Episode 4 — Generative AI Engineering / 4.18 — Building a Simple Multi Agent Workflow

4.18 — Building a Simple Multi-Agent Workflow: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps -- reopen README.md then 4.18.a...4.18.d.
Practice -- 4.18-Exercise-Questions.md.
Polish answers -- 4.18-Interview-Questions.md.

Core vocabulary

Term	One-liner
Multi-agent pipeline	Multiple specialized agents connected in sequence, each with one job
Schema contract	Zod schema defining the exact shape of data between two agents
Sequential pipeline	Agents run one after another; latency = sum of all
Parallel pipeline	Independent agents run simultaneously; latency = max (slowest)
Fan-out/fan-in	Planner splits work, workers run in parallel, merger collects results
Conditional (router)	Router agent picks which pipeline branch to run
Single Responsibility	Each agent does one thing well -- analyze OR transform OR generate
Selective context	Each agent receives only the specific fields it needs
Accumulated context	Each agent receives original input + all previous outputs (token-heavy)
Validation feedback retry	Feeding Zod errors back to the LLM so it can self-correct
Fail fast	Stop pipeline on first error; never return bad data
Graceful degradation	Cascade through cheaper/simpler approaches when primary fails
Fallback response	Pre-defined, schema-valid output used when an agent fails
PipelineError	Custom error class with agentName, stepNumber, isRetryable

Multi-agent pipeline architecture

┌────────────────────────────────────────────────────────────────────┐
│  Input ──► Agent 1 (Analyze) ──► Agent 2 (Transform) ──► Agent 3  │
│               │                     │                    (Generate)│
│           Zod Schema 1          Zod Schema 2          Zod Schema 3│
│           validates             validates              validates   │
│               │                     │                     │        │
│               ▼                     ▼                     ▼        │
│          Structured            Structured            FINAL OUTPUT  │
│          JSON passed           JSON passed           (Structured   │
│          to Agent 2            to Agent 3             JSON)        │
└────────────────────────────────────────────────────────────────────┘

KEY INSIGHT: Same pattern for ANY domain.
  Hinge:    Profile Analyzer → Bio Improver → Conversation Starter
  ImageKit: Metadata Extractor → SEO Optimizer → Tag Categorizer
  Generic:  Analyze → Transform → Generate

Pipeline patterns at a glance

Pattern	Diagram	Latency	When to Use
Sequential	A → B → C	T_A + T_B + T_C	Each step needs previous output
Parallel	A, B, C (simultaneous)	max(T_A, T_B, T_C)	Agents are independent
Fan-out/fan-in	P → [W1, W2, W3] → M	T_P + max(T_W) + T_M	Dynamic sub-task decomposition
Conditional	Router → branch	T_R + T_branch	Different inputs, different paths

// Sequential
const s1 = await agent1(input);
const s2 = await agent2(s1);
const s3 = await agent3(s2);

// Parallel
const [a, b, c] = await Promise.all([
  agentA(input), agentB(input), agentC(input)
]);

// Fan-out/fan-in
const tasks = await planner(input);
const results = await Promise.all(tasks.map(t => worker(t)));
const merged = await merger(results);

// Conditional
const route = await router(input);
switch (route) { case "A": return pipelineA(input); ... }

Data flow patterns

DIRECT PASS-THROUGH:
  Agent 1 output → Agent 2 input → Agent 3 input
  Simple. Later agents lose original input.

ACCUMULATED CONTEXT:
  Agent 2 gets: original + Agent 1 output
  Agent 3 gets: original + Agent 1 output + Agent 2 output
  Full context. Token-heavy. Can confuse agents.

SELECTIVE CONTEXT (recommended):
  Agent 2 gets: original.bio + analysis.weaknesses + analysis.tips
  Agent 3 gets: improvedBio + original.interests + original.name
  Minimal tokens. Focused agents. Best results.

Validation between agents

Agent runs → JSON output → Parse JSON → Zod validates
                                            │
                                       ┌────┴────┐
                                       │         │
                                    PASS      FAIL
                                       │         │
                                  Continue    Fail fast OR
                                  pipeline    retry with
                                              validation feedback

What Zod catches

Missing fields:     ZodError: Required at "weaknesses"
Wrong types:        ZodError: Expected number, received string at "overallScore"
Invalid enums:      ZodError: Invalid enum value at "category"
Out-of-range:       ZodError: Number must be <= 10 at "overallScore"
Array too short:    ZodError: Array must contain at least 1 element(s)
String too short:   ZodError: String must contain at least 20 character(s)

Schema contract pattern

import { z } from 'zod';

const Agent1Output = z.object({
  strengths: z.array(z.object({
    category: z.enum(["bio", "interests", "photos"]),
    description: z.string().min(10),
    impactScore: z.number().min(1).max(10),
  })).min(1),
  overallScore: z.number().min(1).max(10),
});

// Validate immediately after agent runs
const validated = Agent1Output.parse(parsed);  // throws on failure
// OR
const result = Agent1Output.safeParse(parsed); // result.success + result.error

Error handling strategies

Five error types (easiest → hardest to detect)

1. LLM API error       (HTTP 429, 500, 503, timeout)
2. Empty response       (null, undefined, empty string)
3. JSON parse error     (text instead of JSON, truncated JSON)
4. Zod validation error (missing fields, wrong types, out-of-range)
5. Semantic error       (valid JSON, correct types, but WRONG content)

Three failure strategies

Strategy	Returns	Best For
Fail fast	Nothing (error thrown)	Data pipelines, accuracy-critical
Partial results	Completed steps only	Development, debugging
Fallback	Always something	User-facing products

Retry strategies

SIMPLE RETRY:
  Retry N times with linear delay. No learning.

EXPONENTIAL BACKOFF:
  Wait 1s, 2s, 4s, 8s + random jitter. For rate limits.

VALIDATION FEEDBACK (best for Zod errors):
  Feed Zod error messages back to LLM as follow-up.
  LLM reads error and self-corrects.
  Does NOT help with API errors or semantic errors.

Graceful degradation ladder

Level 1:  Full pipeline with GPT-4o           (best quality)
Level 2:  Full pipeline with GPT-4o-mini       (faster, cheaper)
Level 3:  Simplified pipeline (fewer agents)    (reduced quality)
Level 4:  Rule-based heuristics (no LLM)       (basic but reliable)
Level 5:  Static fallback (hardcoded default)   (guaranteed to work)

Every level MUST produce output that passes the same Zod schema.

Temperature strategy

HINGE PIPELINE (dating profiles):
  Agent 1 (Analyzer):     0.7  analytical but needs creative insight
  Agent 2 (Bio Writer):   0.8  writing needs creativity
  Agent 3 (Openers):      0.9  maximum creativity for conversation
  Pattern: monotonically increasing (progressively more creative)

IMAGEKIT PIPELINE (image SEO):
  Agent 1 (Extractor):    0.5  factual extraction, consistency matters
  Agent 2 (SEO):          0.7  creative but accurate titles
  Agent 3 (Tagger):       0.6  comprehensive but consistent tags
  Pattern: non-monotonic (analytical → creative → balanced)

RULE: Temperature follows the task, not the position in the pipeline.
  Factual/analytical → low (0.3-0.5)
  Balanced → medium (0.6-0.7)
  Creative/generative → high (0.8-0.9)

When to use multi-agent pipelines

USE MULTI-AGENT WHEN:
  - Task has genuinely different reasoning steps
  - A single prompt produces inconsistent results
  - You need different models/temperatures per step
  - You need independent testing per step
  - Different team members own different steps

DON'T USE MULTI-AGENT WHEN:
  - Single well-crafted prompt produces equivalent quality
  - Latency budget is under 1-2 seconds
  - Cost increase not justified by quality gain
  - Task needs no LLM at all (code, regex, DB lookup)

AGENT COUNT GUIDELINES:
  2 agents:  Simple analyze → generate
  3 agents:  Standard analyze → transform → generate (most common)
  4-5 agents: Complex workflows with distinct phases
  6+ agents:  Rare. Consider merging some.

Common gotchas

Gotcha	Fix
No validation between agents	Zod `.parse()` after every agent
Passing entire state to every agent	Selective context -- each agent gets only what it needs
Same temperature for all agents	Tune per agent: analytical=low, creative=high
`Promise.all` for batch processing	Use `Promise.allSettled` -- isolate failures per item
No logging in pipeline	Log agent name, duration, status, retries at every step
Monolithic "do everything" agent	Split by SRP -- one job per agent
Fallbacks that break downstream agents	Fallbacks must pass the same Zod schema
No retry for Zod errors	Validation feedback retry -- feed errors back to LLM
Merging agents that need different models	Keep separate if model/temperature needs differ
Building multi-agent before trying single prompt	Always benchmark single prompt first

Reusable pipeline runner (minimal)

async function callAgent(name, prompt, input, schema, temp = 0.7) {
  const res = await client.chat.completions.create({
    model: "gpt-4o", temperature: temp,
    messages: [
      { role: "system", content: prompt },
      { role: "user", content: JSON.stringify(input) },
    ],
  });
  const raw = res.choices[0].message.content;
  let parsed;
  try { parsed = JSON.parse(raw); } catch {
    const m = raw.match(/```(?:json)?\s*([\s\S]*?)```/);
    if (m) parsed = JSON.parse(m[1].trim());
    else throw new Error(`${name}: invalid JSON`);
  }
  return schema.parse(parsed);  // Zod validates
}

// Sequential pipeline
async function runPipeline(agents, input) {
  let data = input;
  for (const a of agents) {
    data = await callAgent(a.name, a.prompt, data, a.schema, a.temp);
  }
  return data;
}

Testing checklist

UNIT (per agent):
  [ ] Valid output matches Zod schema
  [ ] Missing fields rejected
  [ ] Invalid enums rejected
  [ ] Out-of-range numbers rejected
  [ ] Empty/null responses handled
  [ ] JSON in markdown code blocks extracted

INTEGRATION (agent-to-agent):
  [ ] Agent 1 output is valid Agent 2 input
  [ ] Agent 2 output is valid Agent 3 input
  [ ] Selective context extracts correct fields

END-TO-END (full pipeline):
  [ ] Happy path produces valid final output
  [ ] Each agent failure handled gracefully
  [ ] Fallback responses pass schema validation
  [ ] Pipeline metadata (duration, agentCount) correct

Quick mental model

Multi-agent pipeline =
  Input
    → Agent 1 (analyze, validate with Zod)
    → Agent 2 (transform, validate with Zod)
    → Agent 3 (generate, validate with Zod)
    → Structured output

Each agent: one job, one schema, one temperature, independently testable.
Between agents: Zod validation, selective context, error handling.
Pattern is domain-agnostic: works for dating profiles, image SEO, code review, etc.

End of 4.18 quick revision.