Episode 4 — Generative AI Engineering / 4.18 — Building a Simple Multi Agent Workflow

4.18 — Building a Simple Multi-Agent Workflow: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

  1. Skim before labs or interviews.
  2. Drill gaps -- reopen README.md then 4.18.a...4.18.d.
  3. Practice -- 4.18-Exercise-Questions.md.
  4. Polish answers -- 4.18-Interview-Questions.md.

Core vocabulary

TermOne-liner
Multi-agent pipelineMultiple specialized agents connected in sequence, each with one job
Schema contractZod schema defining the exact shape of data between two agents
Sequential pipelineAgents run one after another; latency = sum of all
Parallel pipelineIndependent agents run simultaneously; latency = max (slowest)
Fan-out/fan-inPlanner splits work, workers run in parallel, merger collects results
Conditional (router)Router agent picks which pipeline branch to run
Single ResponsibilityEach agent does one thing well -- analyze OR transform OR generate
Selective contextEach agent receives only the specific fields it needs
Accumulated contextEach agent receives original input + all previous outputs (token-heavy)
Validation feedback retryFeeding Zod errors back to the LLM so it can self-correct
Fail fastStop pipeline on first error; never return bad data
Graceful degradationCascade through cheaper/simpler approaches when primary fails
Fallback responsePre-defined, schema-valid output used when an agent fails
PipelineErrorCustom error class with agentName, stepNumber, isRetryable

Multi-agent pipeline architecture

┌────────────────────────────────────────────────────────────────────┐
│  Input ──► Agent 1 (Analyze) ──► Agent 2 (Transform) ──► Agent 3  │
│               │                     │                    (Generate)│
│           Zod Schema 1          Zod Schema 2          Zod Schema 3│
│           validates             validates              validates   │
│               │                     │                     │        │
│               ▼                     ▼                     ▼        │
│          Structured            Structured            FINAL OUTPUT  │
│          JSON passed           JSON passed           (Structured   │
│          to Agent 2            to Agent 3             JSON)        │
└────────────────────────────────────────────────────────────────────┘

KEY INSIGHT: Same pattern for ANY domain.
  Hinge:    Profile Analyzer → Bio Improver → Conversation Starter
  ImageKit: Metadata Extractor → SEO Optimizer → Tag Categorizer
  Generic:  Analyze → Transform → Generate

Pipeline patterns at a glance

PatternDiagramLatencyWhen to Use
SequentialA → B → CT_A + T_B + T_CEach step needs previous output
ParallelA, B, C (simultaneous)max(T_A, T_B, T_C)Agents are independent
Fan-out/fan-inP → [W1, W2, W3] → MT_P + max(T_W) + T_MDynamic sub-task decomposition
ConditionalRouter → branchT_R + T_branchDifferent inputs, different paths
// Sequential
const s1 = await agent1(input);
const s2 = await agent2(s1);
const s3 = await agent3(s2);

// Parallel
const [a, b, c] = await Promise.all([
  agentA(input), agentB(input), agentC(input)
]);

// Fan-out/fan-in
const tasks = await planner(input);
const results = await Promise.all(tasks.map(t => worker(t)));
const merged = await merger(results);

// Conditional
const route = await router(input);
switch (route) { case "A": return pipelineA(input); ... }

Data flow patterns

DIRECT PASS-THROUGH:
  Agent 1 output → Agent 2 input → Agent 3 input
  Simple. Later agents lose original input.

ACCUMULATED CONTEXT:
  Agent 2 gets: original + Agent 1 output
  Agent 3 gets: original + Agent 1 output + Agent 2 output
  Full context. Token-heavy. Can confuse agents.

SELECTIVE CONTEXT (recommended):
  Agent 2 gets: original.bio + analysis.weaknesses + analysis.tips
  Agent 3 gets: improvedBio + original.interests + original.name
  Minimal tokens. Focused agents. Best results.

Validation between agents

Agent runs → JSON output → Parse JSON → Zod validates
                                            │
                                       ┌────┴────┐
                                       │         │
                                    PASS      FAIL
                                       │         │
                                  Continue    Fail fast OR
                                  pipeline    retry with
                                              validation feedback

What Zod catches

Missing fields:     ZodError: Required at "weaknesses"
Wrong types:        ZodError: Expected number, received string at "overallScore"
Invalid enums:      ZodError: Invalid enum value at "category"
Out-of-range:       ZodError: Number must be <= 10 at "overallScore"
Array too short:    ZodError: Array must contain at least 1 element(s)
String too short:   ZodError: String must contain at least 20 character(s)

Schema contract pattern

import { z } from 'zod';

const Agent1Output = z.object({
  strengths: z.array(z.object({
    category: z.enum(["bio", "interests", "photos"]),
    description: z.string().min(10),
    impactScore: z.number().min(1).max(10),
  })).min(1),
  overallScore: z.number().min(1).max(10),
});

// Validate immediately after agent runs
const validated = Agent1Output.parse(parsed);  // throws on failure
// OR
const result = Agent1Output.safeParse(parsed); // result.success + result.error

Error handling strategies

Five error types (easiest → hardest to detect)

1. LLM API error       (HTTP 429, 500, 503, timeout)
2. Empty response       (null, undefined, empty string)
3. JSON parse error     (text instead of JSON, truncated JSON)
4. Zod validation error (missing fields, wrong types, out-of-range)
5. Semantic error       (valid JSON, correct types, but WRONG content)

Three failure strategies

StrategyReturnsBest For
Fail fastNothing (error thrown)Data pipelines, accuracy-critical
Partial resultsCompleted steps onlyDevelopment, debugging
FallbackAlways somethingUser-facing products

Retry strategies

SIMPLE RETRY:
  Retry N times with linear delay. No learning.

EXPONENTIAL BACKOFF:
  Wait 1s, 2s, 4s, 8s + random jitter. For rate limits.

VALIDATION FEEDBACK (best for Zod errors):
  Feed Zod error messages back to LLM as follow-up.
  LLM reads error and self-corrects.
  Does NOT help with API errors or semantic errors.

Graceful degradation ladder

Level 1:  Full pipeline with GPT-4o           (best quality)
Level 2:  Full pipeline with GPT-4o-mini       (faster, cheaper)
Level 3:  Simplified pipeline (fewer agents)    (reduced quality)
Level 4:  Rule-based heuristics (no LLM)       (basic but reliable)
Level 5:  Static fallback (hardcoded default)   (guaranteed to work)

Every level MUST produce output that passes the same Zod schema.

Temperature strategy

HINGE PIPELINE (dating profiles):
  Agent 1 (Analyzer):     0.7  analytical but needs creative insight
  Agent 2 (Bio Writer):   0.8  writing needs creativity
  Agent 3 (Openers):      0.9  maximum creativity for conversation
  Pattern: monotonically increasing (progressively more creative)

IMAGEKIT PIPELINE (image SEO):
  Agent 1 (Extractor):    0.5  factual extraction, consistency matters
  Agent 2 (SEO):          0.7  creative but accurate titles
  Agent 3 (Tagger):       0.6  comprehensive but consistent tags
  Pattern: non-monotonic (analytical → creative → balanced)

RULE: Temperature follows the task, not the position in the pipeline.
  Factual/analytical → low (0.3-0.5)
  Balanced → medium (0.6-0.7)
  Creative/generative → high (0.8-0.9)

When to use multi-agent pipelines

USE MULTI-AGENT WHEN:
  - Task has genuinely different reasoning steps
  - A single prompt produces inconsistent results
  - You need different models/temperatures per step
  - You need independent testing per step
  - Different team members own different steps

DON'T USE MULTI-AGENT WHEN:
  - Single well-crafted prompt produces equivalent quality
  - Latency budget is under 1-2 seconds
  - Cost increase not justified by quality gain
  - Task needs no LLM at all (code, regex, DB lookup)

AGENT COUNT GUIDELINES:
  2 agents:  Simple analyze → generate
  3 agents:  Standard analyze → transform → generate (most common)
  4-5 agents: Complex workflows with distinct phases
  6+ agents:  Rare. Consider merging some.

Common gotchas

GotchaFix
No validation between agentsZod .parse() after every agent
Passing entire state to every agentSelective context -- each agent gets only what it needs
Same temperature for all agentsTune per agent: analytical=low, creative=high
Promise.all for batch processingUse Promise.allSettled -- isolate failures per item
No logging in pipelineLog agent name, duration, status, retries at every step
Monolithic "do everything" agentSplit by SRP -- one job per agent
Fallbacks that break downstream agentsFallbacks must pass the same Zod schema
No retry for Zod errorsValidation feedback retry -- feed errors back to LLM
Merging agents that need different modelsKeep separate if model/temperature needs differ
Building multi-agent before trying single promptAlways benchmark single prompt first

Reusable pipeline runner (minimal)

async function callAgent(name, prompt, input, schema, temp = 0.7) {
  const res = await client.chat.completions.create({
    model: "gpt-4o", temperature: temp,
    messages: [
      { role: "system", content: prompt },
      { role: "user", content: JSON.stringify(input) },
    ],
  });
  const raw = res.choices[0].message.content;
  let parsed;
  try { parsed = JSON.parse(raw); } catch {
    const m = raw.match(/```(?:json)?\s*([\s\S]*?)```/);
    if (m) parsed = JSON.parse(m[1].trim());
    else throw new Error(`${name}: invalid JSON`);
  }
  return schema.parse(parsed);  // Zod validates
}

// Sequential pipeline
async function runPipeline(agents, input) {
  let data = input;
  for (const a of agents) {
    data = await callAgent(a.name, a.prompt, data, a.schema, a.temp);
  }
  return data;
}

Testing checklist

UNIT (per agent):
  [ ] Valid output matches Zod schema
  [ ] Missing fields rejected
  [ ] Invalid enums rejected
  [ ] Out-of-range numbers rejected
  [ ] Empty/null responses handled
  [ ] JSON in markdown code blocks extracted

INTEGRATION (agent-to-agent):
  [ ] Agent 1 output is valid Agent 2 input
  [ ] Agent 2 output is valid Agent 3 input
  [ ] Selective context extracts correct fields

END-TO-END (full pipeline):
  [ ] Happy path produces valid final output
  [ ] Each agent failure handled gracefully
  [ ] Fallback responses pass schema validation
  [ ] Pipeline metadata (duration, agentCount) correct

Quick mental model

Multi-agent pipeline =
  Input
    → Agent 1 (analyze, validate with Zod)
    → Agent 2 (transform, validate with Zod)
    → Agent 3 (generate, validate with Zod)
    → Structured output

Each agent: one job, one schema, one temperature, independently testable.
Between agents: Zod validation, selective context, error handling.
Pattern is domain-agnostic: works for dating profiles, image SEO, code review, etc.

End of 4.18 quick revision.