Episode 4 — Generative AI Engineering / 4.18 — Building a Simple Multi Agent Workflow

4.18.a — Multi-Agent Pipeline Design

In one sentence: A multi-agent pipeline breaks a complex AI task into specialized agents connected in a chain, where each agent has a single responsibility, communicates through schema-validated data contracts, and the overall system is easier to build, test, debug, and improve than a monolithic single-agent approach.

Navigation: ← 4.18 Overview · 4.18.b — Hinge Direction: Profile Pipeline →

1. What Is a Multi-Agent Pipeline?

A multi-agent pipeline is an architecture where multiple AI agents are connected in sequence (or in parallel), each performing a specific sub-task and passing its output to the next agent as input.

Think of it like an assembly line in a factory:

Worker 1 inspects the raw material
Worker 2 shapes it
Worker 3 paints it
Worker 4 packages it

Each worker is specialized. Each worker only needs to know about their own job and the handoff protocol with the next station.

┌─────────────────────────────────────────────────────────────────────────┐
│                    MULTI-AGENT PIPELINE                                  │
│                                                                         │
│  ┌─────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│  │  INPUT   │───►│  AGENT 1  │───►│  AGENT 2  │───►│  AGENT 3  │          │
│  │  (raw)   │    │ Analyze   │    │ Transform │    │ Generate  │          │
│  └─────────┘    └────┬─────┘    └────┬─────┘    └────┬─────┘          │
│                      │               │               │                 │
│                  Zod Schema 1    Zod Schema 2    Zod Schema 3          │
│                  validates        validates        validates           │
│                  output           output           output              │
│                      │               │               │                 │
│                      ▼               ▼               ▼                 │
│                 Structured      Structured      FINAL OUTPUT           │
│                 JSON passed     JSON passed     (Structured JSON)      │
│                 to Agent 2      to Agent 3                             │
└─────────────────────────────────────────────────────────────────────────┘

Why not just use one big agent?

Concern	Single Agent	Multi-Agent Pipeline
Prompt complexity	One enormous prompt trying to do everything	Each prompt is short and focused
Reliability	One failure = total failure	Each agent can be retried independently
Debugging	"Something went wrong somewhere"	"Agent 2 returned invalid data"
Testing	Must test the whole thing	Each agent is testable in isolation
Iteration	Changing one behavior risks breaking others	Change one agent without affecting others
Token usage	Entire context needed every call	Each agent only gets what it needs
Model selection	One model for all tasks	Different models for different agents

2. Pipeline Architecture Patterns

2.1 Sequential Pipeline (Chain)

The simplest and most common pattern. Each agent runs after the previous one finishes.

Input ──► Agent 1 ──► Agent 2 ──► Agent 3 ──► Output

Execution order: A1 → A2 → A3
Total time: time(A1) + time(A2) + time(A3)

When to use: When each agent needs the output of the previous one.

// Sequential pipeline — basic structure
async function sequentialPipeline(input) {
  const step1Result = await agent1(input);
  const step2Result = await agent2(step1Result);
  const step3Result = await agent3(step2Result);
  return step3Result;
}

2.2 Parallel Pipeline

Multiple agents process the same input simultaneously, and results are combined.

              ┌──► Agent A ──┐
              │              │
Input ───────┼──► Agent B ──┼──► Combiner ──► Output
              │              │
              └──► Agent C ──┘

Execution order: A, B, C run simultaneously
Total time: max(time(A), time(B), time(C)) + time(Combiner)

When to use: When agents are independent and don't need each other's output.

// Parallel pipeline — basic structure
async function parallelPipeline(input) {
  const [resultA, resultB, resultC] = await Promise.all([
    agentA(input),
    agentB(input),
    agentC(input),
  ]);
  return combiner(resultA, resultB, resultC);
}

2.3 Fan-Out / Fan-In

An agent splits work into multiple sub-tasks, distributes them to specialized agents, and another agent collects and merges results.

                         ┌──► Agent B1 ──┐
                         │               │
Input ──► Agent A ──────┼──► Agent B2 ──┼──► Agent C ──► Output
          (fan-out)      │               │    (fan-in)
                         └──► Agent B3 ──┘

When to use: When the first agent identifies sub-tasks that can be processed independently.

// Fan-out / fan-in pipeline
async function fanOutFanIn(input) {
  // Fan-out: first agent identifies sub-tasks
  const subTasks = await plannerAgent(input);

  // Process sub-tasks in parallel
  const subResults = await Promise.all(
    subTasks.map(task => workerAgent(task))
  );

  // Fan-in: merge results
  const finalResult = await mergerAgent(subResults);
  return finalResult;
}

2.4 Conditional Pipeline (Router)

A router agent decides which pipeline branch to follow based on the input.

                    ┌──► Pipeline A ──┐
                    │                 │
Input ──► Router ──┼──► Pipeline B ──┼──► Output
                    │                 │
                    └──► Pipeline C ──┘

When to use: When different inputs require different processing paths.

// Conditional pipeline with router
async function conditionalPipeline(input) {
  const route = await routerAgent(input);

  switch (route.pipeline) {
    case 'simple':
      return await simplePipeline(input);
    case 'complex':
      return await complexPipeline(input);
    case 'specialized':
      return await specializedPipeline(input);
    default:
      throw new Error(`Unknown route: ${route.pipeline}`);
  }
}

Pattern Comparison

Pattern	Latency	Complexity	Use Case
Sequential	Sum of all agents	Low	Each step depends on previous
Parallel	Max of all agents	Medium	Independent sub-tasks
Fan-out/Fan-in	Fan-out + max parallel + fan-in	High	Dynamic sub-task decomposition
Conditional	Router + selected branch	Medium	Input-dependent processing

3. Data Flow Between Agents

The critical design decision in any multi-agent pipeline is: what data does each agent receive, and what data does each agent produce?

3.1 Direct Pass-Through

Each agent receives only the output of the previous agent.

Agent 1 output ──► Agent 2 input
Agent 2 output ──► Agent 3 input

// Direct pass-through
const step1 = await agent1(rawInput);
const step2 = await agent2(step1);      // only sees step1's output
const step3 = await agent3(step2);      // only sees step2's output

Advantage: Simple, each agent has minimal context. Disadvantage: Later agents lose access to original input.

3.2 Accumulated Context

Each agent receives the original input PLUS all previous outputs.

Agent 1 receives: original input
Agent 2 receives: original input + Agent 1 output
Agent 3 receives: original input + Agent 1 output + Agent 2 output

// Accumulated context
const step1 = await agent1(rawInput);
const step2 = await agent2({ original: rawInput, analysis: step1 });
const step3 = await agent3({ original: rawInput, analysis: step1, transformed: step2 });

Advantage: Later agents have full context for better decisions. Disadvantage: Growing context means more tokens and potential confusion.

3.3 Selective Context

Each agent receives only the specific fields it needs from previous steps.

// Selective context — each agent gets only what it needs
const step1 = await agent1(rawInput);
const step2 = await agent2({
  strengths: step1.strengths,      // only specific fields
  weaknesses: step1.weaknesses,
});
const step3 = await agent3({
  improvedBio: step2.bio,           // only what agent 3 needs
  originalName: rawInput.name,
});

Advantage: Minimal token usage, agents stay focused. Disadvantage: Must carefully design what each agent needs.

4. Designing Agent Responsibilities (Single Responsibility Principle)

The Single Responsibility Principle (SRP) from software engineering applies directly to agent design:

Each agent should have one reason to change — it should do one thing, and do it well.

Good Decomposition

┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│    AGENT 1        │    │    AGENT 2        │    │    AGENT 3        │
│  "Analyze"        │    │  "Transform"      │    │  "Generate"       │
│                   │    │                   │    │                   │
│  - Read input     │    │  - Take analysis  │    │  - Take improved  │
│  - Identify       │    │  - Apply changes  │    │    data           │
│    patterns       │    │  - Produce        │    │  - Create final   │
│  - Score/rate     │    │    improved        │    │    deliverable    │
│  - Return         │    │    version         │    │  - Format output  │
│    analysis       │    │  - Return result  │    │  - Return result  │
└──────────────────┘    └──────────────────┘    └──────────────────┘
       ONE JOB               ONE JOB                ONE JOB

Bad Decomposition (Anti-patterns)

ANTI-PATTERN 1: Agent does too much
┌─────────────────────────────────────────────────────────┐
│  Agent 1: Analyze AND transform AND generate AND format  │
│  (This is just a single agent pretending to be a         │
│   pipeline — all the problems of monolithic design)      │
└─────────────────────────────────────────────────────────┘

ANTI-PATTERN 2: Agent does too little
┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│ Read name │  │ Read age  │  │ Read bio  │  │ Combine   │  │ Format   │
│           │  │           │  │           │  │ fields    │  │ output   │
└──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘
(Unnecessary overhead — these should be one agent)

ANTI-PATTERN 3: Agents have overlapping responsibilities
┌──────────────────┐  ┌──────────────────┐
│ Agent 1: Analyze   │  │ Agent 2: Re-analyze│
│ AND improve bio    │  │ AND generate       │
└──────────────────┘  └──────────────────┘
(Overlapping "analyze" work — unclear ownership)

How to Decide Agent Boundaries

Ask these questions:

Can this step be tested independently? If yes, it's a good agent boundary.
Does this step need different context than the previous step? If yes, separate agent.
Might I want to swap the model for this step? If yes, separate agent (e.g., use GPT-4o for analysis, GPT-4o-mini for formatting).
Does this step have a clearly different output schema? If yes, separate agent.
Would combining this with the next step make the prompt too complex? If yes, keep them separate.

5. Schema Contracts Between Agents

A schema contract is a formal definition of what data an agent produces and what data the next agent expects. In our pipelines, we use Zod schemas as contracts.

┌─────────────────────────────────────────────────────────────────────────┐
│                      SCHEMA CONTRACTS                                    │
│                                                                         │
│  Agent 1                 Agent 2                 Agent 3                │
│  ┌──────┐               ┌──────┐               ┌──────┐               │
│  │INPUT │  Schema A      │INPUT │  Schema B      │INPUT │  Schema C     │
│  │      │──validates──►  │      │──validates──►  │      │──validates──► │
│  │      │  Agent 1       │      │  Agent 2       │      │  Agent 3      │
│  └──────┘  output        └──────┘  output        └──────┘  output       │
│                                                                         │
│  If Agent 1's output doesn't match Schema A → FAIL FAST                 │
│  Don't let bad data propagate through the pipeline                      │
└─────────────────────────────────────────────────────────────────────────┘

Defining Contracts in Code

import { z } from 'zod';

// Contract: What Agent 1 MUST produce
const Agent1OutputSchema = z.object({
  strengths: z.array(z.string()).min(1),
  weaknesses: z.array(z.string()).min(1),
  overallScore: z.number().min(1).max(10),
  summary: z.string().min(10),
});

// Contract: What Agent 2 MUST produce
const Agent2OutputSchema = z.object({
  improvedContent: z.string().min(20),
  changesApplied: z.array(z.string()).min(1),
  improvementScore: z.number().min(1).max(10),
});

// Contract: What Agent 3 MUST produce (final output)
const Agent3OutputSchema = z.object({
  finalDeliverables: z.array(z.string()).min(1),
  metadata: z.object({
    processingSteps: z.number(),
    totalAgentsUsed: z.number(),
  }),
});

Why Schema Contracts Matter

Fail fast: If Agent 1 produces garbage, you know immediately — not three agents later.
Independent development: Two developers can work on Agent 1 and Agent 2 independently, as long as they agree on the schema contract.
Easy testing: Mock Agent 1's output with any data matching Schema A, test Agent 2 in isolation.
Self-documenting: The schema IS the documentation of what flows between agents.
Type safety: With z.infer, TypeScript knows the exact shape at every step.

6. Building a Generic Pipeline Runner

Before diving into specific examples (sections b and c), here is a reusable pipeline runner that works with any set of agents:

import { z } from 'zod';
import OpenAI from 'openai';

const client = new OpenAI();

/**
 * Represents a single agent in the pipeline.
 */
function createAgent({ name, systemPrompt, outputSchema, model = 'gpt-4o' }) {
  return {
    name,
    systemPrompt,
    outputSchema,
    model,
  };
}

/**
 * Runs a single agent: sends a prompt, parses JSON, validates with Zod.
 */
async function runAgent(agent, input) {
  console.log(`\n--- Running Agent: ${agent.name} ---`);
  console.log(`Input: ${JSON.stringify(input).substring(0, 200)}...`);

  const response = await client.chat.completions.create({
    model: agent.model,
    temperature: 0.7,
    messages: [
      { role: 'system', content: agent.systemPrompt },
      { role: 'user', content: JSON.stringify(input) },
    ],
  });

  const rawOutput = response.choices[0].message.content;
  if (!rawOutput) {
    throw new Error(`Agent "${agent.name}" returned empty response`);
  }

  // Parse JSON from response
  let parsed;
  try {
    parsed = JSON.parse(rawOutput);
  } catch {
    // Try to extract JSON from markdown code blocks
    const jsonMatch = rawOutput.match(/```(?:json)?\s*([\s\S]*?)```/);
    if (jsonMatch) {
      parsed = JSON.parse(jsonMatch[1].trim());
    } else {
      throw new Error(
        `Agent "${agent.name}" returned invalid JSON:\n${rawOutput.substring(0, 500)}`
      );
    }
  }

  // Validate with Zod schema
  const validated = agent.outputSchema.parse(parsed);
  console.log(`Agent "${agent.name}" output validated successfully.`);
  return validated;
}

/**
 * Runs a sequential pipeline of agents.
 * Each agent receives the output of the previous agent (plus optional context).
 */
async function runPipeline(agents, initialInput) {
  console.log(`\n========== PIPELINE START ==========`);
  console.log(`Agents in pipeline: ${agents.map(a => a.name).join(' → ')}`);

  let currentData = initialInput;
  const pipelineLog = [];

  for (const agent of agents) {
    const startTime = Date.now();
    try {
      const result = await runAgent(agent, currentData);
      const duration = Date.now() - startTime;

      pipelineLog.push({
        agent: agent.name,
        status: 'success',
        duration,
        output: result,
      });

      currentData = result;
    } catch (error) {
      const duration = Date.now() - startTime;
      pipelineLog.push({
        agent: agent.name,
        status: 'failed',
        duration,
        error: error.message,
      });
      throw new Error(
        `Pipeline failed at agent "${agent.name}": ${error.message}`
      );
    }
  }

  console.log(`\n========== PIPELINE COMPLETE ==========`);
  return {
    result: currentData,
    log: pipelineLog,
  };
}

export { createAgent, runAgent, runPipeline };

How the Generic Runner Works

runPipeline([agent1, agent2, agent3], input)
  │
  ├──► runAgent(agent1, input)
  │      ├── Send to LLM with agent1.systemPrompt
  │      ├── Parse JSON response
  │      ├── Validate with agent1.outputSchema (Zod)
  │      └── Return validated output
  │
  ├──► runAgent(agent2, agent1Output)
  │      ├── Send to LLM with agent2.systemPrompt
  │      ├── Parse JSON response
  │      ├── Validate with agent2.outputSchema (Zod)
  │      └── Return validated output
  │
  └──► runAgent(agent3, agent2Output)
         ├── Send to LLM with agent3.systemPrompt
         ├── Parse JSON response
         ├── Validate with agent3.outputSchema (Zod)
         └── Return validated output → FINAL RESULT

7. Architecture Diagrams for Common Use Cases

Content Creation Pipeline

┌──────────┐    ┌──────────────┐    ┌────────────┐    ┌──────────────┐
│  Topic    │───►│  Researcher   │───►│  Writer     │───►│  Editor      │
│  Input    │    │  Agent        │    │  Agent      │    │  Agent       │
│           │    │               │    │             │    │              │
│  "Write   │    │  Finds key    │    │  Writes     │    │  Polishes    │
│  about    │    │  facts,       │    │  draft      │    │  grammar,    │
│  topic X" │    │  stats,       │    │  article    │    │  tone,       │
│           │    │  sources      │    │  from       │    │  structure   │
│           │    │               │    │  research   │    │              │
└──────────┘    └──────────────┘    └────────────┘    └──────────────┘
                 Schema: facts[]     Schema: draft     Schema: final
                 sources[]           sections[]         article
                 keyPoints[]         wordCount          readability

Customer Support Pipeline

┌──────────┐    ┌──────────────┐    ┌────────────┐    ┌──────────────┐
│  Customer │───►│  Classifier   │───►│  Resolver   │───►│  Response    │
│  Message  │    │  Agent        │    │  Agent      │    │  Generator   │
│           │    │               │    │             │    │              │
│  "My      │    │  Categorizes  │    │  Looks up   │    │  Writes      │
│  order    │    │  intent:      │    │  relevant   │    │  friendly    │
│  hasn't   │    │  shipping,    │    │  policies,  │    │  response    │
│  arrived" │    │  billing,     │    │  solutions  │    │  with        │
│           │    │  technical    │    │             │    │  resolution  │
└──────────┘    └──────────────┘    └────────────┘    └──────────────┘
                 Schema: category    Schema: solution   Schema: response
                 urgency             steps[]            tone
                 sentiment           confidence         followUp

Data Processing Pipeline

┌──────────┐    ┌──────────────┐    ┌────────────┐    ┌──────────────┐
│  Raw      │───►│  Extractor    │───►│  Enricher   │───►│  Formatter   │
│  Data     │    │  Agent        │    │  Agent      │    │  Agent       │
│           │    │               │    │             │    │              │
│  Messy    │    │  Pulls out    │    │  Adds       │    │  Produces    │
│  text,    │    │  structured   │    │  context,   │    │  final       │
│  PDFs,    │    │  fields       │    │  categories │    │  clean       │
│  emails   │    │  from raw     │    │  scores     │    │  output      │
└──────────┘    └──────────────┘    └────────────┘    └──────────────┘
                 Schema: fields{}    Schema: enriched   Schema: formatted
                 entities[]          categories[]        output{}
                 rawValues           scores              report

8. Choosing the Right Number of Agents

A common question: how many agents should my pipeline have?

Guidelines

Agents	When It Makes Sense
2	Simple transform: analyze → generate. Example: sentiment analysis → response generation
3	Standard pipeline: analyze → transform → output. This is the most common pattern
4-5	Complex workflows with distinct phases. Example: classify → research → draft → edit → format
6+	Only for genuinely complex workflows. Consider if some agents can be merged

Rules of Thumb

Start with 2-3 agents. Add more only when you have evidence that a step needs to be split.
Each agent call = ~1-5 seconds latency + API cost. More agents = more time + more money.
If two agents always run together and never need to be tested/changed independently, merge them.
If one agent's prompt is getting too long (>500 words of instructions), consider splitting it.

9. Key Takeaways

A multi-agent pipeline decomposes complex AI tasks into specialized agents connected in sequence, parallel, or hybrid patterns.
Single Responsibility Principle applies to agents — each agent should do one thing well.
Schema contracts (Zod) between agents ensure data integrity and enable independent testing.
Four main patterns: sequential (most common), parallel (for independent tasks), fan-out/fan-in (for dynamic sub-tasks), and conditional/router (for input-dependent paths).
Data flow choices — direct pass-through, accumulated context, or selective context — depend on what each agent needs.
A generic pipeline runner can execute any set of agents, handling JSON parsing, Zod validation, logging, and error reporting.
Start simple (2-3 agents) and only add complexity when you have evidence it's needed.

Explain-It Challenge

A junior developer asks: "Why can't I just use one really good prompt instead of three agents?" Explain when and why multi-agent pipelines outperform single agents.
Draw (or describe) the data flow for a multi-agent pipeline that takes a job posting and produces: (a) a skills analysis, (b) interview questions based on those skills, and (c) a scoring rubric. What schemas would you define at each step?
Explain the difference between sequential and fan-out/fan-in pipelines. Give a real-world example where fan-out/fan-in would be significantly better than sequential.

Navigation: ← 4.18 Overview · 4.18.b — Hinge Direction: Profile Pipeline →