Episode 4 — Generative AI Engineering / 4.18 — Building a Simple Multi Agent Workflow
4.18.a — Multi-Agent Pipeline Design
In one sentence: A multi-agent pipeline breaks a complex AI task into specialized agents connected in a chain, where each agent has a single responsibility, communicates through schema-validated data contracts, and the overall system is easier to build, test, debug, and improve than a monolithic single-agent approach.
Navigation: ← 4.18 Overview · 4.18.b — Hinge Direction: Profile Pipeline →
1. What Is a Multi-Agent Pipeline?
A multi-agent pipeline is an architecture where multiple AI agents are connected in sequence (or in parallel), each performing a specific sub-task and passing its output to the next agent as input.
Think of it like an assembly line in a factory:
- Worker 1 inspects the raw material
- Worker 2 shapes it
- Worker 3 paints it
- Worker 4 packages it
Each worker is specialized. Each worker only needs to know about their own job and the handoff protocol with the next station.
┌─────────────────────────────────────────────────────────────────────────┐
│ MULTI-AGENT PIPELINE │
│ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ INPUT │───►│ AGENT 1 │───►│ AGENT 2 │───►│ AGENT 3 │ │
│ │ (raw) │ │ Analyze │ │ Transform │ │ Generate │ │
│ └─────────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ Zod Schema 1 Zod Schema 2 Zod Schema 3 │
│ validates validates validates │
│ output output output │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Structured Structured FINAL OUTPUT │
│ JSON passed JSON passed (Structured JSON) │
│ to Agent 2 to Agent 3 │
└─────────────────────────────────────────────────────────────────────────┘
Why not just use one big agent?
| Concern | Single Agent | Multi-Agent Pipeline |
|---|---|---|
| Prompt complexity | One enormous prompt trying to do everything | Each prompt is short and focused |
| Reliability | One failure = total failure | Each agent can be retried independently |
| Debugging | "Something went wrong somewhere" | "Agent 2 returned invalid data" |
| Testing | Must test the whole thing | Each agent is testable in isolation |
| Iteration | Changing one behavior risks breaking others | Change one agent without affecting others |
| Token usage | Entire context needed every call | Each agent only gets what it needs |
| Model selection | One model for all tasks | Different models for different agents |
2. Pipeline Architecture Patterns
2.1 Sequential Pipeline (Chain)
The simplest and most common pattern. Each agent runs after the previous one finishes.
Input ──► Agent 1 ──► Agent 2 ──► Agent 3 ──► Output
Execution order: A1 → A2 → A3
Total time: time(A1) + time(A2) + time(A3)
When to use: When each agent needs the output of the previous one.
// Sequential pipeline — basic structure
async function sequentialPipeline(input) {
const step1Result = await agent1(input);
const step2Result = await agent2(step1Result);
const step3Result = await agent3(step2Result);
return step3Result;
}
2.2 Parallel Pipeline
Multiple agents process the same input simultaneously, and results are combined.
┌──► Agent A ──┐
│ │
Input ───────┼──► Agent B ──┼──► Combiner ──► Output
│ │
└──► Agent C ──┘
Execution order: A, B, C run simultaneously
Total time: max(time(A), time(B), time(C)) + time(Combiner)
When to use: When agents are independent and don't need each other's output.
// Parallel pipeline — basic structure
async function parallelPipeline(input) {
const [resultA, resultB, resultC] = await Promise.all([
agentA(input),
agentB(input),
agentC(input),
]);
return combiner(resultA, resultB, resultC);
}
2.3 Fan-Out / Fan-In
An agent splits work into multiple sub-tasks, distributes them to specialized agents, and another agent collects and merges results.
┌──► Agent B1 ──┐
│ │
Input ──► Agent A ──────┼──► Agent B2 ──┼──► Agent C ──► Output
(fan-out) │ │ (fan-in)
└──► Agent B3 ──┘
When to use: When the first agent identifies sub-tasks that can be processed independently.
// Fan-out / fan-in pipeline
async function fanOutFanIn(input) {
// Fan-out: first agent identifies sub-tasks
const subTasks = await plannerAgent(input);
// Process sub-tasks in parallel
const subResults = await Promise.all(
subTasks.map(task => workerAgent(task))
);
// Fan-in: merge results
const finalResult = await mergerAgent(subResults);
return finalResult;
}
2.4 Conditional Pipeline (Router)
A router agent decides which pipeline branch to follow based on the input.
┌──► Pipeline A ──┐
│ │
Input ──► Router ──┼──► Pipeline B ──┼──► Output
│ │
└──► Pipeline C ──┘
When to use: When different inputs require different processing paths.
// Conditional pipeline with router
async function conditionalPipeline(input) {
const route = await routerAgent(input);
switch (route.pipeline) {
case 'simple':
return await simplePipeline(input);
case 'complex':
return await complexPipeline(input);
case 'specialized':
return await specializedPipeline(input);
default:
throw new Error(`Unknown route: ${route.pipeline}`);
}
}
Pattern Comparison
| Pattern | Latency | Complexity | Use Case |
|---|---|---|---|
| Sequential | Sum of all agents | Low | Each step depends on previous |
| Parallel | Max of all agents | Medium | Independent sub-tasks |
| Fan-out/Fan-in | Fan-out + max parallel + fan-in | High | Dynamic sub-task decomposition |
| Conditional | Router + selected branch | Medium | Input-dependent processing |
3. Data Flow Between Agents
The critical design decision in any multi-agent pipeline is: what data does each agent receive, and what data does each agent produce?
3.1 Direct Pass-Through
Each agent receives only the output of the previous agent.
Agent 1 output ──► Agent 2 input
Agent 2 output ──► Agent 3 input
// Direct pass-through
const step1 = await agent1(rawInput);
const step2 = await agent2(step1); // only sees step1's output
const step3 = await agent3(step2); // only sees step2's output
Advantage: Simple, each agent has minimal context. Disadvantage: Later agents lose access to original input.
3.2 Accumulated Context
Each agent receives the original input PLUS all previous outputs.
Agent 1 receives: original input
Agent 2 receives: original input + Agent 1 output
Agent 3 receives: original input + Agent 1 output + Agent 2 output
// Accumulated context
const step1 = await agent1(rawInput);
const step2 = await agent2({ original: rawInput, analysis: step1 });
const step3 = await agent3({ original: rawInput, analysis: step1, transformed: step2 });
Advantage: Later agents have full context for better decisions. Disadvantage: Growing context means more tokens and potential confusion.
3.3 Selective Context
Each agent receives only the specific fields it needs from previous steps.
// Selective context — each agent gets only what it needs
const step1 = await agent1(rawInput);
const step2 = await agent2({
strengths: step1.strengths, // only specific fields
weaknesses: step1.weaknesses,
});
const step3 = await agent3({
improvedBio: step2.bio, // only what agent 3 needs
originalName: rawInput.name,
});
Advantage: Minimal token usage, agents stay focused. Disadvantage: Must carefully design what each agent needs.
4. Designing Agent Responsibilities (Single Responsibility Principle)
The Single Responsibility Principle (SRP) from software engineering applies directly to agent design:
Each agent should have one reason to change — it should do one thing, and do it well.
Good Decomposition
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ AGENT 1 │ │ AGENT 2 │ │ AGENT 3 │
│ "Analyze" │ │ "Transform" │ │ "Generate" │
│ │ │ │ │ │
│ - Read input │ │ - Take analysis │ │ - Take improved │
│ - Identify │ │ - Apply changes │ │ data │
│ patterns │ │ - Produce │ │ - Create final │
│ - Score/rate │ │ improved │ │ deliverable │
│ - Return │ │ version │ │ - Format output │
│ analysis │ │ - Return result │ │ - Return result │
└──────────────────┘ └──────────────────┘ └──────────────────┘
ONE JOB ONE JOB ONE JOB
Bad Decomposition (Anti-patterns)
ANTI-PATTERN 1: Agent does too much
┌─────────────────────────────────────────────────────────┐
│ Agent 1: Analyze AND transform AND generate AND format │
│ (This is just a single agent pretending to be a │
│ pipeline — all the problems of monolithic design) │
└─────────────────────────────────────────────────────────┘
ANTI-PATTERN 2: Agent does too little
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Read name │ │ Read age │ │ Read bio │ │ Combine │ │ Format │
│ │ │ │ │ │ │ fields │ │ output │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
(Unnecessary overhead — these should be one agent)
ANTI-PATTERN 3: Agents have overlapping responsibilities
┌──────────────────┐ ┌──────────────────┐
│ Agent 1: Analyze │ │ Agent 2: Re-analyze│
│ AND improve bio │ │ AND generate │
└──────────────────┘ └──────────────────┘
(Overlapping "analyze" work — unclear ownership)
How to Decide Agent Boundaries
Ask these questions:
- Can this step be tested independently? If yes, it's a good agent boundary.
- Does this step need different context than the previous step? If yes, separate agent.
- Might I want to swap the model for this step? If yes, separate agent (e.g., use GPT-4o for analysis, GPT-4o-mini for formatting).
- Does this step have a clearly different output schema? If yes, separate agent.
- Would combining this with the next step make the prompt too complex? If yes, keep them separate.
5. Schema Contracts Between Agents
A schema contract is a formal definition of what data an agent produces and what data the next agent expects. In our pipelines, we use Zod schemas as contracts.
┌─────────────────────────────────────────────────────────────────────────┐
│ SCHEMA CONTRACTS │
│ │
│ Agent 1 Agent 2 Agent 3 │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │INPUT │ Schema A │INPUT │ Schema B │INPUT │ Schema C │
│ │ │──validates──► │ │──validates──► │ │──validates──► │
│ │ │ Agent 1 │ │ Agent 2 │ │ Agent 3 │
│ └──────┘ output └──────┘ output └──────┘ output │
│ │
│ If Agent 1's output doesn't match Schema A → FAIL FAST │
│ Don't let bad data propagate through the pipeline │
└─────────────────────────────────────────────────────────────────────────┘
Defining Contracts in Code
import { z } from 'zod';
// Contract: What Agent 1 MUST produce
const Agent1OutputSchema = z.object({
strengths: z.array(z.string()).min(1),
weaknesses: z.array(z.string()).min(1),
overallScore: z.number().min(1).max(10),
summary: z.string().min(10),
});
// Contract: What Agent 2 MUST produce
const Agent2OutputSchema = z.object({
improvedContent: z.string().min(20),
changesApplied: z.array(z.string()).min(1),
improvementScore: z.number().min(1).max(10),
});
// Contract: What Agent 3 MUST produce (final output)
const Agent3OutputSchema = z.object({
finalDeliverables: z.array(z.string()).min(1),
metadata: z.object({
processingSteps: z.number(),
totalAgentsUsed: z.number(),
}),
});
Why Schema Contracts Matter
- Fail fast: If Agent 1 produces garbage, you know immediately — not three agents later.
- Independent development: Two developers can work on Agent 1 and Agent 2 independently, as long as they agree on the schema contract.
- Easy testing: Mock Agent 1's output with any data matching Schema A, test Agent 2 in isolation.
- Self-documenting: The schema IS the documentation of what flows between agents.
- Type safety: With
z.infer, TypeScript knows the exact shape at every step.
6. Building a Generic Pipeline Runner
Before diving into specific examples (sections b and c), here is a reusable pipeline runner that works with any set of agents:
import { z } from 'zod';
import OpenAI from 'openai';
const client = new OpenAI();
/**
* Represents a single agent in the pipeline.
*/
function createAgent({ name, systemPrompt, outputSchema, model = 'gpt-4o' }) {
return {
name,
systemPrompt,
outputSchema,
model,
};
}
/**
* Runs a single agent: sends a prompt, parses JSON, validates with Zod.
*/
async function runAgent(agent, input) {
console.log(`\n--- Running Agent: ${agent.name} ---`);
console.log(`Input: ${JSON.stringify(input).substring(0, 200)}...`);
const response = await client.chat.completions.create({
model: agent.model,
temperature: 0.7,
messages: [
{ role: 'system', content: agent.systemPrompt },
{ role: 'user', content: JSON.stringify(input) },
],
});
const rawOutput = response.choices[0].message.content;
if (!rawOutput) {
throw new Error(`Agent "${agent.name}" returned empty response`);
}
// Parse JSON from response
let parsed;
try {
parsed = JSON.parse(rawOutput);
} catch {
// Try to extract JSON from markdown code blocks
const jsonMatch = rawOutput.match(/```(?:json)?\s*([\s\S]*?)```/);
if (jsonMatch) {
parsed = JSON.parse(jsonMatch[1].trim());
} else {
throw new Error(
`Agent "${agent.name}" returned invalid JSON:\n${rawOutput.substring(0, 500)}`
);
}
}
// Validate with Zod schema
const validated = agent.outputSchema.parse(parsed);
console.log(`Agent "${agent.name}" output validated successfully.`);
return validated;
}
/**
* Runs a sequential pipeline of agents.
* Each agent receives the output of the previous agent (plus optional context).
*/
async function runPipeline(agents, initialInput) {
console.log(`\n========== PIPELINE START ==========`);
console.log(`Agents in pipeline: ${agents.map(a => a.name).join(' → ')}`);
let currentData = initialInput;
const pipelineLog = [];
for (const agent of agents) {
const startTime = Date.now();
try {
const result = await runAgent(agent, currentData);
const duration = Date.now() - startTime;
pipelineLog.push({
agent: agent.name,
status: 'success',
duration,
output: result,
});
currentData = result;
} catch (error) {
const duration = Date.now() - startTime;
pipelineLog.push({
agent: agent.name,
status: 'failed',
duration,
error: error.message,
});
throw new Error(
`Pipeline failed at agent "${agent.name}": ${error.message}`
);
}
}
console.log(`\n========== PIPELINE COMPLETE ==========`);
return {
result: currentData,
log: pipelineLog,
};
}
export { createAgent, runAgent, runPipeline };
How the Generic Runner Works
runPipeline([agent1, agent2, agent3], input)
│
├──► runAgent(agent1, input)
│ ├── Send to LLM with agent1.systemPrompt
│ ├── Parse JSON response
│ ├── Validate with agent1.outputSchema (Zod)
│ └── Return validated output
│
├──► runAgent(agent2, agent1Output)
│ ├── Send to LLM with agent2.systemPrompt
│ ├── Parse JSON response
│ ├── Validate with agent2.outputSchema (Zod)
│ └── Return validated output
│
└──► runAgent(agent3, agent2Output)
├── Send to LLM with agent3.systemPrompt
├── Parse JSON response
├── Validate with agent3.outputSchema (Zod)
└── Return validated output → FINAL RESULT
7. Architecture Diagrams for Common Use Cases
Content Creation Pipeline
┌──────────┐ ┌──────────────┐ ┌────────────┐ ┌──────────────┐
│ Topic │───►│ Researcher │───►│ Writer │───►│ Editor │
│ Input │ │ Agent │ │ Agent │ │ Agent │
│ │ │ │ │ │ │ │
│ "Write │ │ Finds key │ │ Writes │ │ Polishes │
│ about │ │ facts, │ │ draft │ │ grammar, │
│ topic X" │ │ stats, │ │ article │ │ tone, │
│ │ │ sources │ │ from │ │ structure │
│ │ │ │ │ research │ │ │
└──────────┘ └──────────────┘ └────────────┘ └──────────────┘
Schema: facts[] Schema: draft Schema: final
sources[] sections[] article
keyPoints[] wordCount readability
Customer Support Pipeline
┌──────────┐ ┌──────────────┐ ┌────────────┐ ┌──────────────┐
│ Customer │───►│ Classifier │───►│ Resolver │───►│ Response │
│ Message │ │ Agent │ │ Agent │ │ Generator │
│ │ │ │ │ │ │ │
│ "My │ │ Categorizes │ │ Looks up │ │ Writes │
│ order │ │ intent: │ │ relevant │ │ friendly │
│ hasn't │ │ shipping, │ │ policies, │ │ response │
│ arrived" │ │ billing, │ │ solutions │ │ with │
│ │ │ technical │ │ │ │ resolution │
└──────────┘ └──────────────┘ └────────────┘ └──────────────┘
Schema: category Schema: solution Schema: response
urgency steps[] tone
sentiment confidence followUp
Data Processing Pipeline
┌──────────┐ ┌──────────────┐ ┌────────────┐ ┌──────────────┐
│ Raw │───►│ Extractor │───►│ Enricher │───►│ Formatter │
│ Data │ │ Agent │ │ Agent │ │ Agent │
│ │ │ │ │ │ │ │
│ Messy │ │ Pulls out │ │ Adds │ │ Produces │
│ text, │ │ structured │ │ context, │ │ final │
│ PDFs, │ │ fields │ │ categories │ │ clean │
│ emails │ │ from raw │ │ scores │ │ output │
└──────────┘ └──────────────┘ └────────────┘ └──────────────┘
Schema: fields{} Schema: enriched Schema: formatted
entities[] categories[] output{}
rawValues scores report
8. Choosing the Right Number of Agents
A common question: how many agents should my pipeline have?
Guidelines
| Agents | When It Makes Sense |
|---|---|
| 2 | Simple transform: analyze → generate. Example: sentiment analysis → response generation |
| 3 | Standard pipeline: analyze → transform → output. This is the most common pattern |
| 4-5 | Complex workflows with distinct phases. Example: classify → research → draft → edit → format |
| 6+ | Only for genuinely complex workflows. Consider if some agents can be merged |
Rules of Thumb
- Start with 2-3 agents. Add more only when you have evidence that a step needs to be split.
- Each agent call = ~1-5 seconds latency + API cost. More agents = more time + more money.
- If two agents always run together and never need to be tested/changed independently, merge them.
- If one agent's prompt is getting too long (>500 words of instructions), consider splitting it.
9. Key Takeaways
- A multi-agent pipeline decomposes complex AI tasks into specialized agents connected in sequence, parallel, or hybrid patterns.
- Single Responsibility Principle applies to agents — each agent should do one thing well.
- Schema contracts (Zod) between agents ensure data integrity and enable independent testing.
- Four main patterns: sequential (most common), parallel (for independent tasks), fan-out/fan-in (for dynamic sub-tasks), and conditional/router (for input-dependent paths).
- Data flow choices — direct pass-through, accumulated context, or selective context — depend on what each agent needs.
- A generic pipeline runner can execute any set of agents, handling JSON parsing, Zod validation, logging, and error reporting.
- Start simple (2-3 agents) and only add complexity when you have evidence it's needed.
Explain-It Challenge
- A junior developer asks: "Why can't I just use one really good prompt instead of three agents?" Explain when and why multi-agent pipelines outperform single agents.
- Draw (or describe) the data flow for a multi-agent pipeline that takes a job posting and produces: (a) a skills analysis, (b) interview questions based on those skills, and (c) a scoring rubric. What schemas would you define at each step?
- Explain the difference between sequential and fan-out/fan-in pipelines. Give a real-world example where fan-out/fan-in would be significantly better than sequential.
Navigation: ← 4.18 Overview · 4.18.b — Hinge Direction: Profile Pipeline →