Episode 4 — Generative AI Engineering / 4.19 — Multi Agent Architecture Concerns
4.19.e — When Not to Use Multi-Agent
In one sentence: Multi-agent architecture is a powerful tool, not a default choice -- and the mark of an experienced AI engineer is knowing when a single well-crafted prompt outperforms an elaborate pipeline of specialized agents.
Navigation: <- 4.19.d Managing Shared State | 4.19 Exercise Questions ->
1. The Importance of Logging Each Pipeline Step
Before deciding whether to use multi-agent or not, you need visibility into what is actually happening. Logging every pipeline step is what enables informed architectural decisions.
// Minimal pipeline logger -- use this BEFORE optimizing or refactoring
class PipelineLogger {
constructor(pipelineName) {
this.pipelineName = pipelineName;
this.steps = [];
}
logStep(stepName, input, output, durationMs, tokenCount) {
const entry = {
step: stepName,
inputPreview: JSON.stringify(input).slice(0, 200),
outputPreview: JSON.stringify(output).slice(0, 200),
durationMs,
tokenCount,
timestamp: new Date().toISOString(),
};
this.steps.push(entry);
console.log(`[${this.pipelineName}] ${stepName}: ${durationMs}ms, ${tokenCount} tokens`);
}
summary() {
const totalDuration = this.steps.reduce((sum, s) => sum + s.durationMs, 0);
const totalTokens = this.steps.reduce((sum, s) => sum + s.tokenCount, 0);
console.log(`\n=== Pipeline Summary: ${this.pipelineName} ===`);
console.log(`Steps: ${this.steps.length}`);
console.log(`Total duration: ${totalDuration}ms`);
console.log(`Total tokens: ${totalTokens}`);
console.log(`Avg duration/step: ${(totalDuration / this.steps.length).toFixed(0)}ms`);
this.steps.forEach((s) => {
const pct = ((s.durationMs / totalDuration) * 100).toFixed(1);
console.log(` ${s.step}: ${s.durationMs}ms (${pct}%)`);
});
return { totalDuration, totalTokens, stepCount: this.steps.length };
}
}
The reason logging comes first: you cannot make good architectural decisions without data. If you don't know how long each step takes, how many tokens each agent uses, and what quality each agent produces, you are guessing.
2. Simple Tasks Don't Need Agents
Many tasks that get built as multi-agent pipelines could be done in a single LLM call -- or even without an LLM at all.
Tasks that do NOT need multi-agent
| Task | Why Single Call (or No LLM) Works | Anti-Pattern |
|---|---|---|
| Classify intent into 5 categories | One prompt with examples. Done. | 3-agent pipeline: "classifier -> validator -> router" |
| Extract name + email from text | Single prompt with JSON output | "Extraction agent -> Validation agent -> Format agent" |
| Summarize a short article | One prompt: "Summarize in 3 bullet points" | "Chunker agent -> Summarizer agent -> Merger agent" |
| Format a date string | new Date(input).toISOString() -- no LLM needed | "Parsing agent -> Format agent" |
| Look up a FAQ answer | Keyword search in a database | "Classifier agent -> Retriever agent -> Generator agent" for a static FAQ |
| Validate JSON structure | JSON.parse() + schema validation | "Validation agent" that calls an LLM to check JSON |
The "do you actually need an LLM?" test
BEFORE adding any agent, ask:
1. Can this be done with regular code?
(string operations, regex, database query, API call)
YES -> Don't use an LLM at all. Code is faster, cheaper, deterministic.
2. Can this be done with a single LLM call?
(one well-crafted prompt with clear instructions)
YES -> Use a single call. Simpler, faster, cheaper.
3. Does this require genuinely different reasoning steps
that a single prompt cannot handle?
YES -> Consider multi-agent.
NO -> Single call is sufficient.
3. Single Well-Crafted Prompt vs Multi-Agent Overkill
A single prompt can do a surprising amount of work if written well. The key is structured output and clear instructions.
Example: Article analysis
// OVER-ENGINEERED: 4-agent pipeline
async function overEngineeredAnalysis(article) {
const sentiment = await sentimentAgent(article); // Agent 1: 400ms, $0.001
const summary = await summaryAgent(article); // Agent 2: 900ms, $0.005
const keywords = await keywordAgent(article); // Agent 3: 300ms, $0.001
const report = await reportAgent(sentiment, summary, keywords); // Agent 4: 600ms, $0.004
return report;
// Total: 2,200ms, $0.011, 4 LLM calls, complex debugging
}
// RIGHT-SIZED: Single call
async function singleCallAnalysis(article) {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `Analyze the following article. Return JSON with exactly this structure:
{
"sentiment": "positive" | "negative" | "neutral",
"sentimentScore": number between -1 and 1,
"summary": "2-3 sentence summary",
"keywords": ["keyword1", "keyword2", ...up to 5],
"readingLevel": "basic" | "intermediate" | "advanced"
}`,
},
{ role: 'user', content: article },
],
temperature: 0,
response_format: { type: 'json_object' },
});
return JSON.parse(response.choices[0].message.content);
// Total: 800ms, $0.004, 1 LLM call, trivial debugging
}
When does single-call break down?
Single call works when:
- The total task fits in one prompt's context window
- The model can handle all reasoning in one pass
- Output is a single structured response
- No external tool calls needed between reasoning steps
Single call breaks down when:
- Task requires SEQUENTIAL reasoning (step 2 depends on step 1's exact output)
- Different steps need DIFFERENT models (code gen vs classification)
- Steps involve EXTERNAL actions (API calls, database queries, file operations)
- Context is too large for one call (must chunk and process separately)
- Task requires ITERATION (try something, evaluate, retry)
4. Decision Framework: When to Add vs Remove Agents
The "Agent Justification Test"
Before adding any agent to a pipeline, it must pass all three criteria:
+-------------------------------------------------------------------+
| AGENT JUSTIFICATION TEST |
| |
| For each proposed agent, answer ALL THREE: |
| |
| 1. NECESSITY: Does this agent do something that CANNOT be |
| done by the previous agent in the same call? |
| (Different model? External tool? Different context?) |
| |
| 2. VALUE: Does adding this agent MEASURABLY improve output |
| quality compared to the simpler approach? |
| (Run A/B test. "Probably better" is not good enough.) |
| |
| 3. COST-BENEFIT: Does the improvement justify the added |
| latency, cost, and debugging complexity? |
| (A 2% quality improvement that adds 3 seconds and $0.02 |
| per request is rarely worth it.) |
| |
| If ANY answer is NO -> Don't add the agent. |
+-------------------------------------------------------------------+
Decision tree
Do you need an LLM at all?
| |
YES NO -> Use regular code
|
Can a single LLM call handle it?
| |
YES NO
| |
Use single Why not?
call. |
+----+----+
| |
Needs tool Needs different
calls models/contexts
between for different
steps subtasks
| |
v v
Multi-agent Multi-agent
justified justified
|
How many agents?
|
Start with 2.
Add more ONLY
if measured quality
improves.
The "subtraction test"
For an existing multi-agent pipeline, try removing each agent one at a time:
// Test: what happens if we skip the "tone checker" agent?
async function withoutToneChecker(userInput) {
const classification = await classifyIntent(userInput);
const research = await conductResearch(classification);
const response = await generateResponse(research);
// SKIP: const checked = await checkTone(response);
return response;
}
// Run both versions on 100 test cases, compare quality
async function subtractionTest() {
const testCases = loadTestCases(); // 100 representative inputs
const withAgent = [];
const withoutAgent = [];
for (const tc of testCases) {
withAgent.push(await fullPipeline(tc.input));
withoutAgent.push(await withoutToneChecker(tc.input));
}
// Compare quality (have a human or judge LLM rate both)
const comparison = await compareOutputQuality(withAgent, withoutAgent, testCases);
console.log('=== Subtraction Test: Tone Checker ===');
console.log(`With agent: avg quality ${comparison.withAvg}/10`);
console.log(`Without agent: avg quality ${comparison.withoutAvg}/10`);
console.log(`Quality drop: ${(comparison.withAvg - comparison.withoutAvg).toFixed(2)}`);
console.log(`Latency saved: ~${comparison.latencySaved}ms per request`);
console.log(`Cost saved: ~$${comparison.costSaved.toFixed(4)} per request`);
if (comparison.withAvg - comparison.withoutAvg < 0.5) {
console.log('\n>>> RECOMMENDATION: Remove this agent. Quality difference is negligible.');
}
}
5. The YAGNI Principle Applied to AI Architecture
YAGNI: You Aren't Gonna Need It. Don't build complexity for hypothetical future requirements.
YAGNI violations in AI architecture
| Over-Engineering | YAGNI Alternative | When to Upgrade |
|---|---|---|
| 5-agent pipeline "for flexibility" | Single call that works now | When single call demonstrably fails |
| Agent orchestrator framework | Simple async/await chain | When you have > 3 pipelines to manage |
| Dynamic agent routing | Hardcoded if/else for 3 intents | When you have > 10 intents with different flows |
| Shared vector database for all agents | Pass context directly | When context exceeds single-call limits |
| Kubernetes-deployed agent microservices | Functions in one file | When you have separate teams per agent |
| Custom agent communication protocol | Function return values | When agents run on different machines |
The complexity ladder
START HERE (simplest that works):
Level 0: No LLM
Regular code, database queries, templates.
Latency: <10ms | Cost: $0 | Debugging: Trivial
Level 1: Single LLM call
One well-crafted prompt with structured output.
Latency: 200ms-2s | Cost: $0.001-$0.01 | Debugging: Easy
Level 2: Single LLM call + tool use
One LLM call that can invoke tools (search, calculator, API).
Latency: 1-5s | Cost: $0.005-$0.03 | Debugging: Moderate
Level 3: Simple sequential pipeline (2-3 agents)
Each agent has a clear, distinct role.
Latency: 2-8s | Cost: $0.01-$0.05 | Debugging: Moderate
Level 4: Parallel + sequential pipeline (3-5 agents)
Some agents run in parallel, complex state management.
Latency: 3-15s | Cost: $0.02-$0.10 | Debugging: Hard
Level 5: Dynamic multi-agent with routing
Agent count and flow determined at runtime.
Latency: 5-60s | Cost: $0.05-$1.00 | Debugging: Very hard
RULE: Start at Level 0. Move up ONLY when the current level
demonstrably fails to meet quality requirements.
6. Case Studies: Over-Engineered vs Right-Sized Solutions
Case Study 1: Customer support chatbot
Over-engineered version (6 agents):
User message
-> [Intent Classifier Agent]
-> [Entity Extractor Agent]
-> [Knowledge Retriever Agent]
-> [Response Drafter Agent]
-> [Tone Checker Agent]
-> [Safety Filter Agent]
-> Response to user
Latency: 8-12 seconds
Cost: $0.045 per request
Result: Users abandoned the chat before getting a response
Right-sized version (1 call + 1 validation):
User message
-> [Single LLM call with system prompt containing:
- Intent classification instructions
- Entity extraction instructions
- Knowledge base context (via RAG retrieval, not an agent)
- Response guidelines + tone requirements
- Safety rules]
-> [Programmatic safety check (regex + blocklist, no LLM)]
-> Response to user
Latency: 1.5-2.5 seconds
Cost: $0.008 per request
Result: Same quality, 5x faster, 5x cheaper
Lesson: The 6 agents were doing work that one well-prompted model could do in a single pass. The "Tone Checker" and "Safety Filter" added latency without measurable quality improvement over system-prompt instructions.
Case Study 2: Legal document review
Under-engineered version (1 call):
Full 80-page contract
-> [Single LLM call: "Review this contract and identify all risks"]
-> Response
Problems:
- Contract exceeds context window (needs chunking)
- Single prompt cannot handle the complexity
- Output is unstructured and misses important clauses
- No way to verify which sections were actually analyzed
Right-sized version (4 agents):
Full 80-page contract
-> [Chunking Agent: splits into sections, identifies structure]
-> [Risk Analyzer Agent (per chunk, parallelized): identifies risks in each section]
-> [Cross-Reference Agent: checks for contradictions between sections]
-> [Report Generator: compiles structured risk report with citations]
Latency: 45-90 seconds (acceptable for document review)
Cost: $0.35 per document (justified at $500/document price point)
Result: Structured, verifiable, thorough analysis
Lesson: The single call literally could not handle this task. The document exceeds context limits, and the analysis requires cross-referencing between sections. Multi-agent is genuinely necessary here.
Case Study 3: Email categorization
Over-engineered version (3 agents):
Email
-> [Classifier Agent: categorize email]
-> [Priority Agent: assign priority]
-> [Router Agent: determine which team handles it]
Latency: 3 seconds per email
Cost: $0.015 per email
At 50,000 emails/day: $750/day
Right-sized version (1 call):
Email
-> [Single LLM call: "Classify this email. Return JSON:
{ category, priority, team }"]
Latency: 0.8 seconds per email
Cost: $0.004 per email
At 50,000 emails/day: $200/day
Savings: $550/day = $16,500/month
Even more right-sized version (no LLM for most emails):
Email
-> [Rule-based filter: known senders, keywords, patterns]
-> 70% of emails categorized by rules (0ms, $0)
-> 30% of emails sent to single LLM call ($0.004 each)
Effective cost: $0.0012 per email average
At 50,000 emails/day: $60/day
Savings vs 3-agent: $690/day = $20,700/month
7. Summary: Multi-Agent Is a Tool, Not a Goal
+-------------------------------------------------------------------+
| THE MULTI-AGENT DECISION SUMMARY |
+-------------------------------------------------------------------+
| |
| Multi-agent is JUSTIFIED when: |
| [x] Task requires sequential reasoning across distinct steps |
| [x] Different steps need different models or tools |
| [x] Context exceeds single-call limits |
| [x] Quality measurably improves over single-call approach |
| [x] Latency and cost are acceptable for the use case |
| |
| Multi-agent is OVERKILL when: |
| [x] A single well-crafted prompt produces equivalent quality |
| [x] The task can be solved without an LLM at all |
| [x] The added latency exceeds user tolerance |
| [x] The cost increase is not justified by quality improvement |
| [x] You're adding agents "because we might need them later" |
| |
| GOLDEN RULES: |
| 1. Start simple. Single call first. |
| 2. Add agents only when single call demonstrably fails. |
| 3. Measure everything: quality, latency, cost. |
| 4. Remove agents that don't measurably improve output. |
| 5. The best architecture is the simplest one that works. |
| |
+-------------------------------------------------------------------+
8. Key Takeaways
- Simple tasks do not need agents. Classification, extraction, summarization of short text -- these are single-call tasks. Using agents for them is wasteful.
- A single well-crafted prompt is often enough. Before building a pipeline, try writing one excellent prompt with structured output. You might be surprised by the quality.
- Apply the Agent Justification Test to every proposed agent: is it necessary, does it measurably help, and does the cost-benefit work out? If not, cut it.
- YAGNI applies to AI architecture. Don't build for hypothetical complexity. Start at the simplest level that works and move up only when forced to.
- The subtraction test is powerful. Remove an agent, measure quality. If quality barely drops, that agent was not earning its keep.
- Log everything to make informed decisions. You cannot decide whether to simplify or complexify without data on latency, cost, and quality at each step.
- The goal is solving the user's problem, not building an impressive architecture. Users care about fast, accurate, affordable answers -- not how many agents you deployed.
Explain-It Challenge
- A junior engineer proposes a 5-agent pipeline for a task. You suspect a single call would suffice. How do you diplomatically and empirically demonstrate this?
- Your multi-agent pipeline has been running for 6 months. How do you evaluate whether each agent is still earning its keep?
- A product manager says "we need AI for this feature." Walk through your decision process from "no LLM at all" up to "multi-agent pipeline."
Navigation: <- 4.19.d Managing Shared State | 4.19 Exercise Questions ->