Episode 4 — Generative AI Engineering / 4.19 — Multi Agent Architecture Concerns
4.19.b — Higher Operational Cost
In one sentence: Every agent in a multi-agent pipeline makes its own LLM API call, and each call has its own input and output tokens -- so your costs multiply with every agent you add, making cost awareness essential to sustainable AI architecture.
Navigation: <- 4.19.a Increased Latency | 4.19.c -- Debugging Across Agents ->
1. Cost Multiplication: Each Agent = Separate LLM Call
In a single-call system, you pay for one set of input tokens and one set of output tokens. In a multi-agent system, every agent has its own cost.
SINGLE CALL COST:
+------------------+
| 1 LLM Call |
| Input: 2,000 | <- System prompt + user message
| Output: 500 | <- Model response
+------------------+
Total: 2,500 tokens billed
3-AGENT PIPELINE COST:
+------------------+ +------------------+ +------------------+
| Agent A | | Agent B | | Agent C |
| Input: 1,500 | -> | Input: 2,000 | -> | Input: 1,800 |
| Output: 300 | | Output: 800 | | Output: 500 |
+------------------+ +------------------+ +------------------+
Total: (1,500+300) + (2,000+800) + (1,800+500)
= 1,800 + 2,800 + 2,300
= 6,900 tokens billed (2.76x more than single call)
But it gets worse. Notice that Agent B's input includes Agent A's output as context. Agent C's input includes both Agent A's and Agent B's outputs. Context accumulates across the pipeline, inflating input tokens at each step.
2. Calculating Total Pipeline Cost
The formula
Total Cost = SUM over all agents of:
(agent_input_tokens * input_price_per_token) +
(agent_output_tokens * output_price_per_token)
Pricing reference (approximate, as of early 2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o-mini | $0.15 | $0.60 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Haiku | $0.80 | $4.00 |
Worked example: cost comparison
// Pricing constants (per token)
const PRICING = {
'gpt-4o': { input: 2.50 / 1_000_000, output: 10.00 / 1_000_000 },
'gpt-4o-mini': { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 },
};
// Calculate cost of a single agent call
function agentCost(model, inputTokens, outputTokens) {
const p = PRICING[model];
return (inputTokens * p.input) + (outputTokens * p.output);
}
// --- Approach 1: Single call ---
const singleCallCost = agentCost('gpt-4o', 2000, 500);
console.log(`Single call cost: $${singleCallCost.toFixed(6)}`);
// = (2000 * 0.0000025) + (500 * 0.000010)
// = $0.005 + $0.005
// = $0.010
// --- Approach 2: 3-agent pipeline (all GPT-4o) ---
const pipeline3_allBig = [
agentCost('gpt-4o', 1500, 300), // Agent A: classify
agentCost('gpt-4o', 2000, 800), // Agent B: research
agentCost('gpt-4o', 1800, 500), // Agent C: respond
];
const total3Big = pipeline3_allBig.reduce((a, b) => a + b, 0);
console.log(`3-agent pipeline (all GPT-4o): $${total3Big.toFixed(6)}`);
// = $0.006750 + $0.013000 + $0.009500
// = $0.029250 (2.9x more expensive)
// --- Approach 3: 3-agent pipeline (optimized model selection) ---
const pipeline3_optimized = [
agentCost('gpt-4o-mini', 1500, 300), // Agent A: classify (simple task)
agentCost('gpt-4o', 2000, 800), // Agent B: research (needs power)
agentCost('gpt-4o-mini', 1800, 500), // Agent C: format (mechanical)
];
const total3Opt = pipeline3_optimized.reduce((a, b) => a + b, 0);
console.log(`3-agent pipeline (optimized): $${total3Opt.toFixed(6)}`);
// = $0.000405 + $0.013000 + $0.000570
// = $0.013975 (1.4x vs single call, but way cheaper than all-GPT-4o)
Scale impact
Per Request 10K req/day Monthly (30 days)
---------------------------------------------------------------
Single call $0.010 $100 $3,000
3-agent (all big) $0.029 $290 $8,700
3-agent (optimized) $0.014 $140 $4,200
---------------------------------------------------------------
Difference (big vs single): $5,700/month (190% increase)
Difference (optimized vs big): $4,500/month saved by model selection
3. Cost Comparison: Real Scenarios
Scenario A: Customer support bot
Single-call approach:
System prompt (500 tokens) + customer message (200 tokens) + response (300 tokens)
Cost per request: $0.0048 (GPT-4o)
Multi-agent approach:
Agent 1 - Intent classifier: input 400, output 20 = $0.0012
Agent 2 - Knowledge retriever: input 800, output 400 = $0.0060
Agent 3 - Response generator: input 1200, output 300 = $0.0060
Agent 4 - Tone checker: input 500, output 50 = $0.0018
Total per request: $0.0150 (3.1x more expensive)
At 50,000 requests/day:
Single call: $240/day = $7,200/month
Multi-agent: $750/day = $22,500/month
Difference: $15,300/month
Scenario B: Document analysis pipeline
Single-call approach:
10-page document (8000 tokens) + analysis prompt (500) + output (2000)
Cost per document: $0.0413
Multi-agent approach:
Agent 1 - Extract entities: input 8500, output 1000 = $0.0313
Agent 2 - Classify sections: input 8500, output 500 = $0.0263
Agent 3 - Summarize: input 9500, output 1500 = $0.0388
Agent 4 - Generate report: input 3500, output 2000 = $0.0288
Total per document: $0.1250 (3.0x more expensive)
At 1,000 documents/day:
Single call: $41/day = $1,239/month
Multi-agent: $125/day = $3,750/month
Difference: $2,511/month
BUT if multi-agent produces significantly better analysis,
the cost may be justified. This is a business decision.
4. Cost Optimization Strategies
Strategy 1: Use cheaper models for simpler agents
This is the single most impactful optimization. GPT-4o-mini is ~17x cheaper for input and ~17x cheaper for output compared to GPT-4o.
// Match model complexity to task complexity
const AGENT_MODEL_MAP = {
// Simple tasks -> cheap model
intentClassifier: 'gpt-4o-mini', // Pick from 5 categories
sentimentAnalyzer: 'gpt-4o-mini', // Positive/negative/neutral
formatter: 'gpt-4o-mini', // Reformat text
validator: 'gpt-4o-mini', // Check if output matches schema
// Complex tasks -> powerful model
researcher: 'gpt-4o', // Multi-step reasoning
codeGenerator: 'gpt-4o', // Write correct code
analyst: 'gpt-4o', // Nuanced analysis
};
Strategy 2: Cache repeated agent calls
If the same input produces the same output, cache it. Especially useful for classification agents that see repeated patterns.
import crypto from 'crypto';
class AgentCache {
constructor(ttlMs = 3600000) { // 1 hour default TTL
this.cache = new Map();
this.ttlMs = ttlMs;
this.stats = { hits: 0, misses: 0, savedTokens: 0 };
}
hash(input) {
return crypto.createHash('sha256').update(JSON.stringify(input)).digest('hex');
}
get(input) {
const key = this.hash(input);
const entry = this.cache.get(key);
if (entry && Date.now() - entry.timestamp < this.ttlMs) {
this.stats.hits++;
this.stats.savedTokens += entry.tokensUsed;
return entry.result;
}
this.stats.misses++;
return null;
}
set(input, result, tokensUsed) {
const key = this.hash(input);
this.cache.set(key, { result, tokensUsed, timestamp: Date.now() });
}
report() {
const hitRate = this.stats.hits / (this.stats.hits + this.stats.misses) || 0;
console.log(`Cache hit rate: ${(hitRate * 100).toFixed(1)}%`);
console.log(`Tokens saved: ${this.stats.savedTokens.toLocaleString()}`);
}
}
Strategy 3: Minimize token usage per agent
// BAD: Verbose system prompts waste tokens at every call
const verbosePrompt = `You are an AI assistant specialized in analyzing customer
feedback. Your role is to carefully examine the sentiment of the given text and
determine whether the overall tone is positive, negative, or neutral. Please
consider the nuances of language, sarcasm, and implicit sentiment in your
analysis. Return your assessment as a single word.`;
// ~65 tokens - wasted on every call
// GOOD: Minimal system prompt
const efficientPrompt = `Classify sentiment as: positive, negative, or neutral.
Reply with one word only.`;
// ~15 tokens - 77% reduction
// At 50,000 calls/day with GPT-4o:
// Verbose: 65 tokens * 50,000 * $2.50/1M = $8.13/day
// Efficient: 15 tokens * 50,000 * $2.50/1M = $1.88/day
// Savings: $6.25/day = $187.50/month (just for ONE agent's system prompt)
Strategy 4: Batch processing when real-time isn't needed
// Instead of calling the classification agent 100 times...
// Call it ONCE with 100 items batched together
// Individual calls: 100 API calls
// Cost: 100 * (500 input + 20 output) = 52,000 tokens
// Batched call: 1 API call
const batchPrompt = `Classify each message. Return a JSON array of sentiments.
Messages:
1. "${messages[0]}"
2. "${messages[1]}"
...
100. "${messages[99]}"`;
// Cost: 1 * (5000 input + 200 output) = 5,200 tokens (10x cheaper)
// Note: only works when items are independent and batch fits in context window
Strategy 5: Early termination
If an early agent determines the query doesn't need the full pipeline, skip expensive downstream agents.
async function smartPipeline(userMessage) {
// Agent 1: Classify (cheap, fast)
const intent = await classifyIntent(userMessage); // gpt-4o-mini
// Early termination: simple queries don't need the research agent
if (intent.type === 'greeting' || intent.type === 'faq') {
return generateSimpleResponse(intent); // Single cheap call
// Saved: ~$0.020 by skipping Agent 2 and Agent 3
}
// Complex queries get the full pipeline
const research = await conductResearch(intent); // gpt-4o (expensive)
const response = await generateResponse(research); // gpt-4o
return response;
}
5. When Cost Is Justified vs When It's Wasteful
Cost IS justified when:
| Situation | Why It's Worth It |
|---|---|
| Multi-agent produces measurably better output | Higher quality = higher user satisfaction = more revenue |
| Task genuinely requires multiple reasoning steps | A single prompt cannot handle the complexity |
| Different subtasks need different model capabilities | Code agent needs GPT-4o, but formatter needs only GPT-4o-mini |
| Reliability improves with agent specialization | Validation agent catches errors that save customer support costs |
| The business value per request is high | Legal document analysis at $50/document easily justifies $0.12 in AI cost |
Cost is WASTEFUL when:
| Situation | Why It's Not Worth It |
|---|---|
| A single prompt produces equivalent quality | Multi-agent adds cost without improving output |
| Agents are doing trivial work | Using GPT-4o to format a date string |
| Most requests are simple but get the full pipeline | No early termination means simple queries cost as much as complex ones |
| There is no measurement of quality difference | You added agents "just in case" without evaluating impact |
| The task doesn't need LLM at all | Using an agent to parse a known JSON schema (use code instead) |
6. Monitoring and Budgeting for Multi-Agent Costs
Cost tracking per agent, per request
class CostTracker {
constructor() {
this.requests = [];
}
record(requestId, agentName, model, inputTokens, outputTokens) {
const pricing = PRICING[model];
const cost = (inputTokens * pricing.input) + (outputTokens * pricing.output);
this.requests.push({
requestId,
agentName,
model,
inputTokens,
outputTokens,
cost,
timestamp: new Date(),
});
return cost;
}
dailyReport() {
const today = new Date().toISOString().split('T')[0];
const todayRequests = this.requests.filter(
(r) => r.timestamp.toISOString().startsWith(today)
);
const byAgent = {};
let totalCost = 0;
let totalTokens = 0;
for (const r of todayRequests) {
if (!byAgent[r.agentName]) {
byAgent[r.agentName] = { calls: 0, cost: 0, tokens: 0 };
}
byAgent[r.agentName].calls++;
byAgent[r.agentName].cost += r.cost;
byAgent[r.agentName].tokens += r.inputTokens + r.outputTokens;
totalCost += r.cost;
totalTokens += r.inputTokens + r.outputTokens;
}
console.log(`\n=== DAILY COST REPORT (${today}) ===`);
console.log(`Total requests: ${todayRequests.length}`);
console.log(`Total tokens: ${totalTokens.toLocaleString()}`);
console.log(`Total cost: $${totalCost.toFixed(2)}`);
console.log('\nBreakdown by agent:');
for (const [name, stats] of Object.entries(byAgent)) {
const pct = ((stats.cost / totalCost) * 100).toFixed(1);
console.log(` ${name}: ${stats.calls} calls, $${stats.cost.toFixed(4)} (${pct}%)`);
}
}
}
// Usage
const tracker = new CostTracker();
// Inside your pipeline, after each agent call:
tracker.record('req-001', 'classifier', 'gpt-4o-mini', 400, 20);
tracker.record('req-001', 'researcher', 'gpt-4o', 2000, 800);
tracker.record('req-001', 'responder', 'gpt-4o', 1800, 500);
tracker.dailyReport();
Example output:
=== DAILY COST REPORT (2026-04-11) ===
Total requests: 3
Total tokens: 5,520
Total cost: $0.03
Breakdown by agent:
classifier: 1 calls, $0.0002 (0.6%)
researcher: 1 calls, $0.0130 (47.1%)
responder: 1 calls, $0.0095 (34.4%)
Budget alerts
const DAILY_BUDGET = 500; // $500/day max
function checkBudget(tracker) {
const todayCost = tracker.getTodayCost();
const percentUsed = (todayCost / DAILY_BUDGET) * 100;
if (percentUsed > 90) {
console.error(`[CRITICAL] Daily budget ${percentUsed.toFixed(1)}% used ($${todayCost.toFixed(2)} / $${DAILY_BUDGET})`);
// Send alert to ops team, consider throttling
} else if (percentUsed > 70) {
console.warn(`[WARNING] Daily budget ${percentUsed.toFixed(1)}% used`);
}
}
7. Key Takeaways
- Each agent = separate LLM call = separate cost. A 3-agent pipeline costs roughly 2-3x more than a single call, often more because context accumulates.
- Use cheaper models for simpler tasks. GPT-4o-mini costs ~17x less than GPT-4o -- use it for classification, formatting, and validation agents.
- Cache, batch, and minimize tokens. Every token you save is money saved at scale.
- Early termination saves the most money. If a cheap classifier determines the query is simple, skip expensive downstream agents entirely.
- Always compare cost vs quality. Multi-agent is justified only when it produces measurably better results. Measure, don't assume.
- Monitor costs per agent in production. You need to know which agent is eating your budget so you can optimize the right one.
- Set daily/monthly budget caps with alerts. Runaway costs from a bug or traffic spike can be devastating.
Explain-It Challenge
- Your boss asks "why did our AI costs triple this month?" Walk through how you would diagnose the cause using per-agent cost tracking.
- A 5-agent pipeline costs $0.05 per request and handles 100,000 requests/day. The PM wants to cut costs by 50%. Propose a concrete optimization plan.
- When is it cheaper to use one GPT-4o call than three GPT-4o-mini calls? Work out the token math.
Navigation: <- 4.19.a Increased Latency | 4.19.c -- Debugging Across Agents ->