Episode 4 — Generative AI Engineering / 4.19 — Multi Agent Architecture Concerns

4.19.b — Higher Operational Cost

In one sentence: Every agent in a multi-agent pipeline makes its own LLM API call, and each call has its own input and output tokens -- so your costs multiply with every agent you add, making cost awareness essential to sustainable AI architecture.

Navigation: <- 4.19.a Increased Latency | 4.19.c -- Debugging Across Agents ->

1. Cost Multiplication: Each Agent = Separate LLM Call

In a single-call system, you pay for one set of input tokens and one set of output tokens. In a multi-agent system, every agent has its own cost.

SINGLE CALL COST:
  +------------------+
  |  1 LLM Call      |
  |  Input:  2,000   |  <- System prompt + user message
  |  Output:   500   |  <- Model response
  +------------------+
  Total: 2,500 tokens billed

3-AGENT PIPELINE COST:
  +------------------+    +------------------+    +------------------+
  |  Agent A         |    |  Agent B         |    |  Agent C         |
  |  Input:  1,500   | -> |  Input:  2,000   | -> |  Input:  1,800   |
  |  Output:   300   |    |  Output:   800   |    |  Output:   500   |
  +------------------+    +------------------+    +------------------+
  Total: (1,500+300) + (2,000+800) + (1,800+500)
       = 1,800 + 2,800 + 2,300
       = 6,900 tokens billed (2.76x more than single call)

But it gets worse. Notice that Agent B's input includes Agent A's output as context. Agent C's input includes both Agent A's and Agent B's outputs. Context accumulates across the pipeline, inflating input tokens at each step.

2. Calculating Total Pipeline Cost

The formula

Total Cost = SUM over all agents of:
  (agent_input_tokens * input_price_per_token) +
  (agent_output_tokens * output_price_per_token)

Pricing reference (approximate, as of early 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o	$2.50	$10.00
GPT-4o-mini	$0.15	$0.60
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3.5 Haiku	$0.80	$4.00

Worked example: cost comparison

// Pricing constants (per token)
const PRICING = {
  'gpt-4o': { input: 2.50 / 1_000_000, output: 10.00 / 1_000_000 },
  'gpt-4o-mini': { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 },
};

// Calculate cost of a single agent call
function agentCost(model, inputTokens, outputTokens) {
  const p = PRICING[model];
  return (inputTokens * p.input) + (outputTokens * p.output);
}

// --- Approach 1: Single call ---
const singleCallCost = agentCost('gpt-4o', 2000, 500);
console.log(`Single call cost: $${singleCallCost.toFixed(6)}`);
// = (2000 * 0.0000025) + (500 * 0.000010)
// = $0.005 + $0.005
// = $0.010

// --- Approach 2: 3-agent pipeline (all GPT-4o) ---
const pipeline3_allBig = [
  agentCost('gpt-4o', 1500, 300),  // Agent A: classify
  agentCost('gpt-4o', 2000, 800),  // Agent B: research
  agentCost('gpt-4o', 1800, 500),  // Agent C: respond
];
const total3Big = pipeline3_allBig.reduce((a, b) => a + b, 0);
console.log(`3-agent pipeline (all GPT-4o): $${total3Big.toFixed(6)}`);
// = $0.006750 + $0.013000 + $0.009500
// = $0.029250 (2.9x more expensive)

// --- Approach 3: 3-agent pipeline (optimized model selection) ---
const pipeline3_optimized = [
  agentCost('gpt-4o-mini', 1500, 300),   // Agent A: classify (simple task)
  agentCost('gpt-4o', 2000, 800),        // Agent B: research (needs power)
  agentCost('gpt-4o-mini', 1800, 500),   // Agent C: format (mechanical)
];
const total3Opt = pipeline3_optimized.reduce((a, b) => a + b, 0);
console.log(`3-agent pipeline (optimized): $${total3Opt.toFixed(6)}`);
// = $0.000405 + $0.013000 + $0.000570
// = $0.013975 (1.4x vs single call, but way cheaper than all-GPT-4o)

Scale impact

                     Per Request    10K req/day     Monthly (30 days)
  ---------------------------------------------------------------
  Single call          $0.010        $100            $3,000
  3-agent (all big)    $0.029        $290            $8,700
  3-agent (optimized)  $0.014        $140            $4,200
  ---------------------------------------------------------------
  
  Difference (big vs single):     $5,700/month (190% increase)
  Difference (optimized vs big):  $4,500/month saved by model selection

3. Cost Comparison: Real Scenarios

Scenario A: Customer support bot

Single-call approach:
  System prompt (500 tokens) + customer message (200 tokens) + response (300 tokens)
  Cost per request: $0.0048 (GPT-4o)

Multi-agent approach:
  Agent 1 - Intent classifier:     input 400, output 20   = $0.0012
  Agent 2 - Knowledge retriever:   input 800, output 400  = $0.0060
  Agent 3 - Response generator:    input 1200, output 300 = $0.0060
  Agent 4 - Tone checker:          input 500, output 50   = $0.0018
  Total per request: $0.0150 (3.1x more expensive)

At 50,000 requests/day:
  Single call:    $240/day   = $7,200/month
  Multi-agent:    $750/day   = $22,500/month
  Difference:     $15,300/month

Scenario B: Document analysis pipeline

Single-call approach:
  10-page document (8000 tokens) + analysis prompt (500) + output (2000)
  Cost per document: $0.0413

Multi-agent approach:
  Agent 1 - Extract entities:    input 8500, output 1000  = $0.0313
  Agent 2 - Classify sections:   input 8500, output 500   = $0.0263
  Agent 3 - Summarize:           input 9500, output 1500  = $0.0388
  Agent 4 - Generate report:     input 3500, output 2000  = $0.0288
  Total per document: $0.1250 (3.0x more expensive)

At 1,000 documents/day:
  Single call:    $41/day   = $1,239/month
  Multi-agent:    $125/day  = $3,750/month
  Difference:     $2,511/month

  BUT if multi-agent produces significantly better analysis,
  the cost may be justified. This is a business decision.

4. Cost Optimization Strategies

Strategy 1: Use cheaper models for simpler agents

This is the single most impactful optimization. GPT-4o-mini is ~17x cheaper for input and ~17x cheaper for output compared to GPT-4o.

// Match model complexity to task complexity
const AGENT_MODEL_MAP = {
  // Simple tasks -> cheap model
  intentClassifier: 'gpt-4o-mini',    // Pick from 5 categories
  sentimentAnalyzer: 'gpt-4o-mini',   // Positive/negative/neutral
  formatter: 'gpt-4o-mini',           // Reformat text
  validator: 'gpt-4o-mini',           // Check if output matches schema

  // Complex tasks -> powerful model
  researcher: 'gpt-4o',              // Multi-step reasoning
  codeGenerator: 'gpt-4o',           // Write correct code
  analyst: 'gpt-4o',                 // Nuanced analysis
};

Strategy 2: Cache repeated agent calls

If the same input produces the same output, cache it. Especially useful for classification agents that see repeated patterns.

import crypto from 'crypto';

class AgentCache {
  constructor(ttlMs = 3600000) { // 1 hour default TTL
    this.cache = new Map();
    this.ttlMs = ttlMs;
    this.stats = { hits: 0, misses: 0, savedTokens: 0 };
  }

  hash(input) {
    return crypto.createHash('sha256').update(JSON.stringify(input)).digest('hex');
  }

  get(input) {
    const key = this.hash(input);
    const entry = this.cache.get(key);
    if (entry && Date.now() - entry.timestamp < this.ttlMs) {
      this.stats.hits++;
      this.stats.savedTokens += entry.tokensUsed;
      return entry.result;
    }
    this.stats.misses++;
    return null;
  }

  set(input, result, tokensUsed) {
    const key = this.hash(input);
    this.cache.set(key, { result, tokensUsed, timestamp: Date.now() });
  }

  report() {
    const hitRate = this.stats.hits / (this.stats.hits + this.stats.misses) || 0;
    console.log(`Cache hit rate: ${(hitRate * 100).toFixed(1)}%`);
    console.log(`Tokens saved: ${this.stats.savedTokens.toLocaleString()}`);
  }
}

Strategy 3: Minimize token usage per agent

// BAD: Verbose system prompts waste tokens at every call
const verbosePrompt = `You are an AI assistant specialized in analyzing customer
feedback. Your role is to carefully examine the sentiment of the given text and
determine whether the overall tone is positive, negative, or neutral. Please
consider the nuances of language, sarcasm, and implicit sentiment in your
analysis. Return your assessment as a single word.`;
// ~65 tokens - wasted on every call

// GOOD: Minimal system prompt
const efficientPrompt = `Classify sentiment as: positive, negative, or neutral.
Reply with one word only.`;
// ~15 tokens - 77% reduction

// At 50,000 calls/day with GPT-4o:
//   Verbose: 65 tokens * 50,000 * $2.50/1M = $8.13/day
//   Efficient: 15 tokens * 50,000 * $2.50/1M = $1.88/day
//   Savings: $6.25/day = $187.50/month (just for ONE agent's system prompt)

Strategy 4: Batch processing when real-time isn't needed

// Instead of calling the classification agent 100 times...
// Call it ONCE with 100 items batched together

// Individual calls: 100 API calls
// Cost: 100 * (500 input + 20 output) = 52,000 tokens

// Batched call: 1 API call
const batchPrompt = `Classify each message. Return a JSON array of sentiments.
Messages:
1. "${messages[0]}"
2. "${messages[1]}"
...
100. "${messages[99]}"`;
// Cost: 1 * (5000 input + 200 output) = 5,200 tokens (10x cheaper)
// Note: only works when items are independent and batch fits in context window

Strategy 5: Early termination

If an early agent determines the query doesn't need the full pipeline, skip expensive downstream agents.

async function smartPipeline(userMessage) {
  // Agent 1: Classify (cheap, fast)
  const intent = await classifyIntent(userMessage); // gpt-4o-mini

  // Early termination: simple queries don't need the research agent
  if (intent.type === 'greeting' || intent.type === 'faq') {
    return generateSimpleResponse(intent); // Single cheap call
    // Saved: ~$0.020 by skipping Agent 2 and Agent 3
  }

  // Complex queries get the full pipeline
  const research = await conductResearch(intent);   // gpt-4o (expensive)
  const response = await generateResponse(research); // gpt-4o
  return response;
}

5. When Cost Is Justified vs When It's Wasteful

Cost IS justified when:

Situation	Why It's Worth It
Multi-agent produces measurably better output	Higher quality = higher user satisfaction = more revenue
Task genuinely requires multiple reasoning steps	A single prompt cannot handle the complexity
Different subtasks need different model capabilities	Code agent needs GPT-4o, but formatter needs only GPT-4o-mini
Reliability improves with agent specialization	Validation agent catches errors that save customer support costs
The business value per request is high	Legal document analysis at $50/document easily justifies $0.12 in AI cost

Cost is WASTEFUL when:

Situation	Why It's Not Worth It
A single prompt produces equivalent quality	Multi-agent adds cost without improving output
Agents are doing trivial work	Using GPT-4o to format a date string
Most requests are simple but get the full pipeline	No early termination means simple queries cost as much as complex ones
There is no measurement of quality difference	You added agents "just in case" without evaluating impact
The task doesn't need LLM at all	Using an agent to parse a known JSON schema (use code instead)

6. Monitoring and Budgeting for Multi-Agent Costs

Cost tracking per agent, per request

class CostTracker {
  constructor() {
    this.requests = [];
  }

  record(requestId, agentName, model, inputTokens, outputTokens) {
    const pricing = PRICING[model];
    const cost = (inputTokens * pricing.input) + (outputTokens * pricing.output);

    this.requests.push({
      requestId,
      agentName,
      model,
      inputTokens,
      outputTokens,
      cost,
      timestamp: new Date(),
    });

    return cost;
  }

  dailyReport() {
    const today = new Date().toISOString().split('T')[0];
    const todayRequests = this.requests.filter(
      (r) => r.timestamp.toISOString().startsWith(today)
    );

    const byAgent = {};
    let totalCost = 0;
    let totalTokens = 0;

    for (const r of todayRequests) {
      if (!byAgent[r.agentName]) {
        byAgent[r.agentName] = { calls: 0, cost: 0, tokens: 0 };
      }
      byAgent[r.agentName].calls++;
      byAgent[r.agentName].cost += r.cost;
      byAgent[r.agentName].tokens += r.inputTokens + r.outputTokens;
      totalCost += r.cost;
      totalTokens += r.inputTokens + r.outputTokens;
    }

    console.log(`\n=== DAILY COST REPORT (${today}) ===`);
    console.log(`Total requests: ${todayRequests.length}`);
    console.log(`Total tokens:   ${totalTokens.toLocaleString()}`);
    console.log(`Total cost:     $${totalCost.toFixed(2)}`);
    console.log('\nBreakdown by agent:');
    for (const [name, stats] of Object.entries(byAgent)) {
      const pct = ((stats.cost / totalCost) * 100).toFixed(1);
      console.log(`  ${name}: ${stats.calls} calls, $${stats.cost.toFixed(4)} (${pct}%)`);
    }
  }
}

// Usage
const tracker = new CostTracker();

// Inside your pipeline, after each agent call:
tracker.record('req-001', 'classifier', 'gpt-4o-mini', 400, 20);
tracker.record('req-001', 'researcher', 'gpt-4o', 2000, 800);
tracker.record('req-001', 'responder', 'gpt-4o', 1800, 500);

tracker.dailyReport();

Example output:

=== DAILY COST REPORT (2026-04-11) ===
Total requests: 3
Total tokens:   5,520
Total cost:     $0.03
Breakdown by agent:
  classifier: 1 calls, $0.0002 (0.6%)
  researcher: 1 calls, $0.0130 (47.1%)
  responder:  1 calls, $0.0095 (34.4%)

Budget alerts

const DAILY_BUDGET = 500; // $500/day max

function checkBudget(tracker) {
  const todayCost = tracker.getTodayCost();
  const percentUsed = (todayCost / DAILY_BUDGET) * 100;

  if (percentUsed > 90) {
    console.error(`[CRITICAL] Daily budget ${percentUsed.toFixed(1)}% used ($${todayCost.toFixed(2)} / $${DAILY_BUDGET})`);
    // Send alert to ops team, consider throttling
  } else if (percentUsed > 70) {
    console.warn(`[WARNING] Daily budget ${percentUsed.toFixed(1)}% used`);
  }
}

7. Key Takeaways

Each agent = separate LLM call = separate cost. A 3-agent pipeline costs roughly 2-3x more than a single call, often more because context accumulates.
Use cheaper models for simpler tasks. GPT-4o-mini costs ~17x less than GPT-4o -- use it for classification, formatting, and validation agents.
Cache, batch, and minimize tokens. Every token you save is money saved at scale.
Early termination saves the most money. If a cheap classifier determines the query is simple, skip expensive downstream agents entirely.
Always compare cost vs quality. Multi-agent is justified only when it produces measurably better results. Measure, don't assume.
Monitor costs per agent in production. You need to know which agent is eating your budget so you can optimize the right one.
Set daily/monthly budget caps with alerts. Runaway costs from a bug or traffic spike can be devastating.

Explain-It Challenge

Your boss asks "why did our AI costs triple this month?" Walk through how you would diagnose the cause using per-agent cost tracking.
A 5-agent pipeline costs $0.05 per request and handles 100,000 requests/day. The PM wants to cut costs by 50%. Propose a concrete optimization plan.
When is it cheaper to use one GPT-4o call than three GPT-4o-mini calls? Work out the token math.

Navigation: <- 4.19.a Increased Latency | 4.19.c -- Debugging Across Agents ->