Episode 4 — Generative AI Engineering / 4.15 — Understanding AI Agents

4.15.d — When NOT to Use Agents

In one sentence: Do not use an AI agent when a single LLM call, a prompt chain, or plain deterministic code can solve the problem — agents add latency, cost, unpredictability, and debugging difficulty that are only justified when the task genuinely requires dynamic, multi-step reasoning with tools.

Navigation: <- 4.15.c When to Use Agents | 4.15.e -- Multi-Agent Complexity ->

1. The Golden Rule

"Just because you CAN build an agent doesn't mean you SHOULD."

Agents are the most complex AI pattern. Every layer of complexity you add must be justified by a real problem that simpler approaches cannot solve. In production systems, simplicity is a feature — simple systems are easier to test, debug, monitor, and maintain.

┌──────────────────────────────────────────────────────────────────┐
│                  THE COMPLEXITY TAX                               │
│                                                                  │
│  Approach          Cost per      Latency     Debug        Error  │
│                    request                   difficulty   rate   │
│  ────────────────  ──────────    ─────────   ──────────   ─────  │
│  Deterministic     $0            <50ms       Trivial      ~0%    │
│  code (if/else)                                                  │
│                                                                  │
│  Single LLM call   $0.003       1-3s        Easy         ~3%    │
│                                                                  │
│  Prompt chain      $0.01        3-8s        Moderate     ~8%    │
│  (2-3 calls)                                                     │
│                                                                  │
│  Agent             $0.05-0.50   10-60s      Hard         ~20%   │
│  (5-20 calls)                                                    │
│                                                                  │
│  Multi-agent       $0.50-5.00   30-300s     Very hard    ~40%   │
│  (multiple agents)                                               │
│                                                                  │
│  RULE: Pick the CHEAPEST row that solves your problem.           │
└──────────────────────────────────────────────────────────────────┘

2. Anti-Pattern 1: Simple Tasks That a Single Call Handles

The most common over-engineering mistake is building an agent for a task that a single LLM call handles perfectly.

Do NOT build an agent for:

Task	Why No Agent	Better Approach
Sentiment classification	All info is in the text. No tools needed.	Single call with `temperature: 0`
Text summarization	Input in, summary out. One pass.	Single call
Translation	Direct transformation.	Single call
Data extraction (from provided text)	All data is in the prompt.	Single call with structured output
Email drafting (from provided info)	Creative task with all context available.	Single call
Code explanation	Model reads the code and explains.	Single call
JSON reformatting	Shape A -> Shape B.	Single call or plain code

// BAD: Building an agent for sentiment analysis
// This is like hiring a detective to check your mailbox

const sentimentAgent = new Agent({
  systemPrompt: "You are a sentiment analysis agent...",
  tools: [], // No tools even needed!
  maxIterations: 5,
});

// The agent will:
// Iteration 1: Think "I need to analyze sentiment..."
// Iteration 2: Think "I have analyzed it. The sentiment is positive."
// Total: 2 LLM calls, 4 seconds, $0.01
//
// WASTE. The agent loop provides zero value here.

// GOOD: Single LLM call for sentiment analysis
async function analyzeSentiment(text) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature: 0,
    messages: [
      {
        role: "system",
        content: 'Classify sentiment as positive, negative, or neutral. Return JSON: { "sentiment": "...", "confidence": 0-1 }',
      },
      { role: "user", content: text },
    ],
  });
  return JSON.parse(response.choices[0].message.content);
}
// Total: 1 LLM call, 1.5 seconds, $0.003

The test

Ask yourself: "Does the LLM need to see its own output and decide what to do next?"

No -> Single call (or prompt chain if multi-step but predictable).
Yes -> Agent might be justified.

3. Anti-Pattern 2: Latency-Sensitive Applications

Agents are slow. Every iteration requires an LLM API call (1-3 seconds) plus tool execution (0.5-5 seconds). A 5-step agent takes 10-25 seconds minimum. Users notice.

┌──────────────────────────────────────────────────────────────────┐
│              USER EXPERIENCE vs LATENCY                           │
│                                                                  │
│  < 1 second     │  "Instant" — feels like a fast app             │
│  1-3 seconds    │  "Quick" — acceptable for AI features           │
│  3-5 seconds    │  "Noticeable" — users start wondering           │
│  5-10 seconds   │  "Slow" — users get impatient                   │
│  10-30 seconds  │  "Very slow" — users switch tabs                │
│  30+ seconds    │  "Broken" — users abandon                       │
│                                                                  │
│  Single call:  1-3 seconds (acceptable)                           │
│  Agent:        10-60 seconds (often problematic)                  │
└──────────────────────────────────────────────────────────────────┘

Where latency matters most

Application	Max Acceptable Latency	Agent Viable?
Autocomplete suggestions	<500ms	No
Chat responses (user waiting)	<3s for first token	Marginal (use streaming)
Search result augmentation	<2s	No
Form validation	<1s	No
API endpoint (sync)	<5s	Rarely
Background task (email)	<60s	Yes
Report generation (async)	<5 minutes	Yes
Batch processing	Hours	Yes

Rule of thumb: If the user is staring at a spinner, do not use an agent. If the task runs in the background and the user is notified when done, agents are fine.

// BAD: Agent for a real-time chat feature
// User sends a message and waits 20 seconds for a response

// GOOD: If you must use agent-like behavior in real-time,
// stream the thinking process to the user
async function* streamingAgent(query, tools) {
  yield { type: "status", message: "Searching for information..." };

  const searchResult = await webSearch(query);
  yield { type: "status", message: "Analyzing results..." };

  const analysis = await llmCall(query, searchResult);
  yield { type: "status", message: "Preparing response..." };

  yield { type: "answer", content: analysis };
}

// The user sees progress updates, making the wait tolerable.
// But this is a FIXED chain, not a true agent loop.
// If you truly need an agent, run it asynchronously.

4. Anti-Pattern 3: Cost-Sensitive Applications

Agents make many LLM calls, and each call is more expensive than the last because the context grows. This adds up fast.

Cost comparison

Task: "Answer a customer question about their order"

Single call approach:
  1 call x ~2,000 tokens = $0.005
  Monthly (100K requests): $500

Agent approach (average 4 iterations):
  Call 1: 2,000 tokens   = $0.005
  Call 2: 4,000 tokens   = $0.010
  Call 3: 6,000 tokens   = $0.015
  Call 4: 8,000 tokens   = $0.020
  Total per request:       $0.050
  Monthly (100K requests): $5,000

Agent is 10x more expensive!

With a strong model (GPT-4o at $2.50/$10 per 1M input/output):
  Agent total per request:  ~$0.15
  Monthly (100K requests):  $15,000

Can your business justify $15K/month for this feature?

Cost mitigation strategies (if you must use an agent)

// Strategy 1: Use a cheap model for simple reasoning steps
const agent = new Agent({
  model: "gpt-4o-mini",    // 15x cheaper than gpt-4o
  maxIterations: 5,         // Hard limit on iterations
});

// Strategy 2: Set a per-task token budget
async function budgetedAgent(query, tools, maxTokenBudget = 20000) {
  let totalTokens = 0;

  for (let i = 0; i < maxIterations; i++) {
    const response = await client.chat.completions.create({
      model: "gpt-4o-mini",
      messages,
      tools: toolDefs,
    });

    totalTokens += response.usage.total_tokens;

    if (totalTokens > maxTokenBudget) {
      // Force the agent to give its best answer NOW
      messages.push({
        role: "user",
        content: "You are running out of budget. Give your best answer with the information you have.",
      });
      const finalResponse = await client.chat.completions.create({
        model: "gpt-4o-mini",
        messages,
      });
      return finalResponse.choices[0].message.content;
    }

    // ... normal agent loop continues ...
  }
}

// Strategy 3: Hybrid approach — try single call first
async function hybridApproach(query) {
  // Attempt 1: Single call (cheap, fast)
  const quickAnswer = await singleCallAttempt(query);

  if (quickAnswer.confidence > 0.85) {
    return quickAnswer; // Single call was enough!
  }

  // Attempt 2: Fall back to agent (expensive, thorough)
  return await agentAttempt(query);
}

When cost does not matter (agent is fine)

Internal tools with low request volume (<1K/month)
High-value tasks where the agent output is worth $10+ per request (e.g., sales research, legal analysis)
Background batch processing where efficiency matters less than accuracy
Prototyping where you are testing whether agents solve the problem at all

5. Anti-Pattern 4: When Deterministic Code Is Better

Sometimes the "AI solution" is worse than a simple if/else or a database query. Not every problem needs AI.

// BAD: Using an agent to check if a user is eligible for a discount
// "Check if the user has been a member for more than 1 year
//  and has spent more than $500"

// Agent approach (over-engineered):
// Step 1: Agent queries database for membership date
// Step 2: Agent calculates membership duration
// Step 3: Agent queries database for total spending
// Step 4: Agent reasons about eligibility
// Total: 4 LLM calls, 10 seconds, $0.05, and MIGHT GET THE LOGIC WRONG

// GOOD: Deterministic code (correct, instant, free)
async function checkDiscountEligibility(userId) {
  const user = await db.query(
    "SELECT created_at, total_spent FROM users WHERE id = $1",
    [userId]
  );

  const membershipDays = (Date.now() - new Date(user.created_at)) / (1000 * 60 * 60 * 24);
  const isLongTermMember = membershipDays > 365;
  const isHighSpender = user.total_spent > 500;

  return {
    eligible: isLongTermMember && isHighSpender,
    reason: isLongTermMember && isHighSpender
      ? "Qualified: 1+ year member with $500+ spent"
      : `Not qualified: ${!isLongTermMember ? "membership < 1 year" : "spending < $500"}`,
  };
}
// Total: 1 DB query, 50ms, $0, ALWAYS correct

The "deterministic code" test

Ask yourself: "Can I write the logic as a function with clear inputs and outputs?"

Scenario	Can Write as Code?	Use AI?
Check if order is returnable (rules-based)	Yes: `daysElapsed < 30 && status === 'delivered'`	No
Route support ticket to correct team (rules-based)	Yes: keyword matching + category rules	Probably no
Calculate shipping cost	Yes: weight * rate + zone surcharge	No
Summarize a customer complaint	No: requires language understanding	Yes (single call)
Decide next best action for a complex support case	Partially: rules handle common cases, AI handles edge cases	Hybrid

// BEST PRACTICE: Hybrid approach
// Use deterministic code for what's deterministic,
// use AI only for what requires language understanding

async function handleSupportTicket(ticket) {
  // Step 1: Deterministic routing (code)
  const category = categorizeByKeywords(ticket.subject);
  const priority = calculatePriority(ticket.customer_tier, ticket.issue_type);

  // Step 2: AI only for the parts that need language understanding
  const summary = await client.chat.completions.create({
    model: "gpt-4o-mini",
    temperature: 0,
    messages: [
      { role: "system", content: "Summarize this support ticket in one sentence." },
      { role: "user", content: ticket.body },
    ],
  });

  // Step 3: Deterministic action (code)
  await assignToTeam(category, priority, summary);

  // No agent needed. One LLM call for the part that needs language AI.
  // Everything else is reliable, fast, free, deterministic code.
}

6. Anti-Pattern 5: Over-Engineering with Agents

The "agent" label is exciting. It sounds cutting-edge. This leads developers to build agents when they do not need them, creating systems that are:

Harder to test — Non-deterministic paths make unit testing difficult
Harder to debug — "The agent decided to search for X instead of Y" is not a clear bug report
Harder to maintain — Prompt changes affect tool selection in unpredictable ways
Harder to explain — "The AI decided to do it that way" is not an acceptable explanation for stakeholders
More expensive — In both compute cost and engineering time

Signs you are over-engineering

Sign	What to Do Instead
Your agent has 0 tools	You don't have an agent, you have an expensive LLM loop. Use a single call.
Your agent always takes the same 3 steps	You don't have an agent, you have a slow prompt chain. Hard-code the chain.
Your agent's output could be a template with variables	Use a template engine, not AI.
You are building an agent for an internal tool used by 5 people	Spend the engineering time on a simpler solution. Agents are for scale or complexity.
Your agent makes 1 tool call then stops	You have a function-calling LLM, not an agent. Drop the loop.
Your agent's success rate is below 70%	The task might not be suitable for an agent. Simplify or decompose.

// RED FLAG: "Agent" that always follows the same path
// This is a prompt chain disguised as an agent

const fakeAgent = new Agent({
  systemPrompt: "Step 1: Search for X. Step 2: Summarize. Step 3: Translate.",
  tools: [searchTool],
  maxIterations: 5,
});
// The "agent" always does: search -> summarize -> translate
// There's no dynamic decision-making. This is a chain.

// BETTER: Explicit prompt chain
async function searchSummarizeTranslate(query, targetLang) {
  const searchResults = await searchTool.execute({ query });

  const summary = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "Summarize these search results." },
      { role: "user", content: JSON.stringify(searchResults) },
    ],
  });

  const translation = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: `Translate to ${targetLang}.` },
      { role: "user", content: summary.choices[0].message.content },
    ],
  });

  return translation.choices[0].message.content;
}
// Same result, predictable, testable, cheaper.

7. The "Should I Build an Agent?" Checklist

Before building an agent, honestly answer these questions:

┌──────────────────────────────────────────────────────────────────┐
│            PRE-AGENT CHECKLIST                                    │
│                                                                  │
│  [ ] Can a single LLM call solve this?                           │
│      If YES -> stop. Use a single call.                          │
│                                                                  │
│  [ ] Can a fixed chain of 2-3 LLM calls solve this?             │
│      If YES -> stop. Use a prompt chain.                         │
│                                                                  │
│  [ ] Can deterministic code solve this?                          │
│      If YES -> stop. Write code. No AI needed.                   │
│                                                                  │
│  [ ] Is the latency acceptable? (Will users wait 10-30 seconds?) │
│      If NO -> stop. Use a faster approach or run async.          │
│                                                                  │
│  [ ] Is the cost acceptable at scale?                            │
│      If NO -> stop. Optimize or use a cheaper approach.          │
│                                                                  │
│  [ ] Do the steps GENUINELY vary per request?                    │
│      If NO -> stop. You're building a chain, not an agent.       │
│                                                                  │
│  [ ] Can you define clear success metrics?                       │
│      If NO -> stop. You can't evaluate an agent without metrics. │
│                                                                  │
│  All checks passed? Build an agent.                              │
│  Start minimal. Add complexity only when needed.                 │
└──────────────────────────────────────────────────────────────────┘

8. When Simple Beats Smart: Real-World Examples

Scenario	Instinct	Reality	Better Approach
Email auto-reply	"Build an agent that reads the email, checks CRM, and drafts a reply!"	90% of emails need the same 5 template responses	Template matcher + single LLM call for edge cases
Code review bot	"An agent that reads PRs, runs tests, and gives feedback!"	Running tests is CI/CD's job. Code review is a single-pass reading task.	Single LLM call with PR diff as input
Meeting scheduler	"An agent that checks calendars, finds conflicts, proposes times!"	Calendar APIs have scheduling endpoints. The "AI" part is trivial.	Calendar API + simple logic + optional LLM for natural language response
Data validation	"An agent that understands the schema and validates data!"	Zod/Joi does this perfectly, deterministically, in milliseconds	Schema validation library
FAQ bot	"An agent that searches our knowledge base and answers!"	RAG pipeline (single call + retrieval) handles this without a loop	RAG with single LLM call

9. The Right Mental Model

Think of AI complexity as a toolbox. You do not use a power drill to hammer a nail:

┌─────────────────────────────────────────────────────────────────────┐
│                    YOUR AI TOOLBOX                                    │
│                                                                     │
│  Tool               When to Reach for It                             │
│  ──────────────     ─────────────────────────────────────────────    │
│  Deterministic      Known rules, math, routing, validation.          │
│  code               Always try this first.                           │
│                                                                     │
│  Single LLM call    Text understanding, classification, summarization│
│                     extraction, generation — when all info is in     │
│                     the prompt.                                      │
│                                                                     │
│  Prompt chain       Multi-step transformation with a KNOWN sequence. │
│                     Summarize then translate. Extract then validate.  │
│                                                                     │
│  RAG pipeline       Question answering over documents. One retrieval │
│                     + one generation call.                           │
│                                                                     │
│  Single tool call   LLM + one function call (function calling from   │
│                     section 4.7). Search then answer.               │
│                                                                     │
│  Agent (loop)       Dynamic multi-step tasks. External data.         │
│                     Actions. Steps depend on results. LAST RESORT.   │
│                                                                     │
│  MOVE DOWN this list. Do not skip to "agent" because it's exciting. │
└─────────────────────────────────────────────────────────────────────┘

10. Key Takeaways

"Just because you can doesn't mean you should." Agents are the most complex AI pattern. Use them only when simpler approaches fail.
Simple tasks do not need agents. Classification, summarization, translation, extraction — these are single-call tasks. An agent adds cost and latency for zero benefit.
Latency-sensitive applications are poor fits for agents. If users are waiting, 10-30 seconds is unacceptable for most use cases. Run agents asynchronously or use simpler approaches.
Agents are expensive at scale. A 5-step agent costs 10x+ more than a single call. Multiply by 100K requests/month and the difference is tens of thousands of dollars.
Deterministic code beats AI for deterministic tasks. If you can write the logic as an if/else or a database query, do that. It is faster, cheaper, and always correct.
If the steps are always the same, you have a chain, not an agent. Hard-code it as a chain. Chains are predictable, testable, and cheaper.
Use the pre-agent checklist. Before building an agent, verify that simpler approaches genuinely cannot solve the problem.

Explain-It Challenge

A startup founder tells you: "We're building an AI agent for everything in our app — search, recommendations, notifications, settings." Explain why this is a terrible idea and propose a better architecture.
Your team built an agent for invoice processing. It takes 15 seconds per invoice and costs $0.12 each. You process 50,000 invoices per month. Calculate the monthly cost and propose a cheaper alternative that still uses AI where needed.
Write a checklist for a code review: "Is this agent actually needed, or is it over-engineered?" Include at least 5 specific red flags to look for.

Navigation: <- 4.15.c When to Use Agents | 4.15.e -- Multi-Agent Complexity ->