Episode 4 — Generative AI Engineering / 4.15 — Understanding AI Agents
4.15.d — When NOT to Use Agents
In one sentence: Do not use an AI agent when a single LLM call, a prompt chain, or plain deterministic code can solve the problem — agents add latency, cost, unpredictability, and debugging difficulty that are only justified when the task genuinely requires dynamic, multi-step reasoning with tools.
Navigation: <- 4.15.c When to Use Agents | 4.15.e -- Multi-Agent Complexity ->
1. The Golden Rule
"Just because you CAN build an agent doesn't mean you SHOULD."
Agents are the most complex AI pattern. Every layer of complexity you add must be justified by a real problem that simpler approaches cannot solve. In production systems, simplicity is a feature — simple systems are easier to test, debug, monitor, and maintain.
┌──────────────────────────────────────────────────────────────────┐
│ THE COMPLEXITY TAX │
│ │
│ Approach Cost per Latency Debug Error │
│ request difficulty rate │
│ ──────────────── ────────── ───────── ────────── ───── │
│ Deterministic $0 <50ms Trivial ~0% │
│ code (if/else) │
│ │
│ Single LLM call $0.003 1-3s Easy ~3% │
│ │
│ Prompt chain $0.01 3-8s Moderate ~8% │
│ (2-3 calls) │
│ │
│ Agent $0.05-0.50 10-60s Hard ~20% │
│ (5-20 calls) │
│ │
│ Multi-agent $0.50-5.00 30-300s Very hard ~40% │
│ (multiple agents) │
│ │
│ RULE: Pick the CHEAPEST row that solves your problem. │
└──────────────────────────────────────────────────────────────────┘
2. Anti-Pattern 1: Simple Tasks That a Single Call Handles
The most common over-engineering mistake is building an agent for a task that a single LLM call handles perfectly.
Do NOT build an agent for:
| Task | Why No Agent | Better Approach |
|---|---|---|
| Sentiment classification | All info is in the text. No tools needed. | Single call with temperature: 0 |
| Text summarization | Input in, summary out. One pass. | Single call |
| Translation | Direct transformation. | Single call |
| Data extraction (from provided text) | All data is in the prompt. | Single call with structured output |
| Email drafting (from provided info) | Creative task with all context available. | Single call |
| Code explanation | Model reads the code and explains. | Single call |
| JSON reformatting | Shape A -> Shape B. | Single call or plain code |
// BAD: Building an agent for sentiment analysis
// This is like hiring a detective to check your mailbox
const sentimentAgent = new Agent({
systemPrompt: "You are a sentiment analysis agent...",
tools: [], // No tools even needed!
maxIterations: 5,
});
// The agent will:
// Iteration 1: Think "I need to analyze sentiment..."
// Iteration 2: Think "I have analyzed it. The sentiment is positive."
// Total: 2 LLM calls, 4 seconds, $0.01
//
// WASTE. The agent loop provides zero value here.
// GOOD: Single LLM call for sentiment analysis
async function analyzeSentiment(text) {
const response = await client.chat.completions.create({
model: "gpt-4o",
temperature: 0,
messages: [
{
role: "system",
content: 'Classify sentiment as positive, negative, or neutral. Return JSON: { "sentiment": "...", "confidence": 0-1 }',
},
{ role: "user", content: text },
],
});
return JSON.parse(response.choices[0].message.content);
}
// Total: 1 LLM call, 1.5 seconds, $0.003
The test
Ask yourself: "Does the LLM need to see its own output and decide what to do next?"
- No -> Single call (or prompt chain if multi-step but predictable).
- Yes -> Agent might be justified.
3. Anti-Pattern 2: Latency-Sensitive Applications
Agents are slow. Every iteration requires an LLM API call (1-3 seconds) plus tool execution (0.5-5 seconds). A 5-step agent takes 10-25 seconds minimum. Users notice.
┌──────────────────────────────────────────────────────────────────┐
│ USER EXPERIENCE vs LATENCY │
│ │
│ < 1 second │ "Instant" — feels like a fast app │
│ 1-3 seconds │ "Quick" — acceptable for AI features │
│ 3-5 seconds │ "Noticeable" — users start wondering │
│ 5-10 seconds │ "Slow" — users get impatient │
│ 10-30 seconds │ "Very slow" — users switch tabs │
│ 30+ seconds │ "Broken" — users abandon │
│ │
│ Single call: 1-3 seconds (acceptable) │
│ Agent: 10-60 seconds (often problematic) │
└──────────────────────────────────────────────────────────────────┘
Where latency matters most
| Application | Max Acceptable Latency | Agent Viable? |
|---|---|---|
| Autocomplete suggestions | <500ms | No |
| Chat responses (user waiting) | <3s for first token | Marginal (use streaming) |
| Search result augmentation | <2s | No |
| Form validation | <1s | No |
| API endpoint (sync) | <5s | Rarely |
| Background task (email) | <60s | Yes |
| Report generation (async) | <5 minutes | Yes |
| Batch processing | Hours | Yes |
Rule of thumb: If the user is staring at a spinner, do not use an agent. If the task runs in the background and the user is notified when done, agents are fine.
// BAD: Agent for a real-time chat feature
// User sends a message and waits 20 seconds for a response
// GOOD: If you must use agent-like behavior in real-time,
// stream the thinking process to the user
async function* streamingAgent(query, tools) {
yield { type: "status", message: "Searching for information..." };
const searchResult = await webSearch(query);
yield { type: "status", message: "Analyzing results..." };
const analysis = await llmCall(query, searchResult);
yield { type: "status", message: "Preparing response..." };
yield { type: "answer", content: analysis };
}
// The user sees progress updates, making the wait tolerable.
// But this is a FIXED chain, not a true agent loop.
// If you truly need an agent, run it asynchronously.
4. Anti-Pattern 3: Cost-Sensitive Applications
Agents make many LLM calls, and each call is more expensive than the last because the context grows. This adds up fast.
Cost comparison
Task: "Answer a customer question about their order"
Single call approach:
1 call x ~2,000 tokens = $0.005
Monthly (100K requests): $500
Agent approach (average 4 iterations):
Call 1: 2,000 tokens = $0.005
Call 2: 4,000 tokens = $0.010
Call 3: 6,000 tokens = $0.015
Call 4: 8,000 tokens = $0.020
Total per request: $0.050
Monthly (100K requests): $5,000
Agent is 10x more expensive!
With a strong model (GPT-4o at $2.50/$10 per 1M input/output):
Agent total per request: ~$0.15
Monthly (100K requests): $15,000
Can your business justify $15K/month for this feature?
Cost mitigation strategies (if you must use an agent)
// Strategy 1: Use a cheap model for simple reasoning steps
const agent = new Agent({
model: "gpt-4o-mini", // 15x cheaper than gpt-4o
maxIterations: 5, // Hard limit on iterations
});
// Strategy 2: Set a per-task token budget
async function budgetedAgent(query, tools, maxTokenBudget = 20000) {
let totalTokens = 0;
for (let i = 0; i < maxIterations; i++) {
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages,
tools: toolDefs,
});
totalTokens += response.usage.total_tokens;
if (totalTokens > maxTokenBudget) {
// Force the agent to give its best answer NOW
messages.push({
role: "user",
content: "You are running out of budget. Give your best answer with the information you have.",
});
const finalResponse = await client.chat.completions.create({
model: "gpt-4o-mini",
messages,
});
return finalResponse.choices[0].message.content;
}
// ... normal agent loop continues ...
}
}
// Strategy 3: Hybrid approach — try single call first
async function hybridApproach(query) {
// Attempt 1: Single call (cheap, fast)
const quickAnswer = await singleCallAttempt(query);
if (quickAnswer.confidence > 0.85) {
return quickAnswer; // Single call was enough!
}
// Attempt 2: Fall back to agent (expensive, thorough)
return await agentAttempt(query);
}
When cost does not matter (agent is fine)
- Internal tools with low request volume (<1K/month)
- High-value tasks where the agent output is worth $10+ per request (e.g., sales research, legal analysis)
- Background batch processing where efficiency matters less than accuracy
- Prototyping where you are testing whether agents solve the problem at all
5. Anti-Pattern 4: When Deterministic Code Is Better
Sometimes the "AI solution" is worse than a simple if/else or a database query. Not every problem needs AI.
// BAD: Using an agent to check if a user is eligible for a discount
// "Check if the user has been a member for more than 1 year
// and has spent more than $500"
// Agent approach (over-engineered):
// Step 1: Agent queries database for membership date
// Step 2: Agent calculates membership duration
// Step 3: Agent queries database for total spending
// Step 4: Agent reasons about eligibility
// Total: 4 LLM calls, 10 seconds, $0.05, and MIGHT GET THE LOGIC WRONG
// GOOD: Deterministic code (correct, instant, free)
async function checkDiscountEligibility(userId) {
const user = await db.query(
"SELECT created_at, total_spent FROM users WHERE id = $1",
[userId]
);
const membershipDays = (Date.now() - new Date(user.created_at)) / (1000 * 60 * 60 * 24);
const isLongTermMember = membershipDays > 365;
const isHighSpender = user.total_spent > 500;
return {
eligible: isLongTermMember && isHighSpender,
reason: isLongTermMember && isHighSpender
? "Qualified: 1+ year member with $500+ spent"
: `Not qualified: ${!isLongTermMember ? "membership < 1 year" : "spending < $500"}`,
};
}
// Total: 1 DB query, 50ms, $0, ALWAYS correct
The "deterministic code" test
Ask yourself: "Can I write the logic as a function with clear inputs and outputs?"
| Scenario | Can Write as Code? | Use AI? |
|---|---|---|
| Check if order is returnable (rules-based) | Yes: daysElapsed < 30 && status === 'delivered' | No |
| Route support ticket to correct team (rules-based) | Yes: keyword matching + category rules | Probably no |
| Calculate shipping cost | Yes: weight * rate + zone surcharge | No |
| Summarize a customer complaint | No: requires language understanding | Yes (single call) |
| Decide next best action for a complex support case | Partially: rules handle common cases, AI handles edge cases | Hybrid |
// BEST PRACTICE: Hybrid approach
// Use deterministic code for what's deterministic,
// use AI only for what requires language understanding
async function handleSupportTicket(ticket) {
// Step 1: Deterministic routing (code)
const category = categorizeByKeywords(ticket.subject);
const priority = calculatePriority(ticket.customer_tier, ticket.issue_type);
// Step 2: AI only for the parts that need language understanding
const summary = await client.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
messages: [
{ role: "system", content: "Summarize this support ticket in one sentence." },
{ role: "user", content: ticket.body },
],
});
// Step 3: Deterministic action (code)
await assignToTeam(category, priority, summary);
// No agent needed. One LLM call for the part that needs language AI.
// Everything else is reliable, fast, free, deterministic code.
}
6. Anti-Pattern 5: Over-Engineering with Agents
The "agent" label is exciting. It sounds cutting-edge. This leads developers to build agents when they do not need them, creating systems that are:
- Harder to test — Non-deterministic paths make unit testing difficult
- Harder to debug — "The agent decided to search for X instead of Y" is not a clear bug report
- Harder to maintain — Prompt changes affect tool selection in unpredictable ways
- Harder to explain — "The AI decided to do it that way" is not an acceptable explanation for stakeholders
- More expensive — In both compute cost and engineering time
Signs you are over-engineering
| Sign | What to Do Instead |
|---|---|
| Your agent has 0 tools | You don't have an agent, you have an expensive LLM loop. Use a single call. |
| Your agent always takes the same 3 steps | You don't have an agent, you have a slow prompt chain. Hard-code the chain. |
| Your agent's output could be a template with variables | Use a template engine, not AI. |
| You are building an agent for an internal tool used by 5 people | Spend the engineering time on a simpler solution. Agents are for scale or complexity. |
| Your agent makes 1 tool call then stops | You have a function-calling LLM, not an agent. Drop the loop. |
| Your agent's success rate is below 70% | The task might not be suitable for an agent. Simplify or decompose. |
// RED FLAG: "Agent" that always follows the same path
// This is a prompt chain disguised as an agent
const fakeAgent = new Agent({
systemPrompt: "Step 1: Search for X. Step 2: Summarize. Step 3: Translate.",
tools: [searchTool],
maxIterations: 5,
});
// The "agent" always does: search -> summarize -> translate
// There's no dynamic decision-making. This is a chain.
// BETTER: Explicit prompt chain
async function searchSummarizeTranslate(query, targetLang) {
const searchResults = await searchTool.execute({ query });
const summary = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "Summarize these search results." },
{ role: "user", content: JSON.stringify(searchResults) },
],
});
const translation = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: `Translate to ${targetLang}.` },
{ role: "user", content: summary.choices[0].message.content },
],
});
return translation.choices[0].message.content;
}
// Same result, predictable, testable, cheaper.
7. The "Should I Build an Agent?" Checklist
Before building an agent, honestly answer these questions:
┌──────────────────────────────────────────────────────────────────┐
│ PRE-AGENT CHECKLIST │
│ │
│ [ ] Can a single LLM call solve this? │
│ If YES -> stop. Use a single call. │
│ │
│ [ ] Can a fixed chain of 2-3 LLM calls solve this? │
│ If YES -> stop. Use a prompt chain. │
│ │
│ [ ] Can deterministic code solve this? │
│ If YES -> stop. Write code. No AI needed. │
│ │
│ [ ] Is the latency acceptable? (Will users wait 10-30 seconds?) │
│ If NO -> stop. Use a faster approach or run async. │
│ │
│ [ ] Is the cost acceptable at scale? │
│ If NO -> stop. Optimize or use a cheaper approach. │
│ │
│ [ ] Do the steps GENUINELY vary per request? │
│ If NO -> stop. You're building a chain, not an agent. │
│ │
│ [ ] Can you define clear success metrics? │
│ If NO -> stop. You can't evaluate an agent without metrics. │
│ │
│ All checks passed? Build an agent. │
│ Start minimal. Add complexity only when needed. │
└──────────────────────────────────────────────────────────────────┘
8. When Simple Beats Smart: Real-World Examples
| Scenario | Instinct | Reality | Better Approach |
|---|---|---|---|
| Email auto-reply | "Build an agent that reads the email, checks CRM, and drafts a reply!" | 90% of emails need the same 5 template responses | Template matcher + single LLM call for edge cases |
| Code review bot | "An agent that reads PRs, runs tests, and gives feedback!" | Running tests is CI/CD's job. Code review is a single-pass reading task. | Single LLM call with PR diff as input |
| Meeting scheduler | "An agent that checks calendars, finds conflicts, proposes times!" | Calendar APIs have scheduling endpoints. The "AI" part is trivial. | Calendar API + simple logic + optional LLM for natural language response |
| Data validation | "An agent that understands the schema and validates data!" | Zod/Joi does this perfectly, deterministically, in milliseconds | Schema validation library |
| FAQ bot | "An agent that searches our knowledge base and answers!" | RAG pipeline (single call + retrieval) handles this without a loop | RAG with single LLM call |
9. The Right Mental Model
Think of AI complexity as a toolbox. You do not use a power drill to hammer a nail:
┌─────────────────────────────────────────────────────────────────────┐
│ YOUR AI TOOLBOX │
│ │
│ Tool When to Reach for It │
│ ────────────── ───────────────────────────────────────────── │
│ Deterministic Known rules, math, routing, validation. │
│ code Always try this first. │
│ │
│ Single LLM call Text understanding, classification, summarization│
│ extraction, generation — when all info is in │
│ the prompt. │
│ │
│ Prompt chain Multi-step transformation with a KNOWN sequence. │
│ Summarize then translate. Extract then validate. │
│ │
│ RAG pipeline Question answering over documents. One retrieval │
│ + one generation call. │
│ │
│ Single tool call LLM + one function call (function calling from │
│ section 4.7). Search then answer. │
│ │
│ Agent (loop) Dynamic multi-step tasks. External data. │
│ Actions. Steps depend on results. LAST RESORT. │
│ │
│ MOVE DOWN this list. Do not skip to "agent" because it's exciting. │
└─────────────────────────────────────────────────────────────────────┘
10. Key Takeaways
- "Just because you can doesn't mean you should." Agents are the most complex AI pattern. Use them only when simpler approaches fail.
- Simple tasks do not need agents. Classification, summarization, translation, extraction — these are single-call tasks. An agent adds cost and latency for zero benefit.
- Latency-sensitive applications are poor fits for agents. If users are waiting, 10-30 seconds is unacceptable for most use cases. Run agents asynchronously or use simpler approaches.
- Agents are expensive at scale. A 5-step agent costs 10x+ more than a single call. Multiply by 100K requests/month and the difference is tens of thousands of dollars.
- Deterministic code beats AI for deterministic tasks. If you can write the logic as an
if/elseor a database query, do that. It is faster, cheaper, and always correct. - If the steps are always the same, you have a chain, not an agent. Hard-code it as a chain. Chains are predictable, testable, and cheaper.
- Use the pre-agent checklist. Before building an agent, verify that simpler approaches genuinely cannot solve the problem.
Explain-It Challenge
- A startup founder tells you: "We're building an AI agent for everything in our app — search, recommendations, notifications, settings." Explain why this is a terrible idea and propose a better architecture.
- Your team built an agent for invoice processing. It takes 15 seconds per invoice and costs $0.12 each. You process 50,000 invoices per month. Calculate the monthly cost and propose a cheaper alternative that still uses AI where needed.
- Write a checklist for a code review: "Is this agent actually needed, or is it over-engineered?" Include at least 5 specific red flags to look for.
Navigation: <- 4.15.c When to Use Agents | 4.15.e -- Multi-Agent Complexity ->