4.15 — Understanding AI Agents: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps -- reopen
README.md -> 4.15.a...4.15.e.
- Practice --
4.15-Exercise-Questions.md.
- Polish answers --
4.15-Interview-Questions.md.
Core vocabulary
| Term | One-liner |
|---|
| AI agent | An LLM wrapped in an observe-think-act loop with access to tools |
| Single LLM call | Input in, output out -- one pass, no tools, no loops |
| ReAct | Reasoning + Acting -- interleave chain-of-thought reasoning with tool calls (Yao et al., 2022) |
| Tool use | External functions the agent can call -- search, database, API, calculator, code execution |
| Observation | The result of a tool call, fed back into the agent's context for the next reasoning step |
| Planning | Breaking a complex task into sub-tasks -- implicit (on the fly), explicit (upfront), or adaptive (re-plan) |
| Short-term memory | The message history that accumulates during a single agent run (grows each iteration) |
| Long-term memory | Persistent store (vector DB / database) that retains context across conversations |
| Prompt chain | Fixed sequence of 2-3 LLM calls -- no loop, no dynamic decisions |
| Tool-augmented chain | Fixed sequence of LLM calls with tool calls -- known steps, predictable path |
| Agent loop | The core cycle: perceive -> reason -> act -> observe -> repeat until done |
| Multi-agent system | Two or more specialized agents collaborating on a task |
| Orchestrator | Optional coordinator agent that assigns tasks to specialized agents |
| Handoff | Passing output from one agent to the next -- each is a potential point of information loss |
| maxIterations | Hard safety limit on how many times the agent loop can repeat |
Agent vs single call comparison
| Dimension | Single LLM Call | AI Agent |
|---|
| Flow | Input -> Output (one pass) | Observe -> Think -> Act -> loop |
| Tools | None | Search, APIs, databases, calculators |
| Steps | 1 | Variable (2 to 50+ LLM calls) |
| Latency | 1-3 seconds | 10-60 seconds |
| Cost per request | ~$0.003-0.005 | ~$0.05-0.50 |
| Complexity | Simple | Significant (error handling, loops, state) |
| Determinism | More predictable | Less predictable (path varies by run) |
| Error surface | Model error only | Model + tool + loop errors |
| Debugging | Easy (input -> output) | Hard (which step went wrong?) |
| Best for | Single-step tasks, all info in prompt | Multi-step tasks, external data, actions |
Agent loop architecture
┌──────────────────────────────────────────────────────────────┐
│ AGENT LOOP │
│ │
│ User Task │
│ | │
│ v │
│ ┌──────────┐ │
│ │ OBSERVE │ <-- tool results / user input │
│ └────┬─────┘ │
│ | │
│ v │
│ ┌──────────┐ │
│ │ THINK │ <-- LLM reasons about what to do next │
│ └────┬─────┘ │
│ | │
│ v │
│ ┌──────────┐ ┌──────────┐ │
│ │ ACT │────>│ EXECUTE │ <-- run tool, get result │
│ │(tool call│ │ (tool) │ │
│ │ or final │ └────┬─────┘ │
│ │ answer) │ | │
│ └──────────┘ | │
│ ^ | │
│ └────────────────┘ │
│ │
│ Loop repeats until LLM returns final answer (finish: stop) │
│ OR maxIterations is reached. │
└──────────────────────────────────────────────────────────────┘
Four core components
┌──────────────────────────────────────────────────┐
│ LLM BRAIN │
│ Reasons, decides, interprets, generates │
│ (configured via system prompt) │
└──────┬──────────────┬──────────────┬──────────────┘
| | |
v v v
┌────────────┐ ┌────────────┐ ┌────────────┐
│ TOOLS │ │ MEMORY │ │ PLANNING │
│ search, │ │ short-term │ │ implicit, │
│ database, │ │ (messages) │ │ explicit, │
│ API, calc, │ │ long-term │ │ adaptive │
│ code exec │ │ (vector DB)│ │ │
└────────────┘ └────────────┘ └────────────┘
| Component | Key Fact |
|---|
| LLM Brain | Stronger models = better agents. GPT-4o for brain, GPT-4o-mini for sub-tasks. |
| Tools | Must have clear descriptions, specific parameters, error handling, security validation. |
| Short-term memory | Grows every iteration. Compact by summarizing older steps. |
| Long-term memory | Vector DB / database. Persists across conversations. Add only when needed. |
| Planning | Implicit (3-5 steps), explicit (10+ steps), adaptive (re-plan when results change). |
ReAct cycle
1. THOUGHT: "I need the stock price of AAPL."
2. ACTION: stock_price({ symbol: "AAPL" })
3. OBSERVATION: { price: 178.52, change: +2.3% }
4. THOUGHT: "Now I need the P/E ratio."
5. ACTION: financials({ symbol: "AAPL", metric: "pe" })
6. OBSERVATION: { pe_ratio: 28.5 }
7. THOUGHT: "I have both pieces of information."
8. FINAL ANSWER: "AAPL trades at $178.52 with a P/E of 28.5."
Why ReAct > acting alone: Reasoning before acting decomposes complex queries and avoids obvious tool-call mistakes.
When to use agents -- decision tree
Can the LLM answer with ONLY the prompt context?
|
YES --> Single call. Stop.
|
NO
|
v
Does the task require external data or actions (tools)?
|
NO --> Prompt chain (2-3 fixed LLM calls). Stop.
|
YES
|
v
Is the sequence of steps known in advance?
|
YES --> Tool-augmented chain (fixed steps + tools). Stop.
|
NO
|
v
Can it be done in under 3 decisions?
|
YES --> Simple agent (maxIterations: 5). Stop.
|
NO --> Full agent with planning (maxIterations: 15+).
Complexity spectrum
Simple <──────────────────────────────────────────> Complex
Single Prompt LLM + Tool- Full
LLM Call Chain Single Augmented Agent
Tool Call Chain (loop)
$0.003 $0.01 $0.01 $0.02-0.05 $0.05-0.50
1-3s 3-8s 2-5s 5-10s 10-60s
~3% err ~8% err ~5% err ~10% err ~20% err
RULE: Pick the CHEAPEST row that solves the problem.
When NOT to use agents -- checklist
[ ] Can a single LLM call solve this?
YES -> Stop. Use a single call.
[ ] Can a fixed chain of 2-3 LLM calls solve this?
YES -> Stop. Use a prompt chain.
[ ] Can deterministic code solve this?
YES -> Stop. Write code. No AI needed.
[ ] Is latency acceptable? (Users wait 10-30s?)
NO -> Stop. Use a faster approach or run async.
[ ] Is cost acceptable at scale?
NO -> Stop. Optimize or use a cheaper approach.
[ ] Do the steps GENUINELY vary per request?
NO -> Stop. You have a chain, not an agent.
[ ] Can you define clear success metrics?
NO -> Stop. You cannot evaluate without metrics.
All checks passed? Build an agent. Start minimal.
Signs you are over-engineering
| Red Flag | What to Do Instead |
|---|
| Agent has 0 tools | Use a single call (no tools = no agent) |
| Agent always takes the same 3 steps | Hard-code as a prompt chain |
| Agent output could be a template with variables | Use a template engine |
| Agent makes 1 tool call then stops | Use function calling (no loop needed) |
| Agent success rate is below 70% | Task may not suit an agent -- simplify |
| Agent is for an internal tool used by 5 people | Use a simpler solution |
Token cost math
Single call:
1 call = ~2,000 tokens = ~$0.005
Monthly (100K requests): $500
Agent (5 steps):
Step 1: 2,000 tokens
Step 2: 3,500 tokens (includes step 1 context)
Step 3: 5,000 tokens
Step 4: 6,500 tokens
Step 5: 8,000 tokens
Total: 25,000 tokens = ~$0.06
Monthly (100K requests): $6,000
Agent is ~12x more expensive than a single call.
Context grows NON-LINEARLY because each call includes ALL prior messages.
Error rate math
Single call:
P(failure) = 5% per call
1 call -> 5% failure rate
Agent (5 steps):
5 LLM calls + 5 tool calls = 10 failure points
P(all succeed) = 0.95^10 = 59.9%
~40% chance something goes wrong
Multi-agent (3 agents, 5 steps each, 3 handoffs):
P(agent 1) = 0.95^5 = 0.774
P(agent 2) = 0.95^5 = 0.774
P(agent 3) = 0.95^5 = 0.774
P(handoffs) = 0.95^3 = 0.857
P(all) = 0.774 * 0.774 * 0.774 * 0.857 = 0.398
~60% failure rate!
Latency guide
< 1s "Instant" Single call (fast model)
1-3s "Quick" Single call (standard)
3-5s "Noticeable" Prompt chain
5-10s "Slow" Simple agent
10-30s "Very slow" Full agent (users switch tabs)
30+s "Broken" Multi-agent (run async only)
RULE: If the user is staring at a spinner, do NOT use an agent.
If the task runs in the background, agents are fine.
Multi-agent complexity
Communication paths: n(n-1)/2
1 agent: 0 paths
2 agents: 1 path
3 agents: 3 paths
5 agents: 10 paths
10 agents: 45 paths
Each path = potential miscommunication + latency + error propagation
The "telephone game" problem
Agent 1 output: "Revenue $4.2B (confidence: 0.95, range: $3.9-4.5B)"
Agent 2 sees: "Revenue $4.2B" (confidence lost)
Agent 3 sees: "Revenue ~$4B" (precision lost)
Final report: "Revenue is $4B" (presented as definitive fact)
Information degrades at EVERY handoff.
When multi-agent IS justified
| Use Case | Why Single Agent Fails |
|---|
| Genuinely different expertise domains | System prompt too long, tools too diverse (legal + technical) |
| Adversarial quality control | One agent checks another's work (writer + editor) |
| Scale beyond one agent | Context window overflow, 15+ tools, parallel work streams |
When multi-agent is OVERKILL
| Proposed Multi-Agent | Better Approach |
|---|
| Researcher + summarizer | Single agent that researches, then summarizes |
| Planner + executor | Single agent with explicit planning step |
| Coder + tester | Single agent with code-execution tool |
| One agent per API | Single agent with multiple tools |
Common gotchas
1. Building an agent for a single-call task
-> Adds 12x cost and 5-10x latency for zero benefit
2. Using an agent where deterministic code works
-> if/else is faster, cheaper, and ALWAYS correct
3. Forgetting maxIterations
-> Agent loops forever, burns all your token budget
4. Not setting a token budget per task
-> One expensive conversation can cost more than 1,000 normal ones
5. No error handling on tool execution
-> Tool throws, agent crashes, user gets nothing
6. Trusting LLM-generated SQL without validation
-> SQL injection risk (always validate: SELECT only, parameterized)
7. Passing plain text between agents instead of structured data
-> Information loss at every handoff (confidence, sources, caveats)
8. Building multi-agent when single agent with more tools works
-> Every additional agent multiplies failure probability
9. No metrics or evaluation
-> "It seems to work" is not engineering -- measure completion,
cost, latency, correctness, and safety
10. Skipping the hybrid approach
-> Try single call first, fall back to agent only when needed
(handles 60-80% of requests without the loop)
Security essentials
ALWAYS set:
maxIterations: 15 (hard loop limit)
maxTokenBudget: 50,000 (per-task token cap)
maxToolCalls: 20 (per-task tool limit)
allowedTools: [...] (whitelist only)
requireApproval: [...] (human-in-the-loop for dangerous actions)
NEVER allow:
- Arbitrary SQL execution (SELECT only, validate inputs)
- Arbitrary code execution without sandboxing
- Outbound communication without human approval
- Access to tools beyond the agent's scope
Agent success metrics
| Category | Metrics |
|---|
| Effectiveness | Task completion rate, correctness rate, escalation rate |
| Efficiency | Avg iterations, avg tokens, avg cost, avg latency |
| Reliability | Error rate, max-iteration rate, tool failure rate |
| Safety | Hallucination rate, out-of-scope action rate |
One-line summary
Agent = LLM + Tools + Loop. Use the simplest approach that solves the problem. Single call > prompt chain > tool-augmented chain > agent > multi-agent. Complexity is a cost, not a feature.
End of 4.15 quick revision.