Episode 4 — Generative AI Engineering / 4.15 — Understanding AI Agents

4.15 — Understanding AI Agents: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps -- reopen README.md -> 4.15.a...4.15.e.
Practice -- 4.15-Exercise-Questions.md.
Polish answers -- 4.15-Interview-Questions.md.

Core vocabulary

Term	One-liner
AI agent	An LLM wrapped in an observe-think-act loop with access to tools
Single LLM call	Input in, output out -- one pass, no tools, no loops
ReAct	Reasoning + Acting -- interleave chain-of-thought reasoning with tool calls (Yao et al., 2022)
Tool use	External functions the agent can call -- search, database, API, calculator, code execution
Observation	The result of a tool call, fed back into the agent's context for the next reasoning step
Planning	Breaking a complex task into sub-tasks -- implicit (on the fly), explicit (upfront), or adaptive (re-plan)
Short-term memory	The message history that accumulates during a single agent run (grows each iteration)
Long-term memory	Persistent store (vector DB / database) that retains context across conversations
Prompt chain	Fixed sequence of 2-3 LLM calls -- no loop, no dynamic decisions
Tool-augmented chain	Fixed sequence of LLM calls with tool calls -- known steps, predictable path
Agent loop	The core cycle: perceive -> reason -> act -> observe -> repeat until done
Multi-agent system	Two or more specialized agents collaborating on a task
Orchestrator	Optional coordinator agent that assigns tasks to specialized agents
Handoff	Passing output from one agent to the next -- each is a potential point of information loss
maxIterations	Hard safety limit on how many times the agent loop can repeat

Agent vs single call comparison

Dimension	Single LLM Call	AI Agent
Flow	Input -> Output (one pass)	Observe -> Think -> Act -> loop
Tools	None	Search, APIs, databases, calculators
Steps	1	Variable (2 to 50+ LLM calls)
Latency	1-3 seconds	10-60 seconds
Cost per request	~$0.003-0.005	~$0.05-0.50
Complexity	Simple	Significant (error handling, loops, state)
Determinism	More predictable	Less predictable (path varies by run)
Error surface	Model error only	Model + tool + loop errors
Debugging	Easy (input -> output)	Hard (which step went wrong?)
Best for	Single-step tasks, all info in prompt	Multi-step tasks, external data, actions

Agent loop architecture

┌──────────────────────────────────────────────────────────────┐
│                       AGENT LOOP                              │
│                                                              │
│  User Task                                                   │
│      |                                                       │
│      v                                                       │
│  ┌──────────┐                                                │
│  │ OBSERVE  │ <-- tool results / user input                  │
│  └────┬─────┘                                                │
│       |                                                      │
│       v                                                      │
│  ┌──────────┐                                                │
│  │  THINK   │ <-- LLM reasons about what to do next          │
│  └────┬─────┘                                                │
│       |                                                      │
│       v                                                      │
│  ┌──────────┐     ┌──────────┐                               │
│  │   ACT    │────>│ EXECUTE  │ <-- run tool, get result      │
│  │(tool call│     │  (tool)  │                               │
│  │ or final │     └────┬─────┘                               │
│  │ answer)  │          |                                     │
│  └──────────┘          |                                     │
│       ^                |                                     │
│       └────────────────┘                                     │
│                                                              │
│  Loop repeats until LLM returns final answer (finish: stop)  │
│  OR maxIterations is reached.                                │
└──────────────────────────────────────────────────────────────┘

Four core components

┌──────────────────────────────────────────────────┐
│               LLM BRAIN                           │
│  Reasons, decides, interprets, generates          │
│  (configured via system prompt)                   │
└──────┬──────────────┬──────────────┬──────────────┘
       |              |              |
       v              v              v
┌────────────┐ ┌────────────┐ ┌────────────┐
│   TOOLS    │ │   MEMORY   │ │  PLANNING  │
│ search,    │ │ short-term │ │ implicit,  │
│ database,  │ │ (messages) │ │ explicit,  │
│ API, calc, │ │ long-term  │ │ adaptive   │
│ code exec  │ │ (vector DB)│ │            │
└────────────┘ └────────────┘ └────────────┘

Component	Key Fact
LLM Brain	Stronger models = better agents. GPT-4o for brain, GPT-4o-mini for sub-tasks.
Tools	Must have clear descriptions, specific parameters, error handling, security validation.
Short-term memory	Grows every iteration. Compact by summarizing older steps.
Long-term memory	Vector DB / database. Persists across conversations. Add only when needed.
Planning	Implicit (3-5 steps), explicit (10+ steps), adaptive (re-plan when results change).

ReAct cycle

1. THOUGHT:  "I need the stock price of AAPL."
2. ACTION:   stock_price({ symbol: "AAPL" })
3. OBSERVATION: { price: 178.52, change: +2.3% }
4. THOUGHT:  "Now I need the P/E ratio."
5. ACTION:   financials({ symbol: "AAPL", metric: "pe" })
6. OBSERVATION: { pe_ratio: 28.5 }
7. THOUGHT:  "I have both pieces of information."
8. FINAL ANSWER: "AAPL trades at $178.52 with a P/E of 28.5."

Why ReAct > acting alone: Reasoning before acting decomposes complex queries and avoids obvious tool-call mistakes.

When to use agents -- decision tree

Can the LLM answer with ONLY the prompt context?
  |
  YES --> Single call. Stop.
  |
  NO
  |
  v
Does the task require external data or actions (tools)?
  |
  NO --> Prompt chain (2-3 fixed LLM calls). Stop.
  |
  YES
  |
  v
Is the sequence of steps known in advance?
  |
  YES --> Tool-augmented chain (fixed steps + tools). Stop.
  |
  NO
  |
  v
Can it be done in under 3 decisions?
  |
  YES --> Simple agent (maxIterations: 5). Stop.
  |
  NO --> Full agent with planning (maxIterations: 15+).

Complexity spectrum

Simple <──────────────────────────────────────────> Complex

Single     Prompt     LLM +       Tool-         Full
LLM Call   Chain      Single      Augmented     Agent
                      Tool Call   Chain         (loop)

$0.003     $0.01      $0.01       $0.02-0.05    $0.05-0.50
1-3s       3-8s       2-5s        5-10s         10-60s
~3% err    ~8% err    ~5% err     ~10% err      ~20% err

RULE: Pick the CHEAPEST row that solves the problem.

When NOT to use agents -- checklist

[ ] Can a single LLM call solve this?
    YES -> Stop. Use a single call.

[ ] Can a fixed chain of 2-3 LLM calls solve this?
    YES -> Stop. Use a prompt chain.

[ ] Can deterministic code solve this?
    YES -> Stop. Write code. No AI needed.

[ ] Is latency acceptable? (Users wait 10-30s?)
    NO -> Stop. Use a faster approach or run async.

[ ] Is cost acceptable at scale?
    NO -> Stop. Optimize or use a cheaper approach.

[ ] Do the steps GENUINELY vary per request?
    NO -> Stop. You have a chain, not an agent.

[ ] Can you define clear success metrics?
    NO -> Stop. You cannot evaluate without metrics.

All checks passed? Build an agent. Start minimal.

Signs you are over-engineering

Red Flag	What to Do Instead
Agent has 0 tools	Use a single call (no tools = no agent)
Agent always takes the same 3 steps	Hard-code as a prompt chain
Agent output could be a template with variables	Use a template engine
Agent makes 1 tool call then stops	Use function calling (no loop needed)
Agent success rate is below 70%	Task may not suit an agent -- simplify
Agent is for an internal tool used by 5 people	Use a simpler solution

Token cost math

Single call:
  1 call = ~2,000 tokens = ~$0.005
  Monthly (100K requests): $500

Agent (5 steps):
  Step 1: 2,000 tokens
  Step 2: 3,500 tokens (includes step 1 context)
  Step 3: 5,000 tokens
  Step 4: 6,500 tokens
  Step 5: 8,000 tokens
  Total:  25,000 tokens = ~$0.06
  Monthly (100K requests): $6,000

Agent is ~12x more expensive than a single call.
Context grows NON-LINEARLY because each call includes ALL prior messages.

Error rate math

Single call:
  P(failure) = 5% per call
  1 call -> 5% failure rate

Agent (5 steps):
  5 LLM calls + 5 tool calls = 10 failure points
  P(all succeed) = 0.95^10 = 59.9%
  ~40% chance something goes wrong

Multi-agent (3 agents, 5 steps each, 3 handoffs):
  P(agent 1) = 0.95^5 = 0.774
  P(agent 2) = 0.95^5 = 0.774
  P(agent 3) = 0.95^5 = 0.774
  P(handoffs) = 0.95^3 = 0.857
  P(all) = 0.774 * 0.774 * 0.774 * 0.857 = 0.398
  ~60% failure rate!

Latency guide

< 1s       "Instant"      Single call (fast model)
1-3s       "Quick"        Single call (standard)
3-5s       "Noticeable"   Prompt chain
5-10s      "Slow"         Simple agent
10-30s     "Very slow"    Full agent (users switch tabs)
30+s       "Broken"       Multi-agent (run async only)

RULE: If the user is staring at a spinner, do NOT use an agent.
      If the task runs in the background, agents are fine.

Multi-agent complexity

Communication paths: n(n-1)/2

1 agent:   0 paths
2 agents:  1 path
3 agents:  3 paths
5 agents:  10 paths
10 agents: 45 paths

Each path = potential miscommunication + latency + error propagation

The "telephone game" problem

Agent 1 output: "Revenue $4.2B (confidence: 0.95, range: $3.9-4.5B)"
Agent 2 sees:   "Revenue $4.2B" (confidence lost)
Agent 3 sees:   "Revenue ~$4B"  (precision lost)
Final report:   "Revenue is $4B" (presented as definitive fact)

Information degrades at EVERY handoff.

When multi-agent IS justified

Use Case	Why Single Agent Fails
Genuinely different expertise domains	System prompt too long, tools too diverse (legal + technical)
Adversarial quality control	One agent checks another's work (writer + editor)
Scale beyond one agent	Context window overflow, 15+ tools, parallel work streams

When multi-agent is OVERKILL

Proposed Multi-Agent	Better Approach
Researcher + summarizer	Single agent that researches, then summarizes
Planner + executor	Single agent with explicit planning step
Coder + tester	Single agent with code-execution tool
One agent per API	Single agent with multiple tools

Common gotchas

1. Building an agent for a single-call task
   -> Adds 12x cost and 5-10x latency for zero benefit

2. Using an agent where deterministic code works
   -> if/else is faster, cheaper, and ALWAYS correct

3. Forgetting maxIterations
   -> Agent loops forever, burns all your token budget

4. Not setting a token budget per task
   -> One expensive conversation can cost more than 1,000 normal ones

5. No error handling on tool execution
   -> Tool throws, agent crashes, user gets nothing

6. Trusting LLM-generated SQL without validation
   -> SQL injection risk (always validate: SELECT only, parameterized)

7. Passing plain text between agents instead of structured data
   -> Information loss at every handoff (confidence, sources, caveats)

8. Building multi-agent when single agent with more tools works
   -> Every additional agent multiplies failure probability

9. No metrics or evaluation
   -> "It seems to work" is not engineering -- measure completion,
      cost, latency, correctness, and safety

10. Skipping the hybrid approach
    -> Try single call first, fall back to agent only when needed
       (handles 60-80% of requests without the loop)

Security essentials

ALWAYS set:
  maxIterations:    15        (hard loop limit)
  maxTokenBudget:   50,000    (per-task token cap)
  maxToolCalls:     20        (per-task tool limit)
  allowedTools:     [...]     (whitelist only)
  requireApproval:  [...]     (human-in-the-loop for dangerous actions)

NEVER allow:
  - Arbitrary SQL execution (SELECT only, validate inputs)
  - Arbitrary code execution without sandboxing
  - Outbound communication without human approval
  - Access to tools beyond the agent's scope

Agent success metrics

Category	Metrics
Effectiveness	Task completion rate, correctness rate, escalation rate
Efficiency	Avg iterations, avg tokens, avg cost, avg latency
Reliability	Error rate, max-iteration rate, tool failure rate
Safety	Hallucination rate, out-of-scope action rate

One-line summary

Agent = LLM + Tools + Loop. Use the simplest approach that solves the problem. Single call > prompt chain > tool-augmented chain > agent > multi-agent. Complexity is a cost, not a feature.

End of 4.15 quick revision.