Episode 4 — Generative AI Engineering / 4.15 — Understanding AI Agents

4.15 — Understanding AI Agents: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

  1. Skim before labs or interviews.
  2. Drill gaps -- reopen README.md -> 4.15.a...4.15.e.
  3. Practice -- 4.15-Exercise-Questions.md.
  4. Polish answers -- 4.15-Interview-Questions.md.

Core vocabulary

TermOne-liner
AI agentAn LLM wrapped in an observe-think-act loop with access to tools
Single LLM callInput in, output out -- one pass, no tools, no loops
ReActReasoning + Acting -- interleave chain-of-thought reasoning with tool calls (Yao et al., 2022)
Tool useExternal functions the agent can call -- search, database, API, calculator, code execution
ObservationThe result of a tool call, fed back into the agent's context for the next reasoning step
PlanningBreaking a complex task into sub-tasks -- implicit (on the fly), explicit (upfront), or adaptive (re-plan)
Short-term memoryThe message history that accumulates during a single agent run (grows each iteration)
Long-term memoryPersistent store (vector DB / database) that retains context across conversations
Prompt chainFixed sequence of 2-3 LLM calls -- no loop, no dynamic decisions
Tool-augmented chainFixed sequence of LLM calls with tool calls -- known steps, predictable path
Agent loopThe core cycle: perceive -> reason -> act -> observe -> repeat until done
Multi-agent systemTwo or more specialized agents collaborating on a task
OrchestratorOptional coordinator agent that assigns tasks to specialized agents
HandoffPassing output from one agent to the next -- each is a potential point of information loss
maxIterationsHard safety limit on how many times the agent loop can repeat

Agent vs single call comparison

DimensionSingle LLM CallAI Agent
FlowInput -> Output (one pass)Observe -> Think -> Act -> loop
ToolsNoneSearch, APIs, databases, calculators
Steps1Variable (2 to 50+ LLM calls)
Latency1-3 seconds10-60 seconds
Cost per request~$0.003-0.005~$0.05-0.50
ComplexitySimpleSignificant (error handling, loops, state)
DeterminismMore predictableLess predictable (path varies by run)
Error surfaceModel error onlyModel + tool + loop errors
DebuggingEasy (input -> output)Hard (which step went wrong?)
Best forSingle-step tasks, all info in promptMulti-step tasks, external data, actions

Agent loop architecture

┌──────────────────────────────────────────────────────────────┐
│                       AGENT LOOP                              │
│                                                              │
│  User Task                                                   │
│      |                                                       │
│      v                                                       │
│  ┌──────────┐                                                │
│  │ OBSERVE  │ <-- tool results / user input                  │
│  └────┬─────┘                                                │
│       |                                                      │
│       v                                                      │
│  ┌──────────┐                                                │
│  │  THINK   │ <-- LLM reasons about what to do next          │
│  └────┬─────┘                                                │
│       |                                                      │
│       v                                                      │
│  ┌──────────┐     ┌──────────┐                               │
│  │   ACT    │────>│ EXECUTE  │ <-- run tool, get result      │
│  │(tool call│     │  (tool)  │                               │
│  │ or final │     └────┬─────┘                               │
│  │ answer)  │          |                                     │
│  └──────────┘          |                                     │
│       ^                |                                     │
│       └────────────────┘                                     │
│                                                              │
│  Loop repeats until LLM returns final answer (finish: stop)  │
│  OR maxIterations is reached.                                │
└──────────────────────────────────────────────────────────────┘

Four core components

┌──────────────────────────────────────────────────┐
│               LLM BRAIN                           │
│  Reasons, decides, interprets, generates          │
│  (configured via system prompt)                   │
└──────┬──────────────┬──────────────┬──────────────┘
       |              |              |
       v              v              v
┌────────────┐ ┌────────────┐ ┌────────────┐
│   TOOLS    │ │   MEMORY   │ │  PLANNING  │
│ search,    │ │ short-term │ │ implicit,  │
│ database,  │ │ (messages) │ │ explicit,  │
│ API, calc, │ │ long-term  │ │ adaptive   │
│ code exec  │ │ (vector DB)│ │            │
└────────────┘ └────────────┘ └────────────┘
ComponentKey Fact
LLM BrainStronger models = better agents. GPT-4o for brain, GPT-4o-mini for sub-tasks.
ToolsMust have clear descriptions, specific parameters, error handling, security validation.
Short-term memoryGrows every iteration. Compact by summarizing older steps.
Long-term memoryVector DB / database. Persists across conversations. Add only when needed.
PlanningImplicit (3-5 steps), explicit (10+ steps), adaptive (re-plan when results change).

ReAct cycle

1. THOUGHT:  "I need the stock price of AAPL."
2. ACTION:   stock_price({ symbol: "AAPL" })
3. OBSERVATION: { price: 178.52, change: +2.3% }
4. THOUGHT:  "Now I need the P/E ratio."
5. ACTION:   financials({ symbol: "AAPL", metric: "pe" })
6. OBSERVATION: { pe_ratio: 28.5 }
7. THOUGHT:  "I have both pieces of information."
8. FINAL ANSWER: "AAPL trades at $178.52 with a P/E of 28.5."

Why ReAct > acting alone: Reasoning before acting decomposes complex queries and avoids obvious tool-call mistakes.


When to use agents -- decision tree

Can the LLM answer with ONLY the prompt context?
  |
  YES --> Single call. Stop.
  |
  NO
  |
  v
Does the task require external data or actions (tools)?
  |
  NO --> Prompt chain (2-3 fixed LLM calls). Stop.
  |
  YES
  |
  v
Is the sequence of steps known in advance?
  |
  YES --> Tool-augmented chain (fixed steps + tools). Stop.
  |
  NO
  |
  v
Can it be done in under 3 decisions?
  |
  YES --> Simple agent (maxIterations: 5). Stop.
  |
  NO --> Full agent with planning (maxIterations: 15+).

Complexity spectrum

Simple <──────────────────────────────────────────> Complex

Single     Prompt     LLM +       Tool-         Full
LLM Call   Chain      Single      Augmented     Agent
                      Tool Call   Chain         (loop)

$0.003     $0.01      $0.01       $0.02-0.05    $0.05-0.50
1-3s       3-8s       2-5s        5-10s         10-60s
~3% err    ~8% err    ~5% err     ~10% err      ~20% err

RULE: Pick the CHEAPEST row that solves the problem.

When NOT to use agents -- checklist

[ ] Can a single LLM call solve this?
    YES -> Stop. Use a single call.

[ ] Can a fixed chain of 2-3 LLM calls solve this?
    YES -> Stop. Use a prompt chain.

[ ] Can deterministic code solve this?
    YES -> Stop. Write code. No AI needed.

[ ] Is latency acceptable? (Users wait 10-30s?)
    NO -> Stop. Use a faster approach or run async.

[ ] Is cost acceptable at scale?
    NO -> Stop. Optimize or use a cheaper approach.

[ ] Do the steps GENUINELY vary per request?
    NO -> Stop. You have a chain, not an agent.

[ ] Can you define clear success metrics?
    NO -> Stop. You cannot evaluate without metrics.

All checks passed? Build an agent. Start minimal.

Signs you are over-engineering

Red FlagWhat to Do Instead
Agent has 0 toolsUse a single call (no tools = no agent)
Agent always takes the same 3 stepsHard-code as a prompt chain
Agent output could be a template with variablesUse a template engine
Agent makes 1 tool call then stopsUse function calling (no loop needed)
Agent success rate is below 70%Task may not suit an agent -- simplify
Agent is for an internal tool used by 5 peopleUse a simpler solution

Token cost math

Single call:
  1 call = ~2,000 tokens = ~$0.005
  Monthly (100K requests): $500

Agent (5 steps):
  Step 1: 2,000 tokens
  Step 2: 3,500 tokens (includes step 1 context)
  Step 3: 5,000 tokens
  Step 4: 6,500 tokens
  Step 5: 8,000 tokens
  Total:  25,000 tokens = ~$0.06
  Monthly (100K requests): $6,000

Agent is ~12x more expensive than a single call.
Context grows NON-LINEARLY because each call includes ALL prior messages.

Error rate math

Single call:
  P(failure) = 5% per call
  1 call -> 5% failure rate

Agent (5 steps):
  5 LLM calls + 5 tool calls = 10 failure points
  P(all succeed) = 0.95^10 = 59.9%
  ~40% chance something goes wrong

Multi-agent (3 agents, 5 steps each, 3 handoffs):
  P(agent 1) = 0.95^5 = 0.774
  P(agent 2) = 0.95^5 = 0.774
  P(agent 3) = 0.95^5 = 0.774
  P(handoffs) = 0.95^3 = 0.857
  P(all) = 0.774 * 0.774 * 0.774 * 0.857 = 0.398
  ~60% failure rate!

Latency guide

< 1s       "Instant"      Single call (fast model)
1-3s       "Quick"        Single call (standard)
3-5s       "Noticeable"   Prompt chain
5-10s      "Slow"         Simple agent
10-30s     "Very slow"    Full agent (users switch tabs)
30+s       "Broken"       Multi-agent (run async only)

RULE: If the user is staring at a spinner, do NOT use an agent.
      If the task runs in the background, agents are fine.

Multi-agent complexity

Communication paths: n(n-1)/2

1 agent:   0 paths
2 agents:  1 path
3 agents:  3 paths
5 agents:  10 paths
10 agents: 45 paths

Each path = potential miscommunication + latency + error propagation

The "telephone game" problem

Agent 1 output: "Revenue $4.2B (confidence: 0.95, range: $3.9-4.5B)"
Agent 2 sees:   "Revenue $4.2B" (confidence lost)
Agent 3 sees:   "Revenue ~$4B"  (precision lost)
Final report:   "Revenue is $4B" (presented as definitive fact)

Information degrades at EVERY handoff.

When multi-agent IS justified

Use CaseWhy Single Agent Fails
Genuinely different expertise domainsSystem prompt too long, tools too diverse (legal + technical)
Adversarial quality controlOne agent checks another's work (writer + editor)
Scale beyond one agentContext window overflow, 15+ tools, parallel work streams

When multi-agent is OVERKILL

Proposed Multi-AgentBetter Approach
Researcher + summarizerSingle agent that researches, then summarizes
Planner + executorSingle agent with explicit planning step
Coder + testerSingle agent with code-execution tool
One agent per APISingle agent with multiple tools

Common gotchas

1. Building an agent for a single-call task
   -> Adds 12x cost and 5-10x latency for zero benefit

2. Using an agent where deterministic code works
   -> if/else is faster, cheaper, and ALWAYS correct

3. Forgetting maxIterations
   -> Agent loops forever, burns all your token budget

4. Not setting a token budget per task
   -> One expensive conversation can cost more than 1,000 normal ones

5. No error handling on tool execution
   -> Tool throws, agent crashes, user gets nothing

6. Trusting LLM-generated SQL without validation
   -> SQL injection risk (always validate: SELECT only, parameterized)

7. Passing plain text between agents instead of structured data
   -> Information loss at every handoff (confidence, sources, caveats)

8. Building multi-agent when single agent with more tools works
   -> Every additional agent multiplies failure probability

9. No metrics or evaluation
   -> "It seems to work" is not engineering -- measure completion,
      cost, latency, correctness, and safety

10. Skipping the hybrid approach
    -> Try single call first, fall back to agent only when needed
       (handles 60-80% of requests without the loop)

Security essentials

ALWAYS set:
  maxIterations:    15        (hard loop limit)
  maxTokenBudget:   50,000    (per-task token cap)
  maxToolCalls:     20        (per-task tool limit)
  allowedTools:     [...]     (whitelist only)
  requireApproval:  [...]     (human-in-the-loop for dangerous actions)

NEVER allow:
  - Arbitrary SQL execution (SELECT only, validate inputs)
  - Arbitrary code execution without sandboxing
  - Outbound communication without human approval
  - Access to tools beyond the agent's scope

Agent success metrics

CategoryMetrics
EffectivenessTask completion rate, correctness rate, escalation rate
EfficiencyAvg iterations, avg tokens, avg cost, avg latency
ReliabilityError rate, max-iteration rate, tool failure rate
SafetyHallucination rate, out-of-scope action rate

One-line summary

Agent = LLM + Tools + Loop. Use the simplest approach that solves the problem. Single call > prompt chain > tool-augmented chain > agent > multi-agent. Complexity is a cost, not a feature.


End of 4.15 quick revision.