Episode 4 — Generative AI Engineering / 4.15 — Understanding AI Agents

4.15.a — Agent vs Single LLM Call

In one sentence: A single LLM call is input in, output out — one pass, no tools, no loops; an AI agent wraps the LLM in an observe-think-act loop that can call tools, inspect results, and keep going until the task is complete.

Navigation: <- 4.15 Overview | 4.15.b -- Agent Architecture ->

1. What Is a Single LLM Call?

Everything you have built so far in this course is a single LLM call (sometimes called a "one-shot" call). You send a prompt, the model returns a response, and you are done.

┌────────────┐        ┌─────────┐        ┌────────────┐
│   Input     │ ─────► │   LLM   │ ─────► │   Output   │
│  (prompt)   │        │ (model) │        │ (response) │
└────────────┘        └─────────┘        └────────────┘

Characteristics of a single LLM call:

One input, one output — the model processes your prompt exactly once
No external tools — the model generates from its training data and the prompt context
No loops — there is no feedback mechanism; the model does not see or react to its own output
Stateless — each call is independent (unless you manually pass conversation history)
Predictable cost — you know roughly how many tokens will be consumed

// Single LLM call — summarize a document
import OpenAI from "openai";

const client = new OpenAI();

async function summarize(text) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature: 0,
    messages: [
      { role: "system", content: "Summarize the following text in 3 bullet points." },
      { role: "user", content: text },
    ],
  });

  return response.choices[0].message.content;
}

// One input -> one output -> done
const summary = await summarize("Long article text here...");
console.log(summary);

This is perfectly fine for tasks where the model has all the information it needs in the prompt and can answer in one pass.

2. What Is an AI Agent?

An AI agent is an LLM wrapped in a loop with access to tools. The critical difference: the agent can observe, reason, act, and then observe again — repeating this cycle until the task is complete.

┌──────────────────────────────────────────────────────────────────────┐
│                        THE AGENT LOOP                                │
│                                                                      │
│                    ┌──────────────┐                                   │
│                    │   OBSERVE    │◄─────────────────────┐            │
│                    │  (read tool  │                      │            │
│                    │   results,   │                      │            │
│                    │   user input)│                      │            │
│                    └──────┬───────┘                      │            │
│                           │                              │            │
│                           ▼                              │            │
│                    ┌──────────────┐                      │            │
│                    │    THINK     │                      │            │
│                    │  (LLM reasons│                      │            │
│                    │   about what │                      │            │
│                    │   to do next)│                      │            │
│                    └──────┬───────┘                      │            │
│                           │                              │            │
│                           ▼                              │            │
│                    ┌──────────────┐               ┌──────┴───────┐   │
│                    │     ACT      │──── tool ────►│   EXECUTE    │   │
│                    │  (choose a   │    call       │  (run tool,  │   │
│                    │   tool or    │               │   get result)│   │
│                    │   respond)   │               └──────────────┘   │
│                    └──────────────┘                                   │
│                                                                      │
│   Loop continues until the agent decides the task is complete        │
│   and returns a final answer to the user.                            │
└──────────────────────────────────────────────────────────────────────┘

An AI agent has four defining characteristics:

Characteristic	What It Means
LLM brain	The model reasons about the task, decides what to do next, and interprets results
Tools	External functions the agent can call — search, database queries, APIs, calculators, code execution
Loop	The agent iterates: reason -> act -> observe -> reason again, until done
Autonomy	The agent chooses its own path; no human pre-programs the exact sequence of steps

3. Single Call vs Agent: Side-by-Side Comparison

Dimension	Single LLM Call	AI Agent
Flow	Input -> Output (one pass)	Observe -> Think -> Act -> loop
Tools	None	Search, APIs, databases, calculators, etc.
Steps	1	Variable (2 to 50+ LLM calls)
Latency	Low (one API call)	High (multiple API calls + tool execution)
Cost	Predictable (one call)	Variable (many calls, hard to predict)
Complexity	Simple	Significant (error handling, loops, state)
Determinism	More predictable	Less predictable (path varies by run)
Error surface	Model error only	Model errors + tool errors + loop errors
Best for	Single-step tasks with all info in prompt	Multi-step tasks requiring external data or actions

4. When a Single Call Is Enough

A single LLM call is the right choice when:

All information is in the prompt — The model does not need to fetch external data.
The task is a single transformation — Summarize, translate, classify, extract, reformat.
No side effects needed — The model does not need to write to a database, send an email, or call an API.
Determinism matters — You want predictable, consistent output.
Latency is critical — You need a response in under 2 seconds.

// Single call is PERFECT for classification
async function classifySentiment(review) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature: 0,
    messages: [
      {
        role: "system",
        content: `Classify the sentiment of the review.
Respond with JSON: { "sentiment": "positive" | "negative" | "neutral", "confidence": 0-1 }`,
      },
      { role: "user", content: review },
    ],
  });

  return JSON.parse(response.choices[0].message.content);
}

// Single call is PERFECT for translation
async function translate(text, targetLanguage) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature: 0,
    messages: [
      { role: "system", content: `Translate to ${targetLanguage}. Return only the translation.` },
      { role: "user", content: text },
    ],
  });

  return response.choices[0].message.content;
}

// Single call is PERFECT for extraction
async function extractDates(document) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature: 0,
    messages: [
      {
        role: "system",
        content: `Extract all dates from the document.
Respond with JSON: { "dates": ["YYYY-MM-DD", ...] }`,
      },
      { role: "user", content: document },
    ],
  });

  return JSON.parse(response.choices[0].message.content);
}

5. When You Need an Agent

You need an agent when:

The task requires external data — The model must search the web, query a database, or call APIs to get information not in the prompt.
The task has multiple dynamic steps — The next step depends on the result of the previous step. You cannot pre-program the sequence.
The task requires actions — The model must send emails, create files, update records, or trigger workflows.
The task requires iterative refinement — The model must try something, evaluate the result, and try again.
The task crosses domains — Finding flights, then checking hotels, then calculating budgets, then booking.

// An agent is NEEDED for this — the steps depend on results
// "Find the latest quarterly revenue for Apple and Microsoft, compare them,
//  and draft a summary email to the finance team"

// Step 1: Agent searches for Apple's revenue       -> gets $94.9B
// Step 2: Agent searches for Microsoft's revenue   -> gets $62.0B
// Step 3: Agent compares (no tool needed, LLM reasons)
// Step 4: Agent drafts email with the comparison
// Step 5: Agent sends the email via email API

// A single LLM call CANNOT do this because:
// - It doesn't have access to real-time revenue data
// - It can't send emails
// - The comparison depends on data fetched in steps 1-2

6. The ReAct Pattern (Reasoning + Acting)

ReAct is the foundational pattern behind most AI agents. It stands for Reasoning + Acting, and it interleaves the model's chain-of-thought reasoning with tool calls.

The pattern was introduced in the paper "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022). The key insight: when the model explains its reasoning before acting, it makes better decisions and can self-correct.

The ReAct cycle

┌───────────────────────────────────────────────────────────────────┐
│                       ReAct PATTERN                               │
│                                                                   │
│  1. THOUGHT:  "I need to find the current stock price of AAPL.   │
│                I should use the stock_price tool."                │
│                                                                   │
│  2. ACTION:   stock_price({ symbol: "AAPL" })                    │
│                                                                   │
│  3. OBSERVATION: { price: 178.52, change: +2.3% }                │
│                                                                   │
│  4. THOUGHT:  "Now I have the price. The user also asked for     │
│                the P/E ratio. I should use the financials tool."  │
│                                                                   │
│  5. ACTION:   company_financials({ symbol: "AAPL", metric: "pe"})│
│                                                                   │
│  6. OBSERVATION: { pe_ratio: 28.5 }                               │
│                                                                   │
│  7. THOUGHT:  "I now have both pieces of information. I can      │
│                compose the final answer."                         │
│                                                                   │
│  8. FINAL ANSWER: "Apple (AAPL) is trading at $178.52 (+2.3%)   │
│                    with a P/E ratio of 28.5."                    │
└───────────────────────────────────────────────────────────────────┘

ReAct in code

// Simplified ReAct agent loop
async function reactAgent(userQuery, tools, maxIterations = 10) {
  const messages = [
    {
      role: "system",
      content: `You are a helpful assistant with access to tools.
For each step, respond with your reasoning (THOUGHT), then choose a tool (ACTION).
When you have enough information, provide a FINAL ANSWER.

Available tools:
${tools.map((t) => `- ${t.name}: ${t.description}`).join("\n")}`,
    },
    { role: "user", content: userQuery },
  ];

  for (let i = 0; i < maxIterations; i++) {
    // THINK: Ask the LLM what to do next
    const response = await client.chat.completions.create({
      model: "gpt-4o",
      messages,
      tools: tools.map((t) => ({
        type: "function",
        function: { name: t.name, description: t.description, parameters: t.parameters },
      })),
    });

    const choice = response.choices[0];

    // Check if the agent wants to give a final answer (no tool call)
    if (choice.finish_reason === "stop") {
      return choice.message.content; // Final answer
    }

    // ACT: The agent chose to call a tool
    if (choice.message.tool_calls) {
      messages.push(choice.message); // Add assistant's decision to history

      for (const toolCall of choice.message.tool_calls) {
        const toolName = toolCall.function.name;
        const toolArgs = JSON.parse(toolCall.function.arguments);

        // EXECUTE: Run the tool
        const tool = tools.find((t) => t.name === toolName);
        const result = await tool.execute(toolArgs);

        // OBSERVE: Feed the result back to the agent
        messages.push({
          role: "tool",
          tool_call_id: toolCall.id,
          content: JSON.stringify(result),
        });
      }
    }
  }

  return "Agent reached maximum iterations without a final answer.";
}

Why ReAct works better than acting alone

WITHOUT reasoning (act only):
  User: "What's the weather in the city where Apple is headquartered?"
  Agent: search("weather Apple")        <- wrong search!
  Result: Articles about apple orchards

WITH reasoning (ReAct):
  User: "What's the weather in the city where Apple is headquartered?"
  Thought: "Apple is headquartered in Cupertino, CA. I need weather for Cupertino."
  Agent: get_weather("Cupertino, CA")   <- correct!
  Result: 72 degrees F, sunny

By making the model reason before acting, it decomposes complex queries and avoids obvious mistakes.

7. The Complexity Cost of Agents

Agents are powerful, but they are not free. Every agent introduces complexity that a single call does not have.

Token cost

Single call:
  1 API call = ~2,000 tokens (input + output)
  Cost: ~$0.005

Agent (5-step task):
  Step 1: 2,000 tokens (initial reasoning)
  Step 2: 3,500 tokens (includes step 1 context + tool result)
  Step 3: 5,000 tokens (growing context)
  Step 4: 6,500 tokens (more context)
  Step 5: 8,000 tokens (final answer with full context)
  Total:  25,000 tokens across 5 API calls
  Cost:   ~$0.06

That is 12x more expensive than a single call.

Latency cost

Single call:
  1 API call      ~1-3 seconds
  Total:          ~1-3 seconds

Agent (5-step task):
  5 API calls     ~1-3 seconds each = 5-15 seconds
  5 tool calls    ~0.5-2 seconds each = 2.5-10 seconds
  Total:          ~7-25 seconds

Users notice when something takes more than 3 seconds.

Error amplification

Single call:
  1 point of failure
  P(failure) = 5% -> Overall: 5% failure rate

Agent (5-step task):
  5 LLM decisions + 5 tool calls = 10 points of failure
  P(each step fails) = 5% -> P(all succeed) = 0.95^10 = 59.9%
  Overall: ~40% chance SOMETHING goes wrong

More steps = more opportunities for failure.

Debugging difficulty

Single call:
  Input  -> Output
  Easy to reproduce, easy to debug.

Agent:
  Input -> Thought1 -> Tool1 -> Result1 -> Thought2 -> Tool2 -> Result2 -> ... -> Output
  Which step went wrong? Was it the reasoning? The tool? The interpretation?
  Much harder to reproduce because tool results may change.

8. Code Example: Single Call vs Agent

Let us compare two approaches to the same task: "What is the current population of Tokyo and is it the largest city in the world?"

Approach 1: Single LLM call

// Single call — uses the model's training data (may be outdated)
async function answerWithSingleCall(question) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature: 0,
    messages: [
      { role: "system", content: "Answer factual questions concisely. If unsure, say so." },
      { role: "user", content: question },
    ],
  });

  return response.choices[0].message.content;
}

const answer = await answerWithSingleCall(
  "What is the current population of Tokyo and is it the largest city in the world?"
);
console.log(answer);
// "As of my last update, the Tokyo metropolitan area has approximately 37.4 million people,
//  making it the largest metropolitan area in the world..."
//
// PROBLEM: "as of my last update" — the model's data has a cutoff date.
// The answer might be outdated.

Approach 2: Agent with tools

// Agent — uses real-time data from tools
const tools = [
  {
    name: "search_population",
    description: "Search for the current population of a city",
    parameters: {
      type: "object",
      properties: {
        city: { type: "string", description: "Name of the city" },
      },
      required: ["city"],
    },
    execute: async ({ city }) => {
      // In production, this calls a real API (e.g., World Bank, UN data)
      const data = {
        Tokyo: { population: 37_115_035, year: 2024, type: "metropolitan" },
        Delhi: { population: 34_400_000, year: 2024, type: "metropolitan" },
        Shanghai: { population: 29_867_000, year: 2024, type: "metropolitan" },
      };
      return data[city] || { error: `Population data not found for ${city}` };
    },
  },
  {
    name: "search_largest_cities",
    description: "Get a ranked list of the world's largest cities by population",
    parameters: { type: "object", properties: {}, required: [] },
    execute: async () => {
      return {
        ranking: [
          { rank: 1, city: "Tokyo", population: 37_115_035 },
          { rank: 2, city: "Delhi", population: 34_400_000 },
          { rank: 3, city: "Shanghai", population: 29_867_000 },
          { rank: 4, city: "Dhaka", population: 23_935_000 },
          { rank: 5, city: "Sao Paulo", population: 22_806_000 },
        ],
        source: "UN World Urbanization Prospects 2024",
      };
    },
  },
];

// Use the ReAct agent from section 6
const answer = await reactAgent(
  "What is the current population of Tokyo and is it the largest city in the world?",
  tools
);
console.log(answer);
// Agent's internal process:
//   THOUGHT: I need the current population of Tokyo. Let me search.
//   ACTION:  search_population({ city: "Tokyo" })
//   OBSERVE: { population: 37115035, year: 2024, type: "metropolitan" }
//   THOUGHT: Now I need to check if it's the largest. Let me get the ranking.
//   ACTION:  search_largest_cities()
//   OBSERVE: [{ rank: 1, city: "Tokyo", ... }, ...]
//   THOUGHT: Tokyo is #1. I have both pieces of information.
//   FINAL:   "Tokyo's metropolitan population is 37.1 million (2024 data),
//             and it IS the largest metropolitan area in the world."
//
// ADVANTAGE: The answer uses real, current data with a cited source.

9. The Spectrum: Not Just "Single Call" or "Agent"

In practice, there is a spectrum of complexity between a single call and a full agent:

┌─────────────────────────────────────────────────────────────────┐
│              THE COMPLEXITY SPECTRUM                             │
│                                                                 │
│  Simple ◄──────────────────────────────────────────► Complex    │
│                                                                 │
│  Single       Prompt      LLM          Tool-        Full        │
│  LLM          Chain       + Single     Augmented    Agent       │
│  Call                     Tool Call    Chain        (loop)      │
│                                                                 │
│  "Summarize    "Summarize  "Search for  "Search,     "Research  │
│   this text"   then        X and        then         this topic │
│                translate"  summarize"   summarize,   and write  │
│                                        then check   a report"  │
│                                        facts"                   │
│                                                                 │
│  1 LLM call   2-3 calls   1 call +     3-5 calls    5-50+      │
│                            1 tool      + tools      calls      │
│                                                                 │
│  RULE: Use the SIMPLEST approach that solves the problem.       │
└─────────────────────────────────────────────────────────────────┘

Level	Pattern	Example	LLM Calls	Tools
1	Single call	Classify sentiment	1	0
2	Prompt chain	Summarize, then translate	2-3	0
3	Single tool call	Search + summarize	1	1 (function calling)
4	Tool-augmented chain	Search, summarize, fact-check	3-5	2-3
5	Full agent (loop)	Research topic, write report	5-50+	Many, chosen dynamically

The key question is always: what is the simplest approach that reliably solves this task? Do not reach for an agent when a single call will do.

10. Key Takeaways

A single LLM call is input -> output in one pass. No tools, no loops. It is the right choice for the majority of AI tasks.
An AI agent is an LLM in a loop with tools. It observes, thinks, acts, and repeats. It is the right choice for dynamic, multi-step tasks.
The ReAct pattern (Reasoning + Acting) is the foundation of agent behavior. Making the model reason before acting leads to better tool use and fewer errors.
Agents are expensive — in tokens (12x+ more), latency (5-25 seconds vs 1-3 seconds), error rate (compounds with each step), and debugging effort.
There is a spectrum from single call to full agent. Prompt chains and single tool calls sit in between. Always pick the simplest approach that works.
"Just because you can build an agent doesn't mean you should." Most tasks in production are better served by simpler patterns.

Explain-It Challenge

A junior developer is excited about agents and wants to build one for a sentiment classification API. Explain why this is a bad idea and what they should use instead.
Describe the ReAct pattern to a non-technical product manager. Use a real-world analogy (e.g., a human research assistant).
A task requires fetching data from three APIs and combining the results. Would you use a single call, a prompt chain, or a full agent? Walk through your reasoning using the complexity spectrum.

Navigation: <- 4.15 Overview | 4.15.b -- Agent Architecture ->