Episode 4 — Generative AI Engineering / 4.15 — Understanding AI Agents

4.15.c — When to Use Agents

In one sentence: Use an AI agent when the task requires multiple dynamic steps, external tool use, or decisions that depend on intermediate results — in other words, when the model cannot answer in one shot because it needs to go out, fetch information, take actions, and adapt based on what it finds.

Navigation: <- 4.15.b Agent Architecture | 4.15.d -- When NOT to Use Agents ->


1. The Core Signal: "The LLM Doesn't Have Everything It Needs"

The single most reliable indicator that you need an agent is this:

The model cannot answer correctly with only the information in the prompt.

If the model needs to go get data, check something, calculate, verify, or take action in the real world, a single LLM call will either hallucinate, give outdated information, or simply refuse. That is when an agent earns its complexity cost.

┌──────────────────────────────────────────────────────────────────┐
│               DOES YOUR TASK NEED AN AGENT?                      │
│                                                                  │
│  Can the LLM answer correctly      YES ──► Single call.          │
│  with ONLY the prompt context?             You don't need        │
│       │                                    an agent.             │
│       NO                                                         │
│       │                                                          │
│       ▼                                                          │
│  Does the task require external    NO ──► Prompt chain.          │
│  data or actions (tools)?                 Chain 2-3 LLM calls.   │
│       │                                                          │
│       YES                                                        │
│       │                                                          │
│       ▼                                                          │
│  Is the sequence of steps          NO ──► Tool-augmented chain.  │
│  known in advance?                        Pre-defined steps      │
│       │                                   with tool calls.       │
│       NO (steps depend on                                        │
│        intermediate results)                                     │
│       │                                                          │
│       ▼                                                          │
│  You need an AGENT.                                              │
│  (LLM in a loop with tools)                                     │
└──────────────────────────────────────────────────────────────────┘

2. Use Case 1: Multi-Step Tasks with Dynamic Decision-Making

These tasks require the agent to make decisions at each step based on what it discovered in previous steps. The path through the task is not predictable in advance.

Example: Competitive analysis

// Task: "Analyze our competitor's latest product launch and recommend a response"
//
// The agent's path depends on what it finds:
//
// Step 1: Search for competitor's latest product announcement
//   -> Finds they launched a new pricing tier
//
// Step 2: Search for details on the pricing tier
//   -> Discovers it undercuts our mid-tier by 20%
//
// Step 3: Query our database for our current pricing and customer count
//   -> Gets our pricing data
//
// Step 4: Analyze impact (reasoning, no tool needed)
//   -> Determines 30% of our customers are on the affected tier
//
// Step 5: Search for market reactions and reviews
//   -> Mixed reviews — customers like the price but complain about missing features
//
// Step 6: Synthesize a recommendation
//   -> "Don't match the price cut. Instead, emphasize the features they lack."
//
// A SINGLE CALL couldn't do this — it doesn't have access to:
//   - Current competitor announcements
//   - Our internal pricing data
//   - Real-time market reactions

const competitiveAnalysisAgent = new Agent({
  systemPrompt: `You are a competitive intelligence analyst.
Given a competitor analysis request:
1. Research the competitor's latest moves
2. Compare with our position (use database tools)
3. Analyze market reaction
4. Recommend a strategic response

Always cite sources. Be specific with numbers.`,
  tools: [webSearchTool, databaseQueryTool, calculatorTool],
  maxIterations: 12,
});

Why an agent is needed here

  • Steps 2-6 depend on what step 1 discovers. If the competitor launched a product (not a pricing change), steps 2-6 would be completely different.
  • The agent needs both external data (web search) and internal data (database).
  • The number of steps varies — a simple announcement might take 4 steps, a complex multi-product launch might take 10.

3. Use Case 2: Tasks Requiring Tool Use

Whenever the LLM needs to interact with external systems — search engines, databases, APIs, calculators, code runners — it needs a tool, and often needs an agent to orchestrate multiple tools.

Example: Financial data aggregation

// Task: "What is the total revenue of our top 5 customers in Q3,
//         and how does it compare to Q2?"

const financialAgent = new Agent({
  systemPrompt: `You are a financial data assistant.
Use the database to look up financial data.
Use the calculator for computations.
Always show your work.`,
  tools: [
    {
      name: "database_query",
      description: "Query the company database with SQL. Read-only access.",
      parameters: {
        type: "object",
        properties: {
          sql: { type: "string", description: "SQL SELECT query" },
        },
        required: ["sql"],
      },
      execute: async ({ sql }) => {
        // Validated, read-only DB access
        return await db.query(sql);
      },
    },
    {
      name: "calculator",
      description: "Evaluate a mathematical expression. Supports +, -, *, /, %, and parentheses.",
      parameters: {
        type: "object",
        properties: {
          expression: { type: "string", description: "Math expression to evaluate" },
        },
        required: ["expression"],
      },
      execute: async ({ expression }) => {
        // Safe math evaluation (no eval()!)
        return { result: safeMathEval(expression) };
      },
    },
  ],
  maxIterations: 8,
});

// Agent's likely path:
// 1. database_query: SELECT customer, SUM(revenue) FROM orders WHERE quarter='Q3' GROUP BY customer ORDER BY SUM(revenue) DESC LIMIT 5
// 2. database_query: SELECT customer, SUM(revenue) FROM orders WHERE quarter='Q2' AND customer IN (...top 5...)
// 3. calculator: (Q3_total - Q2_total) / Q2_total * 100  (percentage change)
// 4. Synthesize: "Top 5 customers generated $X in Q3, up Y% from Q2..."

Common tools that agents use

Tool CategoryExamplesWhen Needed
SearchWeb search, internal knowledge base searchReal-time data, current events, unknown facts
DatabaseSQL queries, NoSQL queriesStructured business data, user records, analytics
APIsWeather, stocks, maps, email, CRMExternal service integration
CalculatorMath evaluation, unit conversionPrecise computation (LLMs are bad at math)
Code executionPython/JS sandboxData analysis, chart generation, complex logic
File operationsRead/write files, parse documentsDocument processing, report generation
CommunicationEmail, Slack, SMSAutomated notifications, outreach

4. Use Case 3: Research and Analysis Workflows

Research tasks are natural fits for agents because they follow the human research pattern: search, read, synthesize, search more if needed, and compile findings.

Example: Technical due diligence

// Task: "Research whether we should adopt Redis or Memcached for our caching layer.
//         Consider performance, features, community, and our specific use case
//         (session storage for 10M users)."

const researchAgent = new Agent({
  systemPrompt: `You are a technical research analyst.
When researching a technology decision:
1. Search for current benchmarks and comparisons
2. Look for real-world case studies at similar scale
3. Check community health (GitHub stars, recent commits, open issues)
4. Analyze fit for the specific use case
5. Provide a clear recommendation with trade-offs

Be thorough but concise. Cite all sources.`,
  tools: [
    webSearchTool,
    readUrlTool,          // Read full page content
    githubStatsTool,      // Check repo activity
    saveNoteTool,         // Save intermediate findings
  ],
  maxIterations: 15,      // Research tasks need more iterations
});

// The agent might:
// 1. Search "Redis vs Memcached benchmark 2024"
// 2. Read the top 3 benchmark articles
// 3. Search "Redis session storage 10 million users"
// 4. Check GitHub activity for both projects
// 5. Search "Memcached session storage scaling"
// 6. Save intermediate notes for each topic
// 7. Synthesize into a recommendation

Why agents excel at research

  • The search queries depend on what was already found. Finding that Redis has a specific feature might trigger a search for whether Memcached has an equivalent.
  • The depth of research adapts to what is available. If benchmarks are easy to find, the agent moves on quickly. If they are scarce, it searches more.
  • Intermediate findings inform later searches. Discovering that Redis Cluster handles 10M users well might shift the research focus to operational complexity.

5. Use Case 4: Customer Support Automation

Customer support is one of the most common production uses for agents because it naturally requires looking up account data, checking order status, and taking actions.

// Customer support agent
const supportAgent = new Agent({
  systemPrompt: `You are a customer support agent for an e-commerce company.

PERSONALITY: Friendly, concise, solution-oriented.

RULES:
- Always verify the customer's identity before accessing account data
- Never disclose other customers' information
- For refunds over $100, escalate to a human agent
- Always confirm before taking irreversible actions (refund, cancellation)

AVAILABLE ACTIONS:
- Look up order status
- Check return eligibility
- Process returns (with confirmation)
- Update shipping address
- Escalate to human agent`,
  tools: [
    {
      name: "lookup_order",
      description: "Look up order details by order ID. Returns status, items, dates, and amounts.",
      parameters: {
        type: "object",
        properties: {
          order_id: { type: "string" },
        },
        required: ["order_id"],
      },
      execute: async ({ order_id }) => {
        return await orderService.getOrder(order_id);
      },
    },
    {
      name: "check_return_eligibility",
      description: "Check if an order is eligible for return. Returns eligibility status and deadline.",
      parameters: {
        type: "object",
        properties: {
          order_id: { type: "string" },
        },
        required: ["order_id"],
      },
      execute: async ({ order_id }) => {
        return await returnService.checkEligibility(order_id);
      },
    },
    {
      name: "process_return",
      description: "Initiate a return for an order. Only call AFTER confirming with the customer.",
      parameters: {
        type: "object",
        properties: {
          order_id: { type: "string" },
          reason: { type: "string" },
        },
        required: ["order_id", "reason"],
      },
      execute: async ({ order_id, reason }) => {
        return await returnService.initiateReturn(order_id, reason);
      },
    },
    {
      name: "escalate_to_human",
      description: "Transfer the conversation to a human support agent. Use for complex issues or refunds over $100.",
      parameters: {
        type: "object",
        properties: {
          reason: { type: "string", description: "Why escalation is needed" },
          priority: { type: "string", enum: ["low", "medium", "high"] },
        },
        required: ["reason"],
      },
      execute: async ({ reason, priority = "medium" }) => {
        return await supportService.escalate(reason, priority);
      },
    },
  ],
  maxIterations: 10,
});

// Example conversation:
// User: "I want to return my order #ORD-12345"
// Agent thinks: I need to look up the order first
// Agent calls: lookup_order({ order_id: "ORD-12345" })
// Observes: { status: "delivered", total: $85, delivered_date: "2024-01-15" }
// Agent thinks: I should check if it's eligible for return
// Agent calls: check_return_eligibility({ order_id: "ORD-12345" })
// Observes: { eligible: true, deadline: "2024-02-14", days_remaining: 12 }
// Agent responds: "Your order #ORD-12345 ($85) is eligible for return.
//                  You have 12 days left. Would you like me to process the return?
//                  If so, could you let me know the reason?"

Why agents work well for support

  • Each customer interaction is different. The agent adapts its tool usage to the specific request.
  • The agent can handle the common 80% of requests automatically, escalating the complex 20% to humans.
  • The tool-based approach ensures the agent always uses real data (no hallucinated order numbers or statuses).

6. Use Case 5: Data Pipeline Orchestration

Agents can orchestrate multi-step data pipelines where the steps depend on the data characteristics.

// Task: "Analyze the uploaded CSV, clean it, identify trends, and generate a report"
//
// Agent's dynamic path:
// 1. Read the CSV -> discovers it has 50K rows, 12 columns
// 2. Check for quality issues -> finds 3 columns with >20% nulls
// 3. Decide: drop columns or impute? -> Imputes for 2, drops 1
// 4. Run statistical analysis -> finds seasonal pattern
// 5. Generate visualizations -> creates 3 charts
// 6. Compile report with findings
//
// The specific steps depend on the DATA — different CSVs produce different paths

const dataAgent = new Agent({
  systemPrompt: `You are a data analysis assistant.
Given a dataset:
1. Inspect structure and quality
2. Clean issues (nulls, outliers, type mismatches)
3. Analyze patterns and trends
4. Generate a clear summary with key findings
Always explain your methodology.`,
  tools: [
    readFileTool,         // Read CSV/JSON files
    codeExecutionTool,    // Run Python/JS for analysis
    chartGeneratorTool,   // Generate visualizations
    saveReportTool,       // Save final report
  ],
  maxIterations: 15,
});

7. Decision Framework: Single Call vs Chain vs Agent

Use this framework to decide which approach fits your task:

┌────────────────────────────────────────────────────────────────────────┐
│                   DECISION FRAMEWORK                                   │
│                                                                        │
│  Question 1: Does the task need external data or actions?              │
│  ┌──────┐                                                              │
│  │  NO  │ ──► SINGLE CALL or PROMPT CHAIN                              │
│  └──────┘     (no tools needed, all info is in the prompt)             │
│  ┌──────┐                                                              │
│  │ YES  │ ──► Continue to Question 2                                   │
│  └──────┘                                                              │
│                                                                        │
│  Question 2: Is the sequence of steps predictable?                     │
│  ┌──────┐                                                              │
│  │ YES  │ ──► TOOL-AUGMENTED CHAIN                                     │
│  └──────┘     (pre-defined steps with tool calls)                      │
│  ┌──────┐                                                              │
│  │  NO  │ ──► Continue to Question 3                                   │
│  └──────┘                                                              │
│                                                                        │
│  Question 3: Can a human complete it in under 3 decisions?             │
│  ┌──────┐                                                              │
│  │ YES  │ ──► SIMPLE AGENT (maxIterations: 5)                          │
│  └──────┘                                                              │
│  ┌──────┐                                                              │
│  │  NO  │ ──► FULL AGENT with planning (maxIterations: 15+)            │
│  └──────┘                                                              │
└────────────────────────────────────────────────────────────────────────┘

Decision matrix

TaskApproachWhy
Classify email sentimentSingle callAll info in the email text
Summarize, then translatePrompt chain (2 calls)Fixed sequence, no tools
Answer question using knowledge baseRAG (single call + retrieval)One tool call, predictable flow
Look up order and process returnSimple agentDynamic: depends on order status
Research a topic and write a reportFull agent with planningMany steps, dynamic path
"Do whatever is needed to fix this bug"Full agent with code executionCompletely dynamic, many tools
Compare prices across 5 competitor websitesFull agentDynamic: scraping, comparing, adapting

8. Signs Your Task Is a Good Fit for an Agent

Use this checklist. If 3 or more apply, an agent is likely the right approach:

SignalExample
The task requires real-time data"What's the current stock price of AAPL?"
The task requires multiple data sources"Compare our pricing with 3 competitors"
The next step depends on previous results"If the order is returnable, process the return"
The task requires actions (not just answers)"Send an email to the customer"
A human would need several minutes to complete it"Research this topic and write a brief"
The task involves iterative refinement"Find a flight, but if too expensive, try different dates"
The task crosses domain boundaries"Check inventory, calculate shipping, update the CRM"
The number of steps varies per request"Handle this customer support ticket"

9. Agent Success Metrics

When you deploy an agent, measure these to know if it is working:

// Metrics to track for any agent deployment
const agentMetrics = {
  // Effectiveness
  taskCompletionRate: "% of tasks the agent completes successfully",
  correctnessRate: "% of completed tasks with correct results (human-reviewed sample)",
  escalationRate: "% of tasks escalated to humans",

  // Efficiency
  averageIterations: "Mean number of loop iterations per task",
  averageTokens: "Mean tokens consumed per task",
  averageCost: "Mean $ cost per task",
  averageLatency: "Mean wall-clock time per task (seconds)",

  // Reliability
  errorRate: "% of tasks that hit an unrecoverable error",
  maxIterationRate: "% of tasks that exhaust maxIterations without finishing",
  toolFailureRate: "% of tool calls that fail",

  // Safety
  halluccinationRate: "% of responses containing fabricated information (sampled)",
  outOfScopeRate: "% of times the agent tries to do something it shouldn't",
};

10. Key Takeaways

  1. The core signal for needing an agent: the LLM cannot answer correctly with only the prompt context. It needs to go get data, take actions, or make dynamic decisions.
  2. Multi-step tasks with dynamic decisions are the primary agent use case. If the next step depends on the previous result, you need a loop.
  3. Tool use is what gives agents their power. Search, databases, APIs, calculators, and code execution turn the LLM from a text generator into an actor in the world.
  4. Research workflows are natural fits because they follow the human pattern: search, read, synthesize, search more, compile.
  5. Customer support is one of the most common production agent deployments. The agent handles the routine 80%, humans handle the complex 20%.
  6. Use the decision framework. Ask: Does it need tools? Are steps predictable? How many decisions? The simplest approach that works is always the right one.
  7. Measure everything. Track completion rate, cost, latency, error rate, and correctness. An agent that completes 60% of tasks at $0.50 each might not be worth the complexity over a simpler system.

Explain-It Challenge

  1. Your product manager says: "Let's build an agent that recommends products to users." Using the decision framework, explain whether this needs an agent or a simpler approach. Consider what data is needed and whether the steps are predictable.
  2. Design a customer support agent for a SaaS product (project management tool). List the 5-7 tools it would need and describe one complete interaction flow.
  3. A colleague proposes building an agent for data validation (check if CSV columns match expected types). Argue why this does NOT need an agent and what simpler approach would work better.

Navigation: <- 4.15.b Agent Architecture | 4.15.d -- When NOT to Use Agents ->