Episode 4 — Generative AI Engineering / 4.15 — Understanding AI Agents
4.15.b — Agent Architecture
In one sentence: An AI agent is built from four core components — an LLM brain that reasons, tools that interact with the world, memory that persists context across steps, and planning that breaks complex tasks into manageable sub-tasks — all connected by the agent loop (perceive -> reason -> act -> observe -> repeat).
Navigation: <- 4.15.a Agent vs Single LLM Call | 4.15.c -- When to Use Agents ->
1. The Four Core Components
Every AI agent, regardless of framework or implementation, is composed of these four fundamental components:
┌──────────────────────────────────────────────────────────────────────────┐
│ AI AGENT — CORE COMPONENTS │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ LLM BRAIN │ │
│ │ - Understands natural language │ │
│ │ - Reasons about the task │ │
│ │ - Decides which tool to use (or whether to respond) │ │
│ │ - Interprets tool results │ │
│ │ - Generates the final answer │ │
│ └────────────────────────┬───────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────────┐ │
│ │ TOOLS │ │ MEMORY │ │ PLANNING │ │
│ │ │ │ │ │ │ │
│ │ - Search │ │ - Short: │ │ - Decompose │ │
│ │ - Database │ │ convo │ │ complex tasks│ │
│ │ - API call │ │ history │ │ - Prioritize │ │
│ │ - Calculator│ │ - Long: │ │ sub-tasks │ │
│ │ - Code exec│ │ vector │ │ - Track │ │
│ │ - File I/O │ │ store, │ │ progress │ │
│ │ │ │ database │ │ - Revise plan │ │
│ └────────────┘ └────────────┘ └────────────────┘ │
│ │
│ The LLM brain orchestrates everything. Tools extend its capabilities. │
│ Memory gives it context. Planning gives it structure. │
└──────────────────────────────────────────────────────────────────────────┘
| Component | Role | Analogy |
|---|---|---|
| LLM Brain | Thinks, reasons, decides | Your brain |
| Tools | Acts on the world | Your hands, phone, computer |
| Memory | Remembers context | Your notepad, long-term memory |
| Planning | Breaks down complex goals | Your to-do list, project plan |
2. Component 1: The LLM Brain
The LLM is the central controller of the agent. Every decision flows through it:
- What tool should I use next? — The LLM picks from available tools based on the task.
- What arguments should I pass? — The LLM generates the correct parameters.
- What does this result mean? — The LLM interprets tool output and decides if the task is complete.
- Should I try a different approach? — The LLM can self-correct if a tool returns an error or unexpected result.
// The LLM brain is configured via the system prompt
const AGENT_SYSTEM_PROMPT = `You are an AI research assistant.
You have access to the following tools:
1. web_search(query: string) — Search the web for information
2. read_url(url: string) — Read the full content of a web page
3. calculator(expression: string) — Evaluate mathematical expressions
4. save_note(title: string, content: string) — Save a note for later reference
INSTRUCTIONS:
- Think step by step before choosing a tool.
- After each tool result, assess whether you have enough information to answer.
- If a tool returns an error, try a different approach.
- When you have the final answer, respond directly to the user.
- Always cite your sources.`;
Choosing the right LLM for your agent
| Factor | Consideration |
|---|---|
| Reasoning ability | Stronger models (GPT-4o, Claude Sonnet 4) make better agents because they reason more reliably |
| Tool calling support | The model must support function/tool calling natively (not all models do) |
| Context window | Agents accumulate context fast; larger windows (128K+) prevent truncation |
| Cost | Agents make many calls; a model that costs 10x more per token makes the agent 10x more expensive |
| Speed | Agents are already slow; a faster model (GPT-4o-mini) can offset some latency |
Common pattern in production: Use a strong model (GPT-4o) for the agent brain and a fast/cheap model (GPT-4o-mini) for sub-tasks like summarization.
3. Component 2: Tools / Actions
Tools are functions the agent can call to interact with the outside world. Without tools, an agent is just an LLM talking to itself in a loop (which is useless). Tools are what give the agent its power.
Tool registration
Every tool must be registered with the agent so the LLM knows it exists, what it does, and what arguments it accepts. This is exactly the function-calling mechanism you learned in 4.7.
// Tool definitions — these are registered with the LLM
const tools = [
{
type: "function",
function: {
name: "web_search",
description: "Search the web for current information. Use when you need facts, news, or data not in your training data.",
parameters: {
type: "object",
properties: {
query: {
type: "string",
description: "The search query",
},
max_results: {
type: "number",
description: "Maximum number of results to return (default 5)",
},
},
required: ["query"],
},
},
},
{
type: "function",
function: {
name: "database_query",
description: "Query the application database. Use for user data, orders, or product information.",
parameters: {
type: "object",
properties: {
sql: {
type: "string",
description: "SQL query to execute (read-only, no mutations)",
},
},
required: ["sql"],
},
},
},
{
type: "function",
function: {
name: "send_email",
description: "Send an email to a specified recipient. Use only when the user explicitly asks to send a message.",
parameters: {
type: "object",
properties: {
to: { type: "string", description: "Recipient email address" },
subject: { type: "string", description: "Email subject line" },
body: { type: "string", description: "Email body (plain text)" },
},
required: ["to", "subject", "body"],
},
},
},
];
Tool invocation
When the LLM decides to use a tool, it generates a tool call — a structured JSON object with the tool name and arguments. Your code then executes the function and feeds the result back.
// Tool invocation handler
async function executeTool(toolName, toolArgs) {
const toolHandlers = {
web_search: async ({ query, max_results = 5 }) => {
// Call your search API (e.g., Serper, Tavily, Brave)
const response = await fetch(
`https://api.search.example.com/search?q=${encodeURIComponent(query)}&limit=${max_results}`
);
return await response.json();
},
database_query: async ({ sql }) => {
// Validate: only allow SELECT statements (security!)
if (!sql.trim().toUpperCase().startsWith("SELECT")) {
return { error: "Only SELECT queries are allowed" };
}
const result = await db.query(sql);
return { rows: result.rows, rowCount: result.rowCount };
},
send_email: async ({ to, subject, body }) => {
await emailService.send({ to, subject, body });
return { success: true, message: `Email sent to ${to}` };
},
};
const handler = toolHandlers[toolName];
if (!handler) {
return { error: `Unknown tool: ${toolName}` };
}
try {
return await handler(toolArgs);
} catch (error) {
return { error: `Tool execution failed: ${error.message}` };
}
}
Tool design principles
| Principle | Why It Matters |
|---|---|
| Clear descriptions | The LLM chooses tools based on descriptions. Vague descriptions lead to wrong tool choices |
| Specific parameters | Constrain what the tool accepts. The more specific, the fewer mistakes |
| Error handling | Tools fail. Return structured error objects so the LLM can reason about failures |
| Idempotency | Read operations should be idempotent (safe to retry). Write operations need safeguards |
| Security boundaries | Never let the LLM execute arbitrary code or SQL mutations. Validate inputs |
| Keep tools focused | One tool = one action. A tool that does everything is hard for the LLM to use correctly |
4. Component 3: Memory
Memory allows the agent to retain information across steps and across conversations. Without memory, the agent forgets everything after each API call.
Short-term memory (conversation context)
This is the message history that accumulates during a single agent run. Every LLM call in the loop includes all previous messages, tool calls, and tool results.
// Short-term memory = the messages array that grows during the agent loop
const messages = [
{ role: "system", content: AGENT_SYSTEM_PROMPT }, // Fixed
{ role: "user", content: "Find last quarter's revenue" }, // User's request
// --- Agent loop iteration 1 ---
{
role: "assistant",
content: null,
tool_calls: [{ id: "call_1", function: { name: "database_query", arguments: '{"sql":"SELECT ..."}' } }],
},
{
role: "tool",
tool_call_id: "call_1",
content: '{"rows":[{"revenue":2450000}],"rowCount":1}',
},
// --- Agent loop iteration 2 ---
{
role: "assistant",
content: null,
tool_calls: [{ id: "call_2", function: { name: "web_search", arguments: '{"query":"industry average"}' } }],
},
{
role: "tool",
tool_call_id: "call_2",
content: '{"results":[{"title":"Industry Report","snippet":"Average revenue..."}]}',
},
// --- Agent loop iteration 3 ---
{ role: "assistant", content: "Based on my research, last quarter's revenue was..." }, // Final answer
];
// PROBLEM: This grows with every iteration.
// After 10 steps, the context might be 15,000+ tokens.
// After 20 steps, you may hit context window limits.
Managing short-term memory growth
// Strategy 1: Summarize older steps
async function compactMemory(messages, maxTokens = 8000) {
const estimatedTokens = JSON.stringify(messages).length / 4;
if (estimatedTokens < maxTokens) {
return messages; // No compaction needed
}
// Keep system prompt and last 3 exchanges, summarize the rest
const systemMsg = messages[0];
const userMsg = messages[1];
const recentMessages = messages.slice(-6); // Last 3 exchanges
const oldMessages = messages.slice(2, -6); // Everything in between
// Summarize the old messages
const summaryResponse = await client.chat.completions.create({
model: "gpt-4o-mini", // Use a cheap model for summarization
temperature: 0,
messages: [
{
role: "system",
content: "Summarize the following agent interaction into key findings and decisions. Be concise.",
},
{
role: "user",
content: JSON.stringify(oldMessages),
},
],
});
return [
systemMsg,
userMsg,
{
role: "assistant",
content: `[Summary of previous steps: ${summaryResponse.choices[0].message.content}]`,
},
...recentMessages,
];
}
Long-term memory (persistent store)
Long-term memory persists across conversations. It allows the agent to remember user preferences, past research, and accumulated knowledge.
// Long-term memory implementation using a vector database
class AgentLongTermMemory {
constructor(vectorDB, embeddingModel) {
this.vectorDB = vectorDB;
this.embeddingModel = embeddingModel;
}
// Store a memory
async remember(content, metadata = {}) {
const embedding = await this.embeddingModel.embed(content);
await this.vectorDB.upsert({
id: `memory_${Date.now()}`,
vector: embedding,
metadata: {
content,
timestamp: new Date().toISOString(),
...metadata,
},
});
}
// Recall relevant memories
async recall(query, topK = 5) {
const queryEmbedding = await this.embeddingModel.embed(query);
const results = await this.vectorDB.query({
vector: queryEmbedding,
topK,
});
return results.matches.map((match) => ({
content: match.metadata.content,
relevance: match.score,
timestamp: match.metadata.timestamp,
}));
}
}
// Usage in the agent loop
async function agentWithMemory(userQuery, tools, memory) {
// Recall relevant long-term memories
const relevantMemories = await memory.recall(userQuery);
const memoryContext = relevantMemories.length > 0
? `\n\nRELEVANT MEMORIES FROM PAST INTERACTIONS:\n${relevantMemories.map((m) => `- ${m.content}`).join("\n")}`
: "";
const messages = [
{
role: "system",
content: AGENT_SYSTEM_PROMPT + memoryContext,
},
{ role: "user", content: userQuery },
];
// ... run the agent loop ...
// After the task is complete, store a summary as a new memory
await memory.remember(
`User asked: "${userQuery}". Key finding: ${finalAnswer}`,
{ type: "task_result" }
);
return finalAnswer;
}
Memory comparison
┌─────────────────────────────────────────────────────────────────────┐
│ MEMORY TYPES COMPARED │
│ │
│ SHORT-TERM (within one task) LONG-TERM (across tasks) │
│ ───────────────────────────── ───────────────────────── │
│ Stored in: message array Stored in: vector DB / │
│ database │
│ Lifespan: single agent run Lifespan: persistent │
│ Size limit: context window Size limit: DB capacity │
│ Cost: included in LLM tokens Cost: DB storage + queries │
│ Use case: tracking current Use case: user preferences, │
│ task progress past research, learned facts │
│ │
│ BOTH are essential for sophisticated agents. │
│ Start with short-term only. Add long-term when needed. │
└─────────────────────────────────────────────────────────────────────┘
5. Component 4: Planning
Planning is the agent's ability to break down a complex task into smaller, manageable sub-tasks and execute them in the right order.
Implicit planning (let the LLM figure it out)
In the simplest agents, planning is implicit — the LLM decides the next step at each iteration based on what it has observed so far. This works for 3-5 step tasks.
// Implicit planning: the LLM decides each step on the fly
// System prompt guides it, but there's no explicit plan
const systemPrompt = `You are a research assistant.
When given a research question:
1. Think about what information you need.
2. Use tools to gather that information.
3. When you have enough, synthesize and answer.
Think step by step. Explain your reasoning before each action.`;
Explicit planning (plan first, then execute)
For complex tasks (10+ steps), the agent creates an explicit plan upfront, then executes it step by step.
// Explicit planning: generate a plan, then execute each step
async function planAndExecuteAgent(userQuery, tools) {
// Phase 1: PLAN — Ask the LLM to create a plan
const planResponse = await client.chat.completions.create({
model: "gpt-4o",
temperature: 0,
messages: [
{
role: "system",
content: `You are a planning assistant. Given a user's task, create a step-by-step plan.
Output JSON: { "steps": [{ "step": 1, "description": "...", "tool": "tool_name or null" }] }
Available tools: ${tools.map((t) => t.function.name).join(", ")}`,
},
{ role: "user", content: userQuery },
],
});
const plan = JSON.parse(planResponse.choices[0].message.content);
console.log("PLAN:", plan.steps);
// Phase 2: EXECUTE — Execute each step
const results = [];
for (const step of plan.steps) {
console.log(`Executing step ${step.step}: ${step.description}`);
if (step.tool) {
// Execute the tool (the LLM generates arguments in a separate call)
const toolResult = await executeStepWithTool(step, tools, results);
results.push({ step: step.step, result: toolResult });
} else {
// Pure reasoning step — the LLM synthesizes from previous results
const reasoning = await executeReasoningStep(step, results);
results.push({ step: step.step, result: reasoning });
}
}
// Phase 3: SYNTHESIZE — Combine all results into a final answer
const finalResponse = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content: "Synthesize the following research results into a clear, comprehensive answer.",
},
{
role: "user",
content: `Original question: ${userQuery}\n\nResults:\n${JSON.stringify(results, null, 2)}`,
},
],
});
return finalResponse.choices[0].message.content;
}
Adaptive planning (plan, execute, re-plan)
The most sophisticated approach: plan, execute some steps, observe results, and revise the plan if needed.
┌──────────────────────────────────────────────────────────────────┐
│ ADAPTIVE PLANNING │
│ │
│ 1. Create initial plan │
│ [Step A] -> [Step B] -> [Step C] -> [Step D] │
│ │
│ 2. Execute Step A -> success │
│ │
│ 3. Execute Step B -> unexpected result! │
│ │
│ 4. RE-PLAN: Original Step C is no longer relevant. │
│ New plan: │
│ [Step B'] -> [Step E] -> [Step F] │
│ │
│ 5. Execute new plan... │
│ │
│ ANALOGY: Like a GPS that recalculates when you miss a turn. │
└──────────────────────────────────────────────────────────────────┘
6. The Agent Loop: Putting It All Together
Here is a complete agent loop implementation that connects all four components:
import OpenAI from "openai";
const client = new OpenAI();
class Agent {
constructor({ systemPrompt, tools, maxIterations = 15, model = "gpt-4o" }) {
this.systemPrompt = systemPrompt;
this.tools = tools;
this.maxIterations = maxIterations;
this.model = model;
}
async run(userQuery) {
// Initialize short-term memory (message history)
const messages = [
{ role: "system", content: this.systemPrompt },
{ role: "user", content: userQuery },
];
// Format tools for the API
const toolDefinitions = this.tools.map((tool) => ({
type: "function",
function: {
name: tool.name,
description: tool.description,
parameters: tool.parameters,
},
}));
console.log(`\n--- Agent started: "${userQuery}" ---`);
for (let iteration = 0; iteration < this.maxIterations; iteration++) {
console.log(`\n[Iteration ${iteration + 1}]`);
// REASON: Ask the LLM what to do next
const response = await client.chat.completions.create({
model: this.model,
messages,
tools: toolDefinitions.length > 0 ? toolDefinitions : undefined,
});
const choice = response.choices[0];
const assistantMessage = choice.message;
// Add the assistant's response to memory
messages.push(assistantMessage);
// Check: Did the agent decide to give a final answer?
if (choice.finish_reason === "stop") {
console.log("[Agent] Final answer reached.");
return assistantMessage.content;
}
// ACT: The agent chose to call one or more tools
if (assistantMessage.tool_calls) {
for (const toolCall of assistantMessage.tool_calls) {
const toolName = toolCall.function.name;
const toolArgs = JSON.parse(toolCall.function.arguments);
console.log(`[Tool Call] ${toolName}(${JSON.stringify(toolArgs)})`);
// EXECUTE the tool
const tool = this.tools.find((t) => t.name === toolName);
let result;
if (!tool) {
result = { error: `Tool "${toolName}" not found` };
} else {
try {
result = await tool.execute(toolArgs);
} catch (error) {
result = { error: `Tool failed: ${error.message}` };
}
}
console.log(`[Tool Result] ${JSON.stringify(result).slice(0, 200)}`);
// OBSERVE: Feed the result back into memory
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
}
}
return "Agent reached maximum iterations without completing the task.";
}
}
// Usage
const agent = new Agent({
systemPrompt: "You are a helpful research assistant with access to tools. Think step by step.",
tools: [
{
name: "web_search",
description: "Search the web for current information",
parameters: {
type: "object",
properties: { query: { type: "string" } },
required: ["query"],
},
execute: async ({ query }) => {
// Real implementation calls a search API
return { results: [{ title: "Result 1", snippet: "..." }] };
},
},
],
maxIterations: 10,
});
const answer = await agent.run("What is the current GDP of Japan?");
console.log("\nFINAL ANSWER:", answer);
7. Architecture Diagram: Full Agent System
┌──────────────────────────────────────────────────────────────────────────────────┐
│ COMPLETE AGENT ARCHITECTURE │
│ │
│ ┌──────────┐ │
│ │ USER │ │
│ └────┬─────┘ │
│ │ "Research X and send a summary email" │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ AGENT LOOP │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │
│ │ │ SHORT-TERM MEMORY (message history) │ │ │
│ │ │ [system] [user] [assistant] [tool] [assistant] [tool] ... │ │ │
│ │ └─────────────────────────┬───────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ LLM BRAIN │◄──── Long-term memory (optional) │ │
│ │ │ (GPT-4o, etc.) │ (vector DB recall) │ │
│ │ └────────┬────────┘ │ │
│ │ │ │ │
│ │ ┌─────────────┼─────────────┐ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │web_search│ │database │ │send_email│ ... more tools │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │ │ │ │
│ │ └─────────────┼─────────────┘ │ │
│ │ │ results │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ Back to LLM │ │ │
│ │ │ for next step │ │ │
│ │ └─────────────────┘ │ │
│ │ │ │
│ │ Loop repeats until LLM decides the task is complete │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ FINAL │ │
│ │ ANSWER │ │
│ └──────────┘ │
└──────────────────────────────────────────────────────────────────────────────────┘
8. Common Architecture Patterns
Pattern 1: Minimal agent (tools + loop only)
The simplest useful agent. No explicit planning, no long-term memory. Suitable for 80% of agent use cases.
// Minimal agent — just tools and a loop
// Good for: customer support, simple research, data lookup
const minimalAgent = new Agent({
systemPrompt: "Answer the user's question using the available tools.",
tools: [searchTool, calculatorTool],
maxIterations: 5,
});
Pattern 2: Plan-and-execute agent
Separates planning from execution. Better for complex, multi-step tasks.
// Plan-and-execute — two-phase approach
// Good for: research reports, multi-step analysis
// Phase 1: Planner LLM creates a list of steps
// Phase 2: Executor LLM runs each step with tools
Pattern 3: Agent with reflection
The agent reviews its own work before returning a final answer.
// Reflection pattern — agent critiques itself
async function agentWithReflection(query, tools) {
// Step 1: Agent produces a draft answer
const draftAnswer = await agent.run(query);
// Step 2: Reflection — a separate LLM call critiques the answer
const reflection = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content: `Review the following answer for accuracy, completeness, and clarity.
If there are issues, list them. If the answer is good, say "APPROVED".`,
},
{
role: "user",
content: `Question: ${query}\nAnswer: ${draftAnswer}`,
},
],
});
const critique = reflection.choices[0].message.content;
if (critique.includes("APPROVED")) {
return draftAnswer;
}
// Step 3: Revise based on critique
const revisedAnswer = await agent.run(
`${query}\n\nPrevious answer had these issues: ${critique}\nPlease address them.`
);
return revisedAnswer;
}
9. Security Considerations
Agents that use tools introduce security risks that single LLM calls do not have:
| Risk | Example | Mitigation |
|---|---|---|
| Prompt injection | User tricks agent into calling dangerous tools | Input sanitization, tool-level permissions |
| SQL injection | Agent generates malicious SQL | Read-only access, parameterized queries, query validation |
| Unbounded loops | Agent loops forever, burning tokens | maxIterations limit, token budget per task |
| Data exfiltration | Agent leaks sensitive data via email tool | Output filtering, human-in-the-loop for sensitive actions |
| Privilege escalation | Agent does more than intended | Principle of least privilege — only register needed tools |
| Cost explosion | Agent calls expensive APIs repeatedly | Per-task cost budget, rate limiting |
// Security: Always set hard limits
const SECURITY_LIMITS = {
maxIterations: 15, // Never loop more than 15 times
maxTokenBudget: 50000, // Never spend more than 50K tokens per task
maxToolCalls: 20, // Never call more than 20 tools per task
allowedTools: ["search", "calculator", "read_url"], // Whitelist only
requireApproval: ["send_email", "database_write"], // Human approval needed
};
10. Key Takeaways
- Four core components: Every agent has an LLM brain (reasoning), tools (actions), memory (context), and planning (decomposition). All are connected by the agent loop.
- The LLM brain is the controller. It decides what to do, picks tools, interprets results, and determines when the task is complete.
- Tools must be well-designed. Clear descriptions, specific parameters, proper error handling, and security boundaries are essential for reliable tool use.
- Memory comes in two forms. Short-term (message history within one task) grows fast and needs compaction. Long-term (vector DB / database across tasks) adds sophistication but also complexity.
- Planning ranges from implicit to adaptive. Simple tasks need no explicit plan. Complex tasks benefit from explicit plan-then-execute. The most advanced agents re-plan when things change.
- Security is not optional. Agents that call tools can be exploited. Set hard limits on iterations, tokens, and tool access. Require human approval for dangerous actions.
Explain-It Challenge
- Draw (on paper or whiteboard) the agent loop with all four components. Trace a specific example task through the loop: "Find the cheapest hotel in Paris for next weekend."
- Explain why tool descriptions are so critical for agent performance. What happens if a tool description is vague or inaccurate?
- A colleague asks: "Why do we need long-term memory? Can't the agent just re-search everything each time?" Explain the trade-offs.
Navigation: <- 4.15.a Agent vs Single LLM Call | 4.15.c -- When to Use Agents ->