Episode 4 — Generative AI Engineering / 4.7 — Function Calling Tool Calling

4.7 --- Function Calling / Tool Calling: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps --- reopen README.md, then 4.7.a through 4.7.e.
Practice --- 4.7-Exercise-Questions.md.
Polish answers --- 4.7-Interview-Questions.md.

Core principle

LLMs generate text --- they CANNOT execute code, call APIs, or query databases.
Tool calling bridges this gap:
  AI decides WHAT function to call + arguments  (probabilistic reasoning)
  Your code decides HOW to execute it           (deterministic logic)

Core vocabulary

Term	One-liner
Tool calling	API feature where the model returns a structured function name + arguments for your code to execute
Function calling	Original (deprecated) OpenAI term for the same concept; used the `functions` parameter
Tool use	Anthropic's (Claude) term for tool calling
`tools`	API parameter --- array of tool definitions (JSON Schema) sent with each request
`tool_choice`	Controls routing: `'auto'` (model decides), `'none'` (no tools), `'required'` (must call), or `{ type: 'function', function: { name: '...' } }`
`tool_calls`	Array in the assistant's response containing function name, arguments, and unique call ID
`tool_call_id`	Unique ID that links a tool result back to the tool call that requested it
`finish_reason`	`"tool_calls"` when model wants a function executed; `"stop"` for text responses
Router pattern	Architecture where the LLM classifies intent and dispatches to the right handler function
Hybrid logic	AI handles reasoning (what) + code handles execution (how) --- the core production pattern

The six-step tool calling flow

Step 1: DEFINE    tools with JSON Schema (name, description, parameters)
Step 2: SEND      messages + tools + tool_choice to LLM API
Step 3: MODEL     returns tool_calls (finish_reason: "tool_calls")
                  OR text response (finish_reason: "stop")
Step 4: EXECUTE   function in YOUR code (parse args, validate, call handler)
Step 5: RETURN    result to model via { role: 'tool', tool_call_id, content }
Step 6: MODEL     generates final natural-language response using the result

Tool definition structure

const tools = [
  {
    type: 'function',
    function: {
      name: 'improveBio',                          // Function name
      description: 'Improve a dating profile bio',  // Model reads this to decide
      parameters: {
        type: 'object',
        properties: {
          currentBio: { type: 'string', description: 'Current bio text' },
          tone: {
            type: 'string',
            enum: ['witty', 'sincere', 'adventurous'],  // Constrained values
          },
        },
        required: ['currentBio'],       // Required arguments
        additionalProperties: false,    // No extra fields
      },
    },
  },
];

Tool definition best practices

Practice	Why
Detailed `description` with trigger phrases	Model uses description to decide which tool to call
Use `enum` for constrained values	Prevents model from inventing invalid values
Mark `required` fields	Model knows which arguments it must provide
Set `additionalProperties: false`	Prevents unexpected extra fields
Add `description` to every property	Helps model extract the right data from user input
Keep 3--5 parameters per tool	More parameters = more chances for errors

Message structure for tool calling

// First API call (routing)
messages: [
  { role: 'system', content: '...' },
  { role: 'user', content: 'Improve my bio: "I like hiking"' },
]

// Model response (tool call)
{
  role: 'assistant',
  content: null,                    // null when calling tools
  tool_calls: [{
    id: 'call_abc123',
    type: 'function',
    function: {
      name: 'improveBio',
      arguments: '{"currentBio":"I like hiking","tone":"witty"}' // JSON STRING
    }
  }]
}

// Second API call (with tool result)
messages: [
  { role: 'system', content: '...' },
  { role: 'user', content: '...' },
  { role: 'assistant', content: null, tool_calls: [{ ... }] },
  {
    role: 'tool',                      // Special role
    tool_call_id: 'call_abc123',       // MUST match
    content: '{"improved":"..."}',     // MUST be a string
  },
]

When to use tool calling vs JSON mode vs plain text

Scenario	Approach	Why
"Write me a poem"	Plain text	Text IS the output
"Classify this email as spam"	Structured output (JSON mode)	LLM judgment is the result; no function needed
"What is my account balance?"	Tool calling	Requires database query
"Schedule a meeting at 2pm"	Tool calling	Requires creating an event (side effect)
"Calculate 15% tip on $47.83"	Tool calling	LLMs are unreliable at math
"Improve my dating bio"	Tool calling	Needs business rules, validation, logging
"Thanks for helping!"	Plain text	Conversational; no action needed

Decision rule: If the task requires data retrieval, API calls, calculations, mutations, or enforcing business rules --- use tool calling. If the LLM's text output IS the result --- skip it.

Decision flowchart

Does the task require data NOT in the LLM's training?
  YES --> Tool calling (data retrieval)
  NO  -->
Does the task require a SIDE EFFECT?
  YES --> Tool calling (action tool)
  NO  -->
Does the task require PRECISE COMPUTATION?
  YES --> Tool calling (calculation tool)
  NO  -->
Does the task require EXACT BUSINESS RULES?
  YES --> Tool calling (business logic tool)
  NO  -->
Is the output STRUCTURED DATA for downstream code?
  YES --> Structured output (JSON mode)
  NO  --> Plain text generation

Deterministic invocation --- key code

// Parse arguments safely (ALWAYS wrap in try/catch)
function safeParseArguments(argsString) {
  try {
    return { success: true, data: JSON.parse(argsString) };
  } catch (error) {
    return { success: false, error: error.message };
  }
}

// Dispatch and execute
const functionMap = { improveBio, generateOpeners, moderateText };

const toolCall = message.tool_calls[0];
const parsed = safeParseArguments(toolCall.function.arguments);

if (!parsed.success) {
  // Return error as tool result
}
if (!functionMap[toolCall.function.name]) {
  // Return unknown-function error as tool result
}
const result = await functionMap[toolCall.function.name](parsed.data);

Parallel tool calls

// Model can return multiple tool_calls in one response
// Execute all in parallel, return ALL results

const toolResults = await Promise.all(
  assistantMessage.tool_calls.map(async (tc) => {
    const args = JSON.parse(tc.function.arguments);
    const result = await functionMap[tc.function.name](args);
    return {
      role: 'tool',
      tool_call_id: tc.id,          // Each result must match its call
      content: JSON.stringify(result),
    };
  })
);

// Disable parallel calls: parallel_tool_calls: false

Hybrid logic patterns

Pattern	What happens	Best for
AI routes, code executes entirely	Function is pure deterministic logic (regex, DB query, math)	`moderateText()`, `getAccountBalance()`, `calculateTip()`
AI routes, code orchestrates AI	Function uses another LLM call with a specialized prompt, wrapped in guardrails	`improveBio()`, `generateOpeners()`, content generation
AI routes, code chains steps	Function runs a multi-step pipeline: validate, generate, filter, save, log	`processProfileUpdate()`, complex workflows

The AI decision boundary

BEFORE the boundary (AI's domain):
  - Understand natural language
  - Classify intent (which function?)
  - Extract arguments from text
  - Handle ambiguity

AFTER the boundary (Code's domain):
  - Validate input (length, format, required fields)
  - Enforce business rules (character limits, banned words)
  - Query databases (user status, premium tier)
  - Execute computations (exact math)
  - Call external APIs
  - Log analytics, enforce rate limits, bill users

Tool router architecture

User Message
     |
     v
Input Validator ---- rejects empty/too-long messages
     |
     v
LLM Router --------- decides which tool(s) to call
     |
     v
Tool Handlers ------- execute function with validation
     |
     v
Result Logger ------- logs tool call + result
     |
     v
LLM Formatter ------- turns result into natural response
     |
     v
Final response to user

Error handling layers (production router)

Layer 1: Input Validation       --- reject bad input before API call
Layer 2: API Errors             --- try/catch, retry with backoff
Layer 3: Argument Parsing       --- safeParseArguments(), return error as tool result
Layer 4: Unknown Functions      --- check handlerMap, list available functions
Layer 5: Function Execution     --- try/catch around handler, return error as tool result
Layer 6: Result Validation      --- truncate oversized results (max ~4000 chars)
Layer 7: Final Response Fallback --- if formatting LLM fails, return raw results

Key rule: Return errors as tool role messages, not thrown exceptions. The model can then explain the problem naturally to the user.

Token cost of tool calling

Tool definition overhead:
  ~100-200 tokens per tool (3 params, description)
  3 tools = ~450 tokens added to EVERY API call

Cost per interaction (improveBio example, 3 LLM calls):
  Call 1: Routing       ~$0.0025
  Call 2: Generation    ~$0.0015
  Call 3: Formatting    ~$0.004
  Total:                ~$0.008

At 100,000 interactions/day: ~$800/day

Reduce costs by:

Use cheaper model (gpt-4o-mini) for routing
Include only relevant tools per request
Use tool_choice: 'required' when intent is obvious
Cache results for identical inputs
Keep tool descriptions concise

`tool_choice` quick reference

Value	Behavior	Use when
`'auto'`	Model decides	General-purpose assistant (default)
`'none'`	No tools called	You want text-only despite tools being present
`'required'`	Must call at least one tool	You know a tool is needed
`{ type: 'function', function: { name: '...' } }`	Must call this specific tool	UI context makes intent unambiguous

Production patterns

// Context-aware tool selection (premium vs free)
function getToolsForUser(user) {
  const base = [moderateTextTool, getProfileTipsTool];
  if (user.isPremium) {
    base.push(improveBioTool, generateOpenersTool);
  }
  return base;
}

// Rate limiting (per-user, per-tool)
function checkRateLimit(userId, toolName, limit = 10, windowMs = 60000) {
  const key = `${userId}:${toolName}`;
  const recent = calls.get(key)?.filter((ts) => Date.now() - ts < windowMs) || [];
  if (recent.length >= limit) return { allowed: false };
  recent.push(Date.now());
  calls.set(key, recent);
  return { allowed: true };
}

Common gotchas

Gotcha	Why
`arguments` is a JSON string, not an object	Must `JSON.parse()` before use; can be malformed
Mismatched `tool_call_id`	Tool result must reference the exact `id` from the assistant's tool call
`content` in tool result must be a string	`JSON.stringify()` objects before returning
Tool definitions consume tokens on EVERY call	Include only tools needed for the current context
LLM character counts are unreliable	Enforce limits in code (`bio.slice(0, 500)`), not in prompts
Too many tools degrade routing accuracy	5--10 well-scoped tools outperform 30+ narrow ones
AI might hallucinate function names	Always check `handlerMap[fnName]` before executing
`functions` param is deprecated	Use `tools` param (current standard)
Putting all business rules in the system prompt	Rules are probabilistic in prompts; deterministic in code
Tool that just calls the LLM again (no-op wrapper)	Anti-pattern: two LLM calls for zero benefit

Testing routing accuracy

const testCases = [
  { input: 'Make my bio better: "I like coffee"', expectedTool: 'improveBio' },
  { input: 'Help me message a rock climber', expectedTool: 'generateOpeners' },
  { input: 'Is "Venmo me @john" safe?', expectedTool: 'moderateText' },
  { input: 'Thanks!', expectedTool: null },
];

// Run at temperature: 0 for deterministic results
// Test routing accuracy SEPARATELY from handler correctness

Anti-patterns

Anti-pattern	Problem	Fix
Wrapping pure LLM tasks as tools	`generatePoem()` tool just calls the LLM again --- double cost, zero benefit	Let the LLM generate text directly
Too many tools (30+)	Overwhelms the model; degrades routing accuracy; increases token cost	Consolidate into fewer, more capable tools
No-op tools	`thinkAboutResponse()` tool that does nothing --- wasted latency	Remove; the model can reason without a tool
One field per tool	`getName()`, `getEmail()`, `getPhone()` --- 50 tools for one entity	Consolidate: `getUserProfile(userId, fields[])`

End of 4.7 quick revision.