Episode 4 — Generative AI Engineering / 4.7 — Function Calling Tool Calling

4.7.b --- When to Use Tool Calling

In one sentence: Use tool calling when the user's request requires a deterministic action (database query, API call, calculation, code execution) --- and stick with plain text generation when the LLM's natural language output is the desired result.

Navigation: <- 4.7.a --- What Is Tool Calling | 4.7.c --- Deterministic Tool Invocation ->

1. The Core Decision: Does the Task Require Execution?

The simplest way to decide whether to use tool calling:

Does the user's request require EXECUTING something
outside of generating text?

  YES --> Use tool calling
  NO  --> Use plain text generation (possibly with structured output)

"Executing something" means: querying a database, calling an API, performing a calculation, writing to a file, sending a notification, or any other side effect that the LLM cannot do by generating text alone.

2. When to Use Tool Calling

Category 1: Data retrieval

The user needs information that exists in your system but is not in the model's training data.

User: "What's my account balance?"
--> Tool call: getAccountBalance(userId)
--> Execute: query database
--> Return: $4,523.67
--> LLM formats: "Your current account balance is $4,523.67."

User: "Show me my last 5 orders"
--> Tool call: getOrders(userId, limit: 5)
--> Execute: query database
--> Return: [order1, order2, ...]
--> LLM formats: "Here are your last 5 orders: ..."

Category 2: External API calls

The user needs data from a third-party service.

User: "What's the weather in London?"
--> Tool call: getWeather(city: "London")
--> Execute: call weather API
--> Return: { temp: 15, condition: "cloudy" }
--> LLM formats: "It's currently 15 degrees C and cloudy in London."

User: "Search for flights to Tokyo next week"
--> Tool call: searchFlights(destination: "Tokyo", dateRange: "next week")
--> Execute: call flights API
--> Return: [flight1, flight2, ...]
--> LLM formats: "I found these flights to Tokyo: ..."

Category 3: Precise calculations

LLMs are notoriously unreliable at math. Tool calling lets you use real code.

User: "What's 15% tip on a $47.83 bill split 3 ways?"
--> Tool call: calculateTip(amount: 47.83, tipPercent: 15, splitWays: 3)
--> Execute: (47.83 * 1.15) / 3 = 18.34
--> LLM formats: "Each person pays $18.34 (including 15% tip)."

User: "Convert 500 EUR to JPY"
--> Tool call: convertCurrency(amount: 500, from: "EUR", to: "JPY")
--> Execute: call exchange rate API, calculate
--> Return: 81,250 JPY
--> LLM formats: "500 EUR is approximately 81,250 JPY."

Category 4: Actions and mutations

The user wants to change something in your system.

User: "Schedule a meeting for tomorrow at 2pm"
--> Tool call: createEvent(title: "Meeting", date: "tomorrow", time: "14:00")
--> Execute: create calendar event
--> Return: { eventId: "evt_123", confirmed: true }
--> LLM formats: "Done! I've scheduled your meeting for tomorrow at 2:00 PM."

User: "Send a reminder to the team about the deadline"
--> Tool call: sendNotification(recipients: "team", message: "...")
--> Execute: send notification
--> Return: { sent: true, recipientCount: 8 }
--> LLM formats: "Reminder sent to 8 team members."

Category 5: Deterministic text processing with business rules

When text processing must follow exact rules, not probabilistic generation.

User: "Improve my dating bio: 'I like hiking and coffee'"
--> Tool call: improveBio(currentBio: "I like hiking and coffee", tone: "witty")
--> Execute: your function applies specific templates, character limits,
    banned word filters, and other business rules
--> Return: "Weekend adventurer fueled by espresso. If you can keep up 
    on a trail, you can definitely keep up in conversation."
--> LLM formats the result naturally for the user

User: "Check if this message is okay to send"
--> Tool call: moderateText(text: "...")
--> Execute: run through your moderation pipeline with exact rules
--> Return: { safe: false, reason: "Contains personal contact info" }
--> LLM formats: "I'd suggest removing the phone number before sending."

3. When NOT to Use Tool Calling

Pure text generation

If the desired output is text and no external action is needed, tool calling adds unnecessary complexity.

User: "Write me a poem about autumn"
--> Just let the LLM generate text. No tool needed.

User: "Explain quantum computing in simple terms"
--> Just let the LLM generate text. No tool needed.

User: "Translate this to French: 'Hello, how are you?'"
--> Just let the LLM generate text. No tool needed.
    (Unless you need a specific translation API for quality/consistency)

Simple classification or extraction

If you just need structured data from the LLM's reasoning, use structured output (response_format / JSON mode) instead of tool calling.

// Use structured output, NOT tool calling
User: "Is this email spam or legitimate?"
--> response_format: { type: 'json_schema', ... }
--> Returns: { "classification": "spam", "confidence": 0.95 }

// Tool calling would be overkill here -- no function needs to execute.
// The LLM's judgment IS the result.

Conversational filler

Greetings, follow-ups, and social conversation do not need tools.

User: "Thanks for helping me!"
--> LLM responds: "You're welcome! Let me know if you need anything else."
    No tool needed.

User: "Can you explain what you just did?"
--> LLM responds with an explanation based on the conversation history.
    No tool needed.

4. Decision Framework

Use this flowchart to decide whether a task needs tool calling:

+------------------------------------------------------------------------+
|                     TOOL CALLING DECISION FRAMEWORK                     |
|                                                                         |
|  User sends a message                                                   |
|        |                                                                |
|        v                                                                |
|  Does the response require data NOT in the LLM's training?             |
|        |                                                                |
|    YES --> Does the data come from YOUR system or an external API?      |
|    |           |                                                        |
|    |       YES --> USE TOOL CALLING (data retrieval tool)               |
|    |       NO  --> Consider RAG instead                                 |
|    |                                                                    |
|    NO                                                                   |
|        |                                                                |
|        v                                                                |
|  Does the response require a SIDE EFFECT?                               |
|  (creating, updating, deleting, sending, scheduling)                    |
|        |                                                                |
|    YES --> USE TOOL CALLING (action tool)                               |
|    |                                                                    |
|    NO                                                                   |
|        |                                                                |
|        v                                                                |
|  Does the response require PRECISE COMPUTATION?                         |
|  (math, date calculations, unit conversions)                            |
|        |                                                                |
|    YES --> USE TOOL CALLING (calculation tool)                          |
|    |                                                                    |
|    NO                                                                   |
|        |                                                                |
|        v                                                                |
|  Does the response require EXACT BUSINESS RULES?                        |
|  (validation, compliance checks, format enforcement)                    |
|        |                                                                |
|    YES --> USE TOOL CALLING (business logic tool)                       |
|    |                                                                    |
|    NO                                                                   |
|        |                                                                |
|        v                                                                |
|  Is the desired output STRUCTURED DATA for downstream code?             |
|        |                                                                |
|    YES --> Use STRUCTURED OUTPUT (response_format / JSON mode)          |
|    |       (Not tool calling -- no function needs to execute)           |
|    |                                                                    |
|    NO                                                                   |
|        |                                                                |
|        v                                                                |
|  PLAIN TEXT GENERATION -- no tool calling needed                        |
+------------------------------------------------------------------------+

5. Tool Calling vs Other Approaches

Tool calling vs prompt engineering

Situation	Approach	Why
User asks for a joke	Prompt engineering	The text IS the output
User asks for account balance	Tool calling	Requires database query
User asks for writing feedback	Prompt engineering (or tool calling)	LLM judgment is the output, BUT if feedback must follow business rules, use a tool
User asks to book a flight	Tool calling	Requires API call + mutation

Tool calling vs structured output (JSON mode)

Situation	Approach	Why
Classify sentiment as positive/negative/neutral	Structured output	LLM's judgment is the result; no execution needed
Extract names and dates from text	Structured output	LLM's extraction is the result; no execution needed
Look up a user's order status	Tool calling	Requires querying your order database
Generate 3 bio variations and save the best	Tool calling	"Save" requires executing a write operation

Tool calling vs RAG

Situation	Approach	Why
Answer questions about your product docs	RAG	Static knowledge retrieval + LLM generation
Look up a specific user's subscription status	Tool calling	Dynamic, user-specific data
Find relevant case studies	RAG	Semantic search over knowledge base
Calculate shipping cost for an order	Tool calling	Requires computation with real-time data

Using them together

In real systems, you often combine all three:

// A single user request might involve:

// 1. RAG: Retrieve relevant product information
const productDocs = await vectorSearch(query);

// 2. Tool calling: Look up user's specific order
// (AI decides to call getOrder based on user message)
const order = await getOrder(userId, orderId);

// 3. Structured output: Format the final response
// (AI combines RAG context + order data into a structured reply)

6. The "Hybrid Zone": When It's Not Clear

Some tasks could go either way. Here are the considerations:

Should `improveBio()` be a tool or just a prompt?

Option A: Pure prompt (no tool)

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a dating profile expert. Improve bios.' },
    { role: 'user', content: 'Improve this bio: "I like hiking"' },
  ],
});
// Works fine for simple cases

Option B: Tool calling

// Define improveBio as a tool
// When the AI calls it, your function can:
//   - Enforce a 500-character limit
//   - Filter banned words
//   - Check against a database of existing bios for uniqueness
//   - Apply A/B tested templates
//   - Log the transformation for analytics
//   - Bill the user for the premium feature

When to choose the tool approach:

The function has business rules beyond what prompting can reliably enforce
The function needs to access external data (user preferences, existing bios)
The function has side effects (logging, billing, saving)
The function is part of a multi-function system where the AI must choose between several options
You need auditability --- knowing exactly which function ran with which arguments

7. Anti-Patterns: Misusing Tool Calling

Anti-pattern 1: Wrapping pure LLM tasks as tools

// BAD: This tool just asks the LLM to do something it can do natively
const tools = [{
  type: 'function',
  function: {
    name: 'generatePoem',
    description: 'Generate a poem about a given topic',
    parameters: {
      type: 'object',
      properties: {
        topic: { type: 'string' },
      },
    },
  },
}];

// Your "tool implementation":
async function generatePoem({ topic }) {
  // This just calls the LLM again!
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: `Write a poem about ${topic}` }],
  });
  return response.choices[0].message.content;
}
// PROBLEM: Two LLM calls instead of one. No benefit.

Anti-pattern 2: Too many tools

// BAD: 50+ tools overwhelms the model and increases latency
const tools = [
  { /* getUserName */ },
  { /* getUserEmail */ },
  { /* getUserPhone */ },
  { /* getUserAddress */ },
  { /* getUserAge */ },
  // ... 45 more single-field getters
];

// GOOD: Consolidate into fewer, more capable tools
const tools = [
  {
    type: 'function',
    function: {
      name: 'getUserProfile',
      description: 'Get user profile information. Returns all available fields.',
      parameters: {
        type: 'object',
        properties: {
          userId: { type: 'string' },
          fields: {
            type: 'array',
            items: { type: 'string', enum: ['name', 'email', 'phone', 'address', 'age'] },
            description: 'Which fields to retrieve (omit for all)',
          },
        },
        required: ['userId'],
      },
    },
  },
];

Anti-pattern 3: No-op tools

// BAD: Tool that doesn't actually do anything
const tools = [{
  type: 'function',
  function: {
    name: 'thinkAboutResponse',
    description: 'Think carefully before responding',
    parameters: { type: 'object', properties: {} },
  },
}];

// The model can "think" without a tool. This is wasted latency.

8. Real-World Use Case Map

Domain	Tool Calling Use Cases
Dating app	`improveBio()`, `generateOpeners()`, `moderateText()`, `reportProfile()`, `getMatchSuggestions()`
E-commerce	`searchProducts()`, `getOrderStatus()`, `processReturn()`, `calculateShipping()`, `applyDiscount()`
Customer support	`lookupTicket()`, `escalateToAgent()`, `updateTicketStatus()`, `searchKnowledgeBase()`
Finance	`getAccountBalance()`, `transferFunds()`, `calculateInterest()`, `getTransactionHistory()`
Developer tools	`runCode()`, `queryDatabase()`, `searchDocs()`, `createGitHubIssue()`, `deployService()`
Healthcare	`lookupDrugInteractions()`, `schedulAppointment()`, `getLabResults()`, `calculateBMI()`

9. Key Takeaways

Use tool calling when the user's request requires data retrieval, API calls, calculations, mutations, or enforcing business rules --- anything beyond pure text generation.
Do not use tool calling for pure text generation, simple classification, or conversational responses where the LLM's output is the final result.
Structured output (JSON mode) is for when you need structured data from the LLM's reasoning. Tool calling is for when you need to execute a function.
Consolidate tools --- fewer well-designed tools outperform many narrow ones.
In the hybrid zone, choose tool calling when you need business rules, side effects, external data, auditability, or multi-function routing.

Explain-It Challenge

A junior developer asks: "Why can't I just ask the LLM to calculate the tip instead of making a calculateTip tool?" Explain when LLM math is sufficient and when it is not.
Your product manager wants 30 separate tools for every feature in the app. Why is this a problem, and how would you redesign it?
Walk through the decision framework for this user message: "My order #12345 is late --- can you check the status and give me a discount code?"