Episode 4 — Generative AI Engineering / 4.7 — Function Calling Tool Calling

4.7 --- Exercise Questions: Function Calling / Tool Calling

Practice questions for all five subtopics in Section 4.7. Mix of conceptual, code-based, and design tasks.

How to use this material (instructions)

  1. Read lessons in order --- README.md, then 4.7.a through 4.7.e.
  2. Answer closed-book first --- then compare to the matching lesson.
  3. Build the code examples --- hands-on practice with the OpenAI API.
  4. Interview prep --- 4.7-Interview-Questions.md.
  5. Quick review --- 4.7-Quick-Revision.md.

4.7.a --- What Is Tool Calling (Q1--Q10)

Q1. Define tool calling (function calling) in one sentence. What is the model's role vs your code's role?

Q2. A colleague says: "Tool calling means the AI runs my JavaScript functions." Why is this wrong? What actually happens?

Q3. List three things the LLM can do in a tool calling workflow and three things it cannot do.

Q4. Explain the difference between the old functions parameter and the current tools parameter in the OpenAI API. Why was the change made?

Q5. Why is tool calling considered a bridge between natural language and deterministic code execution? Draw the bridge in your own words.

Q6. The model decides to call improveBio({ currentBio: "I like dogs", tone: "witty" }). In the API response, what data structure contains this information? What is the finish_reason?

Q7. Name five real-world actions that are impossible for an LLM to perform without tool calling.

Q8. Comparison: You prompt the model with "Return JSON: {function, args}" vs using the tools parameter. Give three specific ways the tools approach is more reliable.

Q9. What does it mean to say "the LLM is a smart router"? Explain with an example involving improveBio(), generateOpeners(), and moderateText().

Q10. Which of these models support tool calling: GPT-4o, GPT-3.5-turbo, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 2? For any that do not, explain why.


4.7.b --- When to Use Tool Calling (Q11--Q20)

Q11. State the simplest rule for deciding whether a task needs tool calling.

Q12. Classify each request as "tool calling," "structured output," or "plain text generation":

  • (a) "What's my account balance?"
  • (b) "Write me a poem about autumn"
  • (c) "Classify this email as spam or not spam"
  • (d) "Schedule a meeting for tomorrow at 2pm"
  • (e) "Convert 500 EUR to JPY"

Q13. Why should calculateTip(amount, percent, splitWays) be a tool rather than asking the LLM to compute the tip directly?

Q14. Give two examples where using tool calling would be overkill (anti-pattern) and explain why.

Q15. A product manager wants 40 separate tools (one for each field in the user profile: getName, getEmail, getPhone, etc.). What is the problem and how would you redesign?

Q16. Explain the difference between tool calling and RAG for answering user questions. When would you use each? When would you combine them?

Q17. Your team is building a customer support bot. List five tools you would define and explain what each one does.

Q18. When should improveBio be a tool vs just a prompt instruction? List three conditions that tip the decision toward making it a tool.

Q19. Decision framework: Walk through the decision flowchart for this user message: "My order #12345 hasn't arrived. Can you check the status and email me an update?"

Q20. Anti-pattern analysis: A developer created a generatePoem tool whose implementation just calls the LLM with "Write a poem about ${topic}". What is wrong with this design?


4.7.c --- Deterministic Tool Invocation (Q21--Q32)

Q21. List the six steps of the tool calling flow in order.

Q22. Write a JSON Schema for a searchProducts tool that takes query (string, required), category (enum: electronics/clothing/books, optional), and maxPrice (number, optional).

Q23. What is the tool_choice parameter? List all four possible values and when to use each.

Q24. The model returns arguments: '{"currentBio":"I like dogs","tone":"witty"}'. This is a string, not an object. Why? What must you do before using the arguments?

Q25. In the message array for the second API call, what three messages must be present (after the original system + user messages)?

Q26. What is the tool_call_id and why is it critical when returning tool results?

Q27. Write code that safely parses tool call arguments, handling the case where the JSON is malformed. Return a structured error instead of crashing.

Q28. The model returns two tool calls in a single response: improveBio and moderateText. Write the code to execute both in parallel using Promise.all and return both results.

Q29. Calculation: Your tool definitions include 5 tools with 4 parameters each. Each tool definition is approximately 150 tokens. You make 50,000 API calls per day at $2.50/1M input tokens. What is the daily cost just for tool definitions?

Q30. Write a tool role message for returning this result: { safe: false, issues: ["Phone number detected"] } with tool_call_id "call_xyz789".

Q31. What happens if you send a tool role message with a tool_call_id that does not match any tool call in the assistant's message?

Q32. Multi-turn: Write the complete message array for a three-step conversation: (1) user asks to improve bio, (2) model calls improveBio, you return the result, (3) user asks "make it more sincere instead."


4.7.d --- Hybrid Logic (Q33--Q40)

Q33. Explain the hybrid principle in one sentence: what does the AI handle and what does the code handle?

Q34. A developer puts all business rules (character limits, banned words, rate limiting) in the system prompt. List three specific things that will go wrong in production.

Q35. Draw the AI decision boundary for this scenario: user says "Improve my bio: I like coffee." Mark which steps are AI's job and which are code's job.

Q36. Your improveBio() function sometimes returns bios with 600+ characters despite a 500-character instruction in the AI prompt. Why does this happen and how does the hybrid approach fix it?

Q37. Name and describe three hybrid patterns: (a) AI routes + code executes entirely, (b) AI routes + code orchestrates AI, (c) AI routes + code chains multiple steps. Give an example tool for each.

Q38. Write a test case that verifies the AI routes "Help me message a rock climber" to generateOpeners and not to improveBio or moderateText. Include the assertion logic.

Q39. Cost analysis: A hybrid tool call requires 3 LLM calls (routing, bio generation inside the function, final formatting). Each costs roughly $0.003. At 100,000 interactions/day, what is the daily cost? How would you reduce it?

Q40. The user says "Improve my bio AND check if 'Hey, Venmo me @john' is safe." The AI must call two tools. Describe what happens at each step of the hybrid flow.


4.7.e --- Building an AI Tool Router (Q41--Q50)

Q41. List the seven layers of error handling in a production tool router.

Q42. Why should errors be returned as tool results (to the model) rather than thrown as exceptions?

Q43. Write a safeParseArguments function that returns { success: true, data } or { success: false, error } instead of throwing.

Q44. A user is on the free tier. Your router has 5 tools but free users should only access 2. Write the code that dynamically selects which tools to include in the API call.

Q45. Implement a simple rate limiter that allows a maximum of 10 improveBio calls per user per minute. Use an in-memory Map.

Q46. Write a validateToolResult function that truncates results longer than 4000 characters and logs a warning.

Q47. Your tool router logs show that improveBio is called 150 times/day with a 2% error rate and 1200ms average latency, while moderateText is called 300 times/day with 0% error rate and 5ms latency. What conclusions can you draw and what actions would you take?

Q48. Design: The model sometimes chains tool calls (calls a second tool after seeing the first result). Write the router logic to handle this recursively, with a maximum depth of 3.

Q49. Write the complete test suite for a router with 5 tools. Include: routing accuracy tests (which tool is called), error handling tests (empty input, unknown function), and parallel call tests.

Q50. Production readiness checklist: List 10 things you would verify before deploying an AI tool router to production.


Answer Hints

QHint
Q1Model decides WHAT function + args; your code EXECUTES the function
Q6message.tool_calls[0].function.name and message.tool_calls[0].function.arguments; finish_reason: 'tool_calls'
Q12(a) tool calling, (b) plain text, (c) structured output, (d) tool calling, (e) tool calling
Q15Consolidate into getUserProfile(userId, fields[]) --- fewer tools = better routing
Q22{ type: 'object', properties: { query: { type: 'string' }, category: { type: 'string', enum: [...] }, maxPrice: { type: 'number' } }, required: ['query'] }
Q23'auto' (default), 'none' (text only), 'required' (must call), { type: 'function', function: { name: '...' } } (specific tool)
Q25Assistant message with tool_calls, tool result message with matching tool_call_id, (these two after system+user)
Q295 tools x 150 tokens = 750 tokens/call; 750 x 50,000 = 37.5M tokens; 37.5 x $2.50/1M = $93.75/day
Q36LLM instruction is probabilistic (it may overshoot). Code enforces bio.slice(0, 500) deterministically
Q39$0.003 x 3 x 100,000 = $900/day. Reduce by: using gpt-4o-mini for routing, caching, combining calls
Q41Input validation, API errors, argument parsing, unknown functions, function execution, result validation, final response fallback
Q47improveBio uses AI (slow, error-prone), moderateText is pure code (fast, reliable). Investigate improveBio errors; add caching for repeated inputs
Q50Input validation, error handling, rate limiting, logging, monitoring, alerting, auth, cost budgeting, tool_choice config, load testing

<- Back to 4.7 --- Function Calling / Tool Calling (README)