Episode 4 — Generative AI Engineering / 4.5 — Generating JSON Responses from LLMs

4.5 — Exercise Questions: Generating JSON Responses from LLMs

Practice questions for all five subtopics in Section 4.5. Mix of conceptual, coding, debugging, and design tasks.

How to use this material (instructions)

Read lessons in order — README.md, then 4.5.a → 4.5.e.
Answer closed-book first — then compare to the matching lesson.
Code the hands-on questions — run them against a real API if you can.
Interview prep — 4.5-Interview-Questions.md.
Quick review — 4.5-Quick-Revision.md.

4.5.a — JSON Mode (Q1–Q10)

Q1. What does response_format: { type: "json_object" } guarantee? What does it NOT guarantee?

Q2. What happens if you enable JSON mode in OpenAI's API but forget to mention "JSON" anywhere in your prompt? Why does OpenAI enforce this requirement?

Q3. You receive this response from an LLM without JSON mode enabled:

Sure! Here's the data you requested:

```json
{ "name": "Alice", "age": 30 }

Have a great day!


Write a JavaScript function that extracts and parses the JSON from this response.

**Q4.** Compare JSON mode vs Structured Outputs (`json_schema`). When would you choose each? Give one specific use case for each.

**Q5.** How does Anthropic (Claude) achieve JSON-only output without a native JSON mode parameter? Describe two techniques.

**Q6.** You're streaming a response with JSON mode enabled. At what point can you safely call `JSON.parse()` on the accumulated content? What happens if you try too early?

**Q7.** Your JSON mode response has `finish_reason: "length"`. What happened and what should your code do?

**Q8.** **Hands-on:** Write an OpenAI API call that uses JSON mode to extract the following fields from a movie review: `title`, `rating` (1-10), `sentiment` ("positive", "negative", "mixed"), and `key_points` (array of strings).

**Q9.** A developer says: "I'll just use regex to extract JSON from the response instead of using JSON mode." List three reasons why JSON mode is superior.

**Q10.** JSON mode adds `response_format` to the API call. Does this increase token usage? Does it affect pricing? Does it affect latency?

---

## 4.5.b — Schema-Based Prompting (Q11–Q20)

**Q11.** Why is JSON mode insufficient on its own — why do you still need a schema in the prompt?

**Q12.** You want the model to return `{ "first_name": "Alice", "last_name": "Smith" }` but it keeps returning `{ "name": "Alice Smith" }`. Write a system prompt that fixes this.

**Q13.** Compare these three schema-prompting strategies. Give one advantage of each:
- (a) Showing the exact JSON structure with placeholder types
- (b) Providing a concrete filled-in example
- (c) Using TypeScript-style type definitions

**Q14.** Write a system prompt using TypeScript-style type definitions for extracting restaurant review data with these fields: `restaurant_name`, `cuisine`, `rating` (1-5), `price_range` ("$" | "$$" | "$$$" | "$$$$"), `highlights` (array), `would_recommend` (boolean).

**Q15.** How should you handle optional fields in your schema prompt? Compare three approaches: (a) null for missing, (b) omit if missing, (c) default value like "N/A". Which do you recommend for programmatic use and why?

**Q16.** Why is it better to put the schema in the **system message** and the data in the **user message** rather than combining both in a single message?

**Q17.** You're building a few-shot prompt with examples. How many examples should you include for a simple extraction task? For a complex classification task? What's the trade-off of adding more examples?

**Q18.** **Hands-on:** Write a system prompt that uses both a TypeScript type definition AND a concrete example to guide the model in extracting job posting data: `title`, `company`, `location` (with city/state/remote fields), `salary` (min/max/currency), `skills` (array), `experience_years`.

**Q19.** The model returns `"age": "thirty"` instead of `"age": 30`. How would you modify your schema prompt to prevent this? Show the before and after.

**Q20.** You need to extract data from text in 5 different languages, all producing the same JSON schema. Should you include examples in every language? What's the most token-efficient approach?

---

## 4.5.c — Function Calling Basics (Q21–Q29)

**Q21.** In plain English, explain the difference between JSON mode and function calling. When does the model return `message.content` vs `message.tool_calls`?

**Q22.** What is `tool_choice` and what are its four possible values? Give a use case for each.

**Q23.** You define a tool called `search_users` with parameters `query`, `min_age`, `max_age`. The model returns:

```json
{
  "function": { "name": "search_users", "arguments": "{\"query\": \"hiking\", \"min_age\": 25}" }
}

Why is arguments a JSON string inside a JSON object? Write the code to extract the parsed arguments.

Q24. How does Anthropic's tool use differ from OpenAI's function calling in terms of: (a) how you define tools, (b) how the response is structured, (c) how you access the arguments?

Q25. Write the complete round-trip flow for a weather assistant: define the tool, make the initial call, handle the tool call response, execute the function, send the result back, and get the final text response.

Q26. A developer uses function calling with tool_choice: "required" but doesn't plan to actually execute any function. What pattern is this? Why is it useful for structured output?

Q27. The model returns two tool calls in a single response: get_weather("Paris") and get_events("Paris"). Write the code to handle both, execute them in parallel, and send both results back.

Q28. Explain why function calling provides better schema enforcement than JSON mode. What mechanisms enforce the schema?

Q29. You're choosing between JSON mode, Structured Outputs, and function calling for a feature that extracts structured data from user messages. The data is displayed in the UI — no real functions need to run. Which approach do you choose and why?

4.5.d — Validating Returned Structure (Q30–Q39)

Q30. List five things that can go wrong with AI-generated JSON even when JSON mode is enabled.

Q31. Write a safeJsonParse() function that returns { success, data, error } instead of throwing. Why is this pattern preferred over try/catch at every call site?

Q32. You receive { "age": "30", "score": "85.5", "active": "true" } but your schema expects { age: number, score: number, active: boolean }. Write a type coercion function that safely converts these values.

Q33. Design a validation function for this schema:

{
  "user_id": string (required, UUID format),
  "rating": integer (required, 1-5),
  "tags": array of strings (required, 1-10 items),
  "comment": string or null (optional),
  "recommend": boolean (required)
}

Include checks for: existence, type, range, format, and array length.

Q34. What is a "validate-or-retry loop"? Draw the flow and explain when to retry vs when to fail.

Q35. When the model's response fails validation, should you just call the API again with the same messages? Why or why not? What should you add to the messages?

Q36. You need to handle extra fields the model added (like _reasoning or personality_type). Write a stripExtraFields() function. Should you log the stripped fields? Why?

Q37. Debugging: Your validate-or-retry loop is stuck in an infinite loop — the model keeps returning the same invalid structure. What went wrong and how do you fix it?

Q38. Compare validation strategies for three risk levels:

(a) Low risk: displaying AI-suggested conversation starters in a UI
(b) Medium risk: saving extracted user data to a database
(c) High risk: calculating financial compatibility scores for a premium feature

Q39. Hands-on: Build a complete JSONResponsePipeline class with: execute(messages, schema), validate(data), clean(data), buildRetryMessages(error). Include at least 3 layers of validation.

4.5.e — Building Structured Profile Analysis (Q40–Q50)

Q40. In the profile analysis system prompt, why do we include a score range guide (0-20 = very low, 21-40 = below average, etc.)? What happens if we don't?

Q41. The system prompt says "Do not invent details not present in the profiles." Why is this instruction necessary? What specific hallucination risk does it address?

Q42. Why do we use temperature: 0 for the compatibility analysis? Name one scenario where you might intentionally use a higher temperature.

Q43. The cleanAnalysis() function rounds the score and trims arrays. Why not just reject the response if the score isn't already an integer? What's the trade-off between strict rejection and lenient cleaning?

Q44. Calculate the approximate cost of running 50,000 compatibility analyses per day using GPT-4o. Include the cost impact of a 10% retry rate.

Q45. Design exercise: Extend the compatibility analysis schema to include a personality_match object with summary, energy_level, and communication_style. Write the updated system prompt section and validation function.

Q46. Your compatibility analysis works great with OpenAI. Now you need to support Claude as a fallback. Write the Anthropic version using the prefill technique. What do you need to change?

Q47. Write a complete test suite for the analyzeCompatibility() function. Include tests for: normal case, low compatibility, minimal profiles, validation failure, and API error handling.

Q48. The product team wants to add an "Explain this score" feature. How would you extend the JSON schema to include a score_explanation field? What prompt instructions would you add?

Q49. A user reports that the same two profiles got a score of 72 yesterday and 78 today. Is this a bug? What are the possible causes, and how do you investigate?

Q50. Architecture exercise: Design a production system that runs compatibility analysis at scale: API gateway, queue, worker pool, caching (should you cache results?), monitoring, and cost alerts. Sketch the architecture and explain each component.

Cross-Cutting Questions (Q51–Q55)

Q51. Trace the full lifecycle of a JSON response from the LLM: API call → JSON mode → raw string → parse → validate → clean → use. At which step does each type of error get caught?

Q52. You're building a system that needs to work with both OpenAI and Anthropic APIs. Design an abstraction layer that provides a unified getStructuredResponse(prompt, schema) function.

Q53. Your team is debating: "Should we use JSON mode + prompt-based schema, or Structured Outputs, or function calling for all our extraction tasks?" Write a decision matrix with criteria: reliability, flexibility, token cost, implementation effort, and provider portability.

Q54. A junior developer submitted a PR that calls the OpenAI API with JSON mode but has no validation, no error handling, and directly accesses data.score without checking if it exists. Write a code review with specific action items.

Q55. You're given a budget of $100/day for LLM API costs. Design a system that maximizes the number of profile analyses while staying within budget. Consider: model choice, token optimization, caching, and batch processing.

Answer Hints

Q	Hint
Q1	Valid syntax YES, correct schema NO
Q2	OpenAI returns an error — the word "json" must appear in the prompt
Q7	Truncation means incomplete JSON — increase max_tokens and retry
Q10	JSON mode does not increase token cost, minor latency impact
Q15	null is most programmatic — `if (value !== null)` is cleaner than checking for "N/A"
Q23	Arguments is a string to allow streaming; parse with `JSON.parse(toolCall.function.arguments)`
Q26	"Function calling as structured output" pattern — schema enforced without real function execution
Q34	Loop: call → parse → validate → (valid? return) / (invalid? add error to messages → call again) → max retries → fail
Q37	The retry message must include WHAT was wrong; add the validation errors to the retry prompt
Q42	Temperature 0 for consistency; higher temperature if you want varied conversation starters each refresh
Q44	~$0.004/call × 50,000 = $200/day. With 10% retry: $200 × 1.1 = $220/day
Q49	Not necessarily a bug — GPU floating-point variance, model version updates, or non-zero temperature. Investigate: check logs for temp, model version, system fingerprint

← Back to 4.5 — Generating JSON Responses from LLMs (README)