Episode 4 — Generative AI Engineering / 4.5 — Generating JSON Responses from LLMs

Interview Questions: Generating JSON Responses from LLMs

Model answers for JSON mode, schema-based prompting, function calling, validation, and building structured AI features.

How to use this material (instructions)

Read lessons in order — README.md, then 4.5.a → 4.5.e.
Practice out loud — definition → example → pitfall.
Pair with exercises — 4.5-Exercise-Questions.md.
Quick review — 4.5-Quick-Revision.md.

Beginner (Q1–Q4)

Q1. How do you get an LLM to return valid JSON instead of free-form text?

Why interviewers ask: Tests if you know the fundamental tools for structured output — JSON mode, prompting strategies, and the distinction between syntax validity and schema correctness.

Model answer:

There are three main approaches, in increasing order of strictness:

1. JSON Mode (response_format: { type: "json_object" } in OpenAI). This constrains the model to output syntactically valid JSON — no markdown wrapping, no conversational text, just a JSON string that JSON.parse() will accept. You must mention "JSON" in your prompt for OpenAI to accept the request.

2. Schema-based prompting — including the exact JSON structure, TypeScript type definitions, or concrete examples in your prompt. This guides the model toward the correct keys, types, and nesting. JSON mode ensures the syntax; schema prompting ensures the structure.

3. Structured Outputs (response_format: { type: "json_schema" } with a full JSON Schema definition). This enforces both valid syntax AND exact schema compliance at the API level — guaranteed field names, types, and required fields.

For Anthropic/Claude, you use prompt engineering ("respond with ONLY valid JSON") and assistant prefilling (starting the assistant response with {) since there's no dedicated JSON mode parameter.

In production, always combine with validation — parse the JSON, check field types and ranges, and retry if validation fails.

Q2. What is the difference between JSON mode and function calling?

Why interviewers ask: Both produce structured output, but they serve fundamentally different purposes. This tests architectural understanding.

Model answer:

JSON mode makes the model return a generic JSON object in message.content. You define the structure via prompting. The model simply returns data — no action is implied.

Function calling (tool calling) makes the model return a specific function name and arguments in message.tool_calls. The model is saying "call this function with these parameters." Your code then executes the function and optionally sends the result back for a final response.

Key differences:

Response location: JSON mode → message.content (string). Function calling → message.tool_calls (array of structured objects).
Schema enforcement: JSON mode doesn't enforce a schema (unless using Structured Outputs). Function calling enforces the schema defined in the tool's parameters.
Intent: JSON mode = "here's data." Function calling = "please execute this action."
Follow-up: JSON mode is one-shot. Function calling often involves a round-trip: model requests a tool call → you execute → you send results back → model generates final response.

Use JSON mode for data extraction and analysis. Use function calling when the model needs to trigger actions (API calls, database queries, calculations) or when you need strict schema enforcement without Structured Outputs.

Q3. Why must you validate AI-generated JSON even with JSON mode enabled?

Why interviewers ask: This is a critical production engineering concern. Candidates who skip validation build fragile systems.

Model answer:

JSON mode guarantees syntactically valid JSON but not structurally correct JSON. Five things can still go wrong:

Wrong field names — the model returns "user_name" instead of your expected "username".
Wrong types — "age": "thirty" instead of "age": 30.
Missing fields — the model omits a required field entirely.
Extra fields — the model adds fields you didn't ask for (chain-of-thought leaking, extra analysis).
Out-of-range values — "score": 150 when your schema says 0-100.

Additionally, JSON.parse() itself can fail if finish_reason is "length" (response truncated before the JSON is complete).

Production approach: Parse → Validate → Clean → Use. Wrap JSON.parse() in try/catch, validate against your expected schema (required fields, types, ranges), apply defaults for missing fields, strip extra fields, and retry with error feedback if validation fails. Match validation rigor to risk level — UI suggestions can tolerate lenient validation, but financial data needs strict checks with human fallback.

Q4. Explain schema-based prompting. Why isn't JSON mode enough?

Why interviewers ask: Tests whether you understand that syntax and structure are different problems requiring different solutions.

Model answer:

JSON mode ensures the output is parseable JSON. Schema-based prompting ensures the JSON has the correct keys, nesting, types, and content. Without it, asking "return user info as JSON" might produce {"name": "Alice"}, {"user": {"first": "Alice"}}, or {"person_name": "Alice"} — all valid JSON, all different structures.

Four effective strategies:

Show the exact structure — include a template with placeholder types: { "first_name": "string", "age": number }.
Provide concrete examples (few-shot) — show a complete input/output pair so the model pattern-matches.
TypeScript type definitions — type User = { name: string; age: number; interests: string[] }. LLMs understand TypeScript natively from training data.
Field descriptions — annotate each field: "age" (integer): The person's age as a number, not a string.

Best practice: put the schema in the system message (persistent, defines the contract) and the data in the user message (variable, one per request). Combine strategies for complex schemas: TypeScript types for structure + one example for demonstration + explicit rules for edge cases.

Intermediate (Q5–Q8)

Q5. Walk through how you'd build a validate-or-retry pipeline for structured LLM output.

Why interviewers ask: Tests practical engineering skills — handling non-deterministic systems gracefully instead of crashing on bad output.

Model answer:

The pipeline has five stages inside a retry loop:

for (attempt = 1; attempt <= maxRetries; attempt++) {
  1. API Call    → Send messages with JSON mode
  2. Parse       → JSON.parse() with try/catch, check finish_reason for truncation
  3. Validate    → Check required fields, types, ranges, array lengths
  4. Clean       → Round numbers, trim arrays, strip extra fields, apply defaults
  5. Return      → If valid, return cleaned data + metadata
  
  On failure:
  - Add the model's response AND the specific validation errors to the message history
  - Retry with augmented context: "Your JSON had these errors: [errors]. Please fix."
}
On all retries exhausted: return default/fallback response

Critical details: (1) Feed validation errors back to the model — this is more effective than a blind retry. (2) Set a max retry count (typically 3). (3) Don't retry on auth errors (401) or bad request errors (400) — those won't fix themselves. (4) Use exponential backoff for rate limits (429). (5) Always have a fallback response so the application doesn't crash.

The success rate for well-prompted JSON mode is ~95%+ on the first attempt, rising to ~99%+ with 3 retries.

Q6. How does function calling provide better schema enforcement than JSON mode?

Why interviewers ask: Tests depth of understanding about the structured output landscape and when to use each tool.

Model answer:

Function calling enforces schema at three levels that JSON mode alone does not:

Parameter-level type enforcement — when you define age: { type: "integer" } in the function schema, the model is trained to generate an integer. With JSON mode, the model might return "age": "30" (string) even if your prompt says "number."
Required field enforcement — the required array in the function schema tells the model which fields must be present. JSON mode relies on prompt instructions, which can be ignored.
Enum constraint — enum: ["celsius", "fahrenheit"] restricts valid values. JSON mode would need prompt instructions like "must be one of: celsius, fahrenheit" which the model might not follow.

Additionally, function calling provides a named action (the function name), so the model understands the semantic purpose of the output. This improves accuracy compared to generic "return JSON" instructions.

A powerful pattern: use tool_choice: { type: "function", function: { name: "my_function" } } to force the model to return data through a tool schema even when you have no real function to execute. This gives you schema enforcement similar to Structured Outputs, available across more models.

Q7. How do you handle getting structured JSON from both OpenAI and Anthropic APIs?

Why interviewers ask: Tests practical multi-provider experience — real production systems often need provider flexibility.

Model answer:

The two providers have fundamentally different approaches:

OpenAI: Native JSON mode (response_format: { type: "json_object" }), Structured Outputs (json_schema), and function calling with typed schemas. Response is in message.content (JSON mode) or message.tool_calls (function calling).

Anthropic (Claude): No native JSON mode parameter. Instead: (1) Prompt engineering — instruct "respond with ONLY valid JSON" in the system message. (2) Assistant prefilling — put { in the assistant role to force JSON continuation. (3) Tool use — Claude's version of function calling with input_schema.

To support both, build an abstraction layer:

async function getStructuredJSON(prompt, schema, provider) {
  if (provider === 'openai') {
    // Use response_format: { type: 'json_object' }
    // Parse from message.content
  } else if (provider === 'anthropic') {
    // Use system prompt + assistant prefill
    // Prepend '{' to response, then parse
  }
  // Same validation and cleaning logic for both
}

Key point: validation is provider-independent. The same validate() and clean() functions work regardless of which API generated the JSON. This is where you get the most reuse.

Q8. Design the system prompt for a feature that generates structured profile analysis JSON.

Why interviewers ask: Tests prompt engineering for structured output — a concrete, practical skill.

Model answer:

A production system prompt needs four sections:

1. Role and context: "You are a dating app compatibility analyzer. Given two user profiles, analyze their compatibility."

2. Exact output schema — either TypeScript types or a JSON template with placeholder types:

{ "compatibility_score": integer 0-100, "strengths": [string, ...], ... }

3. Field-level rules — what each field should contain, valid ranges, minimum/maximum counts:

"compatibility_score: integer 0-100. 0-20 very low, 81-100 excellent."
"strengths: 2-5 specific factors referencing actual profile details."
"suggested_openers: 2-3 personalized messages the user could send."

4. Constraints — rules the model must follow:

"Return ONLY the JSON object"
"Do not invent details not present in the profiles"
"All array items must be strings"
"Be specific, not generic"

The score range guide (0-20, 21-40, etc.) is critical — without it, models cluster scores around 50-70 regardless of actual compatibility. Specificity instructions prevent useless generic output like "Both enjoy activities."

Keep the schema in the system message (fixed, reusable) and the profiles in the user message (variable per request).

Advanced (Q9–Q11)

Q9. You need to process 100,000 profile analyses per day reliably. How do you architect this system?

Why interviewers ask: Tests system design thinking for AI-powered features at scale — cost, reliability, latency, and monitoring.

Model answer:

Architecture:

API Gateway — receives analysis requests, rate-limits, authenticates users.
Queue (SQS, Bull, etc.) — buffers requests to handle burst traffic without overloading the LLM API.
Worker Pool — 10-20 workers pulling from the queue, each making OpenAI API calls. Configured with the retry pipeline (parse → validate → retry up to 3x).
Result Cache — hash the two profile IDs (sorted) to create a cache key. Cache results for 24 hours. If profiles haven't changed, serve cached result. This alone can reduce API calls by 30-50% if users revisit the same profiles.
Fallback Provider — if OpenAI is down or rate-limited, fall back to Anthropic with the same validation pipeline.
Monitoring — track: success rate, retry rate, average latency, tokens per call, validation failure types, cost per hour, cache hit rate.
Cost Alerts — if hourly spend exceeds threshold (e.g., $15/hour for $360/day budget), alert and optionally throttle.

Cost math: 100K calls × ~$0.004 = $400/day. With caching (40% hit rate): 60K actual API calls = $240/day. With 10% retries: $264/day.

Optimization levers: (1) Use a cheaper model (GPT-4o-mini at ~$0.0006/call = $60/day). (2) Batch similar requests. (3) Aggressive caching. (4) Shorter system prompt. (5) Reduce max_tokens.

Q10. How do you evaluate and improve the quality of structured LLM output over time?

Why interviewers ask: Production AI features need ongoing quality management, not just initial deployment.

Model answer:

1. Build an evaluation dataset. Create 200+ profile pairs with human-annotated expected outputs: expected score range, required strengths, unacceptable weaknesses. Include edge cases: identical profiles, completely incompatible profiles, minimal profiles.

2. Automated evaluation. Run the model on the eval set weekly. Score on: schema compliance (100% required), score accuracy (within 15 points of human annotation), strength relevance (reference actual profile details), opener quality (personalized, not generic).

3. Production monitoring. Sample 1-5% of production outputs for human review. Track: retry rate (should be <10%), validation failure rate by field, score distribution (watch for clustering), user engagement signals (do users who see the analysis actually send openers?).

4. Prompt iteration. When quality drops, analyze failure patterns. Common fixes: more explicit field rules, additional examples for edge cases, rephrasing ambiguous instructions. Always A/B test prompt changes against the eval set before deploying.

5. Model migration testing. When updating models (e.g., GPT-4o → GPT-4o-mini for cost), run the full eval suite. Some models follow schema instructions better; others produce more creative content but less reliable structure.

Q11. A teammate proposes using Structured Outputs for everything instead of JSON mode + validation. Evaluate this approach.

Why interviewers ask: Tests nuanced understanding of trade-offs — there's no universally best approach.

Model answer:

Pros of Structured Outputs everywhere:

Guaranteed schema compliance — field names, types, required fields, and additionalProperties: false are enforced at the API level.
Less validation code — you still need semantic validation, but type/structure checks are unnecessary.
No retry needed for structural issues — retries only for semantic quality.

Cons:

Provider lock-in — Structured Outputs are an OpenAI-specific feature. If you need Anthropic fallback, you need the full validation pipeline anyway.
Schema rigidity — once defined, the schema can't flex. If the model wants to add a useful field, it can't. JSON mode with validation can be more forgiving.
Not all models support it — older models, local models (Ollama), and other providers may not support json_schema.
Complex schemas — deeply nested schemas with conditional fields are hard to express in JSON Schema and may confuse the model.
Output quality — the model may fill required fields with low-quality content just to satisfy the schema ("N/A" strings, empty arrays where it should have said something).

My recommendation: Use Structured Outputs for simple, stable schemas on OpenAI. Use JSON mode + validation for complex schemas, multi-provider setups, and evolving schemas. Always validate semantics regardless — "is this output useful?" is never guaranteed by any structural enforcement.

Quick-fire

#	Question	One-line answer
1	What does JSON mode guarantee?	Syntactically valid JSON — not schema correctness
2	Must you mention "JSON" in the prompt with OpenAI JSON mode?	Yes — the API returns an error if you don't
3	Where are function call arguments in the response?	`message.tool_calls[0].function.arguments` (JSON string)
4	What's `finish_reason: "tool_calls"`?	Model wants to call a function instead of returning text
5	Should you validate even with Structured Outputs?	Yes — validate semantics (ranges, content quality) even when structure is guaranteed
6	What's assistant prefilling?	Starting the assistant response with `{` to force Claude to output JSON
7	How many retries for validation failure?	Typically 3 — with error feedback in each retry prompt
8	JSON mode vs Structured Outputs?	JSON mode = valid syntax. Structured Outputs = valid syntax + exact schema
9	Temperature for structured extraction?	0 — deterministic, consistent output
10	How to handle extra fields from the model?	Strip them — only keep fields in your expected schema
11	Approximate cost per compatibility analysis (GPT-4o)?	~$0.004 (less than half a cent)

← Back to 4.5 — Generating JSON Responses from LLMs (README)