Episode 4 — Generative AI Engineering / 4.5 — Generating JSON Responses from LLMs
4.5 — Generating JSON Responses from LLMs: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps — reopen
README.md→4.5.a…4.5.e. - Practice —
4.5-Exercise-Questions.md. - Polish answers —
4.5-Interview-Questions.md.
Core vocabulary
| Term | One-liner |
|---|---|
| JSON mode | response_format: { type: "json_object" } — guarantees valid JSON syntax |
| Structured Outputs | response_format: { type: "json_schema" } — guarantees valid JSON + exact schema |
| Schema-based prompting | Embedding JSON structure/types/examples in the prompt to guide model output |
| Function calling | Model returns function name + typed arguments instead of text (a.k.a. tool calling) |
| Tool use | Anthropic's term for function calling |
| tool_calls | Response field containing function name + arguments when model wants to call a tool |
| tool_choice | Controls whether model must/may/cannot call tools (auto, none, required, specific) |
| Assistant prefilling | Starting Claude's response with { to force JSON output |
| Validate-or-retry | Parse → validate → (pass or retry with error feedback) → max retries → fallback |
| Type coercion | Safely converting "30" → 30, "true" → true when the model returns wrong types |
JSON mode — the basics
// OpenAI: JSON mode
const response = await openai.chat.completions.create({
model: 'gpt-4o',
response_format: { type: 'json_object' }, // Guarantees valid JSON
temperature: 0,
messages: [
{ role: 'system', content: 'Return JSON with "name" and "age".' }, // MUST mention JSON
{ role: 'user', content: 'Alice is 30.' }
],
});
const data = JSON.parse(response.choices[0].message.content);
JSON mode guarantees: Valid syntax (parseable by JSON.parse())
JSON mode does NOT: Enforce field names, types, or structure
Structured Outputs: Guarantees syntax + exact schema match
Must mention "JSON": Yes (OpenAI requirement when using json_object)
Check finish_reason: "length" = truncated = invalid JSON
Anthropic / Claude approach
// Prompt-based: instruct JSON-only output
system: 'Respond with ONLY valid JSON, no other text.'
// Prefill technique: force JSON start
messages: [
{ role: 'user', content: 'Extract data...' },
{ role: 'assistant', content: '{' } // Prefill
]
// Then prepend '{' to response: '{' + response.content[0].text
Schema prompting strategies
1. EXACT STRUCTURE: { "name": "string", "age": number }
2. CONCRETE EXAMPLE: Input: "Alice, 30" → Output: { "name": "Alice", "age": 30 }
3. TYPESCRIPT TYPES: type User = { name: string; age: number; interests: string[] }
4. FIELD DESCRIPTIONS: "age" (integer): The person's age as a number, not a string
5. COMBINED: TypeScript type + example + rules = maximum reliability
BEST PRACTICE:
System message = schema + rules (fixed, reusable)
User message = data to process (variable per request)
Function calling vs JSON mode
JSON MODE: FUNCTION CALLING:
Returns: message.content Returns: message.tool_calls
Purpose: return data Purpose: trigger an action
Schema: prompt-guided Schema: defined in tool parameters
Types: not enforced Types: enforced by schema
Follow-up: none Follow-up: send function result back
// Function calling flow
tools: [{ type: 'function', function: { name, description, parameters } }]
tool_choice: 'auto' | 'none' | 'required' | { type: 'function', function: { name } }
// Response
message.tool_calls[0].function.name // Function name (string)
message.tool_calls[0].function.arguments // JSON string — must JSON.parse()
message.tool_calls[0].id // ID for sending result back
Validation layers
Layer 1: SAFE PARSE
try { JSON.parse(text) } catch { handle error }
Check finish_reason !== "length" (truncation)
Layer 2: REQUIRED FIELDS
Check all expected keys exist in parsed object
Layer 3: TYPE CHECK
typeof data.age === 'number'
Array.isArray(data.items)
typeof data.name === 'string'
Layer 4: RANGE / CONSTRAINT
0 <= score <= 100
array.length >= 2 && array.length <= 5
value in ['high', 'medium', 'low']
Layer 5: CLEAN
Round floats to integers where needed
Trim arrays to max length
Strip extra fields
Apply defaults for missing optional fields
Validate-or-retry pattern
for (attempt = 1..3):
response = callAPI(messages)
parsed = JSON.parse(response)
valid = validate(parsed)
if valid: return clean(parsed)
else: messages.push(errorFeedback) ← FEED ERRORS BACK
if all fail: return defaultResponse
KEY: Don't just retry blindly — tell the model WHAT was wrong
"Your JSON had errors: score must be 0-100, got 150. Please fix."
Type coercion helpers
// Safe coercion (when rejection is too strict)
"30" → 30 (string to number)
"true" → true (string to boolean)
85.7 → 86 (float to integer via Math.round)
150 → 100 (clamp to valid range)
"HIGH" → "high" (normalize enum)
The compatibility analysis pattern
System prompt:
1. Role: "You are a compatibility analyzer"
2. Schema: { compatibility_score, strengths, weaknesses, suggested_openers }
3. Rules: score 0-100, strengths 2-5 items, be specific not generic
4. Limits: "Return ONLY JSON, do not invent details"
API call:
model: 'gpt-4o'
temperature: 0
max_tokens: 1024
response_format: { type: 'json_object' }
Validation:
score: integer, 0-100
strengths: array of strings, 2-5 items
weaknesses: array of strings, 0-4 items
suggested_openers: array of strings, 2-3 items
Cost: ~$0.004 per analysis (~750-950 tokens)
Provider comparison
| Feature | OpenAI | Anthropic | Google Gemini |
|---|---|---|---|
| JSON mode | json_object | Prompt + prefill | response_mime_type |
| Structured Outputs | json_schema | Not available | Available |
| Function calling | tools + tool_choice | tools + tool_choice | tools |
| Arguments format | JSON string (parse it) | Parsed object | JSON string |
| Force specific tool | tool_choice: { function } | tool_choice: { tool } | Supported |
Decision guide
Need to EXECUTE actions? → Function calling
Need STRICT schema? → Structured Outputs (or forced tool_choice)
Need valid JSON, flexible? → JSON mode + schema prompt
Need multi-provider support? → JSON mode + prompt + validation (works everywhere)
Simple prototype? → JSON mode (lowest setup)
Validation by risk level
LOW (UI suggestions): Parse + basic type check + default on failure
MED (database writes): Parse + full schema + defaults + 1 retry
HIGH (financial/medical): Parse + strict schema + range check + 3 retries + human fallback
Cost estimation
GPT-4o:
Input: $2.50 / 1M tokens
Output: $10.00 / 1M tokens
Typical structured extraction:
System prompt: ~300-500 tokens (fixed)
User message: ~100-300 tokens (variable)
Model output: ~200-400 tokens
Total: ~600-1200 tokens
Cost: ~$0.003-0.006 per call
10K calls/day: ~$30-60/day
100K calls/day: ~$300-600/day (before caching)
Common gotchas
| Gotcha | Fix |
|---|---|
| Forgot "JSON" in prompt (OpenAI) | API returns error — mention JSON in system or user message |
finish_reason: "length" | Increase max_tokens — truncated JSON is invalid |
| Model returns wrong field names | Use TypeScript types or exact template in prompt |
arguments is a string not object | JSON.parse(toolCall.function.arguments) |
| Retry loop doesn't improve | Feed specific validation errors back to the model |
| Score clusters around 50-70 | Add score range guide in prompt (0-20, 21-40, etc.) |
| Claude wraps JSON in text | Use prefill technique: { role: 'assistant', content: '{' } |
| Extra fields in response | Strip with allowlist of expected fields |
null vs missing field | Instruct model: "use null for missing, never omit fields" |
End of 4.5 quick revision.