Episode 4 — Generative AI Engineering / 4.5 — Generating JSON Responses from LLMs

4.5.a — JSON Mode

In one sentence: JSON mode tells the LLM to only output valid JSON — no markdown, no explanation, no "here's the JSON:" prefix — by setting response_format: { type: "json_object" }, which guarantees syntactically valid JSON but does not guarantee the structure matches your schema.

Navigation: ← 4.5 Overview · 4.5.b — Schema-Based Prompting →

1. The Problem: LLMs Love to Talk

By default, when you ask an LLM for JSON, you get something like this:

Sure! Here's the JSON you requested:

```json
{
  "name": "Alice",
  "age": 30
}

Hope that helps! Let me know if you need anything else.


That response contains valid JSON — buried inside markdown code fences and wrapped in conversational text. Your `JSON.parse()` call will fail because the response isn't **pure JSON**. You could try to extract the JSON with regex, but that's fragile and error-prone.

**JSON mode solves this.** When enabled, the model's output is guaranteed to be a valid JSON string — nothing before it, nothing after it, no markdown formatting.

---

## 2. How JSON Mode Works (OpenAI)

OpenAI introduced JSON mode via the `response_format` parameter:

```javascript
import OpenAI from 'openai';

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    {
      role: 'system',
      content: 'You are a helpful assistant. Respond in JSON format with a "name" and "age" field.'
    },
    {
      role: 'user',
      content: 'Tell me about Alice who is 30 years old.'
    }
  ],
});

const data = JSON.parse(response.choices[0].message.content);
console.log(data);
// { name: "Alice", age: 30 }

What `response_format: { type: "json_object" }` does

Constrains the model to only output valid JSON tokens — it cannot produce text that would break JSON.parse().
Eliminates wrapper text — no "Sure, here's the JSON:" preamble.
Guarantees valid syntax — balanced braces, proper quoting, correct comma placement.
Does NOT enforce structure — the model might return { "user": "Alice", "years": 30 } instead of { "name": "Alice", "age": 30 } unless you tell it the schema.

Critical requirement: You MUST mention JSON in your prompt

// This will ERROR or produce unexpected results
const bad = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    { role: 'user', content: 'Tell me about Paris.' }  // No mention of JSON!
  ],
});
// OpenAI will return an error:
// "When using JSON mode, you must include the word 'json' in the prompt"

Fix: Always instruct the model to respond in JSON somewhere in the system or user message.

// This works
const good = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    {
      role: 'system',
      content: 'You are a travel guide. Always respond in JSON format.'
    },
    { role: 'user', content: 'Tell me about Paris.' }
  ],
});

3. How Anthropic Handles JSON Output

Anthropic (Claude) does not have an identical response_format parameter. Instead, Claude offers several approaches:

Approach 1: Prompt-based JSON instruction

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a data extraction assistant. Always respond with ONLY valid JSON, no other text.',
  messages: [
    {
      role: 'user',
      content: 'Extract the name and age from: "Alice is 30 years old." Return JSON with "name" and "age" fields.'
    }
  ],
});

const data = JSON.parse(response.content[0].text);
console.log(data);
// { name: "Alice", age: 30 }

Approach 2: Prefilling the assistant response

Claude supports prefilling — you start the assistant's response to force JSON output:

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a data extraction assistant.',
  messages: [
    {
      role: 'user',
      content: 'Extract the name and age from: "Alice is 30 years old." Return as JSON.'
    },
    {
      role: 'assistant',
      content: '{'  // Prefill forces JSON output starting with {
    }
  ],
});

// Note: the response continues from where the prefill left off
const jsonString = '{' + response.content[0].text;
const data = JSON.parse(jsonString);
console.log(data);
// { name: "Alice", age: 30 }

How prefilling works: By placing an opening brace { in the assistant turn, you tell Claude "your response has already started with { — continue from there." Claude will then continue generating valid JSON because it's completing a JSON object. This is a powerful technique that's unique to Anthropic's API.

Approach 3: Tool use for structured output

Claude also supports tool use (function calling) which provides schema-enforced structured output. We'll cover this in 4.5.c.

4. JSON Mode vs Free-Form with Parsing

Before JSON mode existed, developers had to extract JSON from free-form responses:

The old way (fragile)

// Ask the model normally
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'system',
      content: 'Return your answer as a JSON object with "name" and "age" fields.'
    },
    { role: 'user', content: 'Info about Alice, age 30.' }
  ],
  // No response_format — free-form output
});

const raw = response.choices[0].message.content;
// raw might be: '```json\n{"name": "Alice", "age": 30}\n```'
// or: 'Here is the JSON:\n{"name": "Alice", "age": 30}'
// or: '{"name": "Alice", "age": 30}'

// Fragile extraction
function extractJSON(text) {
  // Try direct parse first
  try {
    return JSON.parse(text);
  } catch (e) {
    // Try to find JSON in code fence
    const match = text.match(/```(?:json)?\s*([\s\S]*?)```/);
    if (match) {
      return JSON.parse(match[1].trim());
    }
    // Try to find JSON object
    const objMatch = text.match(/\{[\s\S]*\}/);
    if (objMatch) {
      return JSON.parse(objMatch[0]);
    }
    throw new Error('Could not extract JSON from response');
  }
}

const data = extractJSON(raw);

The new way (JSON mode)

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    {
      role: 'system',
      content: 'Return a JSON object with "name" and "age" fields.'
    },
    { role: 'user', content: 'Info about Alice, age 30.' }
  ],
});

// Guaranteed to be valid JSON — no extraction needed
const data = JSON.parse(response.choices[0].message.content);

Comparison

Aspect	Free-form + Parsing	JSON Mode
Valid JSON guaranteed?	No — model might wrap in markdown, add commentary	Yes — always valid syntax
Extra parsing code?	Yes — regex extraction, multiple fallbacks	No — direct `JSON.parse()`
Failure rate	5-15% of responses need extraction	<0.1% syntax errors
Schema enforcement?	No	No (just syntax)
Works with all models?	Yes — any LLM	Only models that support it
Token efficiency	Worse — model wastes tokens on wrapper text	Better — pure JSON only

5. JSON Mode vs Structured Outputs

OpenAI also offers Structured Outputs — a stricter version of JSON mode that enforces a specific JSON Schema:

// JSON Mode — guarantees valid JSON, but not specific fields
const jsonMode = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    { role: 'system', content: 'Respond in JSON with name and age.' },
    { role: 'user', content: 'Alice is 30.' }
  ],
});
// Could return { "name": "Alice", "age": 30 }
// Could also return { "person": "Alice", "years_old": 30 }  -- valid JSON, wrong schema!

// Structured Outputs — guarantees specific JSON Schema
const structured = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'person_info',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string', description: 'The person\'s name' },
          age: { type: 'integer', description: 'The person\'s age' }
        },
        required: ['name', 'age'],
        additionalProperties: false
      }
    }
  },
  messages: [
    { role: 'system', content: 'Extract person information.' },
    { role: 'user', content: 'Alice is 30.' }
  ],
});
// GUARANTEED to return exactly { "name": "...", "age": ... }
// with correct types and no extra fields

When to use which

Feature	JSON Mode	Structured Outputs
Valid JSON	Yes	Yes
Schema enforcement	No — model chooses keys/structure	Yes — exact schema match
Type enforcement	No — age could be `"30"` or `30`	Yes — `integer` means integer
Required fields	No guarantee	Guaranteed
No extra fields	No guarantee	Guaranteed (with `additionalProperties: false`)
Flexibility	High — model decides structure	Low — locked to schema
Use case	Exploratory, simple tasks	Production pipelines, strict contracts
Prompt must mention JSON?	Yes	No (schema is sufficient)

Decision guide

Do you need EXACT field names and types?
  ├── YES → Use Structured Outputs (json_schema)
  └── NO
      ├── Do you need valid JSON syntax?
      │   ├── YES → Use JSON Mode (json_object)
      │   └── NO → Use free-form with parsing
      └── Are you using function/tool calling?
          └── YES → Use tools (covered in 4.5.c)

6. Practical Example: Profile Data Extraction

Let's build a practical example that extracts structured user profile data using JSON mode:

import OpenAI from 'openai';

const openai = new OpenAI();

async function extractProfile(bioText) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    temperature: 0,  // Deterministic for consistent extraction
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: `You are a profile data extractor. Given a user's bio text, extract their information into JSON format.

Return a JSON object with these fields:
- "name" (string): the person's name
- "age" (number): their age
- "interests" (array of strings): their hobbies and interests
- "location" (string or null): where they live, null if not mentioned
- "occupation" (string or null): their job, null if not mentioned`
      },
      {
        role: 'user',
        content: bioText
      }
    ],
  });

  return JSON.parse(response.choices[0].message.content);
}

// Usage
const profile = await extractProfile(
  "Hi! I'm Jordan, 28, living in Austin. I'm a software developer who " +
  "loves hiking, cooking, and playing guitar on weekends."
);

console.log(profile);
// {
//   name: "Jordan",
//   age: 28,
//   interests: ["hiking", "cooking", "playing guitar"],
//   location: "Austin",
//   occupation: "software developer"
// }

Same example with Anthropic (Claude)

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function extractProfileClaude(bioText) {
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    temperature: 0,
    system: `You are a profile data extractor. Given a user's bio text, extract their information.

Return ONLY a valid JSON object with these fields:
- "name" (string): the person's name
- "age" (number): their age
- "interests" (array of strings): their hobbies and interests
- "location" (string or null): where they live, null if not mentioned
- "occupation" (string or null): their job, null if not mentioned

Do not include any text outside the JSON object.`,
    messages: [
      {
        role: 'user',
        content: bioText
      }
    ],
  });

  return JSON.parse(response.content[0].text);
}

7. Common Pitfalls

Pitfall 1: Forgetting to mention JSON in the prompt

// WRONG — will error with OpenAI
await openai.chat.completions.create({
  response_format: { type: 'json_object' },
  messages: [{ role: 'user', content: 'Tell me about dogs.' }],
  // ...
});
// Error: "you must include the word 'json' in the prompt"

Pitfall 2: Assuming JSON mode enforces your schema

// JSON mode guarantees valid JSON, but NOT the right keys
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    {
      role: 'system',
      content: 'Return JSON with "first_name" and "last_name".'
    },
    { role: 'user', content: 'Alice Smith' }
  ],
});

const data = JSON.parse(response.choices[0].message.content);
// MIGHT return: { "first_name": "Alice", "last_name": "Smith" }
// MIGHT return: { "name": "Alice Smith" }   -- valid JSON, wrong keys!
// MIGHT return: { "firstName": "Alice", "lastName": "Smith" }  -- camelCase instead

// Always validate the structure!
if (!data.first_name || !data.last_name) {
  throw new Error('Response missing required fields');
}

Pitfall 3: Not handling the finish_reason

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  max_tokens: 50,  // Too small for the JSON!
  messages: [
    {
      role: 'system',
      content: 'Return JSON with a detailed profile analysis.'
    },
    { role: 'user', content: 'Analyze this profile...' }
  ],
});

// Check finish_reason before parsing!
if (response.choices[0].finish_reason === 'length') {
  // Output was truncated — JSON is likely incomplete/invalid
  console.error('Response truncated — increase max_tokens');
} else {
  const data = JSON.parse(response.choices[0].message.content);
}

Pitfall 4: JSON mode with streaming

When streaming with JSON mode, the JSON is emitted token-by-token. You cannot parse until the stream is complete:

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    { role: 'system', content: 'Return a JSON object with user info.' },
    { role: 'user', content: 'Alice, 30' }
  ],
  stream: true,
});

let fullContent = '';
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content || '';
  fullContent += delta;
  // DON'T try to JSON.parse(fullContent) here — it's incomplete!
}

// Parse only after stream is complete
const data = JSON.parse(fullContent);

8. JSON Mode Across Providers

Provider	JSON Mode Support	How to Enable
OpenAI	Native	`response_format: { type: "json_object" }`
Anthropic	Via prompting + prefill	System prompt + assistant prefill with `{`
Google (Gemini)	Native	`response_mime_type: "application/json"`
Mistral	Native	`response_format: { type: "json_object" }`
Ollama / Local	Varies	`format: "json"` parameter
Azure OpenAI	Native	Same as OpenAI

9. Key Takeaways

JSON mode guarantees syntactically valid JSON output — no wrapper text, no markdown fences, no commentary.
You MUST mention JSON in the prompt when using OpenAI's JSON mode — the API enforces this.
JSON mode guarantees valid syntax but not correct structure — your schema might be ignored.
Structured Outputs (json_schema) go further and enforce an exact JSON Schema — use them for strict production contracts.
Always check finish_reason — a truncated response ("length") produces invalid JSON.
Anthropic uses prompt engineering and assistant prefilling instead of a dedicated JSON mode parameter.
Always validate after parsing — JSON mode is necessary but not sufficient for production use.

Explain-It Challenge

A junior developer says "I enabled JSON mode so my output schema is guaranteed now." What's wrong with this assumption, and what would you add?
Why does OpenAI require you to mention "JSON" in the prompt when JSON mode is enabled? What would happen if this requirement didn't exist?
Explain the trade-off between JSON mode (flexible structure) and Structured Outputs (strict schema). When would you choose each?