Episode 4 — Generative AI Engineering / 4.10 — Error Handling in AI Applications
4.10.a — Handling Invalid JSON
In one sentence: LLMs frequently return malformed JSON — wrapped in extra text, sprinkled with trailing commas, using single quotes, or missing closing brackets — so every production AI system needs a multi-layered JSON parsing strategy that can extract, clean, and validate JSON from any response the model gives you.
Navigation: ← 4.10 Overview · 4.10.b — Partial Responses and Timeouts →
1. Why LLMs Return Invalid JSON
When you ask an LLM to "respond in JSON," you might expect clean, parseable output. In reality, you'll get invalid JSON surprisingly often, even with clear instructions. Understanding why is the first step to handling it.
The fundamental problem
LLMs generate text token by token. They don't have a JSON parser running internally — they're predicting the next token based on patterns. This means:
- The model has no concept of "valid JSON" — it's just producing text that looks like JSON based on training data
- The model may add conversational preamble or explanation around the JSON
- The model may mix JSON conventions with JavaScript conventions (single quotes, trailing commas)
- The model may lose track of bracket nesting in complex structures
What you asked for:
"Return the user data as JSON"
What you hoped for:
{"name": "Alice", "age": 30, "email": "alice@example.com"}
What you might actually get (any of these):
─── Response 1: Wrapped in explanation ───
Sure! Here's the user data in JSON format:
```json
{"name": "Alice", "age": 30, "email": "alice@example.com"}
Let me know if you need anything else!
─── Response 2: JavaScript-style, not JSON ─── {name: 'Alice', age: 30, email: 'alice@example.com'}
─── Response 3: Trailing comma ─── {"name": "Alice", "age": 30, "email": "alice@example.com",}
─── Response 4: Truncated (hit token limit) ─── {"name": "Alice", "age": 30, "ema
─── Response 5: Mixed content ─── The user data is: {"name": "Alice", "age": 30} and that's the result.
---
## 2. The JSON.parse() Error
The most basic error you'll encounter is a `JSON.parse()` failure. Let's understand what happens and why.
```javascript
// This is what most developers write first
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Return user data as JSON' }]
});
const content = response.choices[0].message.content;
// DANGER: This will throw if the content isn't perfect JSON
const data = JSON.parse(content);
What JSON.parse() rejects
// All of these throw SyntaxError
JSON.parse('{"name": "Alice",}');
// SyntaxError: Unexpected token } — trailing comma
JSON.parse("{'name': 'Alice'}");
// SyntaxError: Unexpected token ' — single quotes
JSON.parse('{name: "Alice"}');
// SyntaxError: Unexpected token n — unquoted keys
JSON.parse('Here is the JSON: {"name": "Alice"}');
// SyntaxError: Unexpected token H — text before JSON
JSON.parse('{"name": "Alice"} Hope this helps!');
// This actually parses in some engines, but behavior is not guaranteed
JSON.parse('{"name": "Alice", "bio": "She said "hello""}');
// SyntaxError — unescaped quotes inside strings
JSON.parse('{"name": "Alice", // a comment\n"age": 30}');
// SyntaxError — JSON doesn't support comments
JSON.parse('undefined');
// SyntaxError
JSON.parse('');
// SyntaxError: Unexpected end of JSON input
3. Layer 1: Basic Safe Parsing
The first layer of defense is wrapping JSON.parse() in a try/catch. Never call it without protection.
/**
* Safely parse a JSON string. Returns the parsed value or null on failure.
*/
function safeJsonParse(text) {
try {
return JSON.parse(text);
} catch (error) {
console.warn('JSON parse failed:', error.message);
return null;
}
}
// Usage
const data = safeJsonParse(response.choices[0].message.content);
if (data === null) {
// Parsing failed — move to recovery strategies
}
This is necessary but not sufficient. Most LLM responses that fail JSON.parse() have recoverable JSON inside them. You need extraction strategies.
4. Layer 2: Extracting JSON from Mixed Text
The most common failure pattern is the LLM wrapping valid JSON in conversational text. You need to extract the JSON from the surrounding text.
Strategy 1: Extract from markdown code blocks
LLMs love wrapping JSON in markdown code fences. This is actually helpful — it gives us clear delimiters.
/**
* Extract JSON from markdown code blocks.
* Handles ```json ... ``` and ``` ... ``` blocks.
*/
function extractJsonFromCodeBlock(text) {
// Match ```json ... ``` or ``` ... ```
const codeBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/;
const match = text.match(codeBlockRegex);
if (match) {
try {
return JSON.parse(match[1].trim());
} catch (e) {
// Code block content isn't valid JSON either
return null;
}
}
return null;
}
// Test
const response1 = `Here's the data:
\`\`\`json
{"name": "Alice", "age": 30}
\`\`\`
Let me know if you need more!`;
console.log(extractJsonFromCodeBlock(response1));
// { name: 'Alice', age: 30 }
Strategy 2: Find the first { and last }
If there's no code block, look for the outermost curly braces.
/**
* Extract JSON object by finding matching braces.
* Handles text before and after the JSON.
*/
function extractJsonObject(text) {
const firstBrace = text.indexOf('{');
const lastBrace = text.lastIndexOf('}');
if (firstBrace === -1 || lastBrace === -1 || lastBrace <= firstBrace) {
return null;
}
const candidate = text.substring(firstBrace, lastBrace + 1);
try {
return JSON.parse(candidate);
} catch (e) {
return null;
}
}
// Test
const response2 = 'The user data is: {"name": "Alice", "age": 30} and that\'s it.';
console.log(extractJsonObject(response2));
// { name: 'Alice', age: 30 }
Strategy 3: Find JSON arrays
Sometimes the model returns an array, not an object.
/**
* Extract JSON array or object from text.
*/
function extractJson(text) {
// Try direct parse first
try {
return JSON.parse(text.trim());
} catch (e) {
// Continue to extraction strategies
}
// Try code block extraction
const codeBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/;
const codeMatch = text.match(codeBlockRegex);
if (codeMatch) {
try {
return JSON.parse(codeMatch[1].trim());
} catch (e) {
// Fall through
}
}
// Try to find a JSON object
const firstBrace = text.indexOf('{');
const lastBrace = text.lastIndexOf('}');
if (firstBrace !== -1 && lastBrace > firstBrace) {
try {
return JSON.parse(text.substring(firstBrace, lastBrace + 1));
} catch (e) {
// Fall through
}
}
// Try to find a JSON array
const firstBracket = text.indexOf('[');
const lastBracket = text.lastIndexOf(']');
if (firstBracket !== -1 && lastBracket > firstBracket) {
try {
return JSON.parse(text.substring(firstBracket, lastBracket + 1));
} catch (e) {
// Fall through
}
}
return null;
}
5. Layer 3: Cleaning Common JSON Errors
When extraction finds JSON-like text but JSON.parse() still fails, you can attempt to fix common patterns.
Fixing trailing commas
/**
* Remove trailing commas before } or ] in JSON strings.
* Handles nested structures.
*/
function removeTrailingCommas(jsonString) {
// Remove trailing commas before closing braces/brackets
// This regex handles whitespace/newlines between the comma and the closer
return jsonString.replace(/,\s*([\]}])/g, '$1');
}
// Test
const badJson1 = '{"name": "Alice", "age": 30,}';
console.log(JSON.parse(removeTrailingCommas(badJson1)));
// { name: 'Alice', age: 30 }
const badJson2 = '{"items": ["a", "b", "c",], "count": 3,}';
console.log(JSON.parse(removeTrailingCommas(badJson2)));
// { items: ['a', 'b', 'c'], count: 3 }
Fixing single quotes
/**
* Replace single-quoted strings with double-quoted strings.
* WARNING: This is a heuristic — can break if values contain apostrophes.
*/
function fixSingleQuotes(jsonString) {
// Replace single quotes that are used as string delimiters
// This is a simplified approach — handles most LLM output patterns
return jsonString
.replace(/'/g, '"');
}
// Better approach: only replace quotes around keys and values
function fixSingleQuotesSmart(jsonString) {
// Match patterns like 'key': 'value' and replace outer quotes
// This handles most LLM output but is not perfect
let result = jsonString;
// Replace single-quoted keys: {'key': → {"key":
result = result.replace(/{\s*'/g, '{"');
result = result.replace(/,\s*'/g, ',"');
result = result.replace(/':/g, '":');
// Replace single-quoted string values: : 'value' → : "value"
result = result.replace(/:\s*'([^']*)'/g, ': "$1"');
return result;
}
// Test
const badJson3 = "{'name': 'Alice', 'age': 30}";
console.log(JSON.parse(fixSingleQuotes(badJson3)));
// { name: 'Alice', age: 30 }
Fixing unquoted keys
/**
* Add quotes around unquoted object keys.
* Handles keys that are valid JavaScript identifiers.
*/
function fixUnquotedKeys(jsonString) {
// Match unquoted keys: { key: or , key:
return jsonString.replace(
/([{,]\s*)([a-zA-Z_$][a-zA-Z0-9_$]*)\s*:/g,
'$1"$2":'
);
}
// Test
const badJson4 = '{name: "Alice", age: 30}';
console.log(JSON.parse(fixUnquotedKeys(badJson4)));
// { name: 'Alice', age: 30 }
Removing comments
/**
* Remove JavaScript-style comments from JSON strings.
*/
function removeComments(jsonString) {
// Remove single-line comments (// ...)
let result = jsonString.replace(/\/\/[^\n]*/g, '');
// Remove multi-line comments (/* ... */)
result = result.replace(/\/\*[\s\S]*?\*\//g, '');
return result;
}
// Test
const badJson5 = `{
"name": "Alice", // the user's name
"age": 30 /* years old */
}`;
console.log(JSON.parse(removeComments(badJson5)));
// { name: 'Alice', age: 30 }
6. The Complete Multi-Layer Parser
Putting all layers together into a single, production-ready function.
/**
* Robust JSON parser for LLM outputs.
* Attempts multiple strategies to extract and parse JSON.
*
* Strategies (in order):
* 1. Direct JSON.parse()
* 2. Extract from markdown code blocks
* 3. Extract by finding { } or [ ] boundaries
* 4. Clean common errors (trailing commas, single quotes, unquoted keys, comments)
* 5. Combine extraction + cleaning
*
* @param {string} text - Raw LLM response text
* @returns {{ success: boolean, data: any, strategy: string, error?: string }}
*/
function parseLlmJson(text) {
if (!text || typeof text !== 'string') {
return { success: false, data: null, strategy: 'none', error: 'Input is empty or not a string' };
}
const trimmed = text.trim();
// Strategy 1: Direct parse
try {
return { success: true, data: JSON.parse(trimmed), strategy: 'direct' };
} catch (e) {
// Continue
}
// Strategy 2: Extract from code blocks
const codeBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/;
const codeMatch = trimmed.match(codeBlockRegex);
if (codeMatch) {
try {
return { success: true, data: JSON.parse(codeMatch[1].trim()), strategy: 'code-block' };
} catch (e) {
// Try cleaning the code block content
const cleaned = cleanJson(codeMatch[1].trim());
try {
return { success: true, data: JSON.parse(cleaned), strategy: 'code-block-cleaned' };
} catch (e2) {
// Continue
}
}
}
// Strategy 3: Extract by brace/bracket boundaries
const extracted = extractByBoundary(trimmed);
if (extracted) {
try {
return { success: true, data: JSON.parse(extracted), strategy: 'boundary-extraction' };
} catch (e) {
// Try cleaning the extracted content
const cleaned = cleanJson(extracted);
try {
return { success: true, data: JSON.parse(cleaned), strategy: 'boundary-extraction-cleaned' };
} catch (e2) {
// Continue
}
}
}
// Strategy 4: Clean the full text and try again
const fullCleaned = cleanJson(trimmed);
try {
return { success: true, data: JSON.parse(fullCleaned), strategy: 'full-clean' };
} catch (e) {
// All strategies failed
}
return {
success: false,
data: null,
strategy: 'all-failed',
error: 'Could not extract valid JSON from response'
};
}
// Helper: extract by finding matching braces/brackets
function extractByBoundary(text) {
// Try object first
const firstBrace = text.indexOf('{');
const lastBrace = text.lastIndexOf('}');
if (firstBrace !== -1 && lastBrace > firstBrace) {
return text.substring(firstBrace, lastBrace + 1);
}
// Try array
const firstBracket = text.indexOf('[');
const lastBracket = text.lastIndexOf(']');
if (firstBracket !== -1 && lastBracket > firstBracket) {
return text.substring(firstBracket, lastBracket + 1);
}
return null;
}
// Helper: apply all cleaning transformations
function cleanJson(text) {
let result = text;
// Remove comments
result = result.replace(/\/\/[^\n]*/g, '');
result = result.replace(/\/\*[\s\S]*?\*\//g, '');
// Remove trailing commas
result = result.replace(/,\s*([\]}])/g, '$1');
// Fix single quotes (simple replacement)
// Only if the string has no double quotes (suggesting it's using single-quote style)
if (!result.includes('"') && result.includes("'")) {
result = result.replace(/'/g, '"');
}
// Fix unquoted keys
result = result.replace(/([{,]\s*)([a-zA-Z_$][a-zA-Z0-9_$]*)\s*:/g, '$1"$2":');
return result;
}
Using the complete parser
// Example usage in a real application
async function getStructuredResponse(prompt) {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: 'You are a data extraction assistant. Always respond with valid JSON only.'
},
{ role: 'user', content: prompt }
],
temperature: 0
});
const content = response.choices[0].message.content;
const result = parseLlmJson(content);
if (!result.success) {
console.error('Failed to parse LLM response:', {
raw: content,
error: result.error,
strategy: result.strategy
});
throw new Error('LLM returned unparseable response');
}
console.log(`Parsed successfully using strategy: ${result.strategy}`);
return result.data;
}
7. Common Failure Patterns Catalog
Here's a reference of every common failure pattern, what causes it, and how to handle it.
| # | Pattern | Example | Cause | Fix |
|---|---|---|---|---|
| 1 | Text before JSON | Here's the JSON: {"a": 1} | Model adds preamble | Boundary extraction |
| 2 | Text after JSON | {"a": 1} Hope this helps! | Model adds postamble | Boundary extraction |
| 3 | Markdown code block | ```json {"a": 1} ``` | Model wraps in fences | Code block extraction |
| 4 | Trailing comma | {"a": 1, "b": 2,} | JavaScript habit | Regex removal |
| 5 | Single quotes | {'a': 'value'} | Python/JS habit | Quote replacement |
| 6 | Unquoted keys | {a: 1, b: 2} | JavaScript habit | Regex quoting |
| 7 | Comments | {"a": 1 // comment} | Code habit | Comment removal |
| 8 | Unescaped quotes | {"text": "She said "hi""} | Model mistake | Very hard to auto-fix |
| 9 | Truncated JSON | {"name": "Al | Token limit reached | See 4.10.b |
| 10 | Multiple JSON objects | {"a": 1}\n{"b": 2} | Model returns list | Split and parse each |
| 11 | NaN / Infinity | {"score": NaN} | JavaScript-ism | Replace with null |
| 12 | Undefined | {"value": undefined} | JavaScript-ism | Replace with null |
Handling patterns 8, 10, 11, 12
// Pattern 10: Multiple JSON objects (one per line)
function parseJsonLines(text) {
const lines = text.trim().split('\n');
const results = [];
for (const line of lines) {
const result = parseLlmJson(line);
if (result.success) {
results.push(result.data);
}
}
return results;
}
// Patterns 11 & 12: Replace JavaScript-isms
function fixJavaScriptValues(jsonString) {
return jsonString
.replace(/:\s*NaN/g, ': null')
.replace(/:\s*Infinity/g, ': null')
.replace(/:\s*-Infinity/g, ': null')
.replace(/:\s*undefined/g, ': null');
}
8. Prevention: Reducing Invalid JSON at the Source
The best error handling is preventing errors in the first place. These prompt strategies reduce the chance of getting invalid JSON.
Use response_format (OpenAI)
// OpenAI's JSON mode guarantees valid JSON
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'Extract user info and respond in JSON.' },
{ role: 'user', content: 'Alice is 30 years old and works at Acme Corp.' }
],
response_format: { type: 'json_object' }
});
// This is guaranteed to be valid JSON (or the API returns an error)
const data = JSON.parse(response.choices[0].message.content);
Use structured outputs (OpenAI)
// Even stronger: define the exact schema
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'Extract user info.' },
{ role: 'user', content: 'Alice is 30, works at Acme.' }
],
response_format: {
type: 'json_schema',
json_schema: {
name: 'user_info',
schema: {
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'number' },
company: { type: 'string' }
},
required: ['name', 'age', 'company'],
additionalProperties: false
},
strict: true
}
}
});
Strong prompt instructions
// If you can't use response_format, make instructions explicit
const systemPrompt = `You are a JSON extraction API.
CRITICAL RULES:
1. Respond with ONLY a valid JSON object — no text before or after
2. Use double quotes for all strings and keys
3. No trailing commas
4. No comments
5. No markdown formatting or code blocks
Response format:
{"name": "string", "age": number, "company": "string"}`;
Why you still need error handling despite prevention
Even with response_format: { type: 'json_object' }:
- The response can still be truncated if it hits the token limit
- The schema can be wrong (correct JSON, wrong structure)
- The API call can fail entirely (timeout, rate limit, server error)
Prevention reduces error frequency; it does not eliminate the need for error handling.
9. Validation After Parsing: Zod Integration
Parsing JSON is only half the battle. The JSON might be valid but contain the wrong structure or types. Combine parsing with schema validation.
import { z } from 'zod';
// Define expected schema
const UserSchema = z.object({
name: z.string().min(1),
age: z.number().int().positive(),
email: z.string().email()
});
/**
* Parse AND validate LLM JSON response.
* Returns validated, typed data or a detailed error.
*/
function parseLlmJsonWithValidation(text, schema) {
// Step 1: Parse JSON
const parseResult = parseLlmJson(text);
if (!parseResult.success) {
return {
success: false,
error: 'json_parse_failed',
details: parseResult.error,
raw: text
};
}
// Step 2: Validate schema
const validation = schema.safeParse(parseResult.data);
if (!validation.success) {
return {
success: false,
error: 'schema_validation_failed',
details: validation.error.issues,
parsed: parseResult.data,
raw: text
};
}
return {
success: true,
data: validation.data,
strategy: parseResult.strategy
};
}
// Usage
const result = parseLlmJsonWithValidation(llmResponse, UserSchema);
if (result.success) {
console.log('Valid user:', result.data);
// TypeScript knows result.data is { name: string, age: number, email: string }
} else if (result.error === 'json_parse_failed') {
console.error('Could not parse JSON from response');
// Retry with modified prompt, or return default
} else if (result.error === 'schema_validation_failed') {
console.error('JSON parsed but wrong structure:', result.details);
// Retry with validation errors included in prompt (see 4.10.c)
}
10. Key Takeaways
- Never call JSON.parse() without try/catch — LLM responses are unreliable by nature, and an unhandled parse error will crash your application.
- Use a multi-layer strategy — try direct parse, then code block extraction, then boundary extraction, then cleaning, in that order.
- Common LLM JSON errors are predictable — trailing commas, single quotes, unquoted keys, and surrounding text account for the vast majority of failures.
- Prevention helps but isn't enough —
response_format: json_objectreduces errors but doesn't eliminate truncation, timeouts, or schema mismatches. - Always validate structure after parsing — valid JSON is not the same as correct JSON; use Zod or a similar schema validator to verify the shape and types of your data.
Explain-It Challenge
- A junior developer writes
const data = JSON.parse(response.choices[0].message.content)with no error handling. List five different ways this line can crash in production. - Your parsing pipeline extracts JSON from a response using the "first
{to last}" strategy. The model responds:I found {"count": 2} users: {"name": "Alice"} and {"name": "Bob"}. What JSON does your extractor produce, and why is it wrong? - A teammate suggests "just use
response_format: json_objectand you'll never need error handling." Write a three-sentence rebuttal.
Navigation: ← 4.10 Overview · 4.10.b — Partial Responses and Timeouts →