Episode 4 — Generative AI Engineering / 4.10 — Error Handling in AI Applications

4.10.a — Handling Invalid JSON

In one sentence: LLMs frequently return malformed JSON — wrapped in extra text, sprinkled with trailing commas, using single quotes, or missing closing brackets — so every production AI system needs a multi-layered JSON parsing strategy that can extract, clean, and validate JSON from any response the model gives you.

Navigation: ← 4.10 Overview · 4.10.b — Partial Responses and Timeouts →


1. Why LLMs Return Invalid JSON

When you ask an LLM to "respond in JSON," you might expect clean, parseable output. In reality, you'll get invalid JSON surprisingly often, even with clear instructions. Understanding why is the first step to handling it.

The fundamental problem

LLMs generate text token by token. They don't have a JSON parser running internally — they're predicting the next token based on patterns. This means:

  • The model has no concept of "valid JSON" — it's just producing text that looks like JSON based on training data
  • The model may add conversational preamble or explanation around the JSON
  • The model may mix JSON conventions with JavaScript conventions (single quotes, trailing commas)
  • The model may lose track of bracket nesting in complex structures
What you asked for:
  "Return the user data as JSON"

What you hoped for:
  {"name": "Alice", "age": 30, "email": "alice@example.com"}

What you might actually get (any of these):
  
  ─── Response 1: Wrapped in explanation ───
  Sure! Here's the user data in JSON format:
  ```json
  {"name": "Alice", "age": 30, "email": "alice@example.com"}

Let me know if you need anything else!

─── Response 2: JavaScript-style, not JSON ─── {name: 'Alice', age: 30, email: 'alice@example.com'}

─── Response 3: Trailing comma ─── {"name": "Alice", "age": 30, "email": "alice@example.com",}

─── Response 4: Truncated (hit token limit) ─── {"name": "Alice", "age": 30, "ema

─── Response 5: Mixed content ─── The user data is: {"name": "Alice", "age": 30} and that's the result.


---

## 2. The JSON.parse() Error

The most basic error you'll encounter is a `JSON.parse()` failure. Let's understand what happens and why.

```javascript
// This is what most developers write first
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Return user data as JSON' }]
});

const content = response.choices[0].message.content;

// DANGER: This will throw if the content isn't perfect JSON
const data = JSON.parse(content);

What JSON.parse() rejects

// All of these throw SyntaxError

JSON.parse('{"name": "Alice",}');
// SyntaxError: Unexpected token } — trailing comma

JSON.parse("{'name': 'Alice'}");
// SyntaxError: Unexpected token ' — single quotes

JSON.parse('{name: "Alice"}');
// SyntaxError: Unexpected token n — unquoted keys

JSON.parse('Here is the JSON: {"name": "Alice"}');
// SyntaxError: Unexpected token H — text before JSON

JSON.parse('{"name": "Alice"} Hope this helps!');
// This actually parses in some engines, but behavior is not guaranteed

JSON.parse('{"name": "Alice", "bio": "She said "hello""}');
// SyntaxError — unescaped quotes inside strings

JSON.parse('{"name": "Alice", // a comment\n"age": 30}');
// SyntaxError — JSON doesn't support comments

JSON.parse('undefined');
// SyntaxError

JSON.parse('');
// SyntaxError: Unexpected end of JSON input

3. Layer 1: Basic Safe Parsing

The first layer of defense is wrapping JSON.parse() in a try/catch. Never call it without protection.

/**
 * Safely parse a JSON string. Returns the parsed value or null on failure.
 */
function safeJsonParse(text) {
  try {
    return JSON.parse(text);
  } catch (error) {
    console.warn('JSON parse failed:', error.message);
    return null;
  }
}

// Usage
const data = safeJsonParse(response.choices[0].message.content);
if (data === null) {
  // Parsing failed — move to recovery strategies
}

This is necessary but not sufficient. Most LLM responses that fail JSON.parse() have recoverable JSON inside them. You need extraction strategies.


4. Layer 2: Extracting JSON from Mixed Text

The most common failure pattern is the LLM wrapping valid JSON in conversational text. You need to extract the JSON from the surrounding text.

Strategy 1: Extract from markdown code blocks

LLMs love wrapping JSON in markdown code fences. This is actually helpful — it gives us clear delimiters.

/**
 * Extract JSON from markdown code blocks.
 * Handles ```json ... ``` and ``` ... ``` blocks.
 */
function extractJsonFromCodeBlock(text) {
  // Match ```json ... ``` or ``` ... ```
  const codeBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/;
  const match = text.match(codeBlockRegex);
  
  if (match) {
    try {
      return JSON.parse(match[1].trim());
    } catch (e) {
      // Code block content isn't valid JSON either
      return null;
    }
  }
  return null;
}

// Test
const response1 = `Here's the data:
\`\`\`json
{"name": "Alice", "age": 30}
\`\`\`
Let me know if you need more!`;

console.log(extractJsonFromCodeBlock(response1));
// { name: 'Alice', age: 30 }

Strategy 2: Find the first { and last }

If there's no code block, look for the outermost curly braces.

/**
 * Extract JSON object by finding matching braces.
 * Handles text before and after the JSON.
 */
function extractJsonObject(text) {
  const firstBrace = text.indexOf('{');
  const lastBrace = text.lastIndexOf('}');
  
  if (firstBrace === -1 || lastBrace === -1 || lastBrace <= firstBrace) {
    return null;
  }
  
  const candidate = text.substring(firstBrace, lastBrace + 1);
  
  try {
    return JSON.parse(candidate);
  } catch (e) {
    return null;
  }
}

// Test
const response2 = 'The user data is: {"name": "Alice", "age": 30} and that\'s it.';
console.log(extractJsonObject(response2));
// { name: 'Alice', age: 30 }

Strategy 3: Find JSON arrays

Sometimes the model returns an array, not an object.

/**
 * Extract JSON array or object from text.
 */
function extractJson(text) {
  // Try direct parse first
  try {
    return JSON.parse(text.trim());
  } catch (e) {
    // Continue to extraction strategies
  }

  // Try code block extraction
  const codeBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/;
  const codeMatch = text.match(codeBlockRegex);
  if (codeMatch) {
    try {
      return JSON.parse(codeMatch[1].trim());
    } catch (e) {
      // Fall through
    }
  }

  // Try to find a JSON object
  const firstBrace = text.indexOf('{');
  const lastBrace = text.lastIndexOf('}');
  if (firstBrace !== -1 && lastBrace > firstBrace) {
    try {
      return JSON.parse(text.substring(firstBrace, lastBrace + 1));
    } catch (e) {
      // Fall through
    }
  }

  // Try to find a JSON array
  const firstBracket = text.indexOf('[');
  const lastBracket = text.lastIndexOf(']');
  if (firstBracket !== -1 && lastBracket > firstBracket) {
    try {
      return JSON.parse(text.substring(firstBracket, lastBracket + 1));
    } catch (e) {
      // Fall through
    }
  }

  return null;
}

5. Layer 3: Cleaning Common JSON Errors

When extraction finds JSON-like text but JSON.parse() still fails, you can attempt to fix common patterns.

Fixing trailing commas

/**
 * Remove trailing commas before } or ] in JSON strings.
 * Handles nested structures.
 */
function removeTrailingCommas(jsonString) {
  // Remove trailing commas before closing braces/brackets
  // This regex handles whitespace/newlines between the comma and the closer
  return jsonString.replace(/,\s*([\]}])/g, '$1');
}

// Test
const badJson1 = '{"name": "Alice", "age": 30,}';
console.log(JSON.parse(removeTrailingCommas(badJson1)));
// { name: 'Alice', age: 30 }

const badJson2 = '{"items": ["a", "b", "c",], "count": 3,}';
console.log(JSON.parse(removeTrailingCommas(badJson2)));
// { items: ['a', 'b', 'c'], count: 3 }

Fixing single quotes

/**
 * Replace single-quoted strings with double-quoted strings.
 * WARNING: This is a heuristic — can break if values contain apostrophes.
 */
function fixSingleQuotes(jsonString) {
  // Replace single quotes that are used as string delimiters
  // This is a simplified approach — handles most LLM output patterns
  return jsonString
    .replace(/'/g, '"');
}

// Better approach: only replace quotes around keys and values
function fixSingleQuotesSmart(jsonString) {
  // Match patterns like 'key': 'value' and replace outer quotes
  // This handles most LLM output but is not perfect
  let result = jsonString;
  
  // Replace single-quoted keys: {'key': → {"key":
  result = result.replace(/{\s*'/g, '{"');
  result = result.replace(/,\s*'/g, ',"');
  result = result.replace(/':/g, '":');
  
  // Replace single-quoted string values: : 'value' → : "value"
  result = result.replace(/:\s*'([^']*)'/g, ': "$1"');
  
  return result;
}

// Test
const badJson3 = "{'name': 'Alice', 'age': 30}";
console.log(JSON.parse(fixSingleQuotes(badJson3)));
// { name: 'Alice', age: 30 }

Fixing unquoted keys

/**
 * Add quotes around unquoted object keys.
 * Handles keys that are valid JavaScript identifiers.
 */
function fixUnquotedKeys(jsonString) {
  // Match unquoted keys: { key: or , key:
  return jsonString.replace(
    /([{,]\s*)([a-zA-Z_$][a-zA-Z0-9_$]*)\s*:/g,
    '$1"$2":'
  );
}

// Test
const badJson4 = '{name: "Alice", age: 30}';
console.log(JSON.parse(fixUnquotedKeys(badJson4)));
// { name: 'Alice', age: 30 }

Removing comments

/**
 * Remove JavaScript-style comments from JSON strings.
 */
function removeComments(jsonString) {
  // Remove single-line comments (// ...)
  let result = jsonString.replace(/\/\/[^\n]*/g, '');
  // Remove multi-line comments (/* ... */)
  result = result.replace(/\/\*[\s\S]*?\*\//g, '');
  return result;
}

// Test
const badJson5 = `{
  "name": "Alice", // the user's name
  "age": 30 /* years old */
}`;
console.log(JSON.parse(removeComments(badJson5)));
// { name: 'Alice', age: 30 }

6. The Complete Multi-Layer Parser

Putting all layers together into a single, production-ready function.

/**
 * Robust JSON parser for LLM outputs.
 * Attempts multiple strategies to extract and parse JSON.
 * 
 * Strategies (in order):
 * 1. Direct JSON.parse()
 * 2. Extract from markdown code blocks
 * 3. Extract by finding { } or [ ] boundaries
 * 4. Clean common errors (trailing commas, single quotes, unquoted keys, comments)
 * 5. Combine extraction + cleaning
 * 
 * @param {string} text - Raw LLM response text
 * @returns {{ success: boolean, data: any, strategy: string, error?: string }}
 */
function parseLlmJson(text) {
  if (!text || typeof text !== 'string') {
    return { success: false, data: null, strategy: 'none', error: 'Input is empty or not a string' };
  }

  const trimmed = text.trim();

  // Strategy 1: Direct parse
  try {
    return { success: true, data: JSON.parse(trimmed), strategy: 'direct' };
  } catch (e) {
    // Continue
  }

  // Strategy 2: Extract from code blocks
  const codeBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/;
  const codeMatch = trimmed.match(codeBlockRegex);
  if (codeMatch) {
    try {
      return { success: true, data: JSON.parse(codeMatch[1].trim()), strategy: 'code-block' };
    } catch (e) {
      // Try cleaning the code block content
      const cleaned = cleanJson(codeMatch[1].trim());
      try {
        return { success: true, data: JSON.parse(cleaned), strategy: 'code-block-cleaned' };
      } catch (e2) {
        // Continue
      }
    }
  }

  // Strategy 3: Extract by brace/bracket boundaries
  const extracted = extractByBoundary(trimmed);
  if (extracted) {
    try {
      return { success: true, data: JSON.parse(extracted), strategy: 'boundary-extraction' };
    } catch (e) {
      // Try cleaning the extracted content
      const cleaned = cleanJson(extracted);
      try {
        return { success: true, data: JSON.parse(cleaned), strategy: 'boundary-extraction-cleaned' };
      } catch (e2) {
        // Continue
      }
    }
  }

  // Strategy 4: Clean the full text and try again
  const fullCleaned = cleanJson(trimmed);
  try {
    return { success: true, data: JSON.parse(fullCleaned), strategy: 'full-clean' };
  } catch (e) {
    // All strategies failed
  }

  return {
    success: false,
    data: null,
    strategy: 'all-failed',
    error: 'Could not extract valid JSON from response'
  };
}

// Helper: extract by finding matching braces/brackets
function extractByBoundary(text) {
  // Try object first
  const firstBrace = text.indexOf('{');
  const lastBrace = text.lastIndexOf('}');
  if (firstBrace !== -1 && lastBrace > firstBrace) {
    return text.substring(firstBrace, lastBrace + 1);
  }

  // Try array
  const firstBracket = text.indexOf('[');
  const lastBracket = text.lastIndexOf(']');
  if (firstBracket !== -1 && lastBracket > firstBracket) {
    return text.substring(firstBracket, lastBracket + 1);
  }

  return null;
}

// Helper: apply all cleaning transformations
function cleanJson(text) {
  let result = text;

  // Remove comments
  result = result.replace(/\/\/[^\n]*/g, '');
  result = result.replace(/\/\*[\s\S]*?\*\//g, '');

  // Remove trailing commas
  result = result.replace(/,\s*([\]}])/g, '$1');

  // Fix single quotes (simple replacement)
  // Only if the string has no double quotes (suggesting it's using single-quote style)
  if (!result.includes('"') && result.includes("'")) {
    result = result.replace(/'/g, '"');
  }

  // Fix unquoted keys
  result = result.replace(/([{,]\s*)([a-zA-Z_$][a-zA-Z0-9_$]*)\s*:/g, '$1"$2":');

  return result;
}

Using the complete parser

// Example usage in a real application
async function getStructuredResponse(prompt) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: 'You are a data extraction assistant. Always respond with valid JSON only.'
      },
      { role: 'user', content: prompt }
    ],
    temperature: 0
  });

  const content = response.choices[0].message.content;
  const result = parseLlmJson(content);

  if (!result.success) {
    console.error('Failed to parse LLM response:', {
      raw: content,
      error: result.error,
      strategy: result.strategy
    });
    throw new Error('LLM returned unparseable response');
  }

  console.log(`Parsed successfully using strategy: ${result.strategy}`);
  return result.data;
}

7. Common Failure Patterns Catalog

Here's a reference of every common failure pattern, what causes it, and how to handle it.

#PatternExampleCauseFix
1Text before JSONHere's the JSON: {"a": 1}Model adds preambleBoundary extraction
2Text after JSON{"a": 1} Hope this helps!Model adds postambleBoundary extraction
3Markdown code block```json {"a": 1} ```Model wraps in fencesCode block extraction
4Trailing comma{"a": 1, "b": 2,}JavaScript habitRegex removal
5Single quotes{'a': 'value'}Python/JS habitQuote replacement
6Unquoted keys{a: 1, b: 2}JavaScript habitRegex quoting
7Comments{"a": 1 // comment}Code habitComment removal
8Unescaped quotes{"text": "She said "hi""}Model mistakeVery hard to auto-fix
9Truncated JSON{"name": "AlToken limit reachedSee 4.10.b
10Multiple JSON objects{"a": 1}\n{"b": 2}Model returns listSplit and parse each
11NaN / Infinity{"score": NaN}JavaScript-ismReplace with null
12Undefined{"value": undefined}JavaScript-ismReplace with null

Handling patterns 8, 10, 11, 12

// Pattern 10: Multiple JSON objects (one per line)
function parseJsonLines(text) {
  const lines = text.trim().split('\n');
  const results = [];
  
  for (const line of lines) {
    const result = parseLlmJson(line);
    if (result.success) {
      results.push(result.data);
    }
  }
  
  return results;
}

// Patterns 11 & 12: Replace JavaScript-isms
function fixJavaScriptValues(jsonString) {
  return jsonString
    .replace(/:\s*NaN/g, ': null')
    .replace(/:\s*Infinity/g, ': null')
    .replace(/:\s*-Infinity/g, ': null')
    .replace(/:\s*undefined/g, ': null');
}

8. Prevention: Reducing Invalid JSON at the Source

The best error handling is preventing errors in the first place. These prompt strategies reduce the chance of getting invalid JSON.

Use response_format (OpenAI)

// OpenAI's JSON mode guarantees valid JSON
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'Extract user info and respond in JSON.' },
    { role: 'user', content: 'Alice is 30 years old and works at Acme Corp.' }
  ],
  response_format: { type: 'json_object' }
});

// This is guaranteed to be valid JSON (or the API returns an error)
const data = JSON.parse(response.choices[0].message.content);

Use structured outputs (OpenAI)

// Even stronger: define the exact schema
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'Extract user info.' },
    { role: 'user', content: 'Alice is 30, works at Acme.' }
  ],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'user_info',
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          age: { type: 'number' },
          company: { type: 'string' }
        },
        required: ['name', 'age', 'company'],
        additionalProperties: false
      },
      strict: true
    }
  }
});

Strong prompt instructions

// If you can't use response_format, make instructions explicit
const systemPrompt = `You are a JSON extraction API.

CRITICAL RULES:
1. Respond with ONLY a valid JSON object — no text before or after
2. Use double quotes for all strings and keys
3. No trailing commas
4. No comments
5. No markdown formatting or code blocks

Response format:
{"name": "string", "age": number, "company": "string"}`;

Why you still need error handling despite prevention

Even with response_format: { type: 'json_object' }:

  • The response can still be truncated if it hits the token limit
  • The schema can be wrong (correct JSON, wrong structure)
  • The API call can fail entirely (timeout, rate limit, server error)

Prevention reduces error frequency; it does not eliminate the need for error handling.


9. Validation After Parsing: Zod Integration

Parsing JSON is only half the battle. The JSON might be valid but contain the wrong structure or types. Combine parsing with schema validation.

import { z } from 'zod';

// Define expected schema
const UserSchema = z.object({
  name: z.string().min(1),
  age: z.number().int().positive(),
  email: z.string().email()
});

/**
 * Parse AND validate LLM JSON response.
 * Returns validated, typed data or a detailed error.
 */
function parseLlmJsonWithValidation(text, schema) {
  // Step 1: Parse JSON
  const parseResult = parseLlmJson(text);
  
  if (!parseResult.success) {
    return {
      success: false,
      error: 'json_parse_failed',
      details: parseResult.error,
      raw: text
    };
  }

  // Step 2: Validate schema
  const validation = schema.safeParse(parseResult.data);
  
  if (!validation.success) {
    return {
      success: false,
      error: 'schema_validation_failed',
      details: validation.error.issues,
      parsed: parseResult.data,
      raw: text
    };
  }

  return {
    success: true,
    data: validation.data,
    strategy: parseResult.strategy
  };
}

// Usage
const result = parseLlmJsonWithValidation(llmResponse, UserSchema);

if (result.success) {
  console.log('Valid user:', result.data);
  // TypeScript knows result.data is { name: string, age: number, email: string }
} else if (result.error === 'json_parse_failed') {
  console.error('Could not parse JSON from response');
  // Retry with modified prompt, or return default
} else if (result.error === 'schema_validation_failed') {
  console.error('JSON parsed but wrong structure:', result.details);
  // Retry with validation errors included in prompt (see 4.10.c)
}

10. Key Takeaways

  1. Never call JSON.parse() without try/catch — LLM responses are unreliable by nature, and an unhandled parse error will crash your application.
  2. Use a multi-layer strategy — try direct parse, then code block extraction, then boundary extraction, then cleaning, in that order.
  3. Common LLM JSON errors are predictable — trailing commas, single quotes, unquoted keys, and surrounding text account for the vast majority of failures.
  4. Prevention helps but isn't enoughresponse_format: json_object reduces errors but doesn't eliminate truncation, timeouts, or schema mismatches.
  5. Always validate structure after parsing — valid JSON is not the same as correct JSON; use Zod or a similar schema validator to verify the shape and types of your data.

Explain-It Challenge

  1. A junior developer writes const data = JSON.parse(response.choices[0].message.content) with no error handling. List five different ways this line can crash in production.
  2. Your parsing pipeline extracts JSON from a response using the "first { to last }" strategy. The model responds: I found {"count": 2} users: {"name": "Alice"} and {"name": "Bob"}. What JSON does your extractor produce, and why is it wrong?
  3. A teammate suggests "just use response_format: json_object and you'll never need error handling." Write a three-sentence rebuttal.

Navigation: ← 4.10 Overview · 4.10.b — Partial Responses and Timeouts →