Episode 4 — Generative AI Engineering / 4.5 — Generating JSON Responses from LLMs

4.5.d — Validating Returned Structure

In one sentence: Never trust AI-generated JSON blindly — always validate the parsed structure against your expected schema, check types, handle missing or extra fields, and build a retry loop that re-prompts the model when validation fails.

Navigation: ← 4.5.c — Function Calling Basics · 4.5.e — Building Structured Profile Analysis →


1. Why You MUST Validate Before Using AI-Generated JSON

Even with JSON mode enabled and a perfect schema in your prompt, things go wrong:

┌─────────────────────────────────────────────────────────────────────────┐
│                  WHAT CAN GO WRONG                                      │
│                                                                         │
│  1. JSON.parse() fails      → Model returned invalid syntax            │
│  2. Wrong field names        → "userName" instead of "user_name"       │
│  3. Wrong types              → age: "thirty" instead of age: 30        │
│  4. Missing required fields  → No "email" field at all                 │
│  5. Extra unexpected fields  → Random fields you didn't ask for        │
│  6. Wrong value ranges       → score: 150 when max is 100             │
│  7. Wrong array length       → 0 items when you asked for 3-5         │
│  8. Truncated response       → max_tokens hit, JSON cut off mid-way   │
│  9. Nested structure wrong   → Flat object instead of nested          │
│ 10. Null where not expected  → location: null when it should be a     │
│                                 string                                  │
└─────────────────────────────────────────────────────────────────────────┘

If your code blindly accesses data.compatibility_score and it doesn't exist, you get a runtime crash. If data.age is the string "thirty" and you do math on it, you get NaN. Validation is not optional.


2. Layer 1: Safe JSON Parsing

The first defense — JSON.parse() can throw. Always wrap it:

function safeJsonParse(text) {
  try {
    return { success: true, data: JSON.parse(text) };
  } catch (error) {
    return { success: false, error: error.message, raw: text };
  }
}

// Usage
const result = safeJsonParse(response.choices[0].message.content);

if (!result.success) {
  console.error('Failed to parse JSON:', result.error);
  console.error('Raw response:', result.raw);
  // Handle: retry, fallback, or error response
} else {
  const data = result.data;
  // Continue with validation...
}

Common parse failures and fixes

FailureCauseFix
Unexpected token at position 0Model included text before JSONStrip prefix text, find {
Unexpected end of JSON inputResponse was truncated (max_tokens)Increase max_tokens, check finish_reason
`Unexpected token ``Model wrapped JSON in markdown code fencesStrip ```json and ```
Unterminated stringJSON cut off inside a string valueTruncation issue — retry with more tokens

Robust parser with common fix-ups

function robustJsonParse(text) {
  // Attempt 1: Direct parse
  try {
    return JSON.parse(text);
  } catch (e) {
    // Attempt 2: Strip markdown code fences
    const fenceMatch = text.match(/```(?:json)?\s*([\s\S]*?)```/);
    if (fenceMatch) {
      try {
        return JSON.parse(fenceMatch[1].trim());
      } catch (e2) { /* fall through */ }
    }

    // Attempt 3: Find JSON object in text
    const objectMatch = text.match(/\{[\s\S]*\}/);
    if (objectMatch) {
      try {
        return JSON.parse(objectMatch[0]);
      } catch (e3) { /* fall through */ }
    }

    // Attempt 4: Find JSON array in text
    const arrayMatch = text.match(/\[[\s\S]*\]/);
    if (arrayMatch) {
      try {
        return JSON.parse(arrayMatch[0]);
      } catch (e4) { /* fall through */ }
    }

    throw new Error(`Could not parse JSON from response: ${text.substring(0, 200)}`);
  }
}

Important: When using JSON mode, you should rarely need the fallback attempts. But for Anthropic/Claude or models without JSON mode, the robust parser is valuable.


3. Layer 2: Schema Validation with Basic Checks

After parsing, validate the structure matches your expectations:

function validateProfileData(data) {
  const errors = [];

  // Check required fields exist
  const requiredFields = ['name', 'age', 'interests', 'compatibility_score'];
  for (const field of requiredFields) {
    if (data[field] === undefined) {
      errors.push(`Missing required field: "${field}"`);
    }
  }

  // Check types
  if (data.name !== undefined && typeof data.name !== 'string') {
    errors.push(`"name" must be a string, got ${typeof data.name}`);
  }

  if (data.age !== undefined && typeof data.age !== 'number') {
    errors.push(`"age" must be a number, got ${typeof data.age}`);
  }

  if (data.interests !== undefined && !Array.isArray(data.interests)) {
    errors.push(`"interests" must be an array, got ${typeof data.interests}`);
  }

  if (data.compatibility_score !== undefined) {
    if (typeof data.compatibility_score !== 'number') {
      errors.push(`"compatibility_score" must be a number, got ${typeof data.compatibility_score}`);
    } else if (data.compatibility_score < 0 || data.compatibility_score > 100) {
      errors.push(`"compatibility_score" must be 0-100, got ${data.compatibility_score}`);
    }
  }

  return {
    valid: errors.length === 0,
    errors,
    data
  };
}

// Usage
const parsed = JSON.parse(response.choices[0].message.content);
const validation = validateProfileData(parsed);

if (!validation.valid) {
  console.error('Validation failed:', validation.errors);
  // Handle: retry, fix, or reject
} else {
  // Safe to use validation.data
  processProfile(validation.data);
}

4. Layer 3: Type Checking Returned Fields

Go deeper than just checking existence — verify types, ranges, and constraints:

// Type checking utilities
function isString(value) {
  return typeof value === 'string';
}

function isNumber(value) {
  return typeof value === 'number' && !isNaN(value) && isFinite(value);
}

function isInteger(value) {
  return isNumber(value) && Number.isInteger(value);
}

function isBoolean(value) {
  return typeof value === 'boolean';
}

function isArrayOf(value, typeCheck) {
  return Array.isArray(value) && value.every(typeCheck);
}

function isOneOf(value, allowedValues) {
  return allowedValues.includes(value);
}

function isInRange(value, min, max) {
  return isNumber(value) && value >= min && value <= max;
}

function isNullableString(value) {
  return value === null || typeof value === 'string';
}

// Comprehensive validator
function validateCompatibilityAnalysis(data) {
  const errors = [];

  // Type + constraint checks
  if (!isInteger(data.compatibility_score)) {
    errors.push('compatibility_score must be an integer');
  } else if (!isInRange(data.compatibility_score, 0, 100)) {
    errors.push('compatibility_score must be between 0 and 100');
  }

  if (!isArrayOf(data.strengths, isString)) {
    errors.push('strengths must be an array of strings');
  } else if (data.strengths.length < 1 || data.strengths.length > 5) {
    errors.push('strengths must have 1-5 items');
  }

  if (!isArrayOf(data.weaknesses, isString)) {
    errors.push('weaknesses must be an array of strings');
  }

  if (!isArrayOf(data.suggested_openers, isString)) {
    errors.push('suggested_openers must be an array of strings');
  } else if (data.suggested_openers.length < 1 || data.suggested_openers.length > 3) {
    errors.push('suggested_openers must have 1-3 items');
  }

  if (data.confidence !== undefined && !isOneOf(data.confidence, ['high', 'medium', 'low'])) {
    errors.push('confidence must be "high", "medium", or "low"');
  }

  return { valid: errors.length === 0, errors };
}

5. Layer 4: Handling Missing or Extra Fields

Missing fields — provide defaults

function applyDefaults(data, defaults) {
  const result = { ...defaults, ...data };

  // Deep merge for nested objects
  for (const key of Object.keys(defaults)) {
    if (
      defaults[key] !== null &&
      typeof defaults[key] === 'object' &&
      !Array.isArray(defaults[key]) &&
      data[key] !== undefined &&
      typeof data[key] === 'object'
    ) {
      result[key] = { ...defaults[key], ...data[key] };
    }
  }

  return result;
}

// Define defaults for your schema
const profileDefaults = {
  name: 'Unknown',
  age: null,
  interests: [],
  location: null,
  compatibility_score: 50,
  strengths: [],
  weaknesses: [],
  suggested_openers: [],
  confidence: 'low'
};

// Apply defaults to fill gaps
const rawData = JSON.parse(response.choices[0].message.content);
const data = applyDefaults(rawData, profileDefaults);
// Now data always has all expected fields

Extra fields — strip them

function stripExtraFields(data, allowedFields) {
  const result = {};
  for (const field of allowedFields) {
    if (data[field] !== undefined) {
      result[field] = data[field];
    }
  }
  return result;
}

const allowedFields = [
  'name', 'age', 'interests', 'location',
  'compatibility_score', 'strengths', 'weaknesses', 'suggested_openers'
];

const cleanData = stripExtraFields(rawData, allowedFields);
// Any fields the model added that you didn't expect are removed

Why strip extra fields?

// Model sometimes adds unwanted fields
const modelResponse = {
  name: "Alice",
  age: 30,
  interests: ["hiking"],
  compatibility_score: 85,
  strengths: ["outdoor lover"],
  weaknesses: [],
  suggested_openers: ["Ask about hiking!"],
  // Extra fields the model decided to add:
  personality_type: "ENFP",          // Didn't ask for this
  zodiac_sign: "Leo",               // Definitely didn't ask for this
  _internal_reasoning: "Based on..."  // Model's chain-of-thought leaked
};

// Stripping ensures only YOUR expected fields go to your database/API

6. Layer 5: Retry on Validation Failure

When validation fails, don't give up — try again with more explicit instructions:

async function callWithRetry(messages, validateFn, maxRetries = 3) {
  let lastError = null;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o',
        temperature: 0,
        response_format: { type: 'json_object' },
        messages: messages,
      });

      // Check for truncation
      if (response.choices[0].finish_reason === 'length') {
        throw new Error('Response truncated — JSON likely incomplete');
      }

      // Parse
      const data = JSON.parse(response.choices[0].message.content);

      // Validate
      const validation = validateFn(data);
      if (validation.valid) {
        return { success: true, data, attempts: attempt };
      }

      // Validation failed — add error feedback for retry
      lastError = validation.errors.join('; ');
      console.warn(`Attempt ${attempt} validation failed: ${lastError}`);

      // Add the failed response and error feedback to messages for retry
      messages = [
        ...messages,
        { role: 'assistant', content: response.choices[0].message.content },
        {
          role: 'user',
          content: `The JSON you returned has validation errors: ${lastError}. Please fix these issues and return the corrected JSON.`
        }
      ];

    } catch (error) {
      lastError = error.message;
      console.warn(`Attempt ${attempt} error: ${lastError}`);

      // For parse errors, provide different feedback
      if (error instanceof SyntaxError) {
        messages = [
          ...messages,
          {
            role: 'user',
            content: 'Your previous response was not valid JSON. Please return ONLY a valid JSON object.'
          }
        ];
      }
    }
  }

  return { success: false, error: lastError, attempts: maxRetries };
}

Usage example

const messages = [
  {
    role: 'system',
    content: `Analyze dating profiles and return JSON with:
- "compatibility_score" (integer, 0-100)
- "strengths" (array of 2-4 strings)
- "weaknesses" (array of 0-3 strings)
- "suggested_openers" (array of 2-3 strings)`
  },
  {
    role: 'user',
    content: 'Profile 1: Maya, 27, loves hiking. Profile 2: Alex, 29, enjoys cooking and outdoor adventures.'
  }
];

const result = await callWithRetry(messages, validateCompatibilityAnalysis);

if (result.success) {
  console.log(`Success after ${result.attempts} attempt(s):`);
  console.log(result.data);
} else {
  console.error(`Failed after ${result.attempts} attempts: ${result.error}`);
  // Return a graceful error or default response to the user
}

7. Building a Complete Validate-or-Retry Pipeline

Here's a production-grade pipeline that combines all layers:

class JSONResponsePipeline {
  constructor(openai, config = {}) {
    this.openai = openai;
    this.model = config.model || 'gpt-4o';
    this.maxRetries = config.maxRetries || 3;
    this.temperature = config.temperature || 0;
    this.maxTokens = config.maxTokens || 2048;
  }

  async execute(messages, schema) {
    let currentMessages = [...messages];

    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      const startTime = Date.now();

      try {
        // Step 1: API Call
        const response = await this.openai.chat.completions.create({
          model: this.model,
          temperature: this.temperature,
          max_tokens: this.maxTokens,
          response_format: { type: 'json_object' },
          messages: currentMessages,
        });

        const raw = response.choices[0].message.content;
        const finishReason = response.choices[0].finish_reason;

        // Step 2: Check for truncation
        if (finishReason === 'length') {
          throw new ValidationError(
            'TRUNCATED',
            'Response was truncated. Increase max_tokens or simplify the request.'
          );
        }

        // Step 3: Parse JSON
        let data;
        try {
          data = JSON.parse(raw);
        } catch (parseError) {
          throw new ValidationError('PARSE_ERROR', `Invalid JSON: ${parseError.message}`);
        }

        // Step 4: Validate against schema
        const validation = schema.validate(data);
        if (!validation.valid) {
          throw new ValidationError('SCHEMA_ERROR', validation.errors.join('; '));
        }

        // Step 5: Clean data (apply defaults, strip extras)
        const cleanData = schema.clean(data);

        // Step 6: Return success
        return {
          success: true,
          data: cleanData,
          metadata: {
            attempt,
            latencyMs: Date.now() - startTime,
            tokensUsed: response.usage,
            finishReason,
          }
        };

      } catch (error) {
        console.warn(`Attempt ${attempt}/${this.maxRetries}: ${error.message}`);

        if (attempt === this.maxRetries) {
          return {
            success: false,
            error: error.message,
            errorType: error.type || 'UNKNOWN',
            metadata: { attempt, latencyMs: Date.now() - startTime }
          };
        }

        // Build retry messages with error context
        currentMessages = this.buildRetryMessages(currentMessages, error);
      }
    }
  }

  buildRetryMessages(messages, error) {
    const errorFeedback = {
      TRUNCATED: 'Your response was too long and got cut off. Return a shorter, more concise JSON response.',
      PARSE_ERROR: 'Your response was not valid JSON. Return ONLY a valid JSON object with no other text.',
      SCHEMA_ERROR: `Your JSON had validation errors: ${error.message}. Please fix these issues.`
    };

    return [
      ...messages,
      {
        role: 'user',
        content: errorFeedback[error.type] || `Error: ${error.message}. Please try again.`
      }
    ];
  }
}

class ValidationError extends Error {
  constructor(type, message) {
    super(message);
    this.type = type;
  }
}

Defining a schema for the pipeline

const compatibilitySchema = {
  validate(data) {
    const errors = [];

    if (typeof data.compatibility_score !== 'number' || data.compatibility_score < 0 || data.compatibility_score > 100) {
      errors.push('compatibility_score must be a number between 0 and 100');
    }
    if (!Array.isArray(data.strengths) || data.strengths.length === 0) {
      errors.push('strengths must be a non-empty array');
    }
    if (!Array.isArray(data.weaknesses)) {
      errors.push('weaknesses must be an array');
    }
    if (!Array.isArray(data.suggested_openers) || data.suggested_openers.length === 0) {
      errors.push('suggested_openers must be a non-empty array');
    }

    return { valid: errors.length === 0, errors };
  },

  clean(data) {
    return {
      compatibility_score: Math.round(data.compatibility_score),
      strengths: data.strengths.slice(0, 5),
      weaknesses: data.weaknesses.slice(0, 3),
      suggested_openers: data.suggested_openers.slice(0, 3),
    };
  }
};

Using the pipeline

const pipeline = new JSONResponsePipeline(openai, {
  model: 'gpt-4o',
  maxRetries: 3,
  maxTokens: 1024,
});

const result = await pipeline.execute(
  [
    {
      role: 'system',
      content: `Analyze two dating profiles for compatibility. Return JSON:
{
  "compatibility_score": integer 0-100,
  "strengths": ["string", ...],
  "weaknesses": ["string", ...],
  "suggested_openers": ["string", ...]
}`
    },
    {
      role: 'user',
      content: 'Profile A: Maya, 27, hiking, coffee, coding. Profile B: Alex, 29, cooking, trail running, tech.'
    }
  ],
  compatibilitySchema
);

if (result.success) {
  console.log('Score:', result.data.compatibility_score);
  console.log('Attempts:', result.metadata.attempt);
  console.log('Tokens used:', result.metadata.tokensUsed);
} else {
  console.error('Pipeline failed:', result.error);
  // Show user a graceful error or fallback
}

8. Validation Strategy by Risk Level

Different use cases need different validation rigor:

Risk LevelExampleValidation Strategy
LowDisplay suggestions in UIParse + basic type check. Show "unable to generate" on failure.
MediumSave to databaseParse + full schema validation + defaults + retry once.
HighFinancial calculation, API contractParse + strict schema + type checking + range validation + retry up to 3 times + human fallback.
CriticalMedical data, legal documentsAll of the above + human review before use. Never auto-process.

9. Common Validation Patterns

Pattern 1: Coerce types when safe

function coerceTypes(data, schema) {
  const coerced = { ...data };

  for (const [field, type] of Object.entries(schema)) {
    if (coerced[field] === undefined) continue;

    switch (type) {
      case 'number':
        if (typeof coerced[field] === 'string') {
          const num = Number(coerced[field]);
          if (!isNaN(num)) coerced[field] = num;
        }
        break;
      case 'integer':
        if (typeof coerced[field] === 'string') {
          const int = parseInt(coerced[field], 10);
          if (!isNaN(int)) coerced[field] = int;
        } else if (typeof coerced[field] === 'number') {
          coerced[field] = Math.round(coerced[field]);
        }
        break;
      case 'boolean':
        if (coerced[field] === 'true') coerced[field] = true;
        if (coerced[field] === 'false') coerced[field] = false;
        break;
      case 'string':
        if (typeof coerced[field] !== 'string') {
          coerced[field] = String(coerced[field]);
        }
        break;
    }
  }

  return coerced;
}

// Usage
const raw = { age: "30", score: "85.7", active: "true" };
const coerced = coerceTypes(raw, { age: 'integer', score: 'number', active: 'boolean' });
// { age: 30, score: 85.7, active: true }

Pattern 2: Clamp values to valid ranges

function clampValue(value, min, max) {
  return Math.max(min, Math.min(max, value));
}

// Model returned score: 150 (out of range)
data.compatibility_score = clampValue(data.compatibility_score, 0, 100);
// Now it's 100

Pattern 3: Normalize string values

function normalizeEnum(value, allowedValues, defaultValue) {
  const normalized = value.toLowerCase().trim();
  if (allowedValues.includes(normalized)) return normalized;

  // Fuzzy match
  const close = allowedValues.find(v => normalized.includes(v) || v.includes(normalized));
  return close || defaultValue;
}

// Model returned "Medium confidence" instead of "medium"
const confidence = normalizeEnum(data.confidence, ['high', 'medium', 'low'], 'medium');
// Returns "medium"

10. Key Takeaways

  1. Never trust AI-generated JSON blindly — always validate after parsing, even with JSON mode enabled.
  2. Validation has five layers: safe parse → schema check → type check → range/constraint check → clean/normalize.
  3. JSON.parse() can fail even with JSON mode — always wrap in try/catch and check finish_reason for truncation.
  4. Handle missing fields with defaults, not crashes. Handle extra fields by stripping them.
  5. Type coercion (string "30" → number 30) is often safer than rejection for non-critical fields.
  6. Build a retry loop — when validation fails, feed the errors back to the model and ask it to fix them.
  7. Match validation rigor to risk level — UI suggestions need less than financial calculations.
  8. A validate-or-retry pipeline (parse → validate → clean → or retry with feedback) is the production pattern.

Explain-It Challenge

  1. A teammate says "JSON mode guarantees valid JSON, so we don't need validation." List three things that can still go wrong.
  2. Why is it more effective to feed validation errors back to the model (in the retry) rather than just making the same request again?
  3. When should you coerce a wrong type (string "30" to number 30) vs reject the response and retry?

Navigation: ← 4.5.c — Function Calling Basics · 4.5.e — Building Structured Profile Analysis →