Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod

4.6.d — Handling Invalid Responses

In one sentence: AI responses fail in predictable ways — extra text around JSON, wrong types, missing fields, unexpected structures — and each failure mode has a specific recovery strategy that ranges from JSON extraction and type coercion to default values and graceful degradation.

Navigation: ← 4.6.c Verifying AI Responses · 4.6.e — Retry Strategies →


1. Common AI Response Failures

AI models fail in predictable patterns. Understanding these patterns lets you build targeted recovery strategies.

┌─────────────────────────────────────────────────────────────────────┐
│                   AI RESPONSE FAILURE TAXONOMY                      │
│                                                                     │
│  Category 1: NOT VALID JSON                                         │
│  ────────────────────────                                           │
│  • Extra text before/after JSON (most common)                       │
│  • Markdown code fences around JSON                                 │
│  • Truncated JSON (context limit or max_tokens hit)                 │
│  • Completely non-JSON response ("I'd be happy to help...")         │
│                                                                     │
│  Category 2: VALID JSON, WRONG SHAPE                                │
│  ─────────────────────────────────                                  │
│  • Missing required fields                                          │
│  • Extra unexpected fields (usually harmless)                       │
│  • Wrong nesting level (flat vs nested)                             │
│  • Array instead of object (or vice versa)                          │
│                                                                     │
│  Category 3: RIGHT SHAPE, WRONG TYPES                               │
│  ────────────────────────────────────                               │
│  • Number as string ("82" instead of 82)                            │
│  • Boolean as string ("true" instead of true)                       │
│  • Enum value not in allowed list                                   │
│  • null where a value is expected                                   │
│                                                                     │
│  Category 4: RIGHT SHAPE AND TYPES, WRONG VALUES                    │
│  ────────────────────────────────────────────────                   │
│  • Number out of range (confidence: 95 instead of 0.95)             │
│  • String too short/long                                            │
│  • Array with wrong number of elements                              │
│  • Cross-field inconsistency (score 20 but label "excellent")       │
└─────────────────────────────────────────────────────────────────────┘

2. Extracting JSON from Text with Extra Content

This is the single most common AI failure: the model wraps its JSON in explanatory text.

Common patterns

Pattern 1: Text before JSON
────────────────────────────
"Sure! Here's the analysis:
{"sentiment": "positive", "confidence": 0.92}"

Pattern 2: Markdown code fences
────────────────────────────────
"```json
{"sentiment": "positive", "confidence": 0.92}
```"

Pattern 3: Text before AND after JSON
──────────────────────────────────────
"Based on my analysis, the result is:
{"sentiment": "positive", "confidence": 0.92}
I hope this helps!"

Pattern 4: Multiple JSON objects (take the first or last)
──────────────────────────────────────────────────────────
"Step 1 output: {"intermediate": true}
Final output: {"sentiment": "positive", "confidence": 0.92}"

JSON extraction utility

/**
 * Extract JSON from a string that may contain extra text.
 * Tries multiple strategies in order of reliability.
 */
function extractJSON(text: string): unknown {
  // Strategy 1: Direct parse (fastest path — handles clean JSON)
  try {
    return JSON.parse(text);
  } catch {
    // Not clean JSON, try extraction strategies
  }

  // Strategy 2: Remove markdown code fences
  const fenceMatch = text.match(/```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/);
  if (fenceMatch) {
    try {
      return JSON.parse(fenceMatch[1].trim());
    } catch {
      // Fenced content is not valid JSON either
    }
  }

  // Strategy 3: Find the first { ... } or [ ... ] block
  const jsonMatch = text.match(/(\{[\s\S]*\}|\[[\s\S]*\])/);
  if (jsonMatch) {
    try {
      return JSON.parse(jsonMatch[1]);
    } catch {
      // Matched braces but not valid JSON (probably nested issue)
    }
  }

  // Strategy 4: Find the LAST { ... } block (sometimes the final output is what you want)
  const allObjects = [...text.matchAll(/\{[\s\S]*?\}/g)];
  for (let i = allObjects.length - 1; i >= 0; i--) {
    try {
      return JSON.parse(allObjects[i][0]);
    } catch {
      continue;
    }
  }

  // All strategies failed
  throw new Error(`Could not extract JSON from response: ${text.substring(0, 200)}...`);
}

// Usage
const raw1 = 'Sure! Here is the result: {"sentiment": "positive", "score": 0.95}';
const parsed1 = extractJSON(raw1);
// { sentiment: 'positive', score: 0.95 }

const raw2 = '```json\n{"sentiment": "positive", "score": 0.95}\n```';
const parsed2 = extractJSON(raw2);
// { sentiment: 'positive', score: 0.95 }

More robust extraction with balanced brace matching

The regex approach above can fail with nested objects. Here is a more robust approach:

function extractJSONRobust(text: string): unknown {
  // Try direct parse first
  try {
    return JSON.parse(text);
  } catch {
    // Continue to extraction
  }

  // Remove markdown code fences
  const cleaned = text
    .replace(/```json\s*\n?/g, '')
    .replace(/```\s*\n?/g, '')
    .trim();

  try {
    return JSON.parse(cleaned);
  } catch {
    // Continue
  }

  // Find balanced braces
  const startIndex = cleaned.indexOf('{');
  if (startIndex === -1) {
    const arrayStart = cleaned.indexOf('[');
    if (arrayStart === -1) {
      throw new Error('No JSON object or array found in response');
    }
    return extractBalanced(cleaned, arrayStart, '[', ']');
  }

  return extractBalanced(cleaned, startIndex, '{', '}');
}

function extractBalanced(
  text: string,
  startIndex: number,
  openChar: string,
  closeChar: string,
): unknown {
  let depth = 0;
  let inString = false;
  let escaped = false;

  for (let i = startIndex; i < text.length; i++) {
    const char = text[i];

    if (escaped) {
      escaped = false;
      continue;
    }

    if (char === '\\') {
      escaped = true;
      continue;
    }

    if (char === '"') {
      inString = !inString;
      continue;
    }

    if (inString) continue;

    if (char === openChar) depth++;
    if (char === closeChar) depth--;

    if (depth === 0) {
      const jsonString = text.substring(startIndex, i + 1);
      return JSON.parse(jsonString);
    }
  }

  throw new Error('Unbalanced braces in JSON');
}

Integrating extraction into the validation pipeline

import { z } from 'zod';

async function validateAIOutput<T>(
  rawContent: string,
  schema: z.ZodSchema<T>,
): Promise<{ success: true; data: T } | { success: false; errors: string[] }> {
  // Step 1: Extract JSON (handles extra text)
  let parsed: unknown;
  try {
    parsed = extractJSON(rawContent);
  } catch (err) {
    return {
      success: false,
      errors: [`JSON extraction failed: ${(err as Error).message}`],
    };
  }

  // Step 2: Validate with Zod
  const result = schema.safeParse(parsed);
  if (result.success) {
    return { success: true, data: result.data };
  }

  return {
    success: false,
    errors: result.error.issues.map((i) => `${i.path.join('.')}: ${i.message}`),
  };
}

3. Type Coercion Strategies

AI models frequently return the right value in the wrong type. Here are targeted strategies for each case.

String "82" to number 82

import { z } from 'zod';

// Option 1: z.coerce (simplest)
const Score1 = z.coerce.number();
Score1.parse('82');   // 82
Score1.parse(82);     // 82

// Option 2: Union with transform (explicit)
const Score2 = z.union([
  z.number(),
  z.string().transform((val) => {
    const num = Number(val);
    if (isNaN(num)) throw new Error(`Cannot convert "${val}" to number`);
    return num;
  }),
]);

// Option 3: Preprocess (runs BEFORE schema validation)
const Score3 = z.preprocess(
  (val) => (typeof val === 'string' ? Number(val) : val),
  z.number().min(0).max(100),
);

String "true"/"false" to boolean

// z.coerce.boolean() is DANGEROUS — any non-empty string becomes true
// "false" → true (because "false" is truthy in JS)

// Safe approach:
const SafeBoolean = z.union([
  z.boolean(),
  z.literal('true').transform(() => true),
  z.literal('false').transform(() => false),
  z.literal('yes').transform(() => true),
  z.literal('no').transform(() => false),
  z.literal(1).transform(() => true),
  z.literal(0).transform(() => false),
]);

SafeBoolean.parse(true);     // true
SafeBoolean.parse('false');   // false (correct!)
SafeBoolean.parse('yes');     // true
SafeBoolean.parse(0);         // false

Confidence 95 to 0.95 (wrong scale)

const ConfidenceSchema = z.number().transform((val) => {
  // If the number is > 1, assume it's a percentage and convert
  if (val > 1 && val <= 100) {
    return val / 100;
  }
  return val;
}).pipe(z.number().min(0).max(1));

ConfidenceSchema.parse(0.95);  // 0.95
ConfidenceSchema.parse(95);    // 0.95
ConfidenceSchema.parse(0.5);   // 0.5
ConfidenceSchema.parse(50);    // 0.5
ConfidenceSchema.parse(150);   // ✗ (150/100 = 1.5 > 1)

Comma-separated string to array

const TagsSchema = z.union([
  z.array(z.string()),
  z.string().transform((val) =>
    val.split(',').map((t) => t.trim()).filter((t) => t.length > 0)
  ),
]);

TagsSchema.parse(['a', 'b', 'c']);              // ['a', 'b', 'c']
TagsSchema.parse('machine learning, nlp, ai');   // ['machine learning', 'nlp', 'ai']

Building a flexible AI response schema

const FlexibleAISchema = z.object({
  // Handle mixed-type confidence
  confidence: z.preprocess(
    (val) => {
      if (typeof val === 'string') return parseFloat(val);
      return val;
    },
    z.number().transform((n) => (n > 1 ? n / 100 : n)).pipe(z.number().min(0).max(1)),
  ),

  // Handle boolean as string
  is_reliable: z.preprocess(
    (val) => {
      if (val === 'true' || val === 'yes' || val === 1) return true;
      if (val === 'false' || val === 'no' || val === 0) return false;
      return val;
    },
    z.boolean(),
  ),

  // Handle tags as string or array
  categories: z.preprocess(
    (val) => {
      if (typeof val === 'string') return val.split(',').map((s) => s.trim());
      return val;
    },
    z.array(z.string()),
  ),
});

4. Default Values for Missing Fields

When the AI omits a field, you can provide sensible defaults instead of failing.

Simple defaults

const AnalysisSchema = z.object({
  sentiment: z.string(),
  confidence: z.number().default(0),
  language: z.string().default('unknown'),
  tags: z.array(z.string()).default([]),
  metadata: z.object({
    model: z.string().default('unknown'),
    version: z.string().default('1.0'),
  }).default({}), // entire nested object defaults to {}
});

// AI returns minimal response
const result = AnalysisSchema.parse({
  sentiment: 'positive',
  // everything else is missing
});

console.log(result);
// {
//   sentiment: 'positive',
//   confidence: 0,
//   language: 'unknown',
//   tags: [],
//   metadata: { model: 'unknown', version: '1.0' }
// }

Conditional defaults based on other fields

const ResponseSchema = z.object({
  answer: z.string(),
  confidence: z.number().min(0).max(1),
  source: z.string().optional(),
}).transform((data) => ({
  ...data,
  // If no source provided and confidence is high, mark as "model knowledge"
  source: data.source || (data.confidence > 0.9 ? 'model_knowledge' : 'unverified'),
}));

ResponseSchema.parse({ answer: 'Paris', confidence: 0.99 });
// { answer: 'Paris', confidence: 0.99, source: 'model_knowledge' }

ResponseSchema.parse({ answer: 'Maybe Lisbon', confidence: 0.4 });
// { answer: 'Maybe Lisbon', confidence: 0.4, source: 'unverified' }

5. Graceful Degradation vs Hard Failure

Not all validation failures deserve the same response. Design a strategy based on severity.

The degradation ladder

Level 1: FULL SUCCESS
  → All fields valid, all constraints met
  → Use the data as-is

Level 2: PARTIAL SUCCESS
  → Core fields valid, optional fields missing or invalid
  → Use validated fields, apply defaults for the rest

Level 3: RECOVERABLE FAILURE
  → JSON is valid, some required fields wrong
  → Attempt type coercion, extraction, or transformation
  → If that works, use the recovered data with a warning flag

Level 4: RETRY-WORTHY FAILURE
  → Response is structurally wrong but the model can fix it
  → Retry with error feedback (see 4.6.e)

Level 5: HARD FAILURE
  → Response is completely unusable
  → Return an error to the user or use a fallback

Implementation

import { z } from 'zod';

// Strict schema — what we ideally want
const StrictSchema = z.object({
  category: z.enum(['bug', 'feature', 'question']),
  severity: z.enum(['low', 'medium', 'high', 'critical']),
  summary: z.string().min(10),
  tags: z.array(z.string()).min(1),
  confidence: z.number().min(0).max(1),
});

// Lenient schema — minimum viable data
const LenientSchema = z.object({
  category: z.enum(['bug', 'feature', 'question']),
  severity: z.enum(['low', 'medium', 'high', 'critical']).default('medium'),
  summary: z.string().min(1), // shorter minimum
  tags: z.array(z.string()).default([]),
  confidence: z.number().min(0).max(1).default(0),
});

type Classification = z.infer<typeof StrictSchema>;

interface ClassificationResult {
  data: Classification;
  quality: 'full' | 'partial' | 'degraded';
  warnings: string[];
}

function classifyWithDegradation(rawContent: string): ClassificationResult | null {
  // Step 1: Extract JSON
  let parsed: unknown;
  try {
    parsed = extractJSON(rawContent);
  } catch {
    return null; // Hard failure — can't even extract JSON
  }

  // Step 2: Try strict validation
  const strict = StrictSchema.safeParse(parsed);
  if (strict.success) {
    return { data: strict.data, quality: 'full', warnings: [] };
  }

  // Step 3: Try lenient validation
  const lenient = LenientSchema.safeParse(parsed);
  if (lenient.success) {
    const warnings = strict.error.issues.map(
      (i) => `Degraded: ${i.path.join('.')}: ${i.message}`
    );
    return {
      data: lenient.data as Classification,
      quality: 'partial',
      warnings,
    };
  }

  // Step 4: Try with type coercion
  const CoercedSchema = z.object({
    category: z.string().toLowerCase().pipe(
      z.enum(['bug', 'feature', 'question'])
    ),
    severity: z.string().toLowerCase().pipe(
      z.enum(['low', 'medium', 'high', 'critical'])
    ).default('medium'),
    summary: z.string().default('No summary provided'),
    tags: z.union([
      z.array(z.string()),
      z.string().transform((s) => s.split(',').map((t) => t.trim())),
    ]).default([]),
    confidence: z.coerce.number().min(0).max(1).default(0),
  });

  const coerced = CoercedSchema.safeParse(parsed);
  if (coerced.success) {
    return {
      data: coerced.data as Classification,
      quality: 'degraded',
      warnings: [
        ...strict.error.issues.map((i) => `Original: ${i.path.join('.')}: ${i.message}`),
        'Data was recovered via type coercion',
      ],
    };
  }

  return null; // All recovery strategies failed
}

6. Logging Invalid Responses for Debugging

Invalid AI responses are gold for debugging and improving your system. Log them properly.

What to log

interface AIValidationLog {
  // Identity
  request_id: string;
  timestamp: string;

  // Input context
  prompt_hash: string; // hash of system prompt (don't log full prompt — too large)
  model: string;
  temperature: number;

  // Raw output
  raw_response: string; // FULL raw response — essential for debugging
  raw_response_length: number;

  // Validation outcome
  json_parseable: boolean;
  json_parse_error: string | null;
  zod_valid: boolean;
  zod_errors: Array<{
    path: string;
    code: string;
    message: string;
    received?: string;
    expected?: string;
  }>;

  // Recovery
  recovery_attempted: boolean;
  recovery_strategy: string | null; // 'extraction' | 'coercion' | 'defaults' | 'retry'
  recovery_successful: boolean;

  // Performance
  api_latency_ms: number;
  validation_latency_ms: number;
  tokens_used: { input: number; output: number };
}

Aggregation queries you should build

1. What % of AI responses fail validation?
   → SELECT COUNT(CASE WHEN zod_valid = false) / COUNT(*) FROM ai_validation_logs

2. Which fields fail most often?
   → SELECT path, COUNT(*) FROM ai_validation_errors GROUP BY path ORDER BY COUNT(*) DESC

3. What's the most common error type?
   → SELECT code, COUNT(*) FROM ai_validation_errors GROUP BY code ORDER BY COUNT(*) DESC

4. Does a specific model version have higher failure rates?
   → SELECT model, COUNT(CASE WHEN zod_valid = false) / COUNT(*) as fail_rate
     FROM ai_validation_logs GROUP BY model

5. Are failure rates increasing over time?
   → SELECT DATE(timestamp), fail_rate FROM ai_validation_logs GROUP BY DATE(timestamp)

Simple in-memory log aggregator

class ValidationMetrics {
  private logs: AIValidationLog[] = [];

  record(log: AIValidationLog): void {
    this.logs.push(log);

    // Alert if failure rate spikes
    const recentLogs = this.logs.slice(-100);
    const failRate = recentLogs.filter((l) => !l.zod_valid).length / recentLogs.length;

    if (failRate > 0.1 && recentLogs.length >= 50) {
      console.warn(
        `[ALERT] AI validation failure rate is ${(failRate * 100).toFixed(1)}% ` +
        `(last ${recentLogs.length} requests)`
      );
    }
  }

  getTopFailingFields(limit = 10): Array<{ path: string; count: number }> {
    const counts = new Map<string, number>();
    for (const log of this.logs) {
      for (const error of log.zod_errors) {
        counts.set(error.path, (counts.get(error.path) || 0) + 1);
      }
    }
    return [...counts.entries()]
      .map(([path, count]) => ({ path, count }))
      .sort((a, b) => b.count - a.count)
      .slice(0, limit);
  }

  getFailureRate(): number {
    if (this.logs.length === 0) return 0;
    return this.logs.filter((l) => !l.zod_valid).length / this.logs.length;
  }

  getSummary(): string {
    return [
      `Total requests: ${this.logs.length}`,
      `Failure rate: ${(this.getFailureRate() * 100).toFixed(1)}%`,
      `JSON parse failures: ${this.logs.filter((l) => !l.json_parseable).length}`,
      `Schema validation failures: ${this.logs.filter((l) => l.json_parseable && !l.zod_valid).length}`,
      `Recovery attempts: ${this.logs.filter((l) => l.recovery_attempted).length}`,
      `Recovery successes: ${this.logs.filter((l) => l.recovery_successful).length}`,
      `Top failing fields: ${JSON.stringify(this.getTopFailingFields(5))}`,
    ].join('\n');
  }
}

7. Handling Truncated JSON

When the AI response hits max_tokens, the JSON may be cut off mid-stream:

{"summary": "This is a long analysis of the document that covers multiple topics including

Detection and repair strategies

function repairTruncatedJSON(text: string): unknown {
  // Try normal parse first
  try {
    return JSON.parse(text);
  } catch {
    // Continue to repair
  }

  // Extract JSON portion
  let json = text;
  const startIdx = json.indexOf('{');
  if (startIdx > 0) {
    json = json.substring(startIdx);
  }

  // Count open braces/brackets
  let openBraces = 0;
  let openBrackets = 0;
  let inString = false;
  let escaped = false;

  for (const char of json) {
    if (escaped) { escaped = false; continue; }
    if (char === '\\') { escaped = true; continue; }
    if (char === '"') { inString = !inString; continue; }
    if (inString) continue;
    if (char === '{') openBraces++;
    if (char === '}') openBraces--;
    if (char === '[') openBrackets++;
    if (char === ']') openBrackets--;
  }

  // If we're inside a string, close it
  if (inString) {
    json += '"';
  }

  // Close open brackets and braces
  json += ']'.repeat(Math.max(0, openBrackets));
  json += '}'.repeat(Math.max(0, openBraces));

  try {
    return JSON.parse(json);
  } catch {
    throw new Error('Could not repair truncated JSON');
  }
}

// Example: truncated response
const truncated = '{"summary": "The market shows strong growth in Q4 with';
try {
  const repaired = repairTruncatedJSON(truncated);
  console.log(repaired);
  // { summary: 'The market shows strong growth in Q4 with' }
} catch {
  console.error('Repair failed');
}

Warning: Repaired JSON may have truncated values. Always validate with Zod after repair, and flag the result as potentially incomplete.


8. Putting It All Together: The Defense-in-Depth Approach

import { z } from 'zod';

const AnalysisSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  summary: z.string().min(10),
  topics: z.array(z.string()).min(1),
});

type Analysis = z.infer<typeof AnalysisSchema>;

interface DefenseResult {
  data: Analysis | null;
  stage_reached: 'direct_parse' | 'extraction' | 'coercion' | 'repair' | 'failed';
  warnings: string[];
}

function defendedValidation(rawContent: string): DefenseResult {
  const warnings: string[] = [];

  // Layer 1: Direct JSON parse + strict validation
  try {
    const direct = JSON.parse(rawContent);
    const result = AnalysisSchema.safeParse(direct);
    if (result.success) {
      return { data: result.data, stage_reached: 'direct_parse', warnings };
    }
  } catch {
    warnings.push('Direct JSON parse failed, attempting extraction');
  }

  // Layer 2: Extract JSON from text
  let extracted: unknown;
  try {
    extracted = extractJSON(rawContent);
    const result = AnalysisSchema.safeParse(extracted);
    if (result.success) {
      warnings.push('JSON was extracted from surrounding text');
      return { data: result.data, stage_reached: 'extraction', warnings };
    }
  } catch {
    warnings.push('JSON extraction failed');
  }

  // Layer 3: Type coercion
  if (extracted) {
    const CoercedSchema = z.object({
      sentiment: z.string().toLowerCase().pipe(
        z.enum(['positive', 'negative', 'neutral'])
      ),
      confidence: z.coerce.number().transform((n) => n > 1 ? n / 100 : n)
        .pipe(z.number().min(0).max(1)),
      summary: z.coerce.string().pipe(z.string().min(1)),
      topics: z.union([
        z.array(z.string()),
        z.string().transform((s) => s.split(',').map((t) => t.trim())),
      ]).pipe(z.array(z.string()).min(1)),
    });

    const coerced = CoercedSchema.safeParse(extracted);
    if (coerced.success) {
      warnings.push('Data required type coercion');
      return {
        data: coerced.data as Analysis,
        stage_reached: 'coercion',
        warnings,
      };
    }
  }

  // Layer 4: Truncation repair
  try {
    const repaired = repairTruncatedJSON(rawContent);
    const result = AnalysisSchema.safeParse(repaired);
    if (result.success) {
      warnings.push('JSON appeared truncated and was repaired');
      return { data: result.data, stage_reached: 'repair', warnings };
    }
  } catch {
    warnings.push('Truncation repair failed');
  }

  // All layers failed
  return { data: null, stage_reached: 'failed', warnings };
}

9. Key Takeaways

  1. AI failures are predictable — extra text around JSON, wrong types, missing fields, and truncation cover 95% of cases. Build handlers for each.
  2. JSON extraction (removing surrounding text, markdown fences) should be your first recovery step — it's the most common failure mode.
  3. Type coercion (string "82" to number 82, string "true" to boolean true) handles the second most common failure mode. Use z.coerce or z.preprocess, but be careful with edge cases.
  4. Default values provide resilience for optional fields, but never default required business-critical fields silently — flag them as degraded.
  5. Graceful degradation is a ladder: try strict validation first, then lenient, then coercion, then defaults. Each step down should increment a warning counter.
  6. Log every failure with the full raw response, error details, and recovery outcome. This data drives prompt improvement and model evaluation.
  7. Truncation repair is a last resort. Repaired JSON may have truncated values that pass type checks but contain incomplete data.

Explain-It Challenge

  1. An AI returns Sure! Here's the analysis:\n\n```json\n{"score": 85}\n```\n\nHope that helps!. Write the extraction code that handles this and explain why simple JSON.parse() fails.
  2. Your confidence field keeps getting values like 85, 92, 7 (percentages) instead of 0.85, 0.92, 0.07 (decimals). Design a Zod schema that normalizes both formats and explain the edge case where a value of 1 is ambiguous.
  3. Your team debates whether to "fail fast" (reject any invalid AI response) or "degrade gracefully" (salvage what you can). List three scenarios where each approach is correct.

Navigation: ← 4.6.c Verifying AI Responses · 4.6.e — Retry Strategies →