Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod

4.6.c — Verifying AI Responses

In one sentence: The validation pipeline — API response string to JSON.parse() to Zod schema validation — is the core pattern for turning untrusted AI output into clean, typed data, using safeParse for graceful handling, transformations for data normalization, and coercion for flexible type conversion.

Navigation: ← 4.6.b Defining Validation Schemas · 4.6.d — Handling Invalid Responses →

1. The Validation Pipeline

Every AI response goes through the same pipeline:

┌──────────────────────────────────────────────────────────────────┐
│                    AI RESPONSE VALIDATION PIPELINE                │
│                                                                  │
│  Step 1: Raw API Response                                        │
│  ────────────────────────                                        │
│  response.choices[0].message.content                             │
│  → A string. Could be anything.                                  │
│                                                                  │
│  Step 2: JSON.parse()                                            │
│  ────────────────────                                            │
│  → Converts string to a JavaScript object                        │
│  → Can throw SyntaxError if the string is not valid JSON         │
│                                                                  │
│  Step 3: Zod Validation                                          │
│  ────────────────────                                            │
│  schema.parse(data) or schema.safeParse(data)                    │
│  → Validates structure, types, constraints                       │
│  → Returns typed data or structured error                        │
│                                                                  │
│  Result: Fully validated, fully typed data                       │
│  ─────── OR a clear, actionable error                            │
└──────────────────────────────────────────────────────────────────┘

Here is the pipeline in code:

import { z } from 'zod';

const ResponseSchema = z.object({
  answer: z.string(),
  confidence: z.number().min(0).max(1),
  sources: z.array(z.string()),
});

type ValidatedResponse = z.infer<typeof ResponseSchema>;

function validateAIResponse(rawContent: string): ValidatedResponse {
  // Step 1: We already have the raw string from the API

  // Step 2: Parse JSON
  let parsed: unknown;
  try {
    parsed = JSON.parse(rawContent);
  } catch (error) {
    throw new Error(`AI returned invalid JSON: ${rawContent.substring(0, 200)}`);
  }

  // Step 3: Validate with Zod
  const validated = ResponseSchema.parse(parsed);
  return validated;
}

2. schema.parse() vs schema.safeParse()

Zod gives you two ways to validate data. The choice between them determines your error-handling strategy.

schema.parse() — throws on failure

const Schema = z.object({
  name: z.string(),
  age: z.number(),
});

// Successful parse — returns the validated data
const result = Schema.parse({ name: 'Alice', age: 30 });
// result = { name: 'Alice', age: 30 }

// Failed parse — THROWS a ZodError
try {
  Schema.parse({ name: 'Alice', age: 'thirty' });
} catch (error) {
  if (error instanceof z.ZodError) {
    console.log(error.issues);
    // [{ code: 'invalid_type', expected: 'number', received: 'string',
    //    path: ['age'], message: 'Expected number, received string' }]
  }
}

When to use parse():

When invalid data should stop execution
In pipelines where you want errors to bubble up
When you have a try/catch at a higher level

schema.safeParse() — never throws

const Schema = z.object({
  name: z.string(),
  age: z.number(),
});

// Successful parse
const success = Schema.safeParse({ name: 'Alice', age: 30 });
// { success: true, data: { name: 'Alice', age: 30 } }

if (success.success) {
  console.log(success.data); // Fully typed
}

// Failed parse — does NOT throw
const failure = Schema.safeParse({ name: 'Alice', age: 'thirty' });
// { success: false, error: ZodError }

if (!failure.success) {
  console.log(failure.error.issues);
  // [{ code: 'invalid_type', expected: 'number', received: 'string',
  //    path: ['age'], message: 'Expected number, received string' }]
}

When to use safeParse():

When you want to handle errors without try/catch
When you need to inspect errors before deciding what to do
In validation loops (retry with AI)
When partial validation matters

The pattern for AI validation (recommended)

async function callAIWithValidation<T>(
  schema: z.ZodSchema<T>,
  apiCall: () => Promise<string>,
): Promise<{ success: true; data: T } | { success: false; error: string }> {
  const rawContent = await apiCall();

  // Step 1: Try JSON parse
  let parsed: unknown;
  try {
    parsed = JSON.parse(rawContent);
  } catch {
    return {
      success: false,
      error: `Invalid JSON: ${rawContent.substring(0, 100)}...`,
    };
  }

  // Step 2: Validate with safeParse
  const result = schema.safeParse(parsed);

  if (result.success) {
    return { success: true, data: result.data };
  }

  return {
    success: false,
    error: result.error.issues
      .map((i) => `${i.path.join('.')}: ${i.message}`)
      .join('; '),
  };
}

3. Handling ZodError: Accessing error.issues

When validation fails, Zod provides a rich error object with detailed information about what went wrong.

ZodError structure

import { z, ZodError } from 'zod';

const Schema = z.object({
  name: z.string(),
  age: z.number().min(0).max(150),
  email: z.string().email(),
  tags: z.array(z.string()).min(1),
});

const result = Schema.safeParse({
  name: 42,            // wrong type
  age: -5,             // below min
  email: 'not-email',  // invalid format
  tags: [],            // below min length
});

if (!result.success) {
  console.log(result.error.issues);
  // [
  //   {
  //     code: 'invalid_type',
  //     expected: 'string',
  //     received: 'number',
  //     path: ['name'],
  //     message: 'Expected string, received number'
  //   },
  //   {
  //     code: 'too_small',
  //     minimum: 0,
  //     type: 'number',
  //     inclusive: true,
  //     exact: false,
  //     path: ['age'],
  //     message: 'Number must be greater than or equal to 0'
  //   },
  //   {
  //     code: 'invalid_string',
  //     validation: 'email',
  //     path: ['email'],
  //     message: 'Invalid email'
  //   },
  //   {
  //     code: 'too_small',
  //     minimum: 1,
  //     type: 'array',
  //     inclusive: true,
  //     exact: false,
  //     path: ['tags'],
  //     message: 'Array must contain at least 1 element(s)'
  //   }
  // ]
}

Key properties of each issue

Property	Description
`code`	The type of error (invalid_type, too_small, too_big, invalid_string, custom, etc.)
`path`	Array of keys/indices showing where the error is (e.g., `['analysis', 'sentiment', 0]`)
`message`	Human-readable error message
`expected`	What was expected (for type errors)
`received`	What was actually received (for type errors)

Useful error methods

if (!result.success) {
  // Flat list of all errors
  console.log(result.error.issues);

  // Formatted error object (grouped by path)
  console.log(result.error.format());
  // {
  //   name: { _errors: ['Expected string, received number'] },
  //   age: { _errors: ['Number must be greater than or equal to 0'] },
  //   email: { _errors: ['Invalid email'] },
  //   tags: { _errors: ['Array must contain at least 1 element(s)'] }
  // }

  // Flattened (for simpler error display)
  console.log(result.error.flatten());
  // {
  //   formErrors: [],
  //   fieldErrors: {
  //     name: ['Expected string, received number'],
  //     age: ['Number must be greater than or equal to 0'],
  //     email: ['Invalid email'],
  //     tags: ['Array must contain at least 1 element(s)']
  //   }
  // }
}

4. Formatting Validation Errors for Logging

In production, you need to log validation errors in a structured, searchable format.

interface ValidationLog {
  timestamp: string;
  request_id: string;
  validation_passed: boolean;
  error_count: number;
  errors: Array<{
    path: string;
    code: string;
    message: string;
    expected?: string;
    received?: string;
  }>;
  raw_response_preview: string;
}

function logValidationResult(
  requestId: string,
  rawResponse: string,
  result: z.SafeParseReturnType<unknown, unknown>,
): ValidationLog {
  const log: ValidationLog = {
    timestamp: new Date().toISOString(),
    request_id: requestId,
    validation_passed: result.success,
    error_count: result.success ? 0 : result.error.issues.length,
    errors: result.success
      ? []
      : result.error.issues.map((issue) => ({
          path: issue.path.join('.') || '(root)',
          code: issue.code,
          message: issue.message,
          expected: 'expected' in issue ? String(issue.expected) : undefined,
          received: 'received' in issue ? String(issue.received) : undefined,
        })),
    raw_response_preview: rawResponse.substring(0, 500),
  };

  if (!result.success) {
    console.error('[AI_VALIDATION_FAILED]', JSON.stringify(log));
  } else {
    console.info('[AI_VALIDATION_PASSED]', JSON.stringify({
      timestamp: log.timestamp,
      request_id: log.request_id,
    }));
  }

  return log;
}

Building error messages for the AI (retry context)

function formatErrorsForAI(error: z.ZodError): string {
  const errorLines = error.issues.map((issue) => {
    const path = issue.path.join('.') || 'root';
    return `- Field "${path}": ${issue.message}`;
  });

  return [
    'Your previous response had validation errors:',
    ...errorLines,
    '',
    'Please fix these issues and respond again with valid JSON.',
  ].join('\n');
}

// Example output:
// Your previous response had validation errors:
// - Field "sentiment": Invalid enum value. Expected 'positive' | 'negative' | 'neutral', received 'somewhat positive'
// - Field "confidence": Number must be less than or equal to 1
//
// Please fix these issues and respond again with valid JSON.

5. Partial Validation: When Some Fields Are Valid

Sometimes an AI response is mostly correct but has one bad field. You might want to salvage the good parts.

Strategy 1: Make failing fields optional

const StrictSchema = z.object({
  title: z.string(),
  body: z.string(),
  rating: z.number().min(1).max(5),
  tags: z.array(z.string()),
});

const LenientSchema = z.object({
  title: z.string(),
  body: z.string(),
  rating: z.number().min(1).max(5).optional(), // allow missing
  tags: z.array(z.string()).default([]),        // default to empty
});

// If strict validation fails, try lenient
function validateWithFallback(data: unknown) {
  const strict = StrictSchema.safeParse(data);
  if (strict.success) return { data: strict.data, quality: 'full' as const };

  const lenient = LenientSchema.safeParse(data);
  if (lenient.success) return { data: lenient.data, quality: 'partial' as const };

  return { data: null, quality: 'failed' as const };
}

Strategy 2: Validate fields individually

function extractValidFields(data: unknown) {
  if (typeof data !== 'object' || data === null) {
    return { validFields: {}, invalidFields: ['(root): not an object'] };
  }

  const obj = data as Record<string, unknown>;
  const validFields: Record<string, unknown> = {};
  const invalidFields: string[] = [];

  // Try each field independently
  const fieldSchemas = {
    title: z.string(),
    body: z.string(),
    rating: z.number().min(1).max(5),
    tags: z.array(z.string()),
  };

  for (const [key, schema] of Object.entries(fieldSchemas)) {
    const result = schema.safeParse(obj[key]);
    if (result.success) {
      validFields[key] = result.data;
    } else {
      invalidFields.push(`${key}: ${result.error.issues[0].message}`);
    }
  }

  return { validFields, invalidFields };
}

// Usage
const aiOutput = { title: 'Great', body: 'Nice product', rating: 'five', tags: ['review'] };
const { validFields, invalidFields } = extractValidFields(aiOutput);
// validFields: { title: 'Great', body: 'Nice product', tags: ['review'] }
// invalidFields: ['rating: Expected number, received string']

6. Transformations with .transform()

.transform() lets you validate data AND convert it in one step. This is invaluable when AI output needs normalization.

Basic transformation

// AI returns a string "95" but you need a number
const PercentageSchema = z.string()
  .transform((val) => parseFloat(val))
  .pipe(z.number().min(0).max(100));

PercentageSchema.parse('95');      // returns 95 (number)
PercentageSchema.parse('150');     // ✗ Number must be less than or equal to 100
PercentageSchema.parse('abc');     // ✗ NaN fails the number check

Transform AI date strings

const DateSchema = z.string()
  .transform((val) => new Date(val))
  .refine((date) => !isNaN(date.getTime()), { message: 'Invalid date' });

DateSchema.parse('2025-01-15');                  // Date object
DateSchema.parse('January 15, 2025');            // Date object
DateSchema.parse('not a date');                  // ✗ Invalid date

type ParsedDate = z.infer<typeof DateSchema>;    // Date

Normalize AI response fields

const NormalizedResponseSchema = z.object({
  // AI might return mixed-case sentiment
  sentiment: z.string()
    .toLowerCase()
    .transform((val) => {
      // Normalize common AI variants
      const mapping: Record<string, string> = {
        'pos': 'positive',
        'neg': 'negative',
        'neu': 'neutral',
        'positive': 'positive',
        'negative': 'negative',
        'neutral': 'neutral',
      };
      return mapping[val] || val;
    })
    .pipe(z.enum(['positive', 'negative', 'neutral'])),

  // AI might return confidence as "0.95" or "95%"
  confidence: z.union([
    z.number(),
    z.string().transform((val) => {
      if (val.endsWith('%')) {
        return parseFloat(val) / 100;
      }
      return parseFloat(val);
    }),
  ]).pipe(z.number().min(0).max(1)),

  // AI might return tags as comma-separated string or array
  tags: z.union([
    z.array(z.string()),
    z.string().transform((val) => val.split(',').map((t) => t.trim())),
  ]),
});

// All of these work:
NormalizedResponseSchema.parse({
  sentiment: 'POSITIVE',
  confidence: '95%',
  tags: 'ai, machine learning, nlp',
});
// → { sentiment: 'positive', confidence: 0.95, tags: ['ai', 'machine learning', 'nlp'] }

NormalizedResponseSchema.parse({
  sentiment: 'pos',
  confidence: 0.87,
  tags: ['tag1', 'tag2'],
});
// → { sentiment: 'positive', confidence: 0.87, tags: ['tag1', 'tag2'] }

Transform on objects

const AIOutputSchema = z.object({
  first_name: z.string(),
  last_name: z.string(),
  birth_year: z.number(),
}).transform((data) => ({
  fullName: `${data.first_name} ${data.last_name}`,
  age: new Date().getFullYear() - data.birth_year,
}));

type Transformed = z.infer<typeof AIOutputSchema>;
// { fullName: string; age: number }

AIOutputSchema.parse({ first_name: 'Alice', last_name: 'Smith', birth_year: 1990 });
// { fullName: 'Alice Smith', age: 35 }

7. Coercion with z.coerce

z.coerce automatically converts values before validation. This is different from .transform() — coercion happens before the schema checks.

// z.coerce.number() — calls Number() on the input first
const CoercedNumber = z.coerce.number();
CoercedNumber.parse('42');        // 42 (number)
CoercedNumber.parse(42);          // 42 (number)
CoercedNumber.parse(true);        // 1 (number)
CoercedNumber.parse('');           // 0 (number — this might surprise you)

// z.coerce.string() — calls String() on the input first
const CoercedString = z.coerce.string();
CoercedString.parse(42);          // '42' (string)
CoercedString.parse(true);        // 'true' (string)
CoercedString.parse(null);        // 'null' (string)

// z.coerce.boolean() — calls Boolean() on the input first
const CoercedBool = z.coerce.boolean();
CoercedBool.parse('true');        // true
CoercedBool.parse('');            // false
CoercedBool.parse(1);             // true
CoercedBool.parse(0);             // false

// z.coerce.date() — calls new Date() on the input first
const CoercedDate = z.coerce.date();
CoercedDate.parse('2025-01-15');  // Date object
CoercedDate.parse(1705276800000); // Date object (from timestamp)

Coercion in AI response schemas

// AI models sometimes return numbers as strings
const FlexibleScoreSchema = z.object({
  label: z.string(),
  score: z.coerce.number().min(0).max(100),
  is_reliable: z.coerce.boolean(),
  timestamp: z.coerce.date(),
});

FlexibleScoreSchema.parse({
  label: 'sentiment',
  score: '87',                 // string → 87
  is_reliable: 'true',        // string → true
  timestamp: '2025-01-15',    // string → Date
});
// ✓ All coerced to correct types

Coercion gotchas

// WATCH OUT: z.coerce.number() on empty string gives 0
z.coerce.number().parse('');     // 0 — probably not what you want

// Safer: use transform with explicit handling
const SafeNumber = z.string()
  .refine((val) => val.trim() !== '', { message: 'Cannot be empty' })
  .transform((val) => Number(val))
  .refine((val) => !isNaN(val), { message: 'Not a valid number' });

// WATCH OUT: z.coerce.boolean() treats any non-empty string as true
z.coerce.boolean().parse('false'); // true (!) — 'false' is a truthy string

// Safer: explicit boolean parsing
const SafeBool = z.union([
  z.boolean(),
  z.string().transform((val) => {
    if (val === 'true') return true;
    if (val === 'false') return false;
    throw new Error(`Invalid boolean string: ${val}`);
  }),
]);

8. Complete Pipeline Examples with Real AI Responses

Example 1: Sentiment analysis

import { z } from 'zod';
import OpenAI from 'openai';

const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral', 'mixed']),
  confidence: z.number().min(0).max(1),
  key_phrases: z.array(z.object({
    text: z.string(),
    sentiment: z.enum(['positive', 'negative', 'neutral']),
  })),
  summary: z.string().min(10),
});

type SentimentResult = z.infer<typeof SentimentSchema>;

async function analyzeSentiment(text: string): Promise<SentimentResult> {
  const client = new OpenAI();

  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    temperature: 0,
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: `Analyze text sentiment. Return JSON:
{
  "sentiment": "positive"|"negative"|"neutral"|"mixed",
  "confidence": 0.0 to 1.0,
  "key_phrases": [{"text": "phrase", "sentiment": "positive"|"negative"|"neutral"}],
  "summary": "10+ char explanation"
}`,
      },
      { role: 'user', content: text },
    ],
  });

  const rawContent = response.choices[0].message.content;
  if (!rawContent) throw new Error('Empty AI response');

  // Parse and validate
  const parsed = JSON.parse(rawContent);
  return SentimentSchema.parse(parsed);
}

Example 2: Entity extraction with flexible types

const EntitySchema = z.object({
  entities: z.array(z.object({
    text: z.string(),
    type: z.enum(['person', 'organization', 'location', 'date', 'money', 'other']),
    confidence: z.coerce.number().min(0).max(1), // coerce string → number
    context: z.string().optional(),
  })),
  total_entities: z.coerce.number().int().nonnegative(),
  language: z.string().default('en'),
});

async function extractEntities(text: string) {
  const client = new OpenAI();

  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    temperature: 0,
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: `Extract named entities from the text. Return JSON with:
- entities: array of {text, type, confidence, context}
- total_entities: count
- language: detected language code`,
      },
      { role: 'user', content: text },
    ],
  });

  const raw = response.choices[0].message.content!;
  const parsed = JSON.parse(raw);
  const result = EntitySchema.safeParse(parsed);

  if (!result.success) {
    console.error('Entity extraction validation failed:', result.error.flatten());
    throw new Error('Invalid entity extraction response');
  }

  return result.data;
}

Example 3: Multi-step validation with logging

const ClassificationSchema = z.object({
  category: z.enum(['technical', 'billing', 'general', 'urgent']),
  sub_category: z.string(),
  priority: z.number().int().min(1).max(5),
  auto_response: z.string().optional(),
  requires_human: z.boolean(),
  confidence: z.number().min(0).max(1),
});

type Classification = z.infer<typeof ClassificationSchema>;

interface ClassifyResult {
  data: Classification | null;
  raw_response: string;
  validation_errors: string[];
  json_parse_error: string | null;
}

async function classifyTicket(ticket: string): Promise<ClassifyResult> {
  const client = new OpenAI();

  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    temperature: 0,
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: `Classify support ticket. Return JSON:
{
  "category": "technical"|"billing"|"general"|"urgent",
  "sub_category": "specific category",
  "priority": 1-5,
  "auto_response": "optional suggested response",
  "requires_human": true/false,
  "confidence": 0.0-1.0
}`,
      },
      { role: 'user', content: ticket },
    ],
  });

  const rawContent = response.choices[0].message.content || '';

  // Step 1: JSON parse
  let parsed: unknown;
  try {
    parsed = JSON.parse(rawContent);
  } catch (err) {
    return {
      data: null,
      raw_response: rawContent,
      validation_errors: [],
      json_parse_error: `JSON parse failed: ${(err as Error).message}`,
    };
  }

  // Step 2: Zod validation
  const result = ClassificationSchema.safeParse(parsed);

  if (result.success) {
    return {
      data: result.data,
      raw_response: rawContent,
      validation_errors: [],
      json_parse_error: null,
    };
  }

  return {
    data: null,
    raw_response: rawContent,
    validation_errors: result.error.issues.map(
      (i) => `${i.path.join('.')}: ${i.message}`
    ),
    json_parse_error: null,
  };
}

// Usage with full observability
const result = await classifyTicket('My payment failed twice and I need help urgently!');

if (result.data) {
  console.log(`Category: ${result.data.category}`);
  console.log(`Priority: ${result.data.priority}`);
  console.log(`Needs human: ${result.data.requires_human}`);
} else {
  console.error('Classification failed');
  if (result.json_parse_error) console.error(result.json_parse_error);
  if (result.validation_errors.length) console.error(result.validation_errors);
  console.error('Raw response:', result.raw_response);
}

9. Async Validation with .parseAsync() and .safeParseAsync()

If your refinements or transforms need to perform async operations (database lookups, API calls), use the async variants.

const UserInputSchema = z.object({
  username: z.string().min(3),
  email: z.string().email(),
}).refine(
  async (data) => {
    // Check if username is taken (async operation)
    const exists = await checkUsernameExists(data.username);
    return !exists;
  },
  { message: 'Username already taken', path: ['username'] }
);

// Must use parseAsync / safeParseAsync
const result = await UserInputSchema.safeParseAsync({
  username: 'alice',
  email: 'alice@example.com',
});

Note: For pure AI response validation, you typically do not need async validation since you are just checking data shapes. Async validation is more relevant when combining AI validation with database checks.

10. Key Takeaways

The validation pipeline is always: raw string -> JSON.parse() -> Zod validate. Handle failures at each step independently.
safeParse() is preferred for AI validation because it lets you inspect errors without try/catch, which is essential for retry logic and logging.
ZodError.issues gives you structured error data with path, code, and message — use these for logging and for feeding errors back to the AI.
Partial validation lets you salvage good fields from mostly-correct AI responses rather than discarding everything.
.transform() normalizes AI output in the validation step — convert formats, rename fields, compute derived values.
z.coerce is a quick way to handle AI models that return numbers as strings, but watch out for edge cases (empty strings, 'false' string).
Log everything — validation failures are your best signal for prompt improvement and model evaluation.

Explain-It Challenge

Your team uses schema.parse() everywhere. A colleague suggests switching to schema.safeParse(). What are the trade-offs? When would you keep parse()?
An AI returns {"score": "87", "label": "good"} but your schema expects score to be a number. Show three different ways to handle this with Zod (coerce, transform, union).
Design a validation logging system that would let you answer: "What percentage of AI responses fail validation, and which fields fail most often?"

Navigation: ← 4.6.b Defining Validation Schemas · 4.6.d — Handling Invalid Responses →