Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod

4.6.a — Introduction to Zod

In one sentence: Zod is a TypeScript-first schema validation library that lets you define the exact shape of data you expect, validate it at runtime, and automatically infer TypeScript types — making it the perfect tool for validating unpredictable AI output.

Navigation: ← 4.6 Overview · 4.6.b — Defining Validation Schemas →


1. What Is Zod?

Zod is a schema declaration and validation library for JavaScript and TypeScript. You describe the shape of your data using Zod's API, and then you can validate any value against that schema at runtime.

import { z } from 'zod';

// Define a schema — "I expect an object with these fields"
const UserSchema = z.object({
  name: z.string(),
  age: z.number(),
  email: z.string().email(),
});

// Validate data against the schema
const result = UserSchema.parse({
  name: 'Alice',
  age: 30,
  email: 'alice@example.com',
});
// result is now typed as { name: string; age: number; email: string }

// This throws a ZodError:
UserSchema.parse({
  name: 'Alice',
  age: 'thirty', // wrong type
  email: 'not-an-email', // invalid format
});

Key characteristics of Zod:

  • Zero dependencies — the library has no external dependencies
  • TypeScript-first — designed from the ground up for TypeScript, with full type inference
  • Runtime validation — validates data at runtime, not just compile time
  • Composable — schemas can be nested, combined, and extended
  • Small bundle — approximately 13KB minified + gzipped

2. Why Zod for AI Validation?

When you call an LLM API, you get back a string. Even when you ask for JSON, the model might return:

  • Valid JSON with the wrong shape
  • JSON with missing fields
  • JSON with fields of the wrong type
  • Valid JSON wrapped in explanation text
  • Completely invalid JSON
  • An apology instead of data

TypeScript types alone cannot help you. TypeScript types are erased at compile time — they don't exist at runtime. When you receive data from an API (any API, especially an AI API), you need runtime validation.

// TypeScript type — compile-time only, gone at runtime
type AIResponse = {
  sentiment: 'positive' | 'negative' | 'neutral';
  confidence: number;
  summary: string;
};

// This compiles fine but gives NO runtime protection:
const response: AIResponse = JSON.parse(aiOutput);
// If aiOutput is malformed, your app crashes somewhere downstream
// with a confusing error like "Cannot read property 'toFixed' of undefined"

// Zod schema — exists at RUNTIME, validates actual data
const AIResponseSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  summary: z.string().min(1),
});

// This catches problems immediately with clear error messages:
const validated = AIResponseSchema.parse(JSON.parse(aiOutput));
// ZodError: "Expected 'positive' | 'negative' | 'neutral', received 'somewhat positive'"

The core argument for Zod in AI systems

Traditional API:  Server schema is known → TypeScript types are usually sufficient
AI API:           Output is probabilistic → Runtime validation is ESSENTIAL

Without Zod:
  AI response → JSON.parse() → Hope for the best → Crash at line 247

With Zod:
  AI response → JSON.parse() → schema.safeParse() → Validated data OR clear error

3. Installation

npm install zod

Zod requires TypeScript 4.5+ (for best experience) but also works with plain JavaScript.

// TypeScript
import { z } from 'zod';

// CommonJS (Node.js without ESM)
const { z } = require('zod');

Zod has no peer dependencies, no native modules, and no build step. It works in Node.js, browsers, Deno, Bun, and edge runtimes.


4. Basic Schemas: Primitive Types

Every Zod schema starts with a primitive type validator. These are the building blocks.

String

const stringSchema = z.string();

stringSchema.parse('hello');     // ✓ returns 'hello'
stringSchema.parse(42);          // ✗ throws ZodError
stringSchema.parse(undefined);   // ✗ throws ZodError
stringSchema.parse(null);        // ✗ throws ZodError
stringSchema.parse('');          // ✓ returns '' (empty string is still a string)

Number

const numberSchema = z.number();

numberSchema.parse(42);          // ✓ returns 42
numberSchema.parse(3.14);        // ✓ returns 3.14
numberSchema.parse('42');        // ✗ throws ZodError (string, not number)
numberSchema.parse(NaN);         // ✓ returns NaN (NaN is technically a number)
numberSchema.parse(Infinity);    // ✓ returns Infinity

Boolean

const boolSchema = z.boolean();

boolSchema.parse(true);          // ✓ returns true
boolSchema.parse(false);         // ✓ returns false
boolSchema.parse('true');        // ✗ throws ZodError (string, not boolean)
boolSchema.parse(1);             // ✗ throws ZodError (number, not boolean)

Null and Undefined

const nullSchema = z.null();
nullSchema.parse(null);          // ✓

const undefinedSchema = z.undefined();
undefinedSchema.parse(undefined); // ✓

Literal

const literalSchema = z.literal('positive');

literalSchema.parse('positive');  // ✓
literalSchema.parse('negative');  // ✗ throws ZodError

// Works with numbers and booleans too
const fortyTwo = z.literal(42);
const trueLiteral = z.literal(true);

5. Compound Types: Arrays and Objects

z.array()

const stringArray = z.array(z.string());

stringArray.parse(['a', 'b', 'c']);     // ✓ returns ['a', 'b', 'c']
stringArray.parse([]);                   // ✓ returns []
stringArray.parse(['a', 1, 'c']);        // ✗ element at index 1 is not a string
stringArray.parse('not an array');       // ✗ not an array

// Array of numbers
const numberArray = z.array(z.number());
numberArray.parse([1, 2, 3]);           // ✓

// Array with constraints
const tags = z.array(z.string()).min(1).max(10);
tags.parse([]);                          // ✗ too few
tags.parse(['a', 'b', 'c']);            // ✓

z.object()

This is the workhorse for AI response validation.

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
  isActive: z.boolean(),
});

PersonSchema.parse({
  name: 'Alice',
  age: 30,
  isActive: true,
});
// ✓ returns { name: 'Alice', age: 30, isActive: true }

PersonSchema.parse({
  name: 'Alice',
  age: 30,
  // missing isActive
});
// ✗ ZodError: Required at "isActive"

PersonSchema.parse({
  name: 'Alice',
  age: 30,
  isActive: true,
  extra: 'field', // extra property
});
// ✓ returns { name: 'Alice', age: 30, isActive: true }
// By default, extra fields are STRIPPED (not included in output)

Nested objects

const AIAnalysisSchema = z.object({
  sentiment: z.object({
    label: z.string(),
    score: z.number(),
  }),
  entities: z.array(z.object({
    text: z.string(),
    type: z.string(),
    confidence: z.number(),
  })),
  summary: z.string(),
});

// This mirrors a realistic AI response structure
const validated = AIAnalysisSchema.parse({
  sentiment: { label: 'positive', score: 0.92 },
  entities: [
    { text: 'OpenAI', type: 'ORGANIZATION', confidence: 0.98 },
    { text: 'GPT-4', type: 'PRODUCT', confidence: 0.95 },
  ],
  summary: 'The article discusses recent AI developments.',
});

6. Type Inference with z.infer

One of Zod's most powerful features: you define the schema once, and TypeScript infers the type automatically. No need to maintain a separate type definition.

import { z } from 'zod';

const ProductReviewSchema = z.object({
  product: z.string(),
  rating: z.number().min(1).max(5),
  pros: z.array(z.string()),
  cons: z.array(z.string()),
  recommendation: z.boolean(),
});

// Automatically infer the TypeScript type from the schema
type ProductReview = z.infer<typeof ProductReviewSchema>;

// ProductReview is exactly:
// {
//   product: string;
//   rating: number;
//   pros: string[];
//   cons: string[];
//   recommendation: boolean;
// }

// Now you can use the type throughout your code
function displayReview(review: ProductReview) {
  console.log(`${review.product}: ${review.rating}/5`);
  console.log(`Pros: ${review.pros.join(', ')}`);
  console.log(`Cons: ${review.cons.join(', ')}`);
  console.log(`Recommended: ${review.recommendation ? 'Yes' : 'No'}`);
}

// Validate AI output and get full type safety
const raw = JSON.parse(aiResponse);
const review = ProductReviewSchema.parse(raw);
displayReview(review); // Full IntelliSense, full type safety

Why this matters

Without Zod, you maintain two things that can drift apart:

// Without Zod — TWO sources of truth
type AIResponse = {
  sentiment: string;
  confidence: number;
};

function validateAIResponse(data: unknown): AIResponse {
  if (typeof data !== 'object' || data === null) throw new Error('Not an object');
  if (typeof (data as any).sentiment !== 'string') throw new Error('Bad sentiment');
  if (typeof (data as any).confidence !== 'number') throw new Error('Bad confidence');
  return data as AIResponse;
}
// Type and validation logic can drift apart — bugs happen

// With Zod — ONE source of truth
const AIResponseSchema = z.object({
  sentiment: z.string(),
  confidence: z.number(),
});
type AIResponse = z.infer<typeof AIResponseSchema>;
// Type is ALWAYS in sync with validation

7. Comparison with Alternatives

Zod vs Joi

Joi:
  - Mature, battle-tested (from the Hapi ecosystem)
  - No native TypeScript type inference
  - Larger bundle size (~150KB)
  - Requires separate TypeScript type definitions

Zod:
  - TypeScript-first with automatic type inference
  - Smaller bundle (~13KB gzipped)
  - Single source of truth for types AND validation
  - Better developer experience with modern TypeScript
// Joi — requires separate type
import Joi from 'joi';
const joiSchema = Joi.object({
  name: Joi.string().required(),
  age: Joi.number().required(),
});
// You must ALSO define:
interface User { name: string; age: number; }

// Zod — type comes free
import { z } from 'zod';
const zodSchema = z.object({
  name: z.string(),
  age: z.number(),
});
type User = z.infer<typeof zodSchema>; // Automatic!

Zod vs Yup

Yup:
  - Popular in the React/form validation space
  - Has TypeScript support but inference is weaker
  - Designed primarily for form validation
  - Casts/coerces values by default (can hide bugs)

Zod:
  - Stricter by default (no silent coercion unless you opt in)
  - Better TypeScript inference
  - Designed for general-purpose validation (API, AI, forms)
  - Immutable API (every method returns a new schema)

Zod vs AJV (JSON Schema)

AJV:
  - Fastest JSON Schema validator
  - Uses JSON Schema standard (shareable across languages)
  - No TypeScript type inference
  - Schema is a JSON object, not code — harder to compose

Zod:
  - Schema is code — composable, reusable, refactorable
  - Full TypeScript type inference
  - Not JSON Schema standard (but converters exist: zod-to-json-schema)
  - Slightly slower than AJV for raw validation speed

When to use what

Use CaseBest ChoiceWhy
AI output validationZodType inference, composable schemas, great DX
Form validation (React)Zod or YupBoth integrate well with React Hook Form
API input validationZodSingle source of truth for types + validation
Cross-language schemasAJV/JSON SchemaStandard format shared across services
Legacy Node.js projectsJoiMature, well-supported in Hapi ecosystem
Performance-critical pathsAJVFastest raw validation speed

8. Your First AI Validation Example

Here's a complete example that ties everything together — calling an AI API and validating the response with Zod:

import { z } from 'zod';
import OpenAI from 'openai';

// Step 1: Define the schema for expected AI output
const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
});

// Step 2: Infer the TypeScript type
type SentimentAnalysis = z.infer<typeof SentimentSchema>;

// Step 3: Call the AI and validate
async function analyzeSentiment(text: string): Promise<SentimentAnalysis> {
  const client = new OpenAI();

  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    temperature: 0,
    messages: [
      {
        role: 'system',
        content: `Analyze the sentiment of the given text.
Respond with JSON only, no other text.
Schema: { "sentiment": "positive"|"negative"|"neutral", "confidence": 0-1, "reasoning": "string" }`,
      },
      { role: 'user', content: text },
    ],
  });

  const rawOutput = response.choices[0].message.content;
  if (!rawOutput) {
    throw new Error('Empty response from AI');
  }

  // Step 4: Parse JSON
  let parsed: unknown;
  try {
    parsed = JSON.parse(rawOutput);
  } catch {
    throw new Error(`AI returned invalid JSON: ${rawOutput}`);
  }

  // Step 5: Validate with Zod
  const validated = SentimentSchema.parse(parsed);
  // validated is fully typed as SentimentAnalysis
  return validated;
}

// Usage
const result = await analyzeSentiment('I love this product!');
console.log(result.sentiment);   // 'positive' — TypeScript knows this is the enum
console.log(result.confidence);  // 0.95 — TypeScript knows this is a number
console.log(result.reasoning);   // 'The text expresses...' — TypeScript knows this is a string

9. Key Takeaways

  1. Zod is TypeScript-first — it gives you runtime validation AND compile-time types from a single schema definition.
  2. AI output needs runtime validation — TypeScript types are erased at runtime, so they cannot protect you from malformed AI responses.
  3. z.infer eliminates type duplication — define the schema once, infer the type automatically.
  4. Basic building blocks — z.string(), z.number(), z.boolean(), z.array(), z.object() cover most AI response shapes.
  5. Zod vs alternatives — Zod wins for AI validation because of TypeScript inference, small bundle, and composable API. AJV wins for raw speed, Joi for legacy projects.

Explain-It Challenge

  1. A colleague says "We already have TypeScript types for our AI responses, so we don't need Zod." Explain why they are wrong.
  2. Show how z.infer<typeof schema> eliminates the problem of type definitions drifting out of sync with validation logic.
  3. Write a Zod schema for an AI-generated recipe response: name (string), servings (number, 1-20), ingredients (array of objects with name and quantity), instructions (array of strings, at least 1 step).

Navigation: ← 4.6 Overview · 4.6.b — Defining Validation Schemas →