Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod

4.6.b — Defining Validation Schemas

In one sentence: The power of Zod lies in building complex, nested schemas that precisely describe the shape of AI output — including enums for restricted values, unions for variant responses, optional fields with defaults, and custom refinements for business rules.

Navigation: ← 4.6.a Introduction to Zod · 4.6.c — Verifying AI Responses →

1. Why Complex Schemas Matter for AI

AI responses are rarely flat. A realistic AI analysis might return:

{
  "analysis": {
    "sentiment": "positive",
    "topics": ["pricing", "customer service"],
    "entities": [
      { "name": "Acme Corp", "type": "company", "mentioned_count": 3 }
    ]
  },
  "metadata": {
    "model_confidence": 0.87,
    "processing_time_ms": 1200
  },
  "recommendations": [
    {
      "action": "follow_up",
      "priority": "high",
      "reason": "Customer expressed purchase intent"
    }
  ]
}

You need schemas that can describe and validate every level of this structure. This lesson covers every tool Zod gives you to build schemas of any complexity.

2. z.object() with Nested Objects

Basic nesting

import { z } from 'zod';

const AddressSchema = z.object({
  street: z.string(),
  city: z.string(),
  state: z.string(),
  zip: z.string(),
});

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
  address: AddressSchema, // Nested object schema
});

type Person = z.infer<typeof PersonSchema>;
// {
//   name: string;
//   age: number;
//   address: { street: string; city: string; state: string; zip: string };
// }

Deep nesting for AI responses

const AIReportSchema = z.object({
  summary: z.string(),
  analysis: z.object({
    sentiment: z.object({
      overall: z.string(),
      score: z.number(),
      breakdown: z.object({
        positive_signals: z.array(z.string()),
        negative_signals: z.array(z.string()),
      }),
    }),
    risk_level: z.string(),
  }),
  recommendations: z.array(z.object({
    title: z.string(),
    description: z.string(),
    priority: z.number(),
  })),
});

type AIReport = z.infer<typeof AIReportSchema>;

Extracting and reusing sub-schemas

// Define sub-schemas separately for reuse
const SentimentBreakdown = z.object({
  positive_signals: z.array(z.string()),
  negative_signals: z.array(z.string()),
});

const SentimentAnalysis = z.object({
  overall: z.string(),
  score: z.number(),
  breakdown: SentimentBreakdown,
});

const Recommendation = z.object({
  title: z.string(),
  description: z.string(),
  priority: z.number(),
});

// Compose them
const AIReportSchema = z.object({
  summary: z.string(),
  analysis: z.object({
    sentiment: SentimentAnalysis,
    risk_level: z.string(),
  }),
  recommendations: z.array(Recommendation),
});

3. z.array() with Element Schemas

Basic arrays

const TagsSchema = z.array(z.string());
// Validates: ["ai", "machine-learning", "zod"]

const ScoresSchema = z.array(z.number());
// Validates: [0.95, 0.87, 0.72]

Array constraints

// At least 1 item, at most 10
const TopicsSchema = z.array(z.string()).min(1).max(10);

TopicsSchema.parse([]);                    // ✗ Array must contain at least 1 element(s)
TopicsSchema.parse(['ai']);                // ✓
TopicsSchema.parse(new Array(11).fill('x')); // ✗ Array must contain at most 10 element(s)

// Exact length
const CoordinatesSchema = z.array(z.number()).length(2);
CoordinatesSchema.parse([40.7, -74.0]); // ✓
CoordinatesSchema.parse([40.7]);         // ✗

// Non-empty shorthand
const NonEmptySchema = z.array(z.string()).nonempty();
NonEmptySchema.parse([]);      // ✗
NonEmptySchema.parse(['a']);   // ✓

Arrays of objects (very common for AI output)

const EntitySchema = z.object({
  text: z.string(),
  type: z.enum(['person', 'organization', 'location', 'date', 'other']),
  confidence: z.number().min(0).max(1),
  start_index: z.number().int().nonnegative(),
  end_index: z.number().int().nonnegative(),
});

const EntitiesSchema = z.array(EntitySchema);

type Entity = z.infer<typeof EntitySchema>;
type Entities = z.infer<typeof EntitiesSchema>;
// Entity[] — fully typed

EntitiesSchema.parse([
  { text: 'OpenAI', type: 'organization', confidence: 0.98, start_index: 0, end_index: 6 },
  { text: 'San Francisco', type: 'location', confidence: 0.91, start_index: 15, end_index: 28 },
]);
// ✓ Valid

4. z.enum() for Restricted Values

AI models often return categorical values. Use z.enum() to ensure the model returned one of the expected values.

Basic enum

const SentimentSchema = z.enum(['positive', 'negative', 'neutral']);

SentimentSchema.parse('positive');           // ✓
SentimentSchema.parse('somewhat positive');  // ✗ Invalid enum value

type Sentiment = z.infer<typeof SentimentSchema>;
// 'positive' | 'negative' | 'neutral'

Enum in object

const ClassificationSchema = z.object({
  category: z.enum(['bug', 'feature', 'question', 'documentation']),
  severity: z.enum(['low', 'medium', 'high', 'critical']),
  confidence: z.number(),
});

// AI returns:
ClassificationSchema.parse({
  category: 'bug',
  severity: 'high',
  confidence: 0.92,
});
// ✓

ClassificationSchema.parse({
  category: 'enhancement', // not in the enum
  severity: 'high',
  confidence: 0.92,
});
// ✗ Invalid enum value. Expected 'bug' | 'feature' | 'question' | 'documentation', received 'enhancement'

Extracting enum values

const PrioritySchema = z.enum(['low', 'medium', 'high', 'critical']);

// Get the array of valid values
PrioritySchema.options; // ['low', 'medium', 'high', 'critical']

// Useful for prompts — tell the AI what values are valid
const validValues = PrioritySchema.options.join(', ');
const prompt = `Classify the priority. Must be one of: ${validValues}`;

z.nativeEnum() for TypeScript enums

enum Status {
  Active = 'active',
  Inactive = 'inactive',
  Pending = 'pending',
}

const StatusSchema = z.nativeEnum(Status);
StatusSchema.parse('active');   // ✓
StatusSchema.parse('deleted');  // ✗

5. z.union() and z.discriminatedUnion()

AI responses sometimes have variant structures — the shape of the response depends on the type of result.

z.union() — either shape A or shape B

// AI might return a success or an error
const AIResultSchema = z.union([
  z.object({
    success: z.literal(true),
    data: z.object({
      answer: z.string(),
      confidence: z.number(),
    }),
  }),
  z.object({
    success: z.literal(false),
    error: z.string(),
    suggestion: z.string().optional(),
  }),
]);

type AIResult = z.infer<typeof AIResultSchema>;

// Both are valid:
AIResultSchema.parse({
  success: true,
  data: { answer: 'The capital is Paris', confidence: 0.99 },
});

AIResultSchema.parse({
  success: false,
  error: 'Insufficient context to answer',
  suggestion: 'Please provide more details about which country you mean',
});

z.discriminatedUnion() — faster and better error messages

When variants share a discriminator field (a field whose value determines the shape), use discriminatedUnion for better performance and error messages.

const AIResponseSchema = z.discriminatedUnion('type', [
  z.object({
    type: z.literal('text'),
    content: z.string(),
  }),
  z.object({
    type: z.literal('code'),
    language: z.string(),
    code: z.string(),
    explanation: z.string().optional(),
  }),
  z.object({
    type: z.literal('table'),
    headers: z.array(z.string()),
    rows: z.array(z.array(z.string())),
  }),
]);

type AIResponse = z.infer<typeof AIResponseSchema>;

// Zod looks at the "type" field first, then validates against the matching shape
AIResponseSchema.parse({
  type: 'code',
  language: 'javascript',
  code: 'console.log("hello")',
  explanation: 'Prints hello to the console',
}); // ✓

AIResponseSchema.parse({
  type: 'code',
  language: 'javascript',
  // missing "code" field
}); // ✗ Error clearly says: "Required" at path "code"

Why discriminatedUnion is better than union

z.union():
  - Tries EVERY variant, returns the first match
  - If none match, error message combines errors from ALL variants — confusing
  - O(n) where n is the number of variants

z.discriminatedUnion():
  - Reads the discriminator field, jumps to the correct variant
  - Error message is specific to ONE variant — clear and actionable
  - O(1) lookup by discriminator

6. Optional Fields and Defaults

AI models sometimes omit fields. You need to handle this gracefully.

z.optional()

const ReviewSchema = z.object({
  title: z.string(),
  body: z.string(),
  rating: z.number(),
  tags: z.array(z.string()).optional(),     // might be missing
  author_note: z.string().optional(),       // might be missing
});

type Review = z.infer<typeof ReviewSchema>;
// {
//   title: string;
//   body: string;
//   rating: number;
//   tags?: string[] | undefined;
//   author_note?: string | undefined;
// }

ReviewSchema.parse({
  title: 'Great product',
  body: 'Really enjoyed using it.',
  rating: 5,
  // tags and author_note are omitted — that's fine
}); // ✓

.default() — provide a fallback value

const AnalysisSchema = z.object({
  text: z.string(),
  language: z.string().default('en'),            // defaults to 'en' if missing
  confidence: z.number().default(0),              // defaults to 0 if missing
  tags: z.array(z.string()).default([]),          // defaults to [] if missing
  processed: z.boolean().default(false),          // defaults to false if missing
});

type Analysis = z.infer<typeof AnalysisSchema>;

const result = AnalysisSchema.parse({
  text: 'Hello world',
  // everything else is missing
});

console.log(result);
// {
//   text: 'Hello world',
//   language: 'en',
//   confidence: 0,
//   tags: [],
//   processed: false
// }

.nullable() — allow null

const ResponseSchema = z.object({
  answer: z.string(),
  source: z.string().nullable(), // can be string or null
});

ResponseSchema.parse({ answer: 'Yes', source: null });      // ✓
ResponseSchema.parse({ answer: 'Yes', source: 'Wikipedia'}); // ✓
ResponseSchema.parse({ answer: 'Yes' });                     // ✗ source is required (even though nullable)

.nullish() — optional AND nullable

const FlexibleSchema = z.object({
  answer: z.string(),
  source: z.string().nullish(), // can be string, null, or undefined
});

FlexibleSchema.parse({ answer: 'Yes' });                    // ✓ source is undefined
FlexibleSchema.parse({ answer: 'Yes', source: null });      // ✓ source is null
FlexibleSchema.parse({ answer: 'Yes', source: 'Wiki' });    // ✓ source is string

7. String Refinements

AI-generated strings often need to meet specific criteria. Zod provides built-in string validators.

Length constraints

const SummarySchema = z.string().min(10).max(500);
// Summary must be between 10 and 500 characters

SummarySchema.parse('Too short');                    // ✗ String must contain at least 10 character(s)
SummarySchema.parse('This is a perfectly valid summary of the content.'); // ✓

Format validators

const EmailSchema = z.string().email();
EmailSchema.parse('user@example.com');  // ✓
EmailSchema.parse('not-an-email');      // ✗

const UrlSchema = z.string().url();
UrlSchema.parse('https://example.com'); // ✓
UrlSchema.parse('not a url');           // ✗

const UuidSchema = z.string().uuid();
UuidSchema.parse('550e8400-e29b-41d4-a716-446655440000'); // ✓

const DateTimeSchema = z.string().datetime();
DateTimeSchema.parse('2025-01-15T10:30:00Z'); // ✓

Regex patterns

// AI might return a version string
const VersionSchema = z.string().regex(/^\d+\.\d+\.\d+$/, 'Must be semver format (x.y.z)');
VersionSchema.parse('1.2.3');    // ✓
VersionSchema.parse('v1.2.3');   // ✗

// AI might return a hex color
const HexColorSchema = z.string().regex(/^#[0-9a-fA-F]{6}$/, 'Must be hex color');
HexColorSchema.parse('#ff0000'); // ✓
HexColorSchema.parse('red');     // ✗

Chaining string refinements

const UsernameSchema = z.string()
  .min(3, 'Username too short')
  .max(20, 'Username too long')
  .regex(/^[a-zA-Z0-9_]+$/, 'Only alphanumeric and underscore');

UsernameSchema.parse('alice_42');  // ✓
UsernameSchema.parse('ab');        // ✗ Username too short
UsernameSchema.parse('hello world'); // ✗ Only alphanumeric and underscore

// .trim() removes whitespace before validation
const CleanStringSchema = z.string().trim().min(1);
CleanStringSchema.parse('  hello  '); // ✓ returns 'hello'
CleanStringSchema.parse('   ');        // ✗ too short after trim

// .toLowerCase() / .toUpperCase() transform the value
const TagSchema = z.string().trim().toLowerCase();
TagSchema.parse('  Machine Learning  '); // returns 'machine learning'

8. Number Refinements

Range and type constraints

// AI confidence scores
const ConfidenceSchema = z.number().min(0).max(1);
ConfidenceSchema.parse(0.95);  // ✓
ConfidenceSchema.parse(1.5);   // ✗ Number must be less than or equal to 1
ConfidenceSchema.parse(-0.1);  // ✗ Number must be greater than or equal to 0

// Integer only
const CountSchema = z.number().int();
CountSchema.parse(42);    // ✓
CountSchema.parse(3.14);  // ✗ Expected integer, received float

// Positive / negative / nonnegative
const PositiveSchema = z.number().positive();    // > 0
const NegativeSchema = z.number().negative();    // < 0
const NonNegSchema = z.number().nonnegative();   // >= 0

// Combine constraints
const AgeSchema = z.number().int().min(0).max(150);
const PriceSchema = z.number().nonnegative().multipleOf(0.01); // valid currency amount

Finite check

const SafeNumberSchema = z.number().finite();
SafeNumberSchema.parse(42);         // ✓
SafeNumberSchema.parse(Infinity);   // ✗
SafeNumberSchema.parse(NaN);        // ✗ (NaN is not finite)

9. Custom Refinements with .refine()

When built-in validators are not enough, .refine() lets you add arbitrary validation logic.

Basic refine

const EvenNumberSchema = z.number().refine(
  (n) => n % 2 === 0,
  { message: 'Number must be even' }
);

EvenNumberSchema.parse(4);  // ✓
EvenNumberSchema.parse(3);  // ✗ Number must be even

Refine on objects — cross-field validation

const DateRangeSchema = z.object({
  start_date: z.string().datetime(),
  end_date: z.string().datetime(),
}).refine(
  (data) => new Date(data.end_date) > new Date(data.start_date),
  { message: 'end_date must be after start_date', path: ['end_date'] }
);

DateRangeSchema.parse({
  start_date: '2025-01-01T00:00:00Z',
  end_date: '2025-06-01T00:00:00Z',
}); // ✓

DateRangeSchema.parse({
  start_date: '2025-06-01T00:00:00Z',
  end_date: '2025-01-01T00:00:00Z',
}); // ✗ end_date must be after start_date

.superRefine() for multiple custom errors

const PasswordSchema = z.string().superRefine((val, ctx) => {
  if (val.length < 8) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: 'Password must be at least 8 characters',
    });
  }
  if (!/[A-Z]/.test(val)) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: 'Password must contain at least one uppercase letter',
    });
  }
  if (!/[0-9]/.test(val)) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: 'Password must contain at least one number',
    });
  }
});

// Can produce MULTIPLE errors at once

10. The compatibility_score Schema Example

Let's build a realistic schema for an AI-powered compatibility scoring system — the kind of thing you might use in a matching platform, recommendation engine, or content analysis tool.

import { z } from 'zod';

// Sub-schemas for reusability
const TraitSchema = z.object({
  name: z.string().min(1),
  score: z.number().min(0).max(100),
  weight: z.number().min(0).max(1).default(1),
  description: z.string().optional(),
});

const CompatibilityDimensionSchema = z.object({
  dimension: z.enum([
    'personality',
    'interests',
    'values',
    'communication_style',
    'goals',
  ]),
  score: z.number().min(0).max(100),
  traits_compared: z.array(TraitSchema).min(1),
  explanation: z.string().min(10),
});

// Main schema
const CompatibilityResultSchema = z.object({
  overall_score: z.number().min(0).max(100),
  compatibility_level: z.enum(['excellent', 'good', 'moderate', 'low', 'poor']),
  dimensions: z.array(CompatibilityDimensionSchema).min(1).max(10),
  strengths: z.array(z.string()).min(1),
  challenges: z.array(z.string()),
  recommendation: z.string().min(20),
  confidence: z.number().min(0).max(1),
  metadata: z.object({
    model_version: z.string().optional(),
    analysis_depth: z.enum(['basic', 'standard', 'deep']).default('standard'),
    timestamp: z.string().datetime().optional(),
  }).default({}),
}).refine(
  (data) => {
    // Business rule: overall_score should roughly match compatibility_level
    const level = data.compatibility_level;
    const score = data.overall_score;
    if (level === 'excellent' && score < 80) return false;
    if (level === 'poor' && score > 30) return false;
    return true;
  },
  { message: 'overall_score does not match compatibility_level' }
);

// Infer the full type
type CompatibilityResult = z.infer<typeof CompatibilityResultSchema>;

// Example usage: validate AI output
function validateCompatibilityResponse(rawJson: string): CompatibilityResult {
  const parsed = JSON.parse(rawJson);
  return CompatibilityResultSchema.parse(parsed);
}

This schema demonstrates:

Nested objects (metadata inside the main result)
Arrays of typed objects (dimensions, traits)
Enums (compatibility_level, dimension names, analysis_depth)
Number ranges (scores 0-100, confidence 0-1)
String constraints (min length on explanation and recommendation)
Defaults (analysis_depth defaults to 'standard')
Optional fields (model_version, timestamp, description)
Cross-field refinement (score must match level)
Composable sub-schemas (TraitSchema reused inside CompatibilityDimensionSchema)

11. Schema Composition Techniques

.extend() — add fields to an existing schema

const BaseResponseSchema = z.object({
  id: z.string(),
  created_at: z.string().datetime(),
});

const AIResponseSchema = BaseResponseSchema.extend({
  content: z.string(),
  tokens_used: z.number(),
  model: z.string(),
});

type AIResponse = z.infer<typeof AIResponseSchema>;
// { id: string; created_at: string; content: string; tokens_used: number; model: string }

.merge() — combine two schemas

const MetadataSchema = z.object({
  model: z.string(),
  latency_ms: z.number(),
});

const ContentSchema = z.object({
  text: z.string(),
  confidence: z.number(),
});

const FullResponseSchema = MetadataSchema.merge(ContentSchema);
// { model: string; latency_ms: number; text: string; confidence: number }

.pick() and .omit() — select or exclude fields

const UserSchema = z.object({
  id: z.string(),
  name: z.string(),
  email: z.string().email(),
  password_hash: z.string(),
  created_at: z.string(),
});

// Only pick certain fields (like a DTO)
const PublicUserSchema = UserSchema.pick({ id: true, name: true, email: true });
// { id: string; name: string; email: string }

// Omit sensitive fields
const SafeUserSchema = UserSchema.omit({ password_hash: true });
// { id: string; name: string; email: string; created_at: string }

.partial() and .required()

const ConfigSchema = z.object({
  model: z.string(),
  temperature: z.number(),
  max_tokens: z.number(),
  top_p: z.number(),
});

// Make all fields optional (for partial updates)
const PartialConfigSchema = ConfigSchema.partial();
// { model?: string; temperature?: number; max_tokens?: number; top_p?: number }

// Make specific fields optional
const FlexConfigSchema = ConfigSchema.partial({ temperature: true, top_p: true });
// { model: string; temperature?: number; max_tokens: number; top_p?: number }

12. Key Takeaways

z.object() with nesting handles any depth of AI response structure — break complex schemas into reusable sub-schemas.
z.enum() catches AI responses that return unexpected categorical values — and its .options property is useful for building prompts.
z.discriminatedUnion() is the right choice when AI responses have variant shapes determined by a type field — it gives faster validation and clearer errors than z.union().
Optional fields with defaults make schemas resilient to missing data — use .optional() when absence is acceptable, .default() when you need a fallback value.
String and number refinements (.min(), .max(), .email(), .int(), .positive()) catch subtle type-correct-but-value-wrong errors that AI models commonly produce.
Custom .refine() handles business rules that span multiple fields — like ensuring a score matches its label.
Schema composition (.extend(), .merge(), .pick(), .omit(), .partial()) lets you build a library of reusable schemas without duplication.

Explain-It Challenge

An AI is supposed to return a severity level of "low", "medium", or "high" but sometimes returns "moderate". How would you define a schema that catches this? Would you use z.enum() or z.string()?
Build a schema for an AI-generated invoice with: invoice_number (string), items (array of {description, quantity (integer, min 1), unit_price (non-negative number)}), total (number), due_date (datetime string), and status (enum: draft, sent, paid, overdue). Add a refine that checks total equals sum of quantity * unit_price.
Explain when you would use z.discriminatedUnion() vs z.union() vs z.enum(). Give an AI response example for each.

Navigation: ← 4.6.a Introduction to Zod · 4.6.c — Verifying AI Responses →