Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod

Interview Questions: Schema Validation with Zod

Model answers for Zod basics, schema design, AI response validation, error handling, and retry strategies.

How to use this material (instructions)

  1. Read lessons in orderREADME.md, then 4.6.a4.6.e.
  2. Practice out loud — definition → example → pitfall.
  3. Pair with exercises4.6-Exercise-Questions.md.
  4. Quick review4.6-Quick-Revision.md.

Beginner (Q1–Q4)

Q1. What is Zod and why would you use it for AI applications?

Why interviewers ask: Tests foundational understanding of runtime validation and why TypeScript alone is not sufficient for AI output.

Model answer:

Zod is a TypeScript-first schema declaration and validation library. You define the expected shape of data using Zod's API — z.string(), z.number(), z.object(), etc. — and then validate any unknown value against that schema at runtime. If validation fails, Zod returns structured errors with the exact path and nature of each problem.

For AI applications, Zod is essential because LLM output is fundamentally untrusted. Even with response_format: { type: 'json_object' }, the model can return JSON with wrong types, missing fields, out-of-range values, or unexpected enum values. TypeScript types are erased at compile time and provide zero runtime protection. Zod fills this gap by validating the actual data at the moment you receive it.

The key advantage of Zod over alternatives like Joi or Yup is z.infer<typeof schema> — it automatically derives the TypeScript type from the schema, so you have a single source of truth for both validation logic and type definitions. This eliminates the common bug where a type definition drifts out of sync with the actual validation code.


Q2. Explain the difference between schema.parse() and schema.safeParse().

Why interviewers ask: This is a fundamental API choice that affects error handling strategy throughout the application.

Model answer:

Both methods validate data against a schema, but they differ in how they report failure:

schema.parse(data) returns the validated, typed data on success. On failure, it throws a ZodError. This means you need a try/catch block to handle errors. Use parse() when invalid data should stop execution — for example, in a middleware that should reject bad requests.

schema.safeParse(data) never throws. It returns a discriminated union: { success: true, data: T } on success or { success: false, error: ZodError } on failure. Use safeParse() when you need to inspect or log errors before deciding what to do — this is the standard choice for AI validation because you typically want to examine the errors, decide whether to retry, log the failure, or degrade gracefully.

In AI validation pipelines, safeParse() is almost always preferred because: (1) you need to format errors for retry messages to the model, (2) you want to log specific field failures for monitoring, and (3) not every validation failure should crash the request.


Q3. How do you handle the case where an AI model returns a number as a string (e.g., "82" instead of 82)?

Why interviewers ask: This is the most common AI response type mismatch, and the answer reveals depth of Zod knowledge.

Model answer:

There are three approaches, each with trade-offs:

1. z.coerce.number() — calls Number() on the input before validation. Simplest solution, but has a gotcha: Number('') returns 0, which is probably not what you want. Best for cases where you're confident the AI will return a numeric string.

2. z.preprocess() — runs a custom function before the schema validation. More explicit: z.preprocess((val) => typeof val === 'string' ? Number(val) : val, z.number().min(0).max(100)). Use this when you want to handle the empty-string edge case yourself.

3. z.union() with .transform() — accepts either a number or a string, transforms the string variant: z.union([z.number(), z.string().transform(Number)]). Most explicit and composable. Use this when the input could genuinely be either type and you want each path to have its own validation.

In production AI systems, I typically use z.preprocess() for flexibility, wrapped in a reusable utility like flexibleNumber() that handles strings, percentages ("85%"), and the empty-string edge case.


Q4. What are Zod's error objects and how do you use them?

Why interviewers ask: Demonstrates ability to work with validation error data for logging, retry feedback, and user-facing messages.

Model answer:

When validation fails, Zod produces a ZodError object containing an issues array. Each issue has: path (array of keys showing where the error is, like ['analysis', 'score']), code (the error type: invalid_type, too_small, invalid_enum_value, custom, etc.), message (human-readable description), and type-specific fields like expected and received.

For AI systems, I use these in three ways: (1) Retry feedback — format the errors into a message for the model: "Field 'confidence': Number must be <= 1". Models can self-correct when told exactly what's wrong. (2) Structured logging — log each issue with path, code, and raw response for monitoring dashboards. (3) Aggregation — track which fields fail most often to identify prompt improvement opportunities.

Zod also provides .format() (grouped by path), .flatten() (simplified for forms), and direct .issues access. For AI validation, I use .issues directly because it gives the most control over formatting.


Intermediate (Q5–Q8)

Q5. Walk me through how you validate AI responses in production. What is the full pipeline?

Why interviewers ask: Tests end-to-end system thinking, not just individual API knowledge.

Model answer:

The pipeline has five stages:

Stage 1: Raw extraction. Get response.choices[0].message.content — this is always a string. Check for null/empty.

Stage 2: JSON parsing. Wrap JSON.parse() in try/catch. If it fails, try JSON extraction — the model often wraps JSON in markdown fences or explanatory text. I use a utility that tries direct parse, then strips code fences, then uses regex to find the first {...} block.

Stage 3: Schema validation. schema.safeParse(parsed) validates the entire structure. If it succeeds, return the typed data. If it fails, collect the error details.

Stage 4: Recovery attempt. Depending on the failure type: try type coercion (string numbers), lenient schema with defaults, or truncation repair (if the response hit max_tokens). Each recovery path logs a warning flag.

Stage 5: Retry or fail. If recovery fails, retry with the validation errors fed back to the model as a new user message. After max retries (typically 3), either return a degraded response or fail with a structured error.

In production, every stage is instrumented: JSON parse success rate, schema validation success rate, retry rate, recovery rate. I alert when validation failure rate exceeds 5% over a 15-minute window — that usually indicates a prompt regression or model version change.


Q6. How do you design schemas for AI responses that have multiple possible shapes (variant responses)?

Why interviewers ask: Tests schema design sophistication — union types are common in real AI systems.

Model answer:

Zod offers two approaches for variant schemas:

z.union([schemaA, schemaB]) tries each variant in order and returns the first match. It's flexible but has two drawbacks: (1) performance is O(n) since it tries every variant, and (2) error messages combine failures from ALL variants, making them confusing.

z.discriminatedUnion('type', [...]) requires a shared discriminator field (like "type") whose literal value determines which variant to validate against. It's O(1) lookup and produces clear, single-variant error messages.

For AI responses, I always use discriminatedUnion when the response has a type indicator field. For example, an AI might return { type: "text", content: "..." } or { type: "code", language: "js", code: "..." }. The discriminator field tells Zod exactly which variant schema to apply.

When there's no natural discriminator, I use z.union() but order the variants from most specific to least specific — Zod returns the first match, so putting the broader schema first would cause it to always match, potentially hiding errors.

I also compose variants using extracted sub-schemas for reusability. For example, a MetadataSchema that appears in all variants gets defined once and included in each variant via .extend().


Q7. How do you handle the cost implications of retry loops with AI validation?

Why interviewers ask: Evaluates production awareness — retries are not free, and cost management is a real concern at scale.

Model answer:

Retry costs grow non-linearly because each retry includes the full conversation history. Attempt 1 sends the system prompt + user message (~700 tokens). Attempt 2 adds the failed response + error feedback (~450 tokens). Attempt 3 adds another failed response + feedback. So 3 attempts consume roughly 3.9x the tokens of a single call, not 3x.

I control this with three mechanisms:

1. Token budget per request. I set a maxTotalTokens limit (e.g., 15,000). Each attempt tracks usage.prompt_tokens + usage.completion_tokens. When the budget is exhausted, stop retrying regardless of attempt count.

2. Dollar cap per request. Calculate running cost using the model's per-token pricing. Stop if cumulative cost exceeds a threshold (e.g., $0.05).

3. Diminishing returns analysis. In practice, attempt 1 succeeds ~90% of the time, attempt 2 (with error feedback) succeeds ~98% of remaining failures. Attempt 3 adds <1% marginal improvement. Beyond 3 retries, you're spending money on a fundamentally broken prompt or model edge case — better to fail fast and fix the root cause.

At 50,000 daily calls with a 5% first-attempt failure rate, the retry overhead is ~2,500 extra calls per day. At $0.005/call, that is $12.50/day — acceptable. But if failure rate spikes to 20% (bad prompt change), retry costs quadruple. That is why monitoring validation failure rates with real-time alerts is critical.


Q8. Explain the "never trust AI output without validation" principle. What does it cover?

Why interviewers ask: Tests philosophical understanding and ability to apply a principle across different AI integration patterns.

Model answer:

This principle states that every piece of AI-generated output must pass through runtime validation before it enters your application logic, regardless of how the output was requested. It covers:

Raw completions — obviously untrusted. The model can return anything.

JSON mode (response_format: json_object) — guarantees syntactically valid JSON, but NOT schema compliance. The model can still return wrong types, missing fields, or unexpected values.

Function calling / tool use — the model fills in function arguments, but the arguments are not enforced by a schema at the model level. A temperature argument expected to be 0-1 might come back as 100.

Fine-tuned models — fine-tuning changes what the model generates but doesn't add runtime constraints. Fine-tuned models hallucinate differently, but they still hallucinate.

Strict structured output (OpenAI's strict JSON schema mode) — the closest to guaranteed schema compliance, but even here, value ranges and business logic are not enforced. The model might return { "score": -500 } in a field that should be 0-100.

The only safe stance is: define a Zod schema for every AI output boundary in your system, validate with safeParse(), and have a clear failure path (retry, degrade, or error) for every schema.


Advanced (Q9–Q11)

Q9. Design a validation strategy for a production AI pipeline that processes 100,000+ requests per day.

Why interviewers ask: Tests system design ability — combining validation, monitoring, cost control, and operational excellence.

Model answer:

Architecture:

Request → AI API → JSON Extract → Zod Validate → [Success] → Business Logic
                                       ↓ [Fail]
                                   Recovery Layer (coercion, defaults)
                                       ↓ [Fail]
                                   Retry Loop (max 3, with error feedback)
                                       ↓ [Fail]
                                   Fallback (cached response / human queue / error)

Schema management: Schemas are defined in a shared package, versioned alongside the prompts they validate. When a prompt changes, the schema must be updated in the same PR. CI/CD runs the prompt against 200+ eval examples and verifies validation pass rate exceeds 95%.

Tiered retry policy:

  • Tier 1 (latency-sensitive, e.g., real-time chat): max 1 retry, 500ms timeout per attempt
  • Tier 2 (standard, e.g., batch classification): max 3 retries, exponential backoff
  • Tier 3 (background, e.g., content moderation): max 5 retries, aggressive backoff, dead letter queue for persistent failures

Monitoring:

  • Real-time: validation pass rate, JSON parse rate, retry rate, per-field failure rate (all as time-series metrics)
  • Alerts: pass rate < 95% triggers PagerDuty for the on-call AI engineer
  • Weekly: top 10 failing fields, cost of retries, comparison across model versions

Cost control: Per-request token budget (15K), daily budget per pipeline ($500), automatic circuit breaker that pauses processing if cost anomaly detected.

Logging: Every validation failure logged with: request ID, full raw response, schema version, all Zod issues, recovery outcome. Sampled (1%) success cases logged for eval data generation.


Q10. How would you build a reusable callWithValidation() utility that your entire team can use across different AI features?

Why interviewers ask: Tests ability to build developer-facing abstractions — a utility that's both flexible and safe.

Model answer:

The utility should be generic over the schema type and expose the right hooks:

async function callWithValidation<T>(
  schema: z.ZodSchema<T>,
  systemPrompt: string,
  userMessage: string,
  options?: CallOptions,
): Promise<CallResult<T>>

Key design decisions:

  1. Generic over schema typez.ZodSchema<T> means the return type is automatically inferred from the schema. The caller gets full type safety without any casting.

  2. Options with sensible defaults — model (gpt-4o), temperature (0), maxRetries (3), maxTokens (15000), useJsonMode (true), extractJsonFromText (true). Everything overridable.

  3. Rich result type — return { success, data, attempts, totalTokens, errors, latencyMs }, not just the data or an exception. The caller needs metadata for logging and monitoring.

  4. Lifecycle callbacksonRetry(attempt, error) and onSuccess(attempt, data) let the caller plug in metrics, logging, and alerting without the utility needing to know about the monitoring stack.

  5. JSON extraction built-in — the utility should handle markdown fences and surrounding text by default, since this is the most common failure mode.

  6. Error feedback formatting — the utility formats Zod errors into clear retry messages. The caller doesn't need to know about ZodError.issues.

  7. Testable — accept the OpenAI client as a parameter (or via dependency injection) so tests can provide a mock client. The utility itself is just orchestration logic.

  8. Thread-safe / stateless — no shared mutable state. Each call is independent. Metrics are tracked via the callback hooks, not internal state.

The utility becomes the single integration point between your AI layer and your business logic. Every AI call in the codebase goes through it, giving you centralized validation, retry, cost tracking, and monitoring.


Q11. Compare schema validation (Zod) with OpenAI's native structured output (strict mode). When do you still need Zod?

Why interviewers ask: Tests understanding of the latest API features and ability to reason about defense in depth.

Model answer:

OpenAI's strict structured output mode (response_format: { type: "json_schema", json_schema: { strict: true, schema: ... } }) constrains the model's token generation to comply with a JSON Schema at decode time. This is a significant improvement: it guarantees the structure (correct fields, correct types, no extra fields) matches the schema.

However, you still need Zod for:

  1. Value constraints. Strict mode ensures a field is a number, but not that it's between 0 and 1. The model can return confidence: 500. Zod's .min(0).max(1) catches this.

  2. Cross-field validation. JSON Schema (and strict mode) cannot express "end_date must be after start_date." Zod's .refine() can.

  3. Transformations. You often need to normalize AI output — convert percentages to decimals, merge first/last name, parse date strings into Date objects. Strict mode gives you raw JSON; Zod transforms it into application-ready data.

  4. Provider portability. Strict structured output is an OpenAI-specific feature. If you use Claude, Gemini, Llama, or a mixture, you need a provider-agnostic validation layer. Zod works with any JSON source.

  5. Type inference. Even with strict mode, you still need to type the parsed JSON in TypeScript. Zod gives you that type automatically via z.infer. Without Zod, you maintain a JSON Schema (for the API) AND a TypeScript type (for your code) that can drift apart.

  6. Defense in depth. Models can be updated, prompts can be changed, and APIs can have bugs. Even if strict mode is "guaranteed" to produce valid structure, a defensive engineering stance validates anyway. The cost of a Zod safeParse() call is microseconds — negligible compared to an API call. The cost of a bad value reaching your database is hours of debugging.

My recommendation: use strict structured output to get the model as close to correct as possible (reducing retry rates), AND use Zod to validate value constraints, apply transformations, and ensure defense in depth.


Quick-fire

#QuestionOne-line answer
1What does Zod validate at?Runtime — not compile time
2z.infer does what?Derives TypeScript type from a Zod schema
3parse() vs safeParse()?parse() throws, safeParse() returns {success, data/error}
4z.coerce.boolean() on "false"?Returns true (any non-empty string is truthy)
5Why retry with error feedback?Model can self-correct — ~98% success on attempt 2 vs ~85% blind retry

← Back to 4.6 — Schema Validation with Zod (README)