Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod

4.6 — Exercise Questions: Schema Validation with Zod

Practice questions for all five subtopics in Section 4.6. Mix of conceptual, coding, and system design tasks.

How to use this material (instructions)

  1. Read lessons in orderREADME.md, then 4.6.a4.6.e.
  2. Answer closed-book first — then compare to the matching lesson.
  3. Code the examples — type them out, do not copy-paste. Run them.
  4. Interview prep4.6-Interview-Questions.md.
  5. Quick review4.6-Quick-Revision.md.

4.6.a — Introduction to Zod (Q1–Q9)

Q1. What is Zod? Explain in one sentence why it is described as "TypeScript-first."

Q2. Why can't TypeScript types alone protect you from invalid AI responses? Give a concrete example of what goes wrong at runtime.

Q3. Write a Zod schema for a simple object with three fields: name (string), score (number), and active (boolean). Parse a valid object and an invalid object — show what happens in each case.

Q4. Explain what z.infer<typeof schema> does. Why is this better than maintaining a separate TypeScript type definition alongside your schema?

Q5. Compare Zod with Joi in two sentences. When would you choose Joi over Zod?

Q6. Compare Zod with AJV (JSON Schema). Name one scenario where AJV is the better choice and one where Zod is the better choice.

Q7. Install Zod in a new Node.js project and write a script that defines a schema for an AI sentiment response (sentiment, confidence, summary), parses a hard-coded valid object, and prints the result with its inferred type annotation.

Q8. What is the approximate bundle size of Zod (minified + gzipped)? Why does this matter for edge runtimes and serverless functions?

Q9. A teammate says: "We already have response_format: { type: 'json_object' } in our OpenAI call, so we don't need Zod." Is this correct? Explain why or why not.


4.6.b — Defining Validation Schemas (Q10–Q18)

Q10. Write a Zod schema for a nested AI analysis response:

{
  "analysis": {
    "sentiment": "positive",
    "topics": ["pricing", "support"]
  },
  "metadata": {
    "model": "gpt-4o",
    "tokens_used": 450
  }
}

Q11. Use z.enum() to define a schema for a field that only accepts "bug", "feature", "question", or "documentation". What error message does Zod produce if the AI returns "enhancement" instead?

Q12. Explain the difference between z.union() and z.discriminatedUnion(). When should you use each? Which gives better error messages?

Q13. Write a schema for an AI response that might return either a success object { success: true, data: { answer: string } } or an error object { success: false, error: string }. Use z.discriminatedUnion().

Q14. What is the difference between z.optional(), .default(), .nullable(), and .nullish()? Write a one-line example for each.

Q15. Write a schema for a string field that must be: at least 10 characters long, at most 500 characters, and trimmed of whitespace. Chain the appropriate Zod methods.

Q16. Write a schema for a number field representing a confidence score: must be a finite number between 0 and 1 (inclusive). What happens if the AI returns NaN or Infinity?

Q17. Use .refine() to add a cross-field validation that checks end_date is after start_date in an object schema. Show the error when the rule is violated.

Q18. Explain how .extend(), .merge(), .pick(), .omit(), and .partial() work. Write a code example that uses at least three of these on a single base schema.


4.6.c — Verifying AI Responses (Q19–Q27)

Q19. Draw (in text/ASCII) the three-step AI validation pipeline. What happens at each step and what error can occur?

Q20. What is the difference between schema.parse() and schema.safeParse()? Write both versions for the same schema and show the output for a failing input.

Q21. Given a ZodError object, how do you access the list of individual validation issues? Write code that extracts each issue's path, code, and message.

Q22. Write a function formatErrorsForLogging(error: ZodError): string that produces a single-line log entry with all errors separated by semicolons.

Q23. Explain the concept of "partial validation." Write code that validates each field of an AI response independently and returns an object with validFields and invalidFields.

Q24. Use .transform() to create a schema that accepts a date string from the AI and converts it into a JavaScript Date object. Add a refinement that rejects invalid dates.

Q25. Use z.coerce.number() to handle the case where an AI returns "87" instead of 87. What is the gotcha with z.coerce.number() when the input is an empty string?

Q26. Write a "normalized response" schema that accepts confidence as either a number (0-1) or a string (e.g., "95%" or "0.87") and always outputs a number between 0 and 1.

Q27. Explain when you would use z.preprocess() vs .transform() vs z.coerce. Give one use case for each.


4.6.d — Handling Invalid Responses (Q28–Q36)

Q28. List the four categories of AI response failures. Give one real-world example of each.

Q29. Write a extractJSON(text) function that handles these three inputs:

  • '{"score": 85}'
  • 'Sure! Here is the result: {"score": 85}'
  • '```json\n{"score": 85}\n```'

Q30. The AI returns {"confidence": "82", "label": "good"} but your schema expects confidence to be a number. Show three different Zod-based solutions: (a) z.coerce, (b) z.preprocess, (c) z.union with .transform().

Q31. Write a safe boolean parser that correctly handles true, false, "true", "false", "yes", "no", 1, and 0. Explain why z.coerce.boolean() is dangerous for the string "false".

Q32. An AI returns confidence scores on a 0-100 scale instead of 0-1. Write a Zod .transform() that normalizes both scales. What is the ambiguous edge case with a value of 1?

Q33. Write a schema with .default() for every optional field, so that a minimal AI response { "text": "hello" } fills in all missing fields with sensible defaults.

Q34. Explain the "graceful degradation ladder" (5 levels). For each level, describe what happened and what action to take.

Q35. Implement a defendedValidation() function that tries four layers: direct parse, JSON extraction, type coercion, and truncation repair. Show the function signature and the first two layers.

Q36. Design a ValidationMetrics class that tracks: total requests, failure rate, top failing fields, most common error codes, and recovery success rate. Write the record() and getSummary() methods.


4.6.e — Retry Strategies (Q37–Q47)

Q37. When should you retry a failed AI validation vs fail immediately? List three examples of each.

Q38. Write a basic retry loop that calls an AI API, validates with Zod, and retries up to 3 times with error feedback. Use safeParse().

Q39. Explain why passing Zod validation errors back to the model is dramatically more effective than blind retries. Include approximate success rate numbers.

Q40. Write a formatZodErrors(error: ZodError): string function that creates a human-readable error list that can be included in a retry message to the model.

Q41. Calculate the token cost for 3 retry attempts given: system prompt = 600 tokens, user message = 200 tokens, average response = 350 tokens, error feedback = 120 tokens, GPT-4o pricing ($2.50/1M input, $10/1M output). Show the cost for each attempt and the total.

Q42. Implement exponential backoff with jitter. Write the formula and a function calculateBackoff(attempt, baseDelay, maxDelay) that returns the delay in milliseconds.

Q43. Write a shouldRetry() function that checks three conditions: max retries not exceeded, total tokens under budget, and total cost under a dollar limit.

Q44. Using the callWithValidation() utility pattern from lesson 4.6.e, write a complete ticket classifier that takes a support ticket string and returns a validated classification with category, priority, and summary.

Q45. Explain why response_format: { type: 'json_object' } does NOT eliminate the need for Zod validation. What can still go wrong?

Q46. Calculation: Your system makes 50,000 AI calls per day. With Zod validation and retry, 4% of calls need 1 retry and 0.5% need 2 retries. The rest succeed on the first attempt. Calculate: (a) total API calls per day, (b) additional cost of retries at $0.005 per call, (c) percentage of calls that ultimately fail (assume 0.01% fail after all retries).

Q47. Design exercise: Describe how you would build a monitoring dashboard for AI validation health. What metrics would you track? What alerts would you set?


Answer Hints

QHint
Q2TypeScript types are erased at compile time — JSON.parse() returns any, no runtime checking
Q4z.infer derives the TS type from the schema — single source of truth, never drifts
Q9json_object mode guarantees valid JSON but NOT schema compliance (wrong types, missing fields)
Q12discriminatedUnion uses a shared field to pick the variant — O(1) lookup, clearer errors
Q14.optional() = may be missing; .default() = fill in if missing; .nullable() = may be null; .nullish() = may be missing or null
Q20parse() throws ZodError; safeParse() returns {success, data/error} — never throws
Q25z.coerce.number() converts empty string to 0 via Number('') — probably not what you want
Q31z.coerce.boolean() calls Boolean("false") which is true because any non-empty string is truthy
Q32The value 1 is ambiguous: is it 1% (0.01) or 100% (1.0)? You need a heuristic or convention
Q39Blind retry: 85% success each time, so P(fail all 3) = 0.15^3 = 0.34%. With feedback: attempt 2 is ~98%
Q41Attempt 1: 800 input, 350 output; Attempt 2: 1270 input, 350 output; Attempt 3: 1740 input, 350 output. Total ≈ $0.016
Q46(a) 50,000 + 2,000 + 500 = 52,500 calls. (b) 2,500 extra calls × $0.005 = $12.50/day. (c) 50,000 × 0.0001 = 5 failures/day

← Back to 4.6 — Schema Validation with Zod (README)