Episode 4 — Generative AI Engineering / 4.4 — Structured Output in AI Systems
4.4 — Exercise Questions: Structured Output in AI Systems
Practice questions for all four subtopics in Section 4.4. Mix of conceptual, design, coding, and debugging tasks.
How to use this material (instructions)
- Read lessons in order —
README.md, then4.4.a→4.4.d. - Answer closed-book first — then compare to the matching lesson.
- Build the code examples — structured output is best learned by doing.
- Interview prep —
4.4-Interview-Questions.md. - Quick review —
4.4-Quick-Revision.md.
4.4.a — Why Unstructured Responses Are Difficult (Q1–Q10)
Q1. An LLM is asked "Is this email spam?" and responds with "This email is likely spam, but it could also be a legitimate promotional message." Write JavaScript code that tries to extract a boolean isSpam value from this response. Explain why your code fails.
Q2. List five different ways an LLM might phrase the answer to "What is the price of this product?" that would each require different regex patterns to parse.
Q3. Explain the regex escalation spiral in your own words. Why does adding more regex patterns to handle edge cases actually make the problem worse?
Q4. You have a production system that parses LLM responses using response.includes('positive') to detect sentiment. Write three LLM responses where this approach produces the wrong result.
Q5. Explain the difference between a parsing failure (throws an error) and silent data corruption (wrong data, no error). Which is more dangerous in production and why?
Q6. A junior developer says "I'll just use response.split('\n') to extract the data from the LLM response." Give three examples of how the LLM might format its response that would break this approach.
Q7. Describe the human-readable vs machine-readable gap. Why can a human extract "about thirty bucks" as a price of approximately $30, but code cannot?
Q8. You're building a pipeline where Step 1 extracts a price from an LLM, Step 2 calculates tax, and Step 3 generates an invoice. If Step 1 returns NaN, trace what happens through Steps 2 and 3. Why is this called a cascading error?
Q9. Your API wraps an LLM call and sometimes returns { sentiment: "positive", confidence: 0.92 } and other times returns { sentiment: undefined, confidence: undefined }. Explain why this happens and how it affects frontend consumers.
Q10. Write a test case for a function analyzeSentiment(text) that returns unstructured text. Explain why it's nearly impossible to write a meaningful assertion.
4.4.b — How Structured Responses Help (Q11–Q20)
Q11. Compare JSON, XML, and CSV as structured output formats for LLMs. For each, give one scenario where it's the best choice.
Q12. Rewrite this unstructured parsing code to use structured JSON output instead:
const text = response.choices[0].message.content;
const nameMatch = text.match(/Name:\s*(.+)/i);
const ageMatch = text.match(/Age:\s*(\d+)/i);
const name = nameMatch?.[1]?.trim() || 'Unknown';
const age = ageMatch ? parseInt(ageMatch[1]) : null;
Show both the prompt you'd write and the parsing code.
Q13. Explain how structured output enables type safety. Give an example where the LLM returns "0.85" (string) instead of 0.85 (number) and show how validation catches this.
Q14. You're told that structured output reduces parsing code by ~80%. Using the product extraction example from the lesson, count the approximate lines of code for both approaches and verify this claim.
Q15. Write a robustParse() function that handles these common LLM quirks when returning JSON:
- Response wrapped in markdown code fences (
```json ... ```) - Response has trailing text after the JSON
- Response has leading text before the JSON
Q16. Design an error handling strategy for a structured LLM response. List all five failure modes (empty response, non-JSON, missing field, invalid value, wrong type) and the recovery action for each.
Q17. Your API must return a consistent response shape for all clients. Write an Express.js endpoint that wraps an LLM call with structured output and guarantees the response shape, even when the LLM fails.
Q18. A teammate argues: "We don't need structured output — we can just use a second LLM call to parse the first LLM's response." What are three problems with this approach?
Q19. Explain why structured output makes automated testing possible. Write a Jest test suite for a sentiment analysis function that returns structured JSON.
Q20. How does structured output reduce operational overhead? Calculate the daily time savings for a system processing 10,000 requests with a 3% failure rate (5 min manual review per failure) vs 0.01% failure rate.
4.4.c — Common Applications (Q21–Q32)
Q21. Design a structured output schema for resume parsing that captures: personal info, work experience, education, skills, and certifications. Include at least 15 fields with appropriate types.
Q22. Write the system prompt and JavaScript code for a content moderation system that returns { flagged, severity, categories, reason, suggestedAction }. Include the routing logic that auto-removes high-severity content and queues medium-severity for human review.
Q23. Design a schema for email classification that captures intent, urgency, department, and extracted entities (order number, account ID). Then write the routing logic that: (a) sends critical issues to Slack, (b) auto-responds with templates for common questions, (c) escalates angry customers.
Q24. Build a product metadata generator that takes raw supplier text and produces structured data with title, description, tags, category, and attributes. Include validation that rejects titles over 100 characters and requires at least 3 tags.
Q25. Design a scoring engine schema for job-candidate matching. Include overall score, weighted criterion breakdown, strengths, weaknesses, and recommendation. Write the code that ranks 10 candidates and returns the top 3.
Q26. Write a structured output prompt for invoice extraction that captures vendor, customer, line items, taxes, and total. Include a validation function that verifies the line item totals add up to the subtotal.
Q27. Compare aspect-based sentiment analysis vs simple positive/negative classification. Design schemas for both and explain when you'd use each.
Q28. Build a contract term extraction system. Design the schema, write the prompt, and implement risk detection logic that flags: (a) non-compete clauses, (b) auto-renewal terms, (c) contracts over $100K, (d) unusual liability terms.
Q29. You're processing 5,000 receipts per day for an expense management system. Design the complete pipeline: LLM extraction, validation, categorization, and database storage. What's your strategy for handling the ~1% of receipts that fail extraction?
Q30. Write the structuredLLMCall() universal wrapper function from the lesson. Include: markdown fence removal, retry with exponential backoff, custom validation, and a fallback default value.
Q31. Your sentiment analysis system processes product reviews in 6 languages. How does this affect your schema design? Should sentiment labels be in English regardless of input language? Why?
Q32. Design a structured output system for automated code review that returns: issues found (severity, line number, description, suggestion), overall quality score, and a pass/fail recommendation. Write the schema and a sample prompt.
4.4.d — Designing Output Schemas (Q33–Q44)
Q33. Convert this inconsistently named schema into proper camelCase with correct naming conventions:
{ fn: '', last_name: '', email_addr: '', DOB: '', Active: '', phone_num: '' }
Q34. You have a schema with 20 fields. 8 are always present in the source data, and 12 are sometimes present. Design the schema with correct required/optional field designations. Write the validation code for the required fields.
Q35. Explain why enums are the most important data type for structured LLM output. Give an example where using a free-form string instead of an enum causes a bug in a switch statement.
Q36. Your schema has 4 levels of nesting and the LLM produces malformed JSON 15% of the time. Propose two strategies to fix this: (a) flatten the schema, and (b) split into multiple LLM calls. Show both approaches.
Q37. Design a schema versioning strategy for a sentiment analysis system. Start with v1.0, then show what changes for v1.1 (add optional field), v1.2 (add new enum value), and v2.0 (add required field). For each version, explain whether it's backward-compatible.
Q38. Write a schema documentation template that includes: field name, type, required/optional, allowed values, description, consumer (who uses this field), and example value. Fill it in for a 6-field content moderation schema.
Q39. A new team member adds these fields to the schema: "dt", "usr_nm", "isACtive", "items_LIST". Explain what's wrong with each name and provide the corrected version.
Q40. Design a flat schema and a nested schema for the same dataset: a customer order with personal info, shipping address, billing address, and 3 line items. Compare token counts (approximate), parseability, and code ergonomics.
Q41. Write a migration guide for moving from schema v1.0 to v2.0 of your email classification system. Include: what changed, who is affected, the transition plan (canary deploy), and the code that handles both versions during the migration.
Q42. You need the LLM to return dates. Compare these approaches: (a) ISO 8601 strings "2025-06-15", (b) Unix timestamps 1750032000, (c) human-readable "June 15, 2025". Which is best for structured output and why?
Q43. Design a schema for a multi-step LLM pipeline where Step 1 extracts raw data, Step 2 enriches it, and Step 3 scores it. Each step's output is the next step's input. How do you design schemas that chain cleanly?
Q44. Your team uses TypeScript. Write a TypeScript interface that matches your LLM output schema, then write a type guard function that validates a parsed JSON object matches the interface at runtime.
Answer Hints
| Q | Hint |
|---|---|
| Q1 | includes("spam") is true, but the response says "likely" and "could also be legitimate" — no clear boolean |
| Q4 | "not positive" contains "positive"; "the positive aspect doesn't outweigh negatives" contains "positive"; "I wouldn't call it positive" contains "positive" |
| Q5 | Silent corruption is more dangerous — errors that throw are caught immediately; wrong data without errors propagates through the system undetected |
| Q8 | NaN * taxRate = NaN, NaN.toFixed(2) = "NaN", displayed to user as "$NaN" |
| Q11 | JSON: API/nested data (most cases); XML: legacy SOAP systems; CSV: tabular data/spreadsheet export |
| Q14 | Unstructured: ~50-110 lines regex/parsing; Structured: ~15-25 lines prompt+parse+validate; reduction ~75-85% |
| Q15 | Strip ```json prefix and ``` suffix; find first { and last }; use content.substring(firstBrace, lastBrace+1) |
| Q18 | Double the cost, double the latency, second LLM can also fail, adds complexity without solving the root problem |
| Q20 | 3% of 10K = 300 failures * 5 min = 1,500 min = 25 hours/day; 0.01% of 10K = 1 failure * 5 min = 5 min/day; savings = ~25 hours/day |
| Q31 | Labels should always be in English (enum values are code, not display text); localization happens in the UI layer |
| Q33 | { firstName: '', lastName: '', emailAddress: '', dateOfBirth: '', isActive: '', phoneNumber: '' } |
| Q35 | switch(data.status) with case "active" — if LLM returns "Active" or "ACTIVE" instead, no case matches and hits default |
| Q37 | v1.1: backward-compatible (new optional field); v1.2: backward-compatible if consumers handle unknown enum values; v2.0: NOT backward-compatible (new required field) |
| Q42 | ISO 8601 strings: unambiguous, timezone-aware, easy to parse with new Date(), LLMs produce them reliably |