Episode 4 — Generative AI Engineering / 4.4 — Structured Output in AI Systems
4.4 — Structured Output in AI Systems: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps — reopen
README.md→4.4.a…4.4.d. - Practice —
4.4-Exercise-Questions.md. - Polish answers —
4.4-Interview-Questions.md.
Core vocabulary
| Term | One-liner |
|---|---|
| Structured output | LLM response in a predefined, parseable format (JSON/XML/CSV) instead of free-form text |
| Unstructured output | Free-form natural language — infinite variations, unparseable by code |
| Schema | The contract defining the exact shape, fields, types, and constraints of the output |
| Enum | A restricted set of allowed string values (e.g., "positive" | "negative" | "neutral") |
| Silent data corruption | Wrong data enters the system without throwing an error — worse than a crash |
| Regex escalation spiral | Adding more regex to handle edge cases → more complexity → more bugs → more regex |
| Schema versioning | Tracking changes to schemas with major/minor versions for backward compatibility |
| Type safety | Validating that each field has the expected data type before using it in code |
The core problem
WITHOUT structured output:
LLM → "The sentiment is positive with 92% confidence" → regex parsing → BREAKS
WITH structured output:
LLM → {"sentiment": "positive", "confidence": 0.92} → JSON.parse() → WORKS
RULE: Don't parse free text. Make the LLM produce structured data directly.
Format comparison
JSON → 95% of use cases. Native JS. Best LLM reliability. Nested data.
XML → Legacy systems. Verbose. Medium LLM reliability.
CSV → Flat tabular data. Spreadsheet export. Medium LLM reliability.
DEFAULT CHOICE: JSON. Always JSON unless you have a specific reason not to.
Structured output benefits stack
Layer 1: FORMAT → Standard parsers (JSON.parse), no custom regex
Layer 2: PREDICTABLE → Same fields every time → reliable pipelines
Layer 3: TYPE SAFE → Validation catches wrong types before they propagate
Layer 4: LESS CODE → ~80% reduction in parsing code
Layer 5: ERROR HANDLING → "Missing field X" vs "regex didn't match something"
Layer 6: API CONTRACT → Consistent response shape for all consumers
Basic structured output pattern
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0, // Always 0 for structured output
messages: [
{
role: 'system',
content: `Analyze sentiment. Respond with ONLY valid JSON:
{ "sentiment": "positive"|"negative"|"neutral", "confidence": number }`
},
{ role: 'user', content: userText },
],
});
let content = response.choices[0].message.content.trim();
// Strip markdown fences (common LLM quirk)
if (content.startsWith('```')) {
content = content.replace(/^```(?:json)?\n?/, '').replace(/\n?```$/, '');
}
const data = JSON.parse(content);
Common applications cheat sheet
| Application | Key Schema Fields | Use Case |
|---|---|---|
| Resume parsing | name, experience[], skills{}, education[] | Candidate data extraction + filtering |
| Product metadata | title, description, tags[], category | E-commerce catalog generation |
| Content moderation | flagged, severity, categories[], suggestedAction | Auto-moderate + human review queue |
| Scoring engine | overallScore, breakdown[], strengths[], weaknesses[] | Job matching, compatibility |
| Email classification | intent, urgency, department, extractedEntities{} | Auto-routing + priority |
| Sentiment analysis | sentiment, score, confidence, aspects[] | Reviews, feedback, social media |
| Document extraction | invoiceNumber, lineItems[], totalAmount | Invoices, receipts, contracts |
Schema design rules
NAMING:
✓ camelCase for JavaScript projects
✓ Descriptive names: firstName (not fn)
✓ Booleans read as questions: isActive, hasAttachment, needsReview
✓ Arrays are plural: tags, skills, items
✓ Consistent prefixes: shippingAddress, billingAddress (not shippingAddr, billing_address)
✗ Never mix conventions in one schema
TYPES:
string → Names, descriptions, summaries
number → Scores, prices, quantities (specify integer vs decimal)
boolean → Yes/no decisions (not "yes"/"no" strings)
array → Lists of items (specify min/max length)
enum → THE MOST IMPORTANT — constrains LLM to exact allowed values
null → Optional field not present in input
REQUIRED vs OPTIONAL:
Required → LLM can always determine from input (sentiment, category)
Optional → May not be in input — use null, NOT fabricated data
RULE: Explicit instruction: "Return null for optional fields not in the text"
NESTING:
1-2 levels → Reliable. Recommended.
3 levels → OK but test carefully. LLMs sometimes misplace braces.
4+ levels → Avoid. High malformed JSON rate. Flatten or split calls.
Error handling strategy
Failure Mode → Recovery Action
─────────────────────────────────────────────
Empty response → Retry the request
Non-JSON response → Strip markdown fences, retry with stricter prompt
Missing required field → Retry or use default value
Invalid enum value → Map to closest valid value or retry
Wrong type (string vs num) → Attempt type coercion or retry
All retries fail → Return safe default + log for review
IMPLEMENTATION: Retry wrapper with exponential backoff (max 2 retries)
Robust JSON extraction
function extractJSON(content) {
let text = content.trim();
// Strip markdown code fences
if (text.startsWith('```')) {
text = text.replace(/^```(?:json)?\n?/, '').replace(/\n?```$/, '');
}
// Find JSON object boundaries
const firstBrace = text.indexOf('{');
const lastBrace = text.lastIndexOf('}');
if (firstBrace !== -1 && lastBrace !== -1) {
text = text.substring(firstBrace, lastBrace + 1);
}
return JSON.parse(text);
}
Schema versioning
MINOR version (1.0 → 1.1): Backward compatible
+ New optional field
+ New enum value (if consumers handle unknown values)
MAJOR version (1.x → 2.0): Breaking change
+ New required field
- Removed field
~ Changed field type
~ Renamed field
MIGRATION:
1. Deploy code that handles BOTH v1 and v2
2. Canary: 10% traffic on v2 prompt
3. Monitor error rates + compare outputs
4. Gradual rollout: 10% → 50% → 100%
5. Remove v1 code after 2-week transition
ALWAYS include _schemaVersion in output for routing.
Token cost awareness
Structured JSON is more expensive than plain text:
{ "name": "John" } → ~7 tokens (braces, quotes, colon, spaces)
John → ~1 token
But the reliability trade-off is worth it:
5% more tokens per response
vs
80% less parsing code + 99% fewer parsing errors
OPTIMIZATION: Use flat schemas when possible (fewer braces = fewer tokens)
Temperature for structured output
ALWAYS temperature: 0 for structured output
→ Deterministic, consistent, parseable
Exception: If the schema includes a creative field (e.g., "generatedDescription")
→ Consider splitting into two calls:
Call 1: temp 0 → structured analysis
Call 2: temp 0.5 → creative generation using analysis as context
Common gotchas
| Gotcha | Why |
|---|---|
| LLM wraps JSON in markdown fences | Training data has lots of markdown — strip ```json ``` |
| LLM adds explanation text around JSON | Prompt says "Respond with ONLY JSON" — enforce in sanitization |
| LLM returns "true" (string) not true (boolean) | Validate types; coerce "true" → true if needed |
| LLM returns "$29.99" not 29.99 | Specify: "number (no currency symbol)" |
| Deep nesting causes malformed JSON | Keep to 2-3 levels max |
| All-required schema causes hallucination | Use optional + null for fields that may not be in input |
| Mixed naming conventions | Pick one (camelCase) and enforce everywhere |
| No schema versioning | Schema changes break consumers silently |
The universal pattern
1. DESIGN the schema (fields, types, required/optional, enums)
2. WRITE the prompt with the exact JSON shape
3. CALL the LLM with temperature 0
4. SANITIZE the response (strip fences, trim)
5. PARSE with JSON.parse()
6. VALIDATE against the schema
7. USE the typed, validated data in your application
8. HANDLE ERRORS with retry + fallback
Quick comparisons
Unstructured parsing: ~110 lines of fragile regex code
Structured output: ~21 lines of robust parse + validate
Unstructured failure rate: 3-5% (regex doesn't match)
Structured failure rate: 0.01-0.1% (malformed JSON)
Unstructured error type: Silent (null propagates, data corrupts)
Structured error type: Loud (JSON.parse throws, validation rejects)
Unstructured testing: Nearly impossible (output varies)
Structured testing: Straightforward (assert on field values)
End of 4.4 quick revision.