Episode 4 — Generative AI Engineering / 4.4 — Structured Output in AI Systems

4.4 — Structured Output in AI Systems: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

  1. Skim before labs or interviews.
  2. Drill gaps — reopen README.md4.4.a4.4.d.
  3. Practice4.4-Exercise-Questions.md.
  4. Polish answers4.4-Interview-Questions.md.

Core vocabulary

TermOne-liner
Structured outputLLM response in a predefined, parseable format (JSON/XML/CSV) instead of free-form text
Unstructured outputFree-form natural language — infinite variations, unparseable by code
SchemaThe contract defining the exact shape, fields, types, and constraints of the output
EnumA restricted set of allowed string values (e.g., "positive" | "negative" | "neutral")
Silent data corruptionWrong data enters the system without throwing an error — worse than a crash
Regex escalation spiralAdding more regex to handle edge cases → more complexity → more bugs → more regex
Schema versioningTracking changes to schemas with major/minor versions for backward compatibility
Type safetyValidating that each field has the expected data type before using it in code

The core problem

WITHOUT structured output:
  LLM → "The sentiment is positive with 92% confidence" → regex parsing → BREAKS

WITH structured output:
  LLM → {"sentiment": "positive", "confidence": 0.92} → JSON.parse() → WORKS

RULE: Don't parse free text. Make the LLM produce structured data directly.

Format comparison

JSON  → 95% of use cases. Native JS. Best LLM reliability. Nested data.
XML   → Legacy systems. Verbose. Medium LLM reliability.
CSV   → Flat tabular data. Spreadsheet export. Medium LLM reliability.

DEFAULT CHOICE: JSON. Always JSON unless you have a specific reason not to.

Structured output benefits stack

Layer 1: FORMAT        → Standard parsers (JSON.parse), no custom regex
Layer 2: PREDICTABLE   → Same fields every time → reliable pipelines
Layer 3: TYPE SAFE     → Validation catches wrong types before they propagate
Layer 4: LESS CODE     → ~80% reduction in parsing code
Layer 5: ERROR HANDLING → "Missing field X" vs "regex didn't match something"
Layer 6: API CONTRACT  → Consistent response shape for all consumers

Basic structured output pattern

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  temperature: 0,  // Always 0 for structured output
  messages: [
    {
      role: 'system',
      content: `Analyze sentiment. Respond with ONLY valid JSON:
{ "sentiment": "positive"|"negative"|"neutral", "confidence": number }`
    },
    { role: 'user', content: userText },
  ],
});

let content = response.choices[0].message.content.trim();

// Strip markdown fences (common LLM quirk)
if (content.startsWith('```')) {
  content = content.replace(/^```(?:json)?\n?/, '').replace(/\n?```$/, '');
}

const data = JSON.parse(content);

Common applications cheat sheet

ApplicationKey Schema FieldsUse Case
Resume parsingname, experience[], skills{}, education[]Candidate data extraction + filtering
Product metadatatitle, description, tags[], categoryE-commerce catalog generation
Content moderationflagged, severity, categories[], suggestedActionAuto-moderate + human review queue
Scoring engineoverallScore, breakdown[], strengths[], weaknesses[]Job matching, compatibility
Email classificationintent, urgency, department, extractedEntities{}Auto-routing + priority
Sentiment analysissentiment, score, confidence, aspects[]Reviews, feedback, social media
Document extractioninvoiceNumber, lineItems[], totalAmountInvoices, receipts, contracts

Schema design rules

NAMING:
  ✓ camelCase for JavaScript projects
  ✓ Descriptive names: firstName (not fn)
  ✓ Booleans read as questions: isActive, hasAttachment, needsReview
  ✓ Arrays are plural: tags, skills, items
  ✓ Consistent prefixes: shippingAddress, billingAddress (not shippingAddr, billing_address)
  ✗ Never mix conventions in one schema

TYPES:
  string     → Names, descriptions, summaries
  number     → Scores, prices, quantities (specify integer vs decimal)
  boolean    → Yes/no decisions (not "yes"/"no" strings)
  array      → Lists of items (specify min/max length)
  enum       → THE MOST IMPORTANT — constrains LLM to exact allowed values
  null       → Optional field not present in input

REQUIRED vs OPTIONAL:
  Required   → LLM can always determine from input (sentiment, category)
  Optional   → May not be in input — use null, NOT fabricated data
  RULE: Explicit instruction: "Return null for optional fields not in the text"

NESTING:
  1-2 levels → Reliable. Recommended.
  3 levels   → OK but test carefully. LLMs sometimes misplace braces.
  4+ levels  → Avoid. High malformed JSON rate. Flatten or split calls.

Error handling strategy

Failure Mode               → Recovery Action
─────────────────────────────────────────────
Empty response             → Retry the request
Non-JSON response          → Strip markdown fences, retry with stricter prompt
Missing required field     → Retry or use default value
Invalid enum value         → Map to closest valid value or retry
Wrong type (string vs num) → Attempt type coercion or retry
All retries fail           → Return safe default + log for review

IMPLEMENTATION: Retry wrapper with exponential backoff (max 2 retries)

Robust JSON extraction

function extractJSON(content) {
  let text = content.trim();
  
  // Strip markdown code fences
  if (text.startsWith('```')) {
    text = text.replace(/^```(?:json)?\n?/, '').replace(/\n?```$/, '');
  }
  
  // Find JSON object boundaries
  const firstBrace = text.indexOf('{');
  const lastBrace = text.lastIndexOf('}');
  if (firstBrace !== -1 && lastBrace !== -1) {
    text = text.substring(firstBrace, lastBrace + 1);
  }
  
  return JSON.parse(text);
}

Schema versioning

MINOR version (1.0 → 1.1): Backward compatible
  + New optional field
  + New enum value (if consumers handle unknown values)

MAJOR version (1.x → 2.0): Breaking change
  + New required field
  - Removed field
  ~ Changed field type
  ~ Renamed field

MIGRATION:
  1. Deploy code that handles BOTH v1 and v2
  2. Canary: 10% traffic on v2 prompt
  3. Monitor error rates + compare outputs
  4. Gradual rollout: 10% → 50% → 100%
  5. Remove v1 code after 2-week transition

ALWAYS include _schemaVersion in output for routing.

Token cost awareness

Structured JSON is more expensive than plain text:
  { "name": "John" }  → ~7 tokens (braces, quotes, colon, spaces)
  John                 → ~1 token

But the reliability trade-off is worth it:
  5% more tokens per response
  vs
  80% less parsing code + 99% fewer parsing errors

OPTIMIZATION: Use flat schemas when possible (fewer braces = fewer tokens)

Temperature for structured output

ALWAYS temperature: 0 for structured output
  → Deterministic, consistent, parseable

Exception: If the schema includes a creative field (e.g., "generatedDescription")
  → Consider splitting into two calls:
    Call 1: temp 0 → structured analysis
    Call 2: temp 0.5 → creative generation using analysis as context

Common gotchas

GotchaWhy
LLM wraps JSON in markdown fencesTraining data has lots of markdown — strip ```json ```
LLM adds explanation text around JSONPrompt says "Respond with ONLY JSON" — enforce in sanitization
LLM returns "true" (string) not true (boolean)Validate types; coerce "true"true if needed
LLM returns "$29.99" not 29.99Specify: "number (no currency symbol)"
Deep nesting causes malformed JSONKeep to 2-3 levels max
All-required schema causes hallucinationUse optional + null for fields that may not be in input
Mixed naming conventionsPick one (camelCase) and enforce everywhere
No schema versioningSchema changes break consumers silently

The universal pattern

1. DESIGN the schema (fields, types, required/optional, enums)
2. WRITE the prompt with the exact JSON shape
3. CALL the LLM with temperature 0
4. SANITIZE the response (strip fences, trim)
5. PARSE with JSON.parse()
6. VALIDATE against the schema
7. USE the typed, validated data in your application
8. HANDLE ERRORS with retry + fallback

Quick comparisons

Unstructured parsing:     ~110 lines of fragile regex code
Structured output:        ~21 lines of robust parse + validate

Unstructured failure rate: 3-5% (regex doesn't match)
Structured failure rate:   0.01-0.1% (malformed JSON)

Unstructured error type:   Silent (null propagates, data corrupts)
Structured error type:     Loud (JSON.parse throws, validation rejects)

Unstructured testing:      Nearly impossible (output varies)
Structured testing:        Straightforward (assert on field values)

End of 4.4 quick revision.