Episode 4 — Generative AI Engineering / 4.5 — Generating JSON Responses from LLMs
4.5.d — Validating Returned Structure
In one sentence: Never trust AI-generated JSON blindly — always validate the parsed structure against your expected schema, check types, handle missing or extra fields, and build a retry loop that re-prompts the model when validation fails.
Navigation: ← 4.5.c — Function Calling Basics · 4.5.e — Building Structured Profile Analysis →
1. Why You MUST Validate Before Using AI-Generated JSON
Even with JSON mode enabled and a perfect schema in your prompt, things go wrong:
┌─────────────────────────────────────────────────────────────────────────┐
│ WHAT CAN GO WRONG │
│ │
│ 1. JSON.parse() fails → Model returned invalid syntax │
│ 2. Wrong field names → "userName" instead of "user_name" │
│ 3. Wrong types → age: "thirty" instead of age: 30 │
│ 4. Missing required fields → No "email" field at all │
│ 5. Extra unexpected fields → Random fields you didn't ask for │
│ 6. Wrong value ranges → score: 150 when max is 100 │
│ 7. Wrong array length → 0 items when you asked for 3-5 │
│ 8. Truncated response → max_tokens hit, JSON cut off mid-way │
│ 9. Nested structure wrong → Flat object instead of nested │
│ 10. Null where not expected → location: null when it should be a │
│ string │
└─────────────────────────────────────────────────────────────────────────┘
If your code blindly accesses data.compatibility_score and it doesn't exist, you get a runtime crash. If data.age is the string "thirty" and you do math on it, you get NaN. Validation is not optional.
2. Layer 1: Safe JSON Parsing
The first defense — JSON.parse() can throw. Always wrap it:
function safeJsonParse(text) {
try {
return { success: true, data: JSON.parse(text) };
} catch (error) {
return { success: false, error: error.message, raw: text };
}
}
// Usage
const result = safeJsonParse(response.choices[0].message.content);
if (!result.success) {
console.error('Failed to parse JSON:', result.error);
console.error('Raw response:', result.raw);
// Handle: retry, fallback, or error response
} else {
const data = result.data;
// Continue with validation...
}
Common parse failures and fixes
| Failure | Cause | Fix |
|---|---|---|
Unexpected token at position 0 | Model included text before JSON | Strip prefix text, find { |
Unexpected end of JSON input | Response was truncated (max_tokens) | Increase max_tokens, check finish_reason |
| `Unexpected token `` | Model wrapped JSON in markdown code fences | Strip ```json and ``` |
Unterminated string | JSON cut off inside a string value | Truncation issue — retry with more tokens |
Robust parser with common fix-ups
function robustJsonParse(text) {
// Attempt 1: Direct parse
try {
return JSON.parse(text);
} catch (e) {
// Attempt 2: Strip markdown code fences
const fenceMatch = text.match(/```(?:json)?\s*([\s\S]*?)```/);
if (fenceMatch) {
try {
return JSON.parse(fenceMatch[1].trim());
} catch (e2) { /* fall through */ }
}
// Attempt 3: Find JSON object in text
const objectMatch = text.match(/\{[\s\S]*\}/);
if (objectMatch) {
try {
return JSON.parse(objectMatch[0]);
} catch (e3) { /* fall through */ }
}
// Attempt 4: Find JSON array in text
const arrayMatch = text.match(/\[[\s\S]*\]/);
if (arrayMatch) {
try {
return JSON.parse(arrayMatch[0]);
} catch (e4) { /* fall through */ }
}
throw new Error(`Could not parse JSON from response: ${text.substring(0, 200)}`);
}
}
Important: When using JSON mode, you should rarely need the fallback attempts. But for Anthropic/Claude or models without JSON mode, the robust parser is valuable.
3. Layer 2: Schema Validation with Basic Checks
After parsing, validate the structure matches your expectations:
function validateProfileData(data) {
const errors = [];
// Check required fields exist
const requiredFields = ['name', 'age', 'interests', 'compatibility_score'];
for (const field of requiredFields) {
if (data[field] === undefined) {
errors.push(`Missing required field: "${field}"`);
}
}
// Check types
if (data.name !== undefined && typeof data.name !== 'string') {
errors.push(`"name" must be a string, got ${typeof data.name}`);
}
if (data.age !== undefined && typeof data.age !== 'number') {
errors.push(`"age" must be a number, got ${typeof data.age}`);
}
if (data.interests !== undefined && !Array.isArray(data.interests)) {
errors.push(`"interests" must be an array, got ${typeof data.interests}`);
}
if (data.compatibility_score !== undefined) {
if (typeof data.compatibility_score !== 'number') {
errors.push(`"compatibility_score" must be a number, got ${typeof data.compatibility_score}`);
} else if (data.compatibility_score < 0 || data.compatibility_score > 100) {
errors.push(`"compatibility_score" must be 0-100, got ${data.compatibility_score}`);
}
}
return {
valid: errors.length === 0,
errors,
data
};
}
// Usage
const parsed = JSON.parse(response.choices[0].message.content);
const validation = validateProfileData(parsed);
if (!validation.valid) {
console.error('Validation failed:', validation.errors);
// Handle: retry, fix, or reject
} else {
// Safe to use validation.data
processProfile(validation.data);
}
4. Layer 3: Type Checking Returned Fields
Go deeper than just checking existence — verify types, ranges, and constraints:
// Type checking utilities
function isString(value) {
return typeof value === 'string';
}
function isNumber(value) {
return typeof value === 'number' && !isNaN(value) && isFinite(value);
}
function isInteger(value) {
return isNumber(value) && Number.isInteger(value);
}
function isBoolean(value) {
return typeof value === 'boolean';
}
function isArrayOf(value, typeCheck) {
return Array.isArray(value) && value.every(typeCheck);
}
function isOneOf(value, allowedValues) {
return allowedValues.includes(value);
}
function isInRange(value, min, max) {
return isNumber(value) && value >= min && value <= max;
}
function isNullableString(value) {
return value === null || typeof value === 'string';
}
// Comprehensive validator
function validateCompatibilityAnalysis(data) {
const errors = [];
// Type + constraint checks
if (!isInteger(data.compatibility_score)) {
errors.push('compatibility_score must be an integer');
} else if (!isInRange(data.compatibility_score, 0, 100)) {
errors.push('compatibility_score must be between 0 and 100');
}
if (!isArrayOf(data.strengths, isString)) {
errors.push('strengths must be an array of strings');
} else if (data.strengths.length < 1 || data.strengths.length > 5) {
errors.push('strengths must have 1-5 items');
}
if (!isArrayOf(data.weaknesses, isString)) {
errors.push('weaknesses must be an array of strings');
}
if (!isArrayOf(data.suggested_openers, isString)) {
errors.push('suggested_openers must be an array of strings');
} else if (data.suggested_openers.length < 1 || data.suggested_openers.length > 3) {
errors.push('suggested_openers must have 1-3 items');
}
if (data.confidence !== undefined && !isOneOf(data.confidence, ['high', 'medium', 'low'])) {
errors.push('confidence must be "high", "medium", or "low"');
}
return { valid: errors.length === 0, errors };
}
5. Layer 4: Handling Missing or Extra Fields
Missing fields — provide defaults
function applyDefaults(data, defaults) {
const result = { ...defaults, ...data };
// Deep merge for nested objects
for (const key of Object.keys(defaults)) {
if (
defaults[key] !== null &&
typeof defaults[key] === 'object' &&
!Array.isArray(defaults[key]) &&
data[key] !== undefined &&
typeof data[key] === 'object'
) {
result[key] = { ...defaults[key], ...data[key] };
}
}
return result;
}
// Define defaults for your schema
const profileDefaults = {
name: 'Unknown',
age: null,
interests: [],
location: null,
compatibility_score: 50,
strengths: [],
weaknesses: [],
suggested_openers: [],
confidence: 'low'
};
// Apply defaults to fill gaps
const rawData = JSON.parse(response.choices[0].message.content);
const data = applyDefaults(rawData, profileDefaults);
// Now data always has all expected fields
Extra fields — strip them
function stripExtraFields(data, allowedFields) {
const result = {};
for (const field of allowedFields) {
if (data[field] !== undefined) {
result[field] = data[field];
}
}
return result;
}
const allowedFields = [
'name', 'age', 'interests', 'location',
'compatibility_score', 'strengths', 'weaknesses', 'suggested_openers'
];
const cleanData = stripExtraFields(rawData, allowedFields);
// Any fields the model added that you didn't expect are removed
Why strip extra fields?
// Model sometimes adds unwanted fields
const modelResponse = {
name: "Alice",
age: 30,
interests: ["hiking"],
compatibility_score: 85,
strengths: ["outdoor lover"],
weaknesses: [],
suggested_openers: ["Ask about hiking!"],
// Extra fields the model decided to add:
personality_type: "ENFP", // Didn't ask for this
zodiac_sign: "Leo", // Definitely didn't ask for this
_internal_reasoning: "Based on..." // Model's chain-of-thought leaked
};
// Stripping ensures only YOUR expected fields go to your database/API
6. Layer 5: Retry on Validation Failure
When validation fails, don't give up — try again with more explicit instructions:
async function callWithRetry(messages, validateFn, maxRetries = 3) {
let lastError = null;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0,
response_format: { type: 'json_object' },
messages: messages,
});
// Check for truncation
if (response.choices[0].finish_reason === 'length') {
throw new Error('Response truncated — JSON likely incomplete');
}
// Parse
const data = JSON.parse(response.choices[0].message.content);
// Validate
const validation = validateFn(data);
if (validation.valid) {
return { success: true, data, attempts: attempt };
}
// Validation failed — add error feedback for retry
lastError = validation.errors.join('; ');
console.warn(`Attempt ${attempt} validation failed: ${lastError}`);
// Add the failed response and error feedback to messages for retry
messages = [
...messages,
{ role: 'assistant', content: response.choices[0].message.content },
{
role: 'user',
content: `The JSON you returned has validation errors: ${lastError}. Please fix these issues and return the corrected JSON.`
}
];
} catch (error) {
lastError = error.message;
console.warn(`Attempt ${attempt} error: ${lastError}`);
// For parse errors, provide different feedback
if (error instanceof SyntaxError) {
messages = [
...messages,
{
role: 'user',
content: 'Your previous response was not valid JSON. Please return ONLY a valid JSON object.'
}
];
}
}
}
return { success: false, error: lastError, attempts: maxRetries };
}
Usage example
const messages = [
{
role: 'system',
content: `Analyze dating profiles and return JSON with:
- "compatibility_score" (integer, 0-100)
- "strengths" (array of 2-4 strings)
- "weaknesses" (array of 0-3 strings)
- "suggested_openers" (array of 2-3 strings)`
},
{
role: 'user',
content: 'Profile 1: Maya, 27, loves hiking. Profile 2: Alex, 29, enjoys cooking and outdoor adventures.'
}
];
const result = await callWithRetry(messages, validateCompatibilityAnalysis);
if (result.success) {
console.log(`Success after ${result.attempts} attempt(s):`);
console.log(result.data);
} else {
console.error(`Failed after ${result.attempts} attempts: ${result.error}`);
// Return a graceful error or default response to the user
}
7. Building a Complete Validate-or-Retry Pipeline
Here's a production-grade pipeline that combines all layers:
class JSONResponsePipeline {
constructor(openai, config = {}) {
this.openai = openai;
this.model = config.model || 'gpt-4o';
this.maxRetries = config.maxRetries || 3;
this.temperature = config.temperature || 0;
this.maxTokens = config.maxTokens || 2048;
}
async execute(messages, schema) {
let currentMessages = [...messages];
for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
const startTime = Date.now();
try {
// Step 1: API Call
const response = await this.openai.chat.completions.create({
model: this.model,
temperature: this.temperature,
max_tokens: this.maxTokens,
response_format: { type: 'json_object' },
messages: currentMessages,
});
const raw = response.choices[0].message.content;
const finishReason = response.choices[0].finish_reason;
// Step 2: Check for truncation
if (finishReason === 'length') {
throw new ValidationError(
'TRUNCATED',
'Response was truncated. Increase max_tokens or simplify the request.'
);
}
// Step 3: Parse JSON
let data;
try {
data = JSON.parse(raw);
} catch (parseError) {
throw new ValidationError('PARSE_ERROR', `Invalid JSON: ${parseError.message}`);
}
// Step 4: Validate against schema
const validation = schema.validate(data);
if (!validation.valid) {
throw new ValidationError('SCHEMA_ERROR', validation.errors.join('; '));
}
// Step 5: Clean data (apply defaults, strip extras)
const cleanData = schema.clean(data);
// Step 6: Return success
return {
success: true,
data: cleanData,
metadata: {
attempt,
latencyMs: Date.now() - startTime,
tokensUsed: response.usage,
finishReason,
}
};
} catch (error) {
console.warn(`Attempt ${attempt}/${this.maxRetries}: ${error.message}`);
if (attempt === this.maxRetries) {
return {
success: false,
error: error.message,
errorType: error.type || 'UNKNOWN',
metadata: { attempt, latencyMs: Date.now() - startTime }
};
}
// Build retry messages with error context
currentMessages = this.buildRetryMessages(currentMessages, error);
}
}
}
buildRetryMessages(messages, error) {
const errorFeedback = {
TRUNCATED: 'Your response was too long and got cut off. Return a shorter, more concise JSON response.',
PARSE_ERROR: 'Your response was not valid JSON. Return ONLY a valid JSON object with no other text.',
SCHEMA_ERROR: `Your JSON had validation errors: ${error.message}. Please fix these issues.`
};
return [
...messages,
{
role: 'user',
content: errorFeedback[error.type] || `Error: ${error.message}. Please try again.`
}
];
}
}
class ValidationError extends Error {
constructor(type, message) {
super(message);
this.type = type;
}
}
Defining a schema for the pipeline
const compatibilitySchema = {
validate(data) {
const errors = [];
if (typeof data.compatibility_score !== 'number' || data.compatibility_score < 0 || data.compatibility_score > 100) {
errors.push('compatibility_score must be a number between 0 and 100');
}
if (!Array.isArray(data.strengths) || data.strengths.length === 0) {
errors.push('strengths must be a non-empty array');
}
if (!Array.isArray(data.weaknesses)) {
errors.push('weaknesses must be an array');
}
if (!Array.isArray(data.suggested_openers) || data.suggested_openers.length === 0) {
errors.push('suggested_openers must be a non-empty array');
}
return { valid: errors.length === 0, errors };
},
clean(data) {
return {
compatibility_score: Math.round(data.compatibility_score),
strengths: data.strengths.slice(0, 5),
weaknesses: data.weaknesses.slice(0, 3),
suggested_openers: data.suggested_openers.slice(0, 3),
};
}
};
Using the pipeline
const pipeline = new JSONResponsePipeline(openai, {
model: 'gpt-4o',
maxRetries: 3,
maxTokens: 1024,
});
const result = await pipeline.execute(
[
{
role: 'system',
content: `Analyze two dating profiles for compatibility. Return JSON:
{
"compatibility_score": integer 0-100,
"strengths": ["string", ...],
"weaknesses": ["string", ...],
"suggested_openers": ["string", ...]
}`
},
{
role: 'user',
content: 'Profile A: Maya, 27, hiking, coffee, coding. Profile B: Alex, 29, cooking, trail running, tech.'
}
],
compatibilitySchema
);
if (result.success) {
console.log('Score:', result.data.compatibility_score);
console.log('Attempts:', result.metadata.attempt);
console.log('Tokens used:', result.metadata.tokensUsed);
} else {
console.error('Pipeline failed:', result.error);
// Show user a graceful error or fallback
}
8. Validation Strategy by Risk Level
Different use cases need different validation rigor:
| Risk Level | Example | Validation Strategy |
|---|---|---|
| Low | Display suggestions in UI | Parse + basic type check. Show "unable to generate" on failure. |
| Medium | Save to database | Parse + full schema validation + defaults + retry once. |
| High | Financial calculation, API contract | Parse + strict schema + type checking + range validation + retry up to 3 times + human fallback. |
| Critical | Medical data, legal documents | All of the above + human review before use. Never auto-process. |
9. Common Validation Patterns
Pattern 1: Coerce types when safe
function coerceTypes(data, schema) {
const coerced = { ...data };
for (const [field, type] of Object.entries(schema)) {
if (coerced[field] === undefined) continue;
switch (type) {
case 'number':
if (typeof coerced[field] === 'string') {
const num = Number(coerced[field]);
if (!isNaN(num)) coerced[field] = num;
}
break;
case 'integer':
if (typeof coerced[field] === 'string') {
const int = parseInt(coerced[field], 10);
if (!isNaN(int)) coerced[field] = int;
} else if (typeof coerced[field] === 'number') {
coerced[field] = Math.round(coerced[field]);
}
break;
case 'boolean':
if (coerced[field] === 'true') coerced[field] = true;
if (coerced[field] === 'false') coerced[field] = false;
break;
case 'string':
if (typeof coerced[field] !== 'string') {
coerced[field] = String(coerced[field]);
}
break;
}
}
return coerced;
}
// Usage
const raw = { age: "30", score: "85.7", active: "true" };
const coerced = coerceTypes(raw, { age: 'integer', score: 'number', active: 'boolean' });
// { age: 30, score: 85.7, active: true }
Pattern 2: Clamp values to valid ranges
function clampValue(value, min, max) {
return Math.max(min, Math.min(max, value));
}
// Model returned score: 150 (out of range)
data.compatibility_score = clampValue(data.compatibility_score, 0, 100);
// Now it's 100
Pattern 3: Normalize string values
function normalizeEnum(value, allowedValues, defaultValue) {
const normalized = value.toLowerCase().trim();
if (allowedValues.includes(normalized)) return normalized;
// Fuzzy match
const close = allowedValues.find(v => normalized.includes(v) || v.includes(normalized));
return close || defaultValue;
}
// Model returned "Medium confidence" instead of "medium"
const confidence = normalizeEnum(data.confidence, ['high', 'medium', 'low'], 'medium');
// Returns "medium"
10. Key Takeaways
- Never trust AI-generated JSON blindly — always validate after parsing, even with JSON mode enabled.
- Validation has five layers: safe parse → schema check → type check → range/constraint check → clean/normalize.
JSON.parse()can fail even with JSON mode — always wrap in try/catch and checkfinish_reasonfor truncation.- Handle missing fields with defaults, not crashes. Handle extra fields by stripping them.
- Type coercion (string "30" → number 30) is often safer than rejection for non-critical fields.
- Build a retry loop — when validation fails, feed the errors back to the model and ask it to fix them.
- Match validation rigor to risk level — UI suggestions need less than financial calculations.
- A validate-or-retry pipeline (parse → validate → clean → or retry with feedback) is the production pattern.
Explain-It Challenge
- A teammate says "JSON mode guarantees valid JSON, so we don't need validation." List three things that can still go wrong.
- Why is it more effective to feed validation errors back to the model (in the retry) rather than just making the same request again?
- When should you coerce a wrong type (string "30" to number 30) vs reject the response and retry?
Navigation: ← 4.5.c — Function Calling Basics · 4.5.e — Building Structured Profile Analysis →