Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod
4.6.e — Retry Strategies
In one sentence: When AI output fails validation, retry with the validation errors fed back to the model — this self-correcting loop, combined with exponential backoff and cost controls, is the production pattern for achieving reliable structured output from probabilistic models.
Navigation: ← 4.6.d Handling Invalid Responses · ← 4.6 Overview
1. When to Retry vs When to Fail
Not every validation failure deserves a retry. Retries cost money, add latency, and can amplify problems if the underlying issue is with your prompt, not the model's response.
Retry when:
- JSON parse failed but the model clearly attempted JSON (e.g., extra text around it)
- One or two fields have wrong types (string instead of number)
- Enum value is close but not exact ("somewhat positive" instead of "positive")
- A required field is missing but others are correct
- The response is structurally correct but violates a constraint (score: 150 instead of 0-100)
Fail immediately when:
- Response is completely non-JSON ("I'd be happy to help you with that!")
- Model refuses the request ("I cannot provide that analysis")
- Response is in a completely wrong format (HTML, XML, prose)
- Same error repeats after 2-3 retries (model cannot self-correct)
- The request itself is problematic (bad input data, impossible prompt)
Decision matrix
┌────────────────────────────────────────────────────────┐
│ SHOULD I RETRY? │
│ │
│ Q1: Is the response JSON-like? │
│ NO → Was the prompt clear about JSON format? │
│ NO → Fix prompt, don't retry │
│ YES → Retry once with stronger instruction │
│ YES → Continue to Q2 │
│ │
│ Q2: Is the structure approximately correct? │
│ NO → (wrong schema entirely) │
│ → Retry with schema in error message │
│ YES → Continue to Q3 │
│ │
│ Q3: Are the errors fixable? │
│ Type errors → Retry with specific field errors │
│ Range errors → Retry with constraint reminders │
│ Missing fields → Retry listing required fields │
│ Logic errors → Retry with business rule explained │
│ │
│ Q4: Have we already retried? │
│ < max_retries → Retry with accumulated context │
│ = max_retries → Fail gracefully │
└────────────────────────────────────────────────────────┘
2. Basic Retry Loop with Zod Validation
The simplest retry loop: call the AI, validate, retry if invalid.
import { z } from 'zod';
import OpenAI from 'openai';
const SentimentSchema = z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number().min(0).max(1),
reasoning: z.string().min(10),
});
type SentimentResult = z.infer<typeof SentimentSchema>;
async function analyzeSentimentWithRetry(
text: string,
maxRetries: number = 3,
): Promise<SentimentResult> {
const client = new OpenAI();
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
{
role: 'system',
content: `Analyze the sentiment of the given text.
Respond with JSON ONLY. No explanation, no markdown, no code fences.
Schema:
{
"sentiment": "positive" | "negative" | "neutral",
"confidence": number between 0 and 1,
"reasoning": "explanation string (at least 10 characters)"
}`,
},
{ role: 'user', content: text },
];
for (let attempt = 1; attempt <= maxRetries; attempt++) {
const response = await client.chat.completions.create({
model: 'gpt-4o',
temperature: 0,
response_format: { type: 'json_object' },
messages,
});
const rawContent = response.choices[0].message.content || '';
// Try to parse JSON
let parsed: unknown;
try {
parsed = JSON.parse(rawContent);
} catch {
// Feed JSON error back to model
messages.push(
{ role: 'assistant', content: rawContent },
{
role: 'user',
content: `Your response was not valid JSON. Please respond with ONLY a JSON object, no other text. Attempt ${attempt}/${maxRetries}.`,
},
);
continue;
}
// Validate with Zod
const result = SentimentSchema.safeParse(parsed);
if (result.success) {
if (attempt > 1) {
console.log(`Validation succeeded on attempt ${attempt}`);
}
return result.data;
}
// Feed validation errors back to model
const errorFeedback = result.error.issues
.map((i) => `- "${i.path.join('.')}": ${i.message}`)
.join('\n');
messages.push(
{ role: 'assistant', content: rawContent },
{
role: 'user',
content: `Your response had validation errors:\n${errorFeedback}\n\nPlease fix these issues and respond with valid JSON. Attempt ${attempt}/${maxRetries}.`,
},
);
}
throw new Error(
`Failed to get valid response after ${maxRetries} attempts`
);
}
3. Passing Validation Errors Back to the Model
The key insight: AI models can self-correct when you tell them exactly what was wrong. This is dramatically more effective than simply retrying the same prompt.
Error formatting strategies
import { z } from 'zod';
// Strategy 1: Simple error list
function formatErrorsSimple(error: z.ZodError): string {
return error.issues
.map((i) => `- Field "${i.path.join('.')}": ${i.message}`)
.join('\n');
}
// Strategy 2: Error with expected vs received
function formatErrorsDetailed(error: z.ZodError): string {
return error.issues
.map((i) => {
let msg = `- Field "${i.path.join('.')}": ${i.message}`;
if ('expected' in i) msg += ` (expected: ${i.expected}, got: ${i.received})`;
return msg;
})
.join('\n');
}
// Strategy 3: Error with the schema reminder
function formatErrorsWithSchema(error: z.ZodError, schemaDescription: string): string {
const errors = error.issues
.map((i) => `- "${i.path.join('.')}": ${i.message}`)
.join('\n');
return `Your response had validation errors:
${errors}
Expected schema:
${schemaDescription}
Please respond with corrected JSON only.`;
}
Example: what the model sees on retry
// First attempt from model:
{
"sentiment": "somewhat positive",
"confidence": 95,
"reasoning": "Good review"
}
// Error feedback sent to model:
Your response had validation errors:
- "sentiment": Invalid enum value. Expected 'positive' | 'negative' | 'neutral', received 'somewhat positive'
- "confidence": Number must be less than or equal to 1 (expected: <=1, got: 95)
- "reasoning": String must contain at least 10 character(s)
Please fix these issues and respond with valid JSON.
// Second attempt from model (usually correct):
{
"sentiment": "positive",
"confidence": 0.95,
"reasoning": "The review expresses strong satisfaction with the product quality and service"
}
How effective is error feedback?
In practice, when you feed Zod validation errors back to GPT-4o or Claude:
Attempt 1 success rate: ~85-95% (with good prompts and response_format: json_object)
Attempt 2 success rate: ~95-99% (with error feedback)
Attempt 3 success rate: ~99%+ (nearly always succeeds)
Without error feedback (blind retry):
Attempt 1 success rate: ~85-95%
Attempt 2 success rate: ~85-95% (same odds — model makes the same mistake)
Attempt 3 success rate: ~85-95% (still the same odds)
The difference is massive. Error feedback turns a probabilistic coin flip into a self-correcting loop.
4. Maximum Retry Count and Backoff
Why limit retries
Each retry costs:
- API tokens (input: all previous messages + error feedback)
- Latency (typically 500ms-3s per call)
- Money (accumulates quickly at scale)
Token cost per retry grows because you include the conversation history:
Attempt 1: system prompt + user message ≈ 500 tokens
Attempt 2: above + assistant response + error feedback ≈ 1,200 tokens
Attempt 3: above + assistant response + error feedback ≈ 2,000 tokens
Total for 3 attempts: ~3,700 input tokens (vs 500 for a single call)
Exponential backoff
function calculateBackoff(
attempt: number,
baseDelay: number = 1000,
maxDelay: number = 10000,
): number {
const delay = Math.min(
baseDelay * Math.pow(2, attempt - 1),
maxDelay,
);
// Add jitter to prevent thundering herd
return delay + Math.random() * delay * 0.1;
}
// attempt 1: 1000ms + jitter
// attempt 2: 2000ms + jitter
// attempt 3: 4000ms + jitter (capped at 10000ms)
Retry with backoff implementation
async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function callWithRetryAndBackoff<T>(
schema: z.ZodSchema<T>,
apiCall: (messages: OpenAI.Chat.ChatCompletionMessageParam[]) => Promise<string>,
systemPrompt: string,
userMessage: string,
options: {
maxRetries?: number;
baseDelay?: number;
maxDelay?: number;
} = {},
): Promise<T> {
const { maxRetries = 3, baseDelay = 1000, maxDelay = 10000 } = options;
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userMessage },
];
for (let attempt = 1; attempt <= maxRetries; attempt++) {
// Backoff on retries (not on first attempt)
if (attempt > 1) {
const delay = calculateBackoff(attempt - 1, baseDelay, maxDelay);
await sleep(delay);
}
const rawContent = await apiCall(messages);
let parsed: unknown;
try {
parsed = JSON.parse(rawContent);
} catch {
if (attempt < maxRetries) {
messages.push(
{ role: 'assistant', content: rawContent },
{ role: 'user', content: 'Invalid JSON. Respond with JSON only.' },
);
}
continue;
}
const result = schema.safeParse(parsed);
if (result.success) return result.data;
if (attempt < maxRetries) {
const errors = result.error.issues
.map((i) => `- "${i.path.join('.')}": ${i.message}`)
.join('\n');
messages.push(
{ role: 'assistant', content: rawContent },
{ role: 'user', content: `Validation errors:\n${errors}\nFix and respond with JSON.` },
);
}
}
throw new Error(`Validation failed after ${maxRetries} attempts`);
}
5. Cost Implications of Retries
Calculating retry costs
interface RetryCostEstimate {
attempt: number;
input_tokens: number;
output_tokens: number;
cumulative_input_tokens: number;
cumulative_output_tokens: number;
cumulative_cost_usd: number;
}
function estimateRetryCosts(
systemPromptTokens: number,
userMessageTokens: number,
avgResponseTokens: number,
errorFeedbackTokens: number,
maxRetries: number,
inputCostPer1M: number, // e.g., 2.50 for GPT-4o
outputCostPer1M: number, // e.g., 10.00 for GPT-4o
): RetryCostEstimate[] {
const estimates: RetryCostEstimate[] = [];
let cumulativeInput = 0;
let cumulativeOutput = 0;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
// Input tokens grow with each retry (includes conversation history)
const inputTokens =
systemPromptTokens +
userMessageTokens +
(attempt - 1) * (avgResponseTokens + errorFeedbackTokens);
const outputTokens = avgResponseTokens;
cumulativeInput += inputTokens;
cumulativeOutput += outputTokens;
const cost =
(cumulativeInput / 1_000_000) * inputCostPer1M +
(cumulativeOutput / 1_000_000) * outputCostPer1M;
estimates.push({
attempt,
input_tokens: inputTokens,
output_tokens: outputTokens,
cumulative_input_tokens: cumulativeInput,
cumulative_output_tokens: cumulativeOutput,
cumulative_cost_usd: cost,
});
}
return estimates;
}
// Example: GPT-4o pricing
const costs = estimateRetryCosts(
500, // system prompt
200, // user message
300, // average response
150, // error feedback
3, // max retries
2.50, // input cost per 1M tokens
10.00, // output cost per 1M tokens
);
console.table(costs);
// ┌─────────┬──────────────┬───────────────┬───────────────────────┐
// │ attempt │ input_tokens │ output_tokens │ cumulative_cost_usd │
// ├─────────┼──────────────┼───────────────┼───────────────────────┤
// │ 1 │ 700 │ 300 │ $0.0048 │
// │ 2 │ 1150 │ 300 │ $0.0109 │
// │ 3 │ 1600 │ 300 │ $0.0186 │
// └─────────┴──────────────┴───────────────┴───────────────────────┘
// 3 retries costs ~3.9x a single call (not 3x, because of growing context)
Cost control strategies
interface CostLimits {
max_retries: number;
max_total_tokens: number;
max_cost_usd: number;
}
const DEFAULT_LIMITS: CostLimits = {
max_retries: 3,
max_total_tokens: 10_000,
max_cost_usd: 0.05,
};
function shouldRetry(
attempt: number,
totalTokensUsed: number,
totalCostUsd: number,
limits: CostLimits = DEFAULT_LIMITS,
): { retry: boolean; reason: string } {
if (attempt >= limits.max_retries) {
return { retry: false, reason: `Max retries reached (${limits.max_retries})` };
}
if (totalTokensUsed >= limits.max_total_tokens) {
return { retry: false, reason: `Token limit reached (${totalTokensUsed}/${limits.max_total_tokens})` };
}
if (totalCostUsd >= limits.max_cost_usd) {
return { retry: false, reason: `Cost limit reached ($${totalCostUsd.toFixed(4)}/$${limits.max_cost_usd})` };
}
return { retry: true, reason: 'Within limits' };
}
6. Building a Robust callWithValidation() Utility
This is the production-ready utility that combines everything from sections 4.6.a through 4.6.e.
import { z, ZodError } from 'zod';
import OpenAI from 'openai';
// ─── Configuration ───────────────────────────────────────────────
interface CallWithValidationOptions {
model?: string;
temperature?: number;
maxRetries?: number;
maxTotalTokens?: number;
baseDelayMs?: number;
useJsonMode?: boolean;
extractJsonFromText?: boolean;
onRetry?: (attempt: number, error: string) => void;
onSuccess?: (attempt: number, data: unknown) => void;
}
interface CallWithValidationResult<T> {
success: boolean;
data: T | null;
attempts: number;
totalTokens: { input: number; output: number };
errors: string[];
latencyMs: number;
}
// ─── JSON Extraction ─────────────────────────────────────────────
function extractJSON(text: string): unknown {
try { return JSON.parse(text); } catch { /* continue */ }
const fenceMatch = text.match(/```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/);
if (fenceMatch) {
try { return JSON.parse(fenceMatch[1].trim()); } catch { /* continue */ }
}
const jsonMatch = text.match(/(\{[\s\S]*\})/);
if (jsonMatch) {
try { return JSON.parse(jsonMatch[1]); } catch { /* continue */ }
}
throw new Error('No JSON found in response');
}
// ─── Error Formatting ────────────────────────────────────────────
function formatZodErrors(error: ZodError): string {
return error.issues
.map((i) => {
const path = i.path.join('.') || 'root';
let msg = `- "${path}": ${i.message}`;
if ('expected' in i && 'received' in i) {
msg += ` (expected ${i.expected}, got ${i.received})`;
}
return msg;
})
.join('\n');
}
// ─── Backoff ─────────────────────────────────────────────────────
function backoff(attempt: number, baseMs: number): Promise<void> {
const delay = Math.min(baseMs * Math.pow(2, attempt - 1), 15000);
const jitter = delay * 0.1 * Math.random();
return new Promise((resolve) => setTimeout(resolve, delay + jitter));
}
// ─── Main Function ───────────────────────────────────────────────
async function callWithValidation<T>(
schema: z.ZodSchema<T>,
systemPrompt: string,
userMessage: string,
options: CallWithValidationOptions = {},
): Promise<CallWithValidationResult<T>> {
const {
model = 'gpt-4o',
temperature = 0,
maxRetries = 3,
maxTotalTokens = 15_000,
baseDelayMs = 1000,
useJsonMode = true,
extractJsonFromText = true,
onRetry,
onSuccess,
} = options;
const client = new OpenAI();
const startTime = Date.now();
const errors: string[] = [];
let totalInputTokens = 0;
let totalOutputTokens = 0;
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userMessage },
];
for (let attempt = 1; attempt <= maxRetries; attempt++) {
// Backoff on retries
if (attempt > 1) {
await backoff(attempt - 1, baseDelayMs);
}
// Check token budget
if (totalInputTokens + totalOutputTokens >= maxTotalTokens) {
errors.push(`Token budget exhausted (${totalInputTokens + totalOutputTokens}/${maxTotalTokens})`);
break;
}
// Call the API
let response: OpenAI.Chat.Completions.ChatCompletion;
try {
response = await client.chat.completions.create({
model,
temperature,
messages,
...(useJsonMode ? { response_format: { type: 'json_object' as const } } : {}),
});
} catch (err) {
const errMsg = `API call failed: ${(err as Error).message}`;
errors.push(errMsg);
if (attempt < maxRetries) {
onRetry?.(attempt, errMsg);
continue;
}
break;
}
// Track token usage
const usage = response.usage;
if (usage) {
totalInputTokens += usage.prompt_tokens;
totalOutputTokens += usage.completion_tokens;
}
const rawContent = response.choices[0]?.message?.content || '';
// Step 1: Parse JSON
let parsed: unknown;
try {
parsed = extractJsonFromText ? extractJSON(rawContent) : JSON.parse(rawContent);
} catch (err) {
const errMsg = `JSON parse failed (attempt ${attempt}): ${(err as Error).message}`;
errors.push(errMsg);
if (attempt < maxRetries) {
messages.push(
{ role: 'assistant', content: rawContent },
{
role: 'user',
content: 'Your response was not valid JSON. Respond with a JSON object ONLY, no other text.',
},
);
onRetry?.(attempt, errMsg);
}
continue;
}
// Step 2: Validate with Zod
const result = schema.safeParse(parsed);
if (result.success) {
onSuccess?.(attempt, result.data);
return {
success: true,
data: result.data,
attempts: attempt,
totalTokens: { input: totalInputTokens, output: totalOutputTokens },
errors,
latencyMs: Date.now() - startTime,
};
}
// Validation failed — format errors for the model
const errorFeedback = formatZodErrors(result.error);
errors.push(`Validation failed (attempt ${attempt}):\n${errorFeedback}`);
if (attempt < maxRetries) {
messages.push(
{ role: 'assistant', content: rawContent },
{
role: 'user',
content: `Your response had validation errors:\n${errorFeedback}\n\nPlease fix these specific issues and respond with corrected JSON only.`,
},
);
onRetry?.(attempt, errorFeedback);
}
}
// All retries exhausted
return {
success: false,
data: null,
attempts: maxRetries,
totalTokens: { input: totalInputTokens, output: totalOutputTokens },
errors,
latencyMs: Date.now() - startTime,
};
}
// ─── Export ──────────────────────────────────────────────────────
export { callWithValidation, CallWithValidationOptions, CallWithValidationResult };
7. Using the callWithValidation() Utility
Basic usage
const ReviewSchema = z.object({
sentiment: z.enum(['positive', 'negative', 'neutral', 'mixed']),
score: z.number().min(0).max(10),
highlights: z.array(z.string()).min(1),
summary: z.string().min(20),
});
type Review = z.infer<typeof ReviewSchema>;
const result = await callWithValidation(
ReviewSchema,
`Analyze the product review. Return JSON:
{
"sentiment": "positive"|"negative"|"neutral"|"mixed",
"score": 0-10,
"highlights": ["key point 1", "key point 2"],
"summary": "20+ char summary"
}`,
'This laptop is amazing! The battery lasts forever and the keyboard feels great. Only downside is the weight.',
);
if (result.success) {
console.log('Analysis:', result.data);
console.log(`Completed in ${result.attempts} attempt(s), ${result.latencyMs}ms`);
} else {
console.error('All retries failed:', result.errors);
}
With callbacks for monitoring
const result = await callWithValidation(
ReviewSchema,
systemPrompt,
userMessage,
{
maxRetries: 3,
model: 'gpt-4o',
onRetry: (attempt, error) => {
console.warn(`[RETRY] Attempt ${attempt} failed: ${error}`);
// Track in your metrics system
metrics.increment('ai.validation.retry', { attempt: String(attempt) });
},
onSuccess: (attempt, data) => {
console.info(`[SUCCESS] Validated on attempt ${attempt}`);
metrics.histogram('ai.validation.attempts', attempt);
},
},
);
Batch processing with validation
async function processReviewBatch(reviews: string[]): Promise<{
results: Review[];
failures: Array<{ index: number; errors: string[] }>;
stats: { total: number; success: number; retried: number; failed: number };
}> {
const results: Review[] = [];
const failures: Array<{ index: number; errors: string[] }> = [];
let retried = 0;
for (let i = 0; i < reviews.length; i++) {
const result = await callWithValidation(
ReviewSchema,
systemPrompt,
reviews[i],
{
maxRetries: 2, // Fewer retries for batch to control cost
onRetry: () => { retried++; },
},
);
if (result.success && result.data) {
results.push(result.data);
} else {
failures.push({ index: i, errors: result.errors });
}
}
return {
results,
failures,
stats: {
total: reviews.length,
success: results.length,
retried,
failed: failures.length,
},
};
}
8. The Principle: NEVER Trust AI Output Without Validation
This principle applies at every level of your system:
┌──────────────────────────────────────────────────────────────────┐
│ TRUST NOTHING. VALIDATE EVERYTHING. │
│ │
│ Level 1: Individual field validation │
│ → Zod schema validates every field type, range, format │
│ │
│ Level 2: Structural validation │
│ → Zod validates the overall shape (required fields, nesting) │
│ │
│ Level 3: Business logic validation │
│ → Custom .refine() checks cross-field consistency │
│ │
│ Level 4: System-level validation │
│ → Monitor aggregate failure rates │
│ → Alert when quality degrades │
│ → A/B test prompt changes against validation metrics │
│ │
│ This applies to: │
│ ✗ Raw LLM completions │
│ ✗ "Structured output" mode (can still have wrong values) │
│ ✗ Function calling responses (schema isn't enforced by model) │
│ ✗ Fine-tuned models (hallucinate differently, but still do) │
│ ✗ Even response_format: json_object (guarantees JSON, not │
│ schema compliance) │
│ │
│ The ONLY exception: native structured output with strict mode │
│ (OpenAI's strict JSON schemas) — and even then, validate │
│ value ranges and business logic. │
└──────────────────────────────────────────────────────────────────┘
9. Complete Production-Ready Code Example
Here is a full, self-contained example of a customer ticket classifier with Zod validation, retry logic, error handling, and monitoring.
import { z, ZodError } from 'zod';
import OpenAI from 'openai';
// ─── 1. Define the Schema ────────────────────────────────────────
const TicketClassificationSchema = z.object({
category: z.enum([
'technical_issue',
'billing',
'account_access',
'feature_request',
'general_inquiry',
]),
priority: z.enum(['p0_critical', 'p1_high', 'p2_medium', 'p3_low']),
sentiment: z.enum(['frustrated', 'neutral', 'positive']),
summary: z.string().min(20).max(200),
suggested_response: z.string().min(50),
requires_escalation: z.boolean(),
confidence: z.number().min(0).max(1),
tags: z.array(z.string()).min(1).max(10),
});
type TicketClassification = z.infer<typeof TicketClassificationSchema>;
// ─── 2. System Prompt (includes schema) ──────────────────────────
const SYSTEM_PROMPT = `You are a customer support ticket classifier.
Given a customer message, classify it and generate a suggested response.
RESPOND WITH JSON ONLY. No explanation, no markdown.
Required JSON schema:
{
"category": "technical_issue" | "billing" | "account_access" | "feature_request" | "general_inquiry",
"priority": "p0_critical" | "p1_high" | "p2_medium" | "p3_low",
"sentiment": "frustrated" | "neutral" | "positive",
"summary": "20-200 char summary of the issue",
"suggested_response": "50+ char professional response to the customer",
"requires_escalation": true/false,
"confidence": 0.0 to 1.0,
"tags": ["tag1", "tag2"] (1-10 tags)
}
Priority guidelines:
- p0_critical: Service is down, data loss, security breach
- p1_high: Major functionality broken, billing error
- p2_medium: Minor issue, question about features
- p3_low: General inquiry, feedback, feature request`;
// ─── 3. Metrics Tracker ──────────────────────────────────────────
class ClassifierMetrics {
private calls = 0;
private successes = 0;
private retries = 0;
private failures = 0;
private totalLatencyMs = 0;
private totalTokens = 0;
recordSuccess(attempts: number, latencyMs: number, tokens: number): void {
this.calls++;
this.successes++;
this.retries += attempts - 1;
this.totalLatencyMs += latencyMs;
this.totalTokens += tokens;
}
recordFailure(attempts: number, latencyMs: number, tokens: number): void {
this.calls++;
this.failures++;
this.retries += attempts - 1;
this.totalLatencyMs += latencyMs;
this.totalTokens += tokens;
}
getReport(): string {
const successRate = this.calls > 0 ? (this.successes / this.calls * 100).toFixed(1) : '0';
const avgLatency = this.calls > 0 ? (this.totalLatencyMs / this.calls).toFixed(0) : '0';
const avgTokens = this.calls > 0 ? (this.totalTokens / this.calls).toFixed(0) : '0';
return [
`=== Classifier Metrics ===`,
`Total calls: ${this.calls}`,
`Success rate: ${successRate}%`,
`Total retries: ${this.retries}`,
`Failures: ${this.failures}`,
`Avg latency: ${avgLatency}ms`,
`Avg tokens: ${avgTokens}`,
`Total tokens: ${this.totalTokens}`,
].join('\n');
}
}
// ─── 4. Main Classifier Function ─────────────────────────────────
const metrics = new ClassifierMetrics();
async function classifyTicket(
ticketText: string,
): Promise<{
success: boolean;
classification: TicketClassification | null;
attempts: number;
error?: string;
}> {
const client = new OpenAI();
const maxRetries = 3;
const startTime = Date.now();
let totalTokens = 0;
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
{ role: 'system', content: SYSTEM_PROMPT },
{ role: 'user', content: `Classify this customer ticket:\n\n${ticketText}` },
];
for (let attempt = 1; attempt <= maxRetries; attempt++) {
// Backoff
if (attempt > 1) {
await new Promise((r) => setTimeout(r, 1000 * Math.pow(2, attempt - 2)));
}
try {
// Call API
const response = await client.chat.completions.create({
model: 'gpt-4o',
temperature: 0,
response_format: { type: 'json_object' },
messages,
});
if (response.usage) {
totalTokens += response.usage.prompt_tokens + response.usage.completion_tokens;
}
const rawContent = response.choices[0]?.message?.content || '';
// Parse JSON
let parsed: unknown;
try {
parsed = JSON.parse(rawContent);
} catch {
if (attempt < maxRetries) {
messages.push(
{ role: 'assistant', content: rawContent },
{ role: 'user', content: 'Invalid JSON. Respond with JSON only.' },
);
continue;
}
const latency = Date.now() - startTime;
metrics.recordFailure(attempt, latency, totalTokens);
return { success: false, classification: null, attempts: attempt, error: 'JSON parse failed' };
}
// Validate
const result = TicketClassificationSchema.safeParse(parsed);
if (result.success) {
const latency = Date.now() - startTime;
metrics.recordSuccess(attempt, latency, totalTokens);
return { success: true, classification: result.data, attempts: attempt };
}
// Feed errors back
if (attempt < maxRetries) {
const errors = result.error.issues
.map((i) => `- "${i.path.join('.')}": ${i.message}`)
.join('\n');
messages.push(
{ role: 'assistant', content: rawContent },
{ role: 'user', content: `Validation errors:\n${errors}\nFix and respond with JSON.` },
);
}
} catch (err) {
if (attempt >= maxRetries) {
const latency = Date.now() - startTime;
metrics.recordFailure(attempt, latency, totalTokens);
return {
success: false,
classification: null,
attempts: attempt,
error: `API error: ${(err as Error).message}`,
};
}
}
}
const latency = Date.now() - startTime;
metrics.recordFailure(maxRetries, latency, totalTokens);
return {
success: false,
classification: null,
attempts: maxRetries,
error: 'Max retries exceeded',
};
}
// ─── 5. Usage ────────────────────────────────────────────────────
async function main() {
const tickets = [
'My account has been locked for 3 days and I cannot access any of my data! This is unacceptable!',
'Hey, I was wondering if you could add dark mode to the dashboard? Would be cool.',
'I was charged twice for my subscription last month. Please refund one payment.',
];
for (const ticket of tickets) {
const result = await classifyTicket(ticket);
if (result.success && result.classification) {
const c = result.classification;
console.log(`\n--- Ticket Classification ---`);
console.log(`Category: ${c.category}`);
console.log(`Priority: ${c.priority}`);
console.log(`Sentiment: ${c.sentiment}`);
console.log(`Summary: ${c.summary}`);
console.log(`Escalate: ${c.requires_escalation}`);
console.log(`Confidence: ${(c.confidence * 100).toFixed(0)}%`);
console.log(`Tags: ${c.tags.join(', ')}`);
console.log(`Attempts: ${result.attempts}`);
} else {
console.error(`\nClassification failed: ${result.error}`);
}
}
// Print metrics
console.log('\n' + metrics.getReport());
}
main();
10. Key Takeaways
- Retry with error feedback is dramatically more effective than blind retry. Pass the exact Zod validation errors back to the model as a user message.
- Limit retries to 3 attempts maximum in most cases. Set token budgets and cost caps to prevent runaway costs.
- Exponential backoff prevents overwhelming the API during rate limits or outages. Add jitter to avoid thundering herd.
- Cost grows non-linearly with retries because each attempt includes the full conversation history. Monitor and budget for this.
- The callWithValidation() pattern is the production standard — it combines JSON extraction, Zod validation, error feedback, retry logic, backoff, cost tracking, and monitoring callbacks.
- NEVER trust AI output without validation — this applies to all models, all modes (including json_object mode), and all architectures. Zod is your last line of defense.
- Monitor retry rates in production. A spike in retries usually means a prompt change broke something, a model version updated, or input data changed character.
Explain-It Challenge
- A colleague suggests removing the retry loop because "GPT-4o almost always returns correct JSON." Using the math from section 5, calculate how many failures a system with 100,000 daily API calls would see at a 5% failure rate, and what the retry cost would be.
- Explain why blind retries (same prompt, no error feedback) are ineffective. What is the expected success probability after 3 blind retries if each attempt has an 85% chance of the same error?
- Design a callWithValidation wrapper for a multi-step AI pipeline where step 2 depends on step 1's output. How do you handle the case where step 1 succeeds but step 2 fails validation?
Navigation: ← 4.6.d Handling Invalid Responses · ← 4.6 Overview