Episode 4 — Generative AI Engineering / 4.6 — Schema Validation with Zod

4.6.e — Retry Strategies

In one sentence: When AI output fails validation, retry with the validation errors fed back to the model — this self-correcting loop, combined with exponential backoff and cost controls, is the production pattern for achieving reliable structured output from probabilistic models.

Navigation: ← 4.6.d Handling Invalid Responses · ← 4.6 Overview

1. When to Retry vs When to Fail

Not every validation failure deserves a retry. Retries cost money, add latency, and can amplify problems if the underlying issue is with your prompt, not the model's response.

Retry when:

- JSON parse failed but the model clearly attempted JSON (e.g., extra text around it)
- One or two fields have wrong types (string instead of number)
- Enum value is close but not exact ("somewhat positive" instead of "positive")
- A required field is missing but others are correct
- The response is structurally correct but violates a constraint (score: 150 instead of 0-100)

Fail immediately when:

- Response is completely non-JSON ("I'd be happy to help you with that!")
- Model refuses the request ("I cannot provide that analysis")
- Response is in a completely wrong format (HTML, XML, prose)
- Same error repeats after 2-3 retries (model cannot self-correct)
- The request itself is problematic (bad input data, impossible prompt)

Decision matrix

┌────────────────────────────────────────────────────────┐
│              SHOULD I RETRY?                           │
│                                                        │
│  Q1: Is the response JSON-like?                        │
│      NO  → Was the prompt clear about JSON format?     │
│            NO  → Fix prompt, don't retry               │
│            YES → Retry once with stronger instruction   │
│      YES → Continue to Q2                              │
│                                                        │
│  Q2: Is the structure approximately correct?            │
│      NO  → (wrong schema entirely)                     │
│            → Retry with schema in error message         │
│      YES → Continue to Q3                              │
│                                                        │
│  Q3: Are the errors fixable?                            │
│      Type errors  → Retry with specific field errors   │
│      Range errors → Retry with constraint reminders    │
│      Missing fields → Retry listing required fields    │
│      Logic errors → Retry with business rule explained │
│                                                        │
│  Q4: Have we already retried?                           │
│      < max_retries → Retry with accumulated context    │
│      = max_retries → Fail gracefully                   │
└────────────────────────────────────────────────────────┘

2. Basic Retry Loop with Zod Validation

The simplest retry loop: call the AI, validate, retry if invalid.

import { z } from 'zod';
import OpenAI from 'openai';

const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  reasoning: z.string().min(10),
});

type SentimentResult = z.infer<typeof SentimentSchema>;

async function analyzeSentimentWithRetry(
  text: string,
  maxRetries: number = 3,
): Promise<SentimentResult> {
  const client = new OpenAI();
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    {
      role: 'system',
      content: `Analyze the sentiment of the given text.
Respond with JSON ONLY. No explanation, no markdown, no code fences.
Schema:
{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": number between 0 and 1,
  "reasoning": "explanation string (at least 10 characters)"
}`,
    },
    { role: 'user', content: text },
  ];

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    const response = await client.chat.completions.create({
      model: 'gpt-4o',
      temperature: 0,
      response_format: { type: 'json_object' },
      messages,
    });

    const rawContent = response.choices[0].message.content || '';

    // Try to parse JSON
    let parsed: unknown;
    try {
      parsed = JSON.parse(rawContent);
    } catch {
      // Feed JSON error back to model
      messages.push(
        { role: 'assistant', content: rawContent },
        {
          role: 'user',
          content: `Your response was not valid JSON. Please respond with ONLY a JSON object, no other text. Attempt ${attempt}/${maxRetries}.`,
        },
      );
      continue;
    }

    // Validate with Zod
    const result = SentimentSchema.safeParse(parsed);

    if (result.success) {
      if (attempt > 1) {
        console.log(`Validation succeeded on attempt ${attempt}`);
      }
      return result.data;
    }

    // Feed validation errors back to model
    const errorFeedback = result.error.issues
      .map((i) => `- "${i.path.join('.')}": ${i.message}`)
      .join('\n');

    messages.push(
      { role: 'assistant', content: rawContent },
      {
        role: 'user',
        content: `Your response had validation errors:\n${errorFeedback}\n\nPlease fix these issues and respond with valid JSON. Attempt ${attempt}/${maxRetries}.`,
      },
    );
  }

  throw new Error(
    `Failed to get valid response after ${maxRetries} attempts`
  );
}

3. Passing Validation Errors Back to the Model

The key insight: AI models can self-correct when you tell them exactly what was wrong. This is dramatically more effective than simply retrying the same prompt.

Error formatting strategies

import { z } from 'zod';

// Strategy 1: Simple error list
function formatErrorsSimple(error: z.ZodError): string {
  return error.issues
    .map((i) => `- Field "${i.path.join('.')}": ${i.message}`)
    .join('\n');
}

// Strategy 2: Error with expected vs received
function formatErrorsDetailed(error: z.ZodError): string {
  return error.issues
    .map((i) => {
      let msg = `- Field "${i.path.join('.')}": ${i.message}`;
      if ('expected' in i) msg += ` (expected: ${i.expected}, got: ${i.received})`;
      return msg;
    })
    .join('\n');
}

// Strategy 3: Error with the schema reminder
function formatErrorsWithSchema(error: z.ZodError, schemaDescription: string): string {
  const errors = error.issues
    .map((i) => `- "${i.path.join('.')}": ${i.message}`)
    .join('\n');

  return `Your response had validation errors:
${errors}

Expected schema:
${schemaDescription}

Please respond with corrected JSON only.`;
}

Example: what the model sees on retry

// First attempt from model:
{
  "sentiment": "somewhat positive",
  "confidence": 95,
  "reasoning": "Good review"
}

// Error feedback sent to model:
Your response had validation errors:
- "sentiment": Invalid enum value. Expected 'positive' | 'negative' | 'neutral', received 'somewhat positive'
- "confidence": Number must be less than or equal to 1 (expected: <=1, got: 95)
- "reasoning": String must contain at least 10 character(s)

Please fix these issues and respond with valid JSON.

// Second attempt from model (usually correct):
{
  "sentiment": "positive",
  "confidence": 0.95,
  "reasoning": "The review expresses strong satisfaction with the product quality and service"
}

How effective is error feedback?

In practice, when you feed Zod validation errors back to GPT-4o or Claude:

Attempt 1 success rate:  ~85-95% (with good prompts and response_format: json_object)
Attempt 2 success rate:  ~95-99% (with error feedback)
Attempt 3 success rate:  ~99%+   (nearly always succeeds)

Without error feedback (blind retry):
Attempt 1 success rate:  ~85-95%
Attempt 2 success rate:  ~85-95% (same odds — model makes the same mistake)
Attempt 3 success rate:  ~85-95% (still the same odds)

The difference is massive. Error feedback turns a probabilistic coin flip into a self-correcting loop.

4. Maximum Retry Count and Backoff

Why limit retries

Each retry costs:
  - API tokens (input: all previous messages + error feedback)
  - Latency (typically 500ms-3s per call)
  - Money (accumulates quickly at scale)

Token cost per retry grows because you include the conversation history:
  Attempt 1: system prompt + user message                    ≈ 500 tokens
  Attempt 2: above + assistant response + error feedback     ≈ 1,200 tokens
  Attempt 3: above + assistant response + error feedback     ≈ 2,000 tokens

Total for 3 attempts: ~3,700 input tokens (vs 500 for a single call)

Exponential backoff

function calculateBackoff(
  attempt: number,
  baseDelay: number = 1000,
  maxDelay: number = 10000,
): number {
  const delay = Math.min(
    baseDelay * Math.pow(2, attempt - 1),
    maxDelay,
  );
  // Add jitter to prevent thundering herd
  return delay + Math.random() * delay * 0.1;
}

// attempt 1: 1000ms + jitter
// attempt 2: 2000ms + jitter
// attempt 3: 4000ms + jitter (capped at 10000ms)

Retry with backoff implementation

async function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function callWithRetryAndBackoff<T>(
  schema: z.ZodSchema<T>,
  apiCall: (messages: OpenAI.Chat.ChatCompletionMessageParam[]) => Promise<string>,
  systemPrompt: string,
  userMessage: string,
  options: {
    maxRetries?: number;
    baseDelay?: number;
    maxDelay?: number;
  } = {},
): Promise<T> {
  const { maxRetries = 3, baseDelay = 1000, maxDelay = 10000 } = options;

  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: userMessage },
  ];

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    // Backoff on retries (not on first attempt)
    if (attempt > 1) {
      const delay = calculateBackoff(attempt - 1, baseDelay, maxDelay);
      await sleep(delay);
    }

    const rawContent = await apiCall(messages);

    let parsed: unknown;
    try {
      parsed = JSON.parse(rawContent);
    } catch {
      if (attempt < maxRetries) {
        messages.push(
          { role: 'assistant', content: rawContent },
          { role: 'user', content: 'Invalid JSON. Respond with JSON only.' },
        );
      }
      continue;
    }

    const result = schema.safeParse(parsed);
    if (result.success) return result.data;

    if (attempt < maxRetries) {
      const errors = result.error.issues
        .map((i) => `- "${i.path.join('.')}": ${i.message}`)
        .join('\n');

      messages.push(
        { role: 'assistant', content: rawContent },
        { role: 'user', content: `Validation errors:\n${errors}\nFix and respond with JSON.` },
      );
    }
  }

  throw new Error(`Validation failed after ${maxRetries} attempts`);
}

5. Cost Implications of Retries

Calculating retry costs

interface RetryCostEstimate {
  attempt: number;
  input_tokens: number;
  output_tokens: number;
  cumulative_input_tokens: number;
  cumulative_output_tokens: number;
  cumulative_cost_usd: number;
}

function estimateRetryCosts(
  systemPromptTokens: number,
  userMessageTokens: number,
  avgResponseTokens: number,
  errorFeedbackTokens: number,
  maxRetries: number,
  inputCostPer1M: number,  // e.g., 2.50 for GPT-4o
  outputCostPer1M: number, // e.g., 10.00 for GPT-4o
): RetryCostEstimate[] {
  const estimates: RetryCostEstimate[] = [];
  let cumulativeInput = 0;
  let cumulativeOutput = 0;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    // Input tokens grow with each retry (includes conversation history)
    const inputTokens =
      systemPromptTokens +
      userMessageTokens +
      (attempt - 1) * (avgResponseTokens + errorFeedbackTokens);

    const outputTokens = avgResponseTokens;

    cumulativeInput += inputTokens;
    cumulativeOutput += outputTokens;

    const cost =
      (cumulativeInput / 1_000_000) * inputCostPer1M +
      (cumulativeOutput / 1_000_000) * outputCostPer1M;

    estimates.push({
      attempt,
      input_tokens: inputTokens,
      output_tokens: outputTokens,
      cumulative_input_tokens: cumulativeInput,
      cumulative_output_tokens: cumulativeOutput,
      cumulative_cost_usd: cost,
    });
  }

  return estimates;
}

// Example: GPT-4o pricing
const costs = estimateRetryCosts(
  500,    // system prompt
  200,    // user message
  300,    // average response
  150,    // error feedback
  3,      // max retries
  2.50,   // input cost per 1M tokens
  10.00,  // output cost per 1M tokens
);

console.table(costs);
// ┌─────────┬──────────────┬───────────────┬───────────────────────┐
// │ attempt │ input_tokens │ output_tokens │ cumulative_cost_usd   │
// ├─────────┼──────────────┼───────────────┼───────────────────────┤
// │    1    │     700      │     300       │      $0.0048          │
// │    2    │    1150      │     300       │      $0.0109          │
// │    3    │    1600      │     300       │      $0.0186          │
// └─────────┴──────────────┴───────────────┴───────────────────────┘
// 3 retries costs ~3.9x a single call (not 3x, because of growing context)

Cost control strategies

interface CostLimits {
  max_retries: number;
  max_total_tokens: number;
  max_cost_usd: number;
}

const DEFAULT_LIMITS: CostLimits = {
  max_retries: 3,
  max_total_tokens: 10_000,
  max_cost_usd: 0.05,
};

function shouldRetry(
  attempt: number,
  totalTokensUsed: number,
  totalCostUsd: number,
  limits: CostLimits = DEFAULT_LIMITS,
): { retry: boolean; reason: string } {
  if (attempt >= limits.max_retries) {
    return { retry: false, reason: `Max retries reached (${limits.max_retries})` };
  }
  if (totalTokensUsed >= limits.max_total_tokens) {
    return { retry: false, reason: `Token limit reached (${totalTokensUsed}/${limits.max_total_tokens})` };
  }
  if (totalCostUsd >= limits.max_cost_usd) {
    return { retry: false, reason: `Cost limit reached ($${totalCostUsd.toFixed(4)}/$${limits.max_cost_usd})` };
  }
  return { retry: true, reason: 'Within limits' };
}

6. Building a Robust callWithValidation() Utility

This is the production-ready utility that combines everything from sections 4.6.a through 4.6.e.

import { z, ZodError } from 'zod';
import OpenAI from 'openai';

// ─── Configuration ───────────────────────────────────────────────

interface CallWithValidationOptions {
  model?: string;
  temperature?: number;
  maxRetries?: number;
  maxTotalTokens?: number;
  baseDelayMs?: number;
  useJsonMode?: boolean;
  extractJsonFromText?: boolean;
  onRetry?: (attempt: number, error: string) => void;
  onSuccess?: (attempt: number, data: unknown) => void;
}

interface CallWithValidationResult<T> {
  success: boolean;
  data: T | null;
  attempts: number;
  totalTokens: { input: number; output: number };
  errors: string[];
  latencyMs: number;
}

// ─── JSON Extraction ─────────────────────────────────────────────

function extractJSON(text: string): unknown {
  try { return JSON.parse(text); } catch { /* continue */ }

  const fenceMatch = text.match(/```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/);
  if (fenceMatch) {
    try { return JSON.parse(fenceMatch[1].trim()); } catch { /* continue */ }
  }

  const jsonMatch = text.match(/(\{[\s\S]*\})/);
  if (jsonMatch) {
    try { return JSON.parse(jsonMatch[1]); } catch { /* continue */ }
  }

  throw new Error('No JSON found in response');
}

// ─── Error Formatting ────────────────────────────────────────────

function formatZodErrors(error: ZodError): string {
  return error.issues
    .map((i) => {
      const path = i.path.join('.') || 'root';
      let msg = `- "${path}": ${i.message}`;
      if ('expected' in i && 'received' in i) {
        msg += ` (expected ${i.expected}, got ${i.received})`;
      }
      return msg;
    })
    .join('\n');
}

// ─── Backoff ─────────────────────────────────────────────────────

function backoff(attempt: number, baseMs: number): Promise<void> {
  const delay = Math.min(baseMs * Math.pow(2, attempt - 1), 15000);
  const jitter = delay * 0.1 * Math.random();
  return new Promise((resolve) => setTimeout(resolve, delay + jitter));
}

// ─── Main Function ───────────────────────────────────────────────

async function callWithValidation<T>(
  schema: z.ZodSchema<T>,
  systemPrompt: string,
  userMessage: string,
  options: CallWithValidationOptions = {},
): Promise<CallWithValidationResult<T>> {
  const {
    model = 'gpt-4o',
    temperature = 0,
    maxRetries = 3,
    maxTotalTokens = 15_000,
    baseDelayMs = 1000,
    useJsonMode = true,
    extractJsonFromText = true,
    onRetry,
    onSuccess,
  } = options;

  const client = new OpenAI();
  const startTime = Date.now();
  const errors: string[] = [];
  let totalInputTokens = 0;
  let totalOutputTokens = 0;

  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: userMessage },
  ];

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    // Backoff on retries
    if (attempt > 1) {
      await backoff(attempt - 1, baseDelayMs);
    }

    // Check token budget
    if (totalInputTokens + totalOutputTokens >= maxTotalTokens) {
      errors.push(`Token budget exhausted (${totalInputTokens + totalOutputTokens}/${maxTotalTokens})`);
      break;
    }

    // Call the API
    let response: OpenAI.Chat.Completions.ChatCompletion;
    try {
      response = await client.chat.completions.create({
        model,
        temperature,
        messages,
        ...(useJsonMode ? { response_format: { type: 'json_object' as const } } : {}),
      });
    } catch (err) {
      const errMsg = `API call failed: ${(err as Error).message}`;
      errors.push(errMsg);

      if (attempt < maxRetries) {
        onRetry?.(attempt, errMsg);
        continue;
      }
      break;
    }

    // Track token usage
    const usage = response.usage;
    if (usage) {
      totalInputTokens += usage.prompt_tokens;
      totalOutputTokens += usage.completion_tokens;
    }

    const rawContent = response.choices[0]?.message?.content || '';

    // Step 1: Parse JSON
    let parsed: unknown;
    try {
      parsed = extractJsonFromText ? extractJSON(rawContent) : JSON.parse(rawContent);
    } catch (err) {
      const errMsg = `JSON parse failed (attempt ${attempt}): ${(err as Error).message}`;
      errors.push(errMsg);

      if (attempt < maxRetries) {
        messages.push(
          { role: 'assistant', content: rawContent },
          {
            role: 'user',
            content: 'Your response was not valid JSON. Respond with a JSON object ONLY, no other text.',
          },
        );
        onRetry?.(attempt, errMsg);
      }
      continue;
    }

    // Step 2: Validate with Zod
    const result = schema.safeParse(parsed);

    if (result.success) {
      onSuccess?.(attempt, result.data);
      return {
        success: true,
        data: result.data,
        attempts: attempt,
        totalTokens: { input: totalInputTokens, output: totalOutputTokens },
        errors,
        latencyMs: Date.now() - startTime,
      };
    }

    // Validation failed — format errors for the model
    const errorFeedback = formatZodErrors(result.error);
    errors.push(`Validation failed (attempt ${attempt}):\n${errorFeedback}`);

    if (attempt < maxRetries) {
      messages.push(
        { role: 'assistant', content: rawContent },
        {
          role: 'user',
          content: `Your response had validation errors:\n${errorFeedback}\n\nPlease fix these specific issues and respond with corrected JSON only.`,
        },
      );
      onRetry?.(attempt, errorFeedback);
    }
  }

  // All retries exhausted
  return {
    success: false,
    data: null,
    attempts: maxRetries,
    totalTokens: { input: totalInputTokens, output: totalOutputTokens },
    errors,
    latencyMs: Date.now() - startTime,
  };
}

// ─── Export ──────────────────────────────────────────────────────

export { callWithValidation, CallWithValidationOptions, CallWithValidationResult };

7. Using the callWithValidation() Utility

Basic usage

const ReviewSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral', 'mixed']),
  score: z.number().min(0).max(10),
  highlights: z.array(z.string()).min(1),
  summary: z.string().min(20),
});

type Review = z.infer<typeof ReviewSchema>;

const result = await callWithValidation(
  ReviewSchema,
  `Analyze the product review. Return JSON:
{
  "sentiment": "positive"|"negative"|"neutral"|"mixed",
  "score": 0-10,
  "highlights": ["key point 1", "key point 2"],
  "summary": "20+ char summary"
}`,
  'This laptop is amazing! The battery lasts forever and the keyboard feels great. Only downside is the weight.',
);

if (result.success) {
  console.log('Analysis:', result.data);
  console.log(`Completed in ${result.attempts} attempt(s), ${result.latencyMs}ms`);
} else {
  console.error('All retries failed:', result.errors);
}

With callbacks for monitoring

const result = await callWithValidation(
  ReviewSchema,
  systemPrompt,
  userMessage,
  {
    maxRetries: 3,
    model: 'gpt-4o',
    onRetry: (attempt, error) => {
      console.warn(`[RETRY] Attempt ${attempt} failed: ${error}`);
      // Track in your metrics system
      metrics.increment('ai.validation.retry', { attempt: String(attempt) });
    },
    onSuccess: (attempt, data) => {
      console.info(`[SUCCESS] Validated on attempt ${attempt}`);
      metrics.histogram('ai.validation.attempts', attempt);
    },
  },
);

Batch processing with validation

async function processReviewBatch(reviews: string[]): Promise<{
  results: Review[];
  failures: Array<{ index: number; errors: string[] }>;
  stats: { total: number; success: number; retried: number; failed: number };
}> {
  const results: Review[] = [];
  const failures: Array<{ index: number; errors: string[] }> = [];
  let retried = 0;

  for (let i = 0; i < reviews.length; i++) {
    const result = await callWithValidation(
      ReviewSchema,
      systemPrompt,
      reviews[i],
      {
        maxRetries: 2,  // Fewer retries for batch to control cost
        onRetry: () => { retried++; },
      },
    );

    if (result.success && result.data) {
      results.push(result.data);
    } else {
      failures.push({ index: i, errors: result.errors });
    }
  }

  return {
    results,
    failures,
    stats: {
      total: reviews.length,
      success: results.length,
      retried,
      failed: failures.length,
    },
  };
}

8. The Principle: NEVER Trust AI Output Without Validation

This principle applies at every level of your system:

┌──────────────────────────────────────────────────────────────────┐
│                 TRUST NOTHING. VALIDATE EVERYTHING.              │
│                                                                  │
│  Level 1: Individual field validation                            │
│    → Zod schema validates every field type, range, format        │
│                                                                  │
│  Level 2: Structural validation                                  │
│    → Zod validates the overall shape (required fields, nesting)  │
│                                                                  │
│  Level 3: Business logic validation                              │
│    → Custom .refine() checks cross-field consistency             │
│                                                                  │
│  Level 4: System-level validation                                │
│    → Monitor aggregate failure rates                             │
│    → Alert when quality degrades                                 │
│    → A/B test prompt changes against validation metrics          │
│                                                                  │
│  This applies to:                                                │
│    ✗ Raw LLM completions                                         │
│    ✗ "Structured output" mode (can still have wrong values)      │
│    ✗ Function calling responses (schema isn't enforced by model) │
│    ✗ Fine-tuned models (hallucinate differently, but still do)   │
│    ✗ Even response_format: json_object (guarantees JSON, not     │
│      schema compliance)                                          │
│                                                                  │
│  The ONLY exception: native structured output with strict mode   │
│  (OpenAI's strict JSON schemas) — and even then, validate        │
│  value ranges and business logic.                                │
└──────────────────────────────────────────────────────────────────┘

9. Complete Production-Ready Code Example

Here is a full, self-contained example of a customer ticket classifier with Zod validation, retry logic, error handling, and monitoring.

import { z, ZodError } from 'zod';
import OpenAI from 'openai';

// ─── 1. Define the Schema ────────────────────────────────────────

const TicketClassificationSchema = z.object({
  category: z.enum([
    'technical_issue',
    'billing',
    'account_access',
    'feature_request',
    'general_inquiry',
  ]),
  priority: z.enum(['p0_critical', 'p1_high', 'p2_medium', 'p3_low']),
  sentiment: z.enum(['frustrated', 'neutral', 'positive']),
  summary: z.string().min(20).max(200),
  suggested_response: z.string().min(50),
  requires_escalation: z.boolean(),
  confidence: z.number().min(0).max(1),
  tags: z.array(z.string()).min(1).max(10),
});

type TicketClassification = z.infer<typeof TicketClassificationSchema>;

// ─── 2. System Prompt (includes schema) ──────────────────────────

const SYSTEM_PROMPT = `You are a customer support ticket classifier.

Given a customer message, classify it and generate a suggested response.

RESPOND WITH JSON ONLY. No explanation, no markdown.

Required JSON schema:
{
  "category": "technical_issue" | "billing" | "account_access" | "feature_request" | "general_inquiry",
  "priority": "p0_critical" | "p1_high" | "p2_medium" | "p3_low",
  "sentiment": "frustrated" | "neutral" | "positive",
  "summary": "20-200 char summary of the issue",
  "suggested_response": "50+ char professional response to the customer",
  "requires_escalation": true/false,
  "confidence": 0.0 to 1.0,
  "tags": ["tag1", "tag2"] (1-10 tags)
}

Priority guidelines:
- p0_critical: Service is down, data loss, security breach
- p1_high: Major functionality broken, billing error
- p2_medium: Minor issue, question about features
- p3_low: General inquiry, feedback, feature request`;

// ─── 3. Metrics Tracker ──────────────────────────────────────────

class ClassifierMetrics {
  private calls = 0;
  private successes = 0;
  private retries = 0;
  private failures = 0;
  private totalLatencyMs = 0;
  private totalTokens = 0;

  recordSuccess(attempts: number, latencyMs: number, tokens: number): void {
    this.calls++;
    this.successes++;
    this.retries += attempts - 1;
    this.totalLatencyMs += latencyMs;
    this.totalTokens += tokens;
  }

  recordFailure(attempts: number, latencyMs: number, tokens: number): void {
    this.calls++;
    this.failures++;
    this.retries += attempts - 1;
    this.totalLatencyMs += latencyMs;
    this.totalTokens += tokens;
  }

  getReport(): string {
    const successRate = this.calls > 0 ? (this.successes / this.calls * 100).toFixed(1) : '0';
    const avgLatency = this.calls > 0 ? (this.totalLatencyMs / this.calls).toFixed(0) : '0';
    const avgTokens = this.calls > 0 ? (this.totalTokens / this.calls).toFixed(0) : '0';

    return [
      `=== Classifier Metrics ===`,
      `Total calls: ${this.calls}`,
      `Success rate: ${successRate}%`,
      `Total retries: ${this.retries}`,
      `Failures: ${this.failures}`,
      `Avg latency: ${avgLatency}ms`,
      `Avg tokens: ${avgTokens}`,
      `Total tokens: ${this.totalTokens}`,
    ].join('\n');
  }
}

// ─── 4. Main Classifier Function ─────────────────────────────────

const metrics = new ClassifierMetrics();

async function classifyTicket(
  ticketText: string,
): Promise<{
  success: boolean;
  classification: TicketClassification | null;
  attempts: number;
  error?: string;
}> {
  const client = new OpenAI();
  const maxRetries = 3;
  const startTime = Date.now();
  let totalTokens = 0;

  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: 'system', content: SYSTEM_PROMPT },
    { role: 'user', content: `Classify this customer ticket:\n\n${ticketText}` },
  ];

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    // Backoff
    if (attempt > 1) {
      await new Promise((r) => setTimeout(r, 1000 * Math.pow(2, attempt - 2)));
    }

    try {
      // Call API
      const response = await client.chat.completions.create({
        model: 'gpt-4o',
        temperature: 0,
        response_format: { type: 'json_object' },
        messages,
      });

      if (response.usage) {
        totalTokens += response.usage.prompt_tokens + response.usage.completion_tokens;
      }

      const rawContent = response.choices[0]?.message?.content || '';

      // Parse JSON
      let parsed: unknown;
      try {
        parsed = JSON.parse(rawContent);
      } catch {
        if (attempt < maxRetries) {
          messages.push(
            { role: 'assistant', content: rawContent },
            { role: 'user', content: 'Invalid JSON. Respond with JSON only.' },
          );
          continue;
        }
        const latency = Date.now() - startTime;
        metrics.recordFailure(attempt, latency, totalTokens);
        return { success: false, classification: null, attempts: attempt, error: 'JSON parse failed' };
      }

      // Validate
      const result = TicketClassificationSchema.safeParse(parsed);

      if (result.success) {
        const latency = Date.now() - startTime;
        metrics.recordSuccess(attempt, latency, totalTokens);
        return { success: true, classification: result.data, attempts: attempt };
      }

      // Feed errors back
      if (attempt < maxRetries) {
        const errors = result.error.issues
          .map((i) => `- "${i.path.join('.')}": ${i.message}`)
          .join('\n');

        messages.push(
          { role: 'assistant', content: rawContent },
          { role: 'user', content: `Validation errors:\n${errors}\nFix and respond with JSON.` },
        );
      }
    } catch (err) {
      if (attempt >= maxRetries) {
        const latency = Date.now() - startTime;
        metrics.recordFailure(attempt, latency, totalTokens);
        return {
          success: false,
          classification: null,
          attempts: attempt,
          error: `API error: ${(err as Error).message}`,
        };
      }
    }
  }

  const latency = Date.now() - startTime;
  metrics.recordFailure(maxRetries, latency, totalTokens);
  return {
    success: false,
    classification: null,
    attempts: maxRetries,
    error: 'Max retries exceeded',
  };
}

// ─── 5. Usage ────────────────────────────────────────────────────

async function main() {
  const tickets = [
    'My account has been locked for 3 days and I cannot access any of my data! This is unacceptable!',
    'Hey, I was wondering if you could add dark mode to the dashboard? Would be cool.',
    'I was charged twice for my subscription last month. Please refund one payment.',
  ];

  for (const ticket of tickets) {
    const result = await classifyTicket(ticket);

    if (result.success && result.classification) {
      const c = result.classification;
      console.log(`\n--- Ticket Classification ---`);
      console.log(`Category: ${c.category}`);
      console.log(`Priority: ${c.priority}`);
      console.log(`Sentiment: ${c.sentiment}`);
      console.log(`Summary: ${c.summary}`);
      console.log(`Escalate: ${c.requires_escalation}`);
      console.log(`Confidence: ${(c.confidence * 100).toFixed(0)}%`);
      console.log(`Tags: ${c.tags.join(', ')}`);
      console.log(`Attempts: ${result.attempts}`);
    } else {
      console.error(`\nClassification failed: ${result.error}`);
    }
  }

  // Print metrics
  console.log('\n' + metrics.getReport());
}

main();

10. Key Takeaways

Retry with error feedback is dramatically more effective than blind retry. Pass the exact Zod validation errors back to the model as a user message.
Limit retries to 3 attempts maximum in most cases. Set token budgets and cost caps to prevent runaway costs.
Exponential backoff prevents overwhelming the API during rate limits or outages. Add jitter to avoid thundering herd.
Cost grows non-linearly with retries because each attempt includes the full conversation history. Monitor and budget for this.
The callWithValidation() pattern is the production standard — it combines JSON extraction, Zod validation, error feedback, retry logic, backoff, cost tracking, and monitoring callbacks.
NEVER trust AI output without validation — this applies to all models, all modes (including json_object mode), and all architectures. Zod is your last line of defense.
Monitor retry rates in production. A spike in retries usually means a prompt change broke something, a model version updated, or input data changed character.

Explain-It Challenge

A colleague suggests removing the retry loop because "GPT-4o almost always returns correct JSON." Using the math from section 5, calculate how many failures a system with 100,000 daily API calls would see at a 5% failure rate, and what the retry cost would be.
Explain why blind retries (same prompt, no error feedback) are ineffective. What is the expected success probability after 3 blind retries if each attempt has an 85% chance of the same error?
Design a callWithValidation wrapper for a multi-step AI pipeline where step 2 depends on step 1's output. How do you handle the case where step 1 succeeds but step 2 fails validation?

Navigation: ← 4.6.d Handling Invalid Responses · ← 4.6 Overview