Episode 4 — Generative AI Engineering / 4.18 — Building a Simple Multi Agent Workflow

4.18.d — Validation and Error Handling

In one sentence: A multi-agent pipeline is only as strong as its weakest link — this section covers validating output at every agent step, handling error propagation when one agent fails, implementing retry strategies, building fallback responses, logging the entire pipeline for debugging, and testing multi-agent systems in isolation and integration.

Navigation: ← 4.18.c — ImageKit Direction: SEO Pipeline · 4.18 Exercise Questions →


1. Why Every Agent Step Needs Validation

In a multi-agent pipeline, bad data compounds. If Agent 1 produces slightly wrong output and you pass it unchecked to Agent 2, Agent 2 produces output that is based on wrong assumptions, and Agent 3 produces something completely useless. By the time you notice the problem, you cannot tell where it started.

┌─────────────────────────────────────────────────────────────────────────┐
│              WITHOUT VALIDATION AT EACH STEP                             │
│                                                                         │
│  Input ──► Agent 1 ──► Agent 2 ──► Agent 3 ──► Output (garbage)         │
│                │                                                        │
│                └── Produces slightly wrong data                         │
│                    (missing field, wrong type, hallucinated value)       │
│                         │                                               │
│                         └── Agent 2 works with bad input                │
│                             Produces worse data                         │
│                                  │                                      │
│                                  └── Agent 3 amplifies the error        │
│                                      Final output: COMPLETELY WRONG     │
│                                      Debugging: WHERE did it go wrong?  │
│                                      Answer: WHO KNOWS                  │
│                                                                         │
│              WITH VALIDATION AT EACH STEP                                │
│                                                                         │
│  Input ──► Agent 1 ──► [Zod] ──► Agent 2 ──► [Zod] ──► Agent 3 ──►    │
│                          │                      │                       │
│                     ✓ PASS or              ✓ PASS or                    │
│                     ✗ FAIL FAST            ✗ FAIL FAST                  │
│                       │                      │                          │
│                       └── Exact error:       └── Exact error:           │
│                          "Agent 1 field X       "Agent 2 field Y        │
│                           is missing"            wrong type"            │
└─────────────────────────────────────────────────────────────────────────┘

The Validation Contract

Every agent in the pipeline has:

  1. An input expectation — what shape of data it requires
  2. An output schema (Zod) — what shape of data it must produce
  3. A validation step — immediately after the agent runs, before passing data downstream
import { z } from "zod";

// Agent 1's contract
const Agent1OutputSchema = z.object({
  analysis: z.string().min(10),
  scores: z.array(z.number().min(0).max(10)).min(1),
  category: z.enum(["A", "B", "C"]),
});

// Running Agent 1 with validation
async function runAgent1(input) {
  const rawOutput = await callLLM(agent1Prompt, input);
  const parsed = JSON.parse(rawOutput);

  // This line is the CRITICAL validation step
  // If it fails, the pipeline stops HERE — not three agents later
  const validated = Agent1OutputSchema.parse(parsed);

  return validated;
}

2. What Happens When One Agent Fails

When an agent fails (invalid JSON, schema validation error, LLM timeout, empty response), the pipeline needs a strategy. There are three main approaches:

2.1 Fail Fast (Recommended Default)

Stop the entire pipeline immediately and report which agent failed and why.

async function runPipelineFailFast(input) {
  try {
    const step1 = await runAgent1(input);
    const step2 = await runAgent2(step1);
    const step3 = await runAgent3(step2);
    return { success: true, result: step3 };
  } catch (error) {
    return {
      success: false,
      failedAt: error.agentName || "unknown",
      error: error.message,
      // Include partial results for debugging
      partialResults: error.partialResults || null,
    };
  }
}

When to use: Most cases. Better to return nothing than to return bad data.

2.2 Fail with Partial Results

Stop the pipeline but return whatever was successfully completed.

async function runPipelinePartialResults(input) {
  const results = {};

  try {
    results.step1 = await runAgent1(input);
  } catch (error) {
    return {
      success: false,
      failedAt: "Agent 1",
      error: error.message,
      completedSteps: {},
    };
  }

  try {
    results.step2 = await runAgent2(results.step1);
  } catch (error) {
    return {
      success: false,
      failedAt: "Agent 2",
      error: error.message,
      completedSteps: { step1: results.step1 },
    };
  }

  try {
    results.step3 = await runAgent3(results.step2);
  } catch (error) {
    return {
      success: false,
      failedAt: "Agent 3",
      error: error.message,
      completedSteps: { step1: results.step1, step2: results.step2 },
    };
  }

  return { success: true, result: results };
}

When to use: When partial results are useful (e.g., analysis is valuable even without the final generation step).

2.3 Fail with Fallback

When an agent fails, substitute a default/fallback response and continue.

async function runPipelineWithFallbacks(input) {
  let step1;
  try {
    step1 = await runAgent1(input);
  } catch (error) {
    console.warn(`Agent 1 failed: ${error.message}. Using fallback.`);
    step1 = getAgent1Fallback(input);
  }

  let step2;
  try {
    step2 = await runAgent2(step1);
  } catch (error) {
    console.warn(`Agent 2 failed: ${error.message}. Using fallback.`);
    step2 = getAgent2Fallback(step1);
  }

  let step3;
  try {
    step3 = await runAgent3(step2);
  } catch (error) {
    console.warn(`Agent 3 failed: ${error.message}. Using fallback.`);
    step3 = getAgent3Fallback(step2);
  }

  return { result: step3, usedFallbacks: true };
}

When to use: When you MUST return something (e.g., user-facing product where no response is worse than a mediocre response).

Comparison

StrategyReliabilityUser ExperienceData QualityDebugging
Fail FastPipeline stops on first errorNo resultHigh (never returns bad data)Easy (exact failure point)
Partial ResultsReturns what completedSome useful dataHigh for completed stepsEasy
FallbackAlways returns somethingAlways gets a resultVariable (fallbacks may be generic)Harder (must check if fallbacks were used)

3. Error Propagation Patterns

3.1 The Error Types You'll Encounter

┌──────────────────────────────────────────────────────────────────────┐
│  ERROR TYPES IN MULTI-AGENT PIPELINES                                 │
│                                                                       │
│  1. LLM API Error (network, rate limit, timeout)                      │
│     ├── HTTP 429: Rate limited                                        │
│     ├── HTTP 500: Server error                                        │
│     ├── HTTP 503: Service unavailable                                 │
│     └── Timeout: LLM took too long                                    │
│                                                                       │
│  2. Empty Response                                                    │
│     └── LLM returned null/undefined/empty string                      │
│                                                                       │
│  3. JSON Parse Error                                                  │
│     ├── LLM returned explanation text instead of JSON                 │
│     ├── LLM returned truncated JSON                                   │
│     └── LLM wrapped JSON in markdown code blocks                      │
│                                                                       │
│  4. Zod Validation Error                                              │
│     ├── Missing required fields                                       │
│     ├── Wrong field types (string instead of number)                  │
│     ├── Out-of-range values                                           │
│     ├── Invalid enum values                                           │
│     └── Array too short/long                                          │
│                                                                       │
│  5. Semantic Error (hardest to catch)                                 │
│     └── Valid JSON, correct types, but nonsensical content            │
│         e.g., sentiment "positive" for clearly negative text          │
└──────────────────────────────────────────────────────────────────────┘

3.2 Wrapping Errors with Context

Always wrap errors with the agent name and step number so you know exactly where things went wrong:

class PipelineError extends Error {
  constructor(agentName, stepNumber, originalError, partialResults = null) {
    super(`Pipeline failed at step ${stepNumber} (${agentName}): ${originalError.message}`);
    this.agentName = agentName;
    this.stepNumber = stepNumber;
    this.originalError = originalError;
    this.partialResults = partialResults;
    this.isRetryable = PipelineError.isRetryable(originalError);
  }

  static isRetryable(error) {
    // Rate limits and server errors are retryable
    if (error.status === 429 || error.status === 500 || error.status === 503) {
      return true;
    }
    // Zod validation errors are retryable (LLM might produce valid output next time)
    if (error.name === "ZodError") {
      return true;
    }
    // JSON parse errors are retryable
    if (error.message.includes("invalid JSON")) {
      return true;
    }
    return false;
  }
}

// Usage in pipeline
async function runAgentSafe(agent, input, stepNumber) {
  try {
    return await runAgent(agent, input);
  } catch (error) {
    throw new PipelineError(agent.name, stepNumber, error);
  }
}

4. Retry Strategies for Individual Agents

When an agent fails, you often want to retry it rather than failing the entire pipeline. The key is to retry intelligently.

4.1 Simple Retry with Limit

async function runAgentWithRetry(agent, input, maxRetries = 3) {
  let lastError;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      console.log(`${agent.name}: Attempt ${attempt}/${maxRetries}`);
      const result = await runAgent(agent, input);
      return result;
    } catch (error) {
      lastError = error;
      console.warn(`${agent.name}: Attempt ${attempt} failed — ${error.message}`);

      if (attempt < maxRetries) {
        // Wait before retrying (simple linear backoff)
        const waitMs = attempt * 1000;
        console.log(`Waiting ${waitMs}ms before retry...`);
        await new Promise(resolve => setTimeout(resolve, waitMs));
      }
    }
  }

  throw new Error(
    `${agent.name} failed after ${maxRetries} attempts. Last error: ${lastError.message}`
  );
}

4.2 Exponential Backoff

async function runAgentWithExponentialBackoff(agent, input, maxRetries = 3) {
  let lastError;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await runAgent(agent, input);
    } catch (error) {
      lastError = error;

      if (attempt < maxRetries) {
        // Exponential backoff: 1s, 2s, 4s, 8s, ...
        const baseWait = Math.pow(2, attempt - 1) * 1000;
        // Add jitter (random 0-500ms) to prevent thundering herd
        const jitter = Math.random() * 500;
        const waitMs = baseWait + jitter;
        console.log(`${agent.name}: Retry in ${Math.round(waitMs)}ms (attempt ${attempt}/${maxRetries})`);
        await new Promise(resolve => setTimeout(resolve, waitMs));
      }
    }
  }

  throw new Error(
    `${agent.name} failed after ${maxRetries} retries. Last error: ${lastError.message}`
  );
}

4.3 Smart Retry with Validation Feedback

When the failure is a Zod validation error, you can feed the error back to the LLM so it can correct its output:

async function runAgentWithValidationFeedback(agent, input, maxRetries = 3) {
  let lastError;
  let messages = [
    { role: "system", content: agent.systemPrompt },
    { role: "user", content: JSON.stringify(input) },
  ];

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model: agent.model,
        temperature: agent.temperature || 0.7,
        messages,
      });

      const raw = response.choices[0].message.content;
      if (!raw) throw new Error("Empty response");

      let parsed;
      try {
        parsed = JSON.parse(raw);
      } catch {
        const match = raw.match(/```(?:json)?\s*([\s\S]*?)```/);
        if (match) parsed = JSON.parse(match[1].trim());
        else throw new Error(`Invalid JSON: ${raw.substring(0, 200)}`);
      }

      const validated = agent.outputSchema.parse(parsed);
      return validated;
    } catch (error) {
      lastError = error;

      if (attempt < maxRetries && error.name === "ZodError") {
        // Feed validation error back to the LLM
        const errorFeedback = error.issues.map(issue =>
          `Field "${issue.path.join(".")}": ${issue.message}`
        ).join("\n");

        messages.push(
          { role: "assistant", content: "(previous invalid response)" },
          {
            role: "user",
            content: `Your previous response had validation errors:\n${errorFeedback}\n\nPlease fix these issues and respond again with valid JSON.`,
          }
        );

        console.log(`${agent.name}: Feeding validation errors back (attempt ${attempt})`);
      } else if (attempt < maxRetries) {
        // For non-validation errors, just retry fresh
        const waitMs = Math.pow(2, attempt - 1) * 1000;
        await new Promise(resolve => setTimeout(resolve, waitMs));
      }
    }
  }

  throw new Error(
    `${agent.name} failed after ${maxRetries} retries. Last error: ${lastError.message}`
  );
}

Retry Strategy Comparison

StrategyWhen to UseProsCons
Simple retryQuick failures (network blips)Simple, fastDoesn't learn from errors
Exponential backoffRate limits, server overloadRespects API limitsSlower recovery
Validation feedbackZod errors (wrong format)LLM can self-correctUses more tokens, adds messages

5. Fallback Responses

When retries are exhausted and the pipeline cannot continue, you need a fallback — a safe, pre-defined response that the rest of the system can work with.

5.1 Static Fallbacks

function getProfileAnalysisFallback(profile) {
  return {
    strengths: [
      {
        category: "overall",
        description: "Profile contains basic information needed for matching",
        impactScore: 5,
      },
    ],
    weaknesses: [
      {
        category: "overall",
        description: "Profile could not be fully analyzed — please review manually",
        severity: "medium",
        suggestion: "Try updating your profile with more specific details",
      },
    ],
    overallScore: 5,
    profilePersonality: "Unable to determine personality — analysis service temporarily unavailable",
    improvementTips: ["Please try again later for a detailed analysis"],
    toneAnalysis: {
      currentTone: "unknown",
      suggestedTone: "authentic",
      reasoning: "Analysis service unavailable",
    },
  };
}

5.2 Graceful Degradation

Instead of a static fallback, degrade gracefully by doing simpler processing:

async function getProfileAnalysisWithDegradation(profile, maxRetries = 3) {
  // Try full analysis first
  try {
    return await runAgentWithRetry(profileAnalyzerAgent, profile, maxRetries);
  } catch (error) {
    console.warn(`Full analysis failed: ${error.message}`);
  }

  // Fallback: try a simpler model (cheaper, faster, but less capable)
  try {
    const simpleAgent = { ...profileAnalyzerAgent, model: "gpt-4o-mini" };
    return await runAgentWithRetry(simpleAgent, profile, 2);
  } catch (error) {
    console.warn(`Simple analysis failed: ${error.message}`);
  }

  // Last resort: rule-based fallback (no LLM at all)
  return ruleBasedAnalysis(profile);
}

function ruleBasedAnalysis(profile) {
  const strengths = [];
  const weaknesses = [];

  // Simple heuristics
  if (profile.bio.length > 100) {
    strengths.push({
      category: "bio",
      description: "Bio has good length with plenty of detail",
      impactScore: 6,
    });
  } else {
    weaknesses.push({
      category: "bio",
      description: "Bio is very short — consider adding more detail",
      severity: "high",
      suggestion: "Aim for at least 100 characters",
    });
  }

  if (profile.interests.length >= 4) {
    strengths.push({
      category: "interests",
      description: "Good variety of interests listed",
      impactScore: 5,
    });
  }

  return {
    strengths: strengths.length > 0 ? strengths : [{
      category: "overall",
      description: "Profile exists and has basic information",
      impactScore: 3,
    }],
    weaknesses: weaknesses.length > 0 ? weaknesses : [{
      category: "overall",
      description: "Could not identify specific weaknesses without AI analysis",
      severity: "low",
      suggestion: "Try again later for detailed feedback",
    }],
    overallScore: 5,
    profilePersonality: "Analysis performed with basic heuristics — limited insight available",
    improvementTips: ["Full AI analysis unavailable — basic review completed"],
    toneAnalysis: {
      currentTone: "unknown",
      suggestedTone: "authentic",
      reasoning: "Rule-based fallback — no tone analysis available",
    },
  };
}

Degradation Ladder

┌───────────────────────────────────────────────────────────────┐
│  GRACEFUL DEGRADATION LADDER                                   │
│                                                                │
│  Level 1: Full pipeline with GPT-4o (best quality)             │
│     │                                                          │
│     └── Failed? Try Level 2                                    │
│                                                                │
│  Level 2: Full pipeline with GPT-4o-mini (faster, cheaper)     │
│     │                                                          │
│     └── Failed? Try Level 3                                    │
│                                                                │
│  Level 3: Simplified pipeline (fewer agents, simpler prompts)  │
│     │                                                          │
│     └── Failed? Try Level 4                                    │
│                                                                │
│  Level 4: Rule-based heuristics (no LLM, just code logic)     │
│     │                                                          │
│     └── Failed? Return static fallback                         │
│                                                                │
│  Level 5: Static fallback response (hardcoded safe default)    │
└───────────────────────────────────────────────────────────────┘

6. Logging the Entire Pipeline

Good logging is essential for debugging multi-agent pipelines. You need to know exactly what happened at every step.

6.1 Pipeline Logger

class PipelineLogger {
  constructor(pipelineName) {
    this.pipelineName = pipelineName;
    this.startTime = Date.now();
    this.steps = [];
    this.metadata = {};
  }

  logStep(agentName, { status, duration, input, output, error, retryCount = 0 }) {
    const step = {
      agent: agentName,
      status,            // "success" | "failed" | "fallback"
      duration,          // milliseconds
      retryCount,
      timestamp: new Date().toISOString(),
      inputSummary: this.summarize(input),
      outputSummary: status === "success" ? this.summarize(output) : null,
      error: error ? {
        message: error.message,
        type: error.name,
        // For Zod errors, include the specific issues
        issues: error.issues || null,
      } : null,
    };

    this.steps.push(step);
    this.printStep(step);
    return step;
  }

  summarize(data) {
    if (!data) return null;
    const json = JSON.stringify(data);
    if (json.length <= 200) return json;
    return json.substring(0, 200) + "... (truncated)";
  }

  printStep(step) {
    const icon = step.status === "success" ? "[OK]" : step.status === "fallback" ? "[FB]" : "[!!]";
    console.log(
      `${icon} ${step.agent} | ${step.status} | ${step.duration}ms` +
      (step.retryCount > 0 ? ` | retries: ${step.retryCount}` : "") +
      (step.error ? ` | error: ${step.error.message}` : "")
    );
  }

  getReport() {
    const totalDuration = Date.now() - this.startTime;
    return {
      pipeline: this.pipelineName,
      totalDuration,
      stepsCompleted: this.steps.filter(s => s.status === "success").length,
      stepsFailed: this.steps.filter(s => s.status === "failed").length,
      stepsFallback: this.steps.filter(s => s.status === "fallback").length,
      totalRetries: this.steps.reduce((sum, s) => sum + s.retryCount, 0),
      steps: this.steps,
      overallStatus: this.steps.every(s => s.status !== "failed") ? "completed" : "failed",
    };
  }

  printReport() {
    const report = this.getReport();
    console.log("\n╔══════════════════════════════════════════╗");
    console.log(`║  PIPELINE REPORT: ${report.pipeline}`);
    console.log("╠══════════════════════════════════════════╣");
    console.log(`║  Status:     ${report.overallStatus}`);
    console.log(`║  Duration:   ${report.totalDuration}ms`);
    console.log(`║  Steps OK:   ${report.stepsCompleted}`);
    console.log(`║  Steps Fail: ${report.stepsFailed}`);
    console.log(`║  Fallbacks:  ${report.stepsFallback}`);
    console.log(`║  Retries:    ${report.totalRetries}`);
    console.log("╚══════════════════════════════════════════╝");
    return report;
  }
}

6.2 Using the Logger in a Pipeline

async function runLoggedPipeline(input) {
  const logger = new PipelineLogger("Hinge Profile Pipeline");

  let step1Result;
  const step1Start = Date.now();
  try {
    step1Result = await runAgentWithRetry(profileAnalyzerAgent, input, 3);
    logger.logStep("Profile Analyzer", {
      status: "success",
      duration: Date.now() - step1Start,
      input,
      output: step1Result,
    });
  } catch (error) {
    logger.logStep("Profile Analyzer", {
      status: "failed",
      duration: Date.now() - step1Start,
      input,
      error,
      retryCount: 3,
    });

    // Use fallback
    step1Result = getProfileAnalysisFallback(input);
    logger.logStep("Profile Analyzer (Fallback)", {
      status: "fallback",
      duration: 0,
      output: step1Result,
    });
  }

  // ... continue for Agent 2 and Agent 3 ...

  const report = logger.printReport();
  return { result: step1Result, report };
}

6.3 Example Log Output

--- Pipeline: Hinge Profile Pipeline ---
[OK] Profile Analyzer       | success  | 2341ms
[OK] Bio Improver           | success  | 1876ms | retries: 1
[!!] Conversation Generator | failed   | 3012ms | retries: 3 | error: Invalid enum value
[FB] Conversation Generator (Fallback) | fallback | 0ms

╔══════════════════════════════════════════╗
║  PIPELINE REPORT: Hinge Profile Pipeline
╠══════════════════════════════════════════╣
║  Status:     completed
║  Duration:   7229ms
║  Steps OK:   2
║  Steps Fail: 1
║  Fallbacks:  1
║  Retries:    4
╚══════════════════════════════════════════╝

7. Testing Multi-Agent Pipelines

Testing multi-agent pipelines requires testing at three levels: unit (individual agents), integration (agent-to-agent handoffs), and end-to-end (full pipeline).

7.1 Unit Testing Individual Agents

Test each agent in isolation by mocking the LLM response:

// test/agent1.test.js
import { describe, it, expect, vi } from "vitest";
import { ProfileAnalysisSchema } from "../schemas.js";

describe("Profile Analyzer Agent", () => {
  it("should produce valid output for a complete profile", async () => {
    // Mock LLM response
    const mockResponse = {
      strengths: [
        { category: "bio", description: "Bio has specific details about hobbies", impactScore: 7 },
      ],
      weaknesses: [
        { category: "bio", description: "Generic opener reduces engagement", severity: "high",
          suggestion: "Replace 'Hey' with something specific" },
      ],
      overallScore: 6,
      profilePersonality: "Adventurous person who enjoys outdoor activities and cooking",
      improvementTips: ["Add conversation hooks", "Be more specific"],
      toneAnalysis: {
        currentTone: "casual-generic",
        suggestedTone: "warm-specific",
        reasoning: "Generic tone does not stand out",
      },
    };

    // Validate against schema
    const result = ProfileAnalysisSchema.safeParse(mockResponse);
    expect(result.success).toBe(true);
  });

  it("should reject output with missing required fields", () => {
    const badResponse = {
      strengths: [{ category: "bio", description: "Good bio", impactScore: 7 }],
      // Missing: weaknesses, overallScore, etc.
    };

    const result = ProfileAnalysisSchema.safeParse(badResponse);
    expect(result.success).toBe(false);
    expect(result.error.issues[0].path).toContain("weaknesses");
  });

  it("should reject output with invalid enum values", () => {
    const badResponse = {
      strengths: [
        { category: "appearance", description: "Good looking photos", impactScore: 7 },
        // "appearance" is not a valid category
      ],
      weaknesses: [
        { category: "bio", description: "Too short bio text", severity: "critical",
          suggestion: "Make it longer" },
        // "critical" is not a valid severity
      ],
      overallScore: 6,
      profilePersonality: "A person who takes good photos and has style",
      improvementTips: ["Add more details"],
      toneAnalysis: { currentTone: "casual", suggestedTone: "warm", reasoning: "Better fit" },
    };

    const result = ProfileAnalysisSchema.safeParse(badResponse);
    expect(result.success).toBe(false);
  });

  it("should reject out-of-range scores", () => {
    const badResponse = {
      strengths: [{ category: "bio", description: "Good length and details", impactScore: 15 }],
      weaknesses: [{ category: "bio", description: "Could be more specific", severity: "low",
        suggestion: "Add specific details" }],
      overallScore: 12,  // max is 10
      profilePersonality: "An interesting person with varied hobbies",
      improvementTips: ["Keep doing what you are doing"],
      toneAnalysis: { currentTone: "good", suggestedTone: "great", reasoning: "Already good" },
    };

    const result = ProfileAnalysisSchema.safeParse(badResponse);
    expect(result.success).toBe(false);
  });
});

7.2 Integration Testing Agent Handoffs

Test that the output of Agent 1 can be correctly consumed by Agent 2:

describe("Agent 1 → Agent 2 Handoff", () => {
  it("Agent 1 output should be valid input for Agent 2", () => {
    // Simulate Agent 1's output
    const agent1Output = {
      strengths: [
        { category: "interests", description: "Diverse range of interests", impactScore: 7 },
      ],
      weaknesses: [
        { category: "bio", description: "Generic and forgettable opener", severity: "high",
          suggestion: "Start with something unique" },
      ],
      overallScore: 5,
      profilePersonality: "Adventurous tech professional who loves the outdoors",
      improvementTips: ["Replace generic opener", "Add humor"],
      toneAnalysis: {
        currentTone: "casual-generic",
        suggestedTone: "warm-authentic",
        reasoning: "Current tone blends in too much",
      },
    };

    // Validate it passes Agent 1's schema
    expect(ProfileAnalysisSchema.safeParse(agent1Output).success).toBe(true);

    // Build Agent 2's input from Agent 1's output
    const agent2Input = {
      originalBio: "Hey I'm Alex. I like hiking.",
      interests: ["Hiking", "Cooking"],
      name: "Alex",
      analysis: {
        weaknesses: agent1Output.weaknesses,
        improvementTips: agent1Output.improvementTips,
        toneAnalysis: agent1Output.toneAnalysis,
        profilePersonality: agent1Output.profilePersonality,
      },
    };

    // Verify Agent 2's input is well-formed
    expect(agent2Input.analysis.weaknesses.length).toBeGreaterThan(0);
    expect(agent2Input.analysis.improvementTips.length).toBeGreaterThan(0);
    expect(agent2Input.originalBio).toBeTruthy();
  });
});

7.3 End-to-End Testing with Snapshot Validation

Test the full pipeline using recorded responses:

describe("Full Pipeline E2E", () => {
  it("should produce valid final output", async () => {
    // Use a recorded/mocked pipeline run
    const result = await runPipelineWithMockedLLM(sampleProfile);

    // Validate the entire output
    const validation = HingePipelineOutputSchema.safeParse(result);
    expect(validation.success).toBe(true);

    // Check structural requirements
    expect(result.analysis.strengths.length).toBeGreaterThan(0);
    expect(result.analysis.weaknesses.length).toBeGreaterThan(0);
    expect(result.improvedBio.improvedBio.length).toBeGreaterThan(20);
    expect(result.conversationStarters.openers.length).toBeGreaterThanOrEqual(3);
    expect(result.pipelineMetadata.agentCount).toBe(3);
  });

  it("should handle pipeline failure gracefully", async () => {
    // Force Agent 2 to fail
    const result = await runPipelineWithFallbacks(sampleProfile);
    // Should still return a result (with fallback)
    expect(result).toBeTruthy();
  });
});

7.4 Testing Checklist

┌──────────────────────────────────────────────────────────────────────┐
│  TESTING CHECKLIST FOR MULTI-AGENT PIPELINES                         │
│                                                                       │
│  Unit Tests (per agent):                                              │
│  [ ] Valid output matches Zod schema                                  │
│  [ ] Missing required fields are rejected                             │
│  [ ] Invalid enum values are rejected                                 │
│  [ ] Out-of-range numbers are rejected                                │
│  [ ] Empty/null responses are handled                                 │
│  [ ] JSON wrapped in markdown is extracted                            │
│                                                                       │
│  Integration Tests (agent-to-agent):                                  │
│  [ ] Agent 1 output is valid Agent 2 input                            │
│  [ ] Agent 2 output is valid Agent 3 input                            │
│  [ ] Selective context correctly extracts needed fields               │
│                                                                       │
│  E2E Tests (full pipeline):                                           │
│  [ ] Happy path produces valid final output                           │
│  [ ] Pipeline fails gracefully when Agent 1 fails                     │
│  [ ] Pipeline fails gracefully when Agent 2 fails                     │
│  [ ] Pipeline fails gracefully when Agent 3 fails                     │
│  [ ] Fallback responses pass schema validation                        │
│  [ ] Pipeline metadata (duration, agent count) is correct             │
│                                                                       │
│  Performance Tests:                                                   │
│  [ ] Pipeline completes within acceptable time                        │
│  [ ] Retry logic respects max retry count                             │
│  [ ] Concurrent pipeline runs don't interfere                         │
└──────────────────────────────────────────────────────────────────────┘

8. Production-Ready Pipeline Runner

Here is a complete, production-ready pipeline runner that combines all the error handling, retry, fallback, and logging strategies:

import { z } from "zod";
import OpenAI from "openai";

const client = new OpenAI();

// ═══════════════════════════════════════════════════════
// PRODUCTION PIPELINE RUNNER
// ═══════════════════════════════════════════════════════

class ProductionPipeline {
  constructor(name, options = {}) {
    this.name = name;
    this.agents = [];
    this.maxRetries = options.maxRetries || 3;
    this.enableFallbacks = options.enableFallbacks || false;
    this.logger = new PipelineLogger(name);
  }

  addAgent({ name, systemPrompt, outputSchema, model, temperature, fallback }) {
    this.agents.push({
      name,
      systemPrompt,
      outputSchema,
      model: model || "gpt-4o",
      temperature: temperature || 0.7,
      fallback: fallback || null,
    });
    return this; // Allow chaining
  }

  async callLLM(agent, messages) {
    const response = await client.chat.completions.create({
      model: agent.model,
      temperature: agent.temperature,
      messages,
    });
    return response.choices[0].message.content;
  }

  parseJSON(raw, agentName) {
    if (!raw) throw new Error(`${agentName}: Empty response`);

    try {
      return JSON.parse(raw);
    } catch {
      const match = raw.match(/```(?:json)?\s*([\s\S]*?)```/);
      if (match) return JSON.parse(match[1].trim());
      throw new Error(`${agentName}: Invalid JSON`);
    }
  }

  async runSingleAgent(agent, input) {
    let lastError;
    let messages = [
      { role: "system", content: agent.systemPrompt },
      { role: "user", content: JSON.stringify(input) },
    ];

    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        const raw = await this.callLLM(agent, messages);
        const parsed = this.parseJSON(raw, agent.name);
        return agent.outputSchema.parse(parsed);
      } catch (error) {
        lastError = error;

        if (attempt < this.maxRetries) {
          if (error.name === "ZodError") {
            // Feed validation errors back
            const feedback = error.issues
              .map(i => `"${i.path.join(".")}": ${i.message}`)
              .join("\n");
            messages.push(
              { role: "assistant", content: "(invalid)" },
              { role: "user", content: `Validation errors:\n${feedback}\nFix and respond with valid JSON.` }
            );
          } else {
            // Exponential backoff for other errors
            const wait = Math.pow(2, attempt - 1) * 1000 + Math.random() * 500;
            await new Promise(r => setTimeout(r, wait));
          }
        }
      }
    }

    throw lastError;
  }

  async run(initialInput, contextBuilder) {
    let currentData = initialInput;
    const results = {};

    for (let i = 0; i < this.agents.length; i++) {
      const agent = this.agents[i];
      const stepStart = Date.now();

      // Build input — either direct pass-through or custom context
      const agentInput = contextBuilder
        ? contextBuilder(i, agent.name, currentData, results, initialInput)
        : currentData;

      try {
        const output = await this.runSingleAgent(agent, agentInput);
        const duration = Date.now() - stepStart;

        this.logger.logStep(agent.name, {
          status: "success", duration, input: agentInput, output,
        });

        results[agent.name] = output;
        currentData = output;
      } catch (error) {
        const duration = Date.now() - stepStart;

        this.logger.logStep(agent.name, {
          status: "failed", duration, input: agentInput, error,
          retryCount: this.maxRetries,
        });

        if (this.enableFallbacks && agent.fallback) {
          const fallbackResult = typeof agent.fallback === "function"
            ? agent.fallback(agentInput)
            : agent.fallback;

          this.logger.logStep(`${agent.name} (Fallback)`, {
            status: "fallback", duration: 0, output: fallbackResult,
          });

          results[agent.name] = fallbackResult;
          currentData = fallbackResult;
        } else {
          const report = this.logger.printReport();
          throw Object.assign(error, { pipelineReport: report });
        }
      }
    }

    const report = this.logger.printReport();
    return { results, finalOutput: currentData, report };
  }
}

// ═══════════════════════════════════════════════════════
// USAGE EXAMPLE
// ═══════════════════════════════════════════════════════

const pipeline = new ProductionPipeline("Hinge Profile Pipeline", {
  maxRetries: 3,
  enableFallbacks: true,
});

pipeline
  .addAgent({
    name: "Profile Analyzer",
    systemPrompt: PROFILE_ANALYZER_PROMPT,
    outputSchema: ProfileAnalysisSchema,
    temperature: 0.7,
    fallback: (input) => getProfileAnalysisFallback(input),
  })
  .addAgent({
    name: "Bio Improver",
    systemPrompt: BIO_IMPROVER_PROMPT,
    outputSchema: ImprovedBioSchema,
    temperature: 0.8,
    fallback: (input) => getBioImproverFallback(input),
  })
  .addAgent({
    name: "Conversation Starter Generator",
    systemPrompt: CONVERSATION_STARTER_PROMPT,
    outputSchema: ConversationStartersSchema,
    temperature: 0.9,
    fallback: (input) => getConversationStartersFallback(input),
  });

// Custom context builder (selective context for each agent)
function contextBuilder(stepIndex, agentName, currentData, results, originalInput) {
  switch (stepIndex) {
    case 0: return originalInput; // Agent 1 gets original input
    case 1: return {              // Agent 2 gets selective context
      originalBio: originalInput.bio,
      interests: originalInput.interests,
      name: originalInput.name,
      analysis: {
        weaknesses: currentData.weaknesses,
        improvementTips: currentData.improvementTips,
        toneAnalysis: currentData.toneAnalysis,
        profilePersonality: currentData.profilePersonality,
      },
    };
    case 2: return {              // Agent 3 gets selective context
      improvedBio: currentData.improvedBio,
      conversationHooks: currentData.conversationStarters,
      interests: originalInput.interests,
      name: originalInput.name,
      lookingFor: originalInput.lookingFor,
    };
    default: return currentData;
  }
}

const result = await pipeline.run(sampleProfile, contextBuilder);
console.log(result.finalOutput);
console.log(result.report);

9. Key Takeaways

  1. Validate at every step — Zod validation between agents catches errors immediately instead of letting bad data propagate and compound.
  2. Five error types to handle: LLM API errors, empty responses, JSON parse errors, Zod validation errors, and semantic errors (valid but wrong).
  3. Three failure strategies: Fail fast (safest), fail with partial results (useful for debugging), and fail with fallback (best UX).
  4. Smart retries feed validation errors back to the LLM — the model can often self-correct when told exactly what was wrong.
  5. Graceful degradation ladder: Full model → cheaper model → simpler pipeline → rule-based heuristics → static fallback.
  6. Logging every step with duration, status, retries, and errors makes debugging multi-agent pipelines practical.
  7. Test at three levels: Unit (each agent's schema), integration (agent-to-agent handoffs), and end-to-end (full pipeline).
  8. The ProductionPipeline class wraps all these concerns into a reusable, configurable runner.

Explain-It Challenge

  1. You have a 3-agent pipeline where Agent 2 keeps failing with Zod validation errors. Walk through the exact debugging process: what do you check first, second, third? What logs do you look at?
  2. Explain why "validation feedback retry" (feeding Zod errors back to the LLM) is more effective than simple retry for schema validation errors, but NOT more effective for API rate limit errors.
  3. Design a fallback strategy for the ImageKit SEO Pipeline (4.18.c). For each agent, define what a reasonable fallback response would look like and explain why it's "good enough" for the downstream agent.

Navigation: ← 4.18.c — ImageKit Direction: SEO Pipeline · 4.18 Exercise Questions →