Episode 4 — Generative AI Engineering / 4.18 — Building a Simple Multi Agent Workflow

4.18 — Exercise Questions

How to use this material (instructions): Work through these questions in order. For coding questions, write the code yourself before checking the hint. For conceptual questions, try to answer from memory first. Questions progress from basic to advanced within each section.

Navigation: ← 4.18.d — Validation and Error Handling · 4.18 Interview Questions →

Section A — Multi-Agent Pipeline Design (4.18.a)

Conceptual Questions

Q1. What is a multi-agent pipeline, and how does it differ from a single-agent approach?

Hint: Think assembly line vs. one worker doing everything. Key differences: prompt complexity, reliability, debugging, testing, iteration.

Q2. Name the four main pipeline architecture patterns covered in 4.18.a and give one real-world use case for each.

Hint: Sequential, Parallel, Fan-out/Fan-in, Conditional (Router). Consider content creation, data analysis, customer support, and dynamic task decomposition.

Q3. Explain the Single Responsibility Principle as it applies to agent design. Why should each agent have only one job?

Hint: Testing in isolation, independent model selection, clear failure boundaries, simpler prompts. Reference the anti-patterns: too much, too little, overlapping.

Q4. What is a "schema contract" between agents, and why are Zod schemas ideal for this?

Hint: Formal definition of output/input shape. Zod provides runtime validation, TypeScript inference, and self-documenting schemas.

Q5. Describe three data flow patterns between agents. When would you choose "selective context" over "accumulated context"?

Hint: Direct pass-through, accumulated context, selective context. Choose selective when agents need minimal context to reduce token usage and keep focus.

Coding Questions

Q6. Write a Zod schema for a "Movie Recommendation Agent" that outputs:

recommendations: array of objects with title (string), year (number, 1900-2030), genre (enum: action, comedy, drama, thriller, sci-fi), matchScore (number 0-100)
reasoning: string (min 20 chars)
mood: string

Hint: Use z.array(z.object({...})), z.number().min().max(), z.enum([...]).

Q7. Implement a basic sequential pipeline runner that takes an array of agent functions and chains them:

const result = await runSequential([agent1, agent2, agent3], input);

Hint: Use a for loop, pass each agent's output as the next agent's input. Wrap in try/catch to report which agent failed.

Q8. Write a parallel pipeline that runs three agents simultaneously on the same input and combines their results:

const combined = await runParallel([agentA, agentB, agentC], input);

Hint: Promise.all() or Promise.allSettled() for parallel execution. Combine results into a single object.

Q9. Design Zod schemas for a 3-agent "Recipe Generator" pipeline:

Agent 1: Ingredient Analyzer (input: list of ingredients → output: flavor profiles, cuisine suggestions, dietary info)
Agent 2: Recipe Creator (input: analysis → output: recipe name, instructions, timing)
Agent 3: Nutrition Calculator (input: recipe → output: calories, macros, health score)

Hint: Define three separate schemas. Make sure Agent 2's input matches Agent 1's output shape.

Q10. Implement a conditional (router) pipeline that routes input to different pipelines based on content type:

Hint: Create a router function that examines the input and returns a pipeline identifier. Use a switch statement to select the right pipeline.

Section B — Hinge Direction (4.18.b)

Conceptual Questions

Q11. Why does the Hinge pipeline use three agents instead of one? What specific problems would a single agent face?

Hint: Analyzing a profile requires different skills than rewriting a bio, which requires different skills than generating conversation starters. A single prompt would be too long, too complex, and produce inconsistent results.

Q12. Explain why Agent 2 (Bio Improver) uses "selective context" — receiving specific fields from Agent 1's output rather than the entire output.

Hint: Reduces token usage, keeps Agent 2 focused on writing (not re-analyzing), and each field is deliberately chosen for the writing task.

Q13. Why does the Conversation Starter Generator use a higher temperature (0.9) than the Profile Analyzer (0.7)?

Hint: Conversation starters need maximum creativity and variety. Analysis needs consistency and accuracy. Temperature controls this tradeoff.

Q14. What would happen if you removed Zod validation between Agent 1 and Agent 2? Give a specific scenario where this would cause problems.

Hint: Agent 1 could return overallScore: "high" (string instead of number), and Agent 2 would try to work with broken analysis data, producing a confusing rewrite.

Coding Questions

Q15. Add a z.refine() to the ProfileAnalysisSchema that ensures overallScore is consistent with the severity of weaknesses — if any weakness has severity "high", the overall score should be <= 7.

Hint: .refine(data => !(data.weaknesses.some(w => w.severity === "high") && data.overallScore > 7), { message: "..." })

Q16. Modify the Bio Improver's system prompt to also accept a targetAudience parameter (e.g., "professionals aged 25-35") and adjust the writing style accordingly.

Hint: Add the parameter to the input, mention it in the system prompt rules, and update the schema if needed.

Q17. Write a new Agent 4: "Profile Photo Caption Suggester" that takes the improved bio and photo descriptions, and generates caption suggestions for each photo. Define the Zod schema and system prompt.

Hint: Output schema should include an array of objects: { photoDescription, suggestedCaption, tone, connectionToBio }.

Q18. Implement an A/B testing mechanism where the pipeline runs two different Bio Improver prompts and returns both results for comparison.

Hint: Run Agent 2 twice with different system prompts (or different temperatures) using Promise.all(), then include both results in the output.

Q19. Add input validation that rejects profiles with inappropriate content before the pipeline starts.

Hint: Use z.refine() on the UserProfileSchema to check for a blocklist of terms in the bio.

Q20. Modify the pipeline to support "streaming progress" — logging the completion of each agent step as it happens, suitable for a progress bar UI.

Hint: Accept a callback function onProgress(step, total, agentName, status) and call it after each agent completes.

Section C — ImageKit Direction (4.18.c)

Conceptual Questions

Q21. Why is the temperature pattern for the ImageKit pipeline (0.5 → 0.7 → 0.6) different from the Hinge pipeline (0.7 → 0.8 → 0.9)?

Hint: Image metadata extraction is highly factual (low temp). SEO titles need some creativity (medium). Tags need coverage but consistency (moderate). Hinge tasks are progressively more creative.

Q22. Could Agent 2 (SEO Optimizer) and Agent 3 (Tag Categorizer) run in parallel instead of sequentially? What would change?

Hint: Partially yes — if Agent 3 doesn't need Agent 2's keywords. But the current design has Agent 3 receiving SEO keywords to inform tag generation. You'd need to remove that dependency or restructure.

Q23. Why does the CategorizedTagsSchema include a confidence score for each tag? How would you use this in production?

Hint: Confidence scores enable filtering (show only tags with confidence > 0.8), sorting, and quality-based decisions. In production, you might only index high-confidence tags.

Q24. Explain the "batch processing" approach using Promise.allSettled. Why is it better than Promise.all for batch pipelines?

Hint: Promise.all fails the entire batch if one image fails. Promise.allSettled processes all images and reports individual failures without blocking others.

Coding Questions

Q25. Add a z.transform() to the SEOContentSchema that automatically truncates the meta description to 160 characters if the LLM returns one that is too long.

Hint: Apply .transform(val => val.substring(0, 160)) to the meta description field. Consider using z.string().max(160) first, then transform as a fallback.

Q26. Implement a "similarity checker" that compares tags generated by Agent 3 against the existing tags on the image and reports which new tags were added.

Hint: Compare categorizedTags.flatTags with imageInput.existingTags. Return { added: [], existing: [], overlap: number }.

Q27. Write an Agent 4: "Image Format Recommender" that takes the use-case tags from Agent 3 and recommends optimal formats (WebP, AVIF, etc.) and compression settings. Define its schema.

Hint: Output should include recommended formats ranked by suitability, compression level (0-100), estimated file sizes, and reasoning.

Q28. Modify the batch processor to implement a rate limiter that ensures no more than 5 API calls per second across all concurrent pipelines.

Hint: Use a token bucket or simple queue with timestamp tracking. Each callAgent checks the limiter before making the API call.

Q29. Add a "cost estimator" to the pipeline that tracks token usage per agent and calculates the total API cost.

Hint: OpenAI API responses include usage.prompt_tokens and usage.completion_tokens. Track these per agent and multiply by the per-token cost.

Q30. Implement a caching layer: if the same image description has been processed before, return the cached result instead of calling the pipeline again.

Hint: Hash the image description + dimensions as a cache key. Store results in a Map or Redis. Check cache before running the pipeline.

Section D — Validation and Error Handling (4.18.d)

Conceptual Questions

Q31. Name the five types of errors that can occur in a multi-agent pipeline and rank them from easiest to hardest to detect.

Hint: LLM API errors (easiest), empty responses, JSON parse errors, Zod validation errors, semantic errors (hardest). Semantic errors are valid data that is factually wrong.

Q32. Compare the three failure strategies: fail fast, fail with partial results, and fail with fallback. When would you choose each?

Hint: Fail fast for data pipelines where accuracy matters. Partial results for debugging/development. Fallback for user-facing products where something is better than nothing.

Q33. Explain why "validation feedback retry" (feeding Zod errors back to the LLM) works well for schema errors but not for API rate limit errors.

Hint: The LLM can read the Zod error and fix its output format. But rate limit errors have nothing to do with the LLM's output — they require waiting, not prompt changes.

Q34. What is the "graceful degradation ladder"? Describe all five levels.

Hint: Full model → cheaper model → simpler pipeline → rule-based heuristics → static fallback. Each level trades quality for reliability.

Coding Questions

Q35. Implement the PipelineError class with properties: agentName, stepNumber, originalError, isRetryable, and partialResults.

Hint: Extend the built-in Error class. Use a static method to determine if the error is retryable based on HTTP status codes and error types.

Q36. Write a runAgentWithValidationFeedback function that feeds Zod errors back to the LLM and retries up to 3 times.

Hint: Maintain the conversation history. When Zod validation fails, append the error as a user message and request correction.

Q37. Create a ruleBasedAnalysis fallback for the Profile Analyzer that uses simple heuristics (bio length, interest count, photo count) instead of an LLM.

Hint: Check bio.length > 100 for strength, interests.length >= 4 for strength, etc. Return a valid ProfileAnalysisSchema object.

Q38. Implement the PipelineLogger class with logStep(), getReport(), and printReport() methods.

Hint: Store an array of step objects with agent name, status, duration, retry count, and error info. Calculate totals in getReport().

Q39. Write unit tests (using Vitest or Jest) for the ProfileAnalysisSchema that test:

Valid output passes
Missing fields fail
Invalid enums fail
Out-of-range numbers fail

Hint: Use schema.safeParse() and check result.success and result.error.issues.

Q40. Implement the complete ProductionPipeline class with:

addAgent() for builder-pattern configuration
run() with retry logic and optional fallbacks
A contextBuilder callback for custom data flow between agents

Hint: Follow the pattern from section 8 of 4.18.d. The key is making run() flexible enough to support any pipeline.

Bonus Advanced Questions

Q41. Design a multi-agent pipeline for a code review system: Agent 1 analyzes code quality, Agent 2 identifies bugs, Agent 3 suggests improvements, Agent 4 writes the review summary. Define all four schemas.

Hint: Each agent needs different expertise. Agent 1 focuses on style/patterns, Agent 2 on logic/bugs, Agent 3 on refactoring, Agent 4 on human-readable output.

Q42. Implement a "pipeline versioning" system where you can run pipeline v1 and v2 side-by-side on the same input and compare results.

Hint: Create two pipeline configurations with different agents/prompts. Run both with Promise.all(). Compare outputs structurally.

Q43. Add observability to the pipeline: track token usage, latency percentiles (p50, p95, p99), and error rates across many pipeline runs.

Hint: Collect metrics from each run into an array. Calculate percentiles with sorted[Math.floor(sorted.length * 0.95)]. Track error rates as failures / total.

Q44. Design a multi-agent pipeline that processes a YouTube video transcript: Agent 1 segments the transcript, Agent 2 summarizes each segment, Agent 3 generates a blog post, Agent 4 creates social media posts. Define all schemas and data flow.

Hint: This is a fan-out/fan-in + sequential hybrid. Agent 1 produces segments (fan-out), Agent 2 processes them in parallel, Agent 3 collects summaries (fan-in), Agent 4 generates posts.

Q45. Implement a "circuit breaker" pattern: if an agent fails more than N times in a rolling time window, automatically skip it and use the fallback for all subsequent calls until the window resets.

Hint: Track failures with timestamps. If recentFailures > threshold, short-circuit to fallback without calling the LLM. Reset after the window expires.

Navigation: ← 4.18.d — Validation and Error Handling · 4.18 Interview Questions →