Episode 4 — Generative AI Engineering / 4.19 — Multi Agent Architecture Concerns

4.19 — Exercise Questions: Multi-Agent Architecture Concerns

Practice questions for all five subtopics in Section 4.19. Mix of conceptual, calculation, design, and hands-on tasks.

How to use this material (instructions)

Read lessons in order -- README.md, then 4.19.a through 4.19.e.
Answer closed-book first -- then compare to the matching lesson.
Try the code examples -- modify the pipeline examples from each subtopic.
Interview prep -- 4.19-Interview-Questions.md.
Quick review -- 4.19-Quick-Revision.md.

4.19.a — Increased Latency (Q1-Q10)

Q1. A 3-agent sequential pipeline has agents that take 500ms, 1200ms, and 800ms respectively. What is the total pipeline latency? What would it be if all three could run in parallel?

Q2. Explain why LLM output token count directly affects latency. If Agent B produces 500 tokens and takes 1200ms, roughly how long would it take if you constrained its output to 100 tokens?

Q3. Your user-facing chatbot has a 2-second latency budget. You have a 3-agent sequential pipeline. What is the maximum average latency each agent can take? Is this realistic?

Q4. Write JavaScript code using Promise.all() to run three independent agents in parallel and collect their results. Show how total latency changes from sequential to parallel.

Q5. Explain the difference between actual latency and perceived latency. How does streaming the final agent's output help with perceived latency even if actual latency is unchanged?

Q6. You measured your pipeline and found Agent 2 takes 3x longer than Agents 1 and 3 combined. List three strategies to reduce Agent 2's latency specifically.

Q7. Calculation: Your pipeline has this structure: Agent A (400ms) feeds into parallel agents B1 (800ms) and B2 (1100ms), which both feed into Agent C (500ms). Calculate the total latency.

Q8. Why is adding a timeout with fallback important for production multi-agent systems? What should the fallback do?

Q9. A mobile user on 3G has 500ms of additional network latency per API call. How does this affect a 3-agent pipeline where each agent makes its own API call? What about a 3-agent pipeline where all agents share a single server?

Q10. Your boss asks "why can't we just make it faster?" for a 5-agent pipeline taking 8 seconds. Explain the fundamental constraints and which strategies have diminishing returns.

4.19.b — Higher Operational Cost (Q11-Q20)

Q11. Calculation: A single GPT-4o call uses 2000 input tokens and 500 output tokens. At $2.50/1M input and $10.00/1M output, what is the cost per request? What is the monthly cost at 100,000 requests/day?

Q12. The same task is split into 3 agents, each using GPT-4o. Agent A: 800 input, 200 output. Agent B: 1500 input, 600 output. Agent C: 1200 input, 500 output. Calculate the per-request cost and compare to Q11.

Q13. You replace Agent A and Agent C in Q12 with GPT-4o-mini ($0.15/1M input, $0.60/1M output). Recalculate. How much do you save monthly at 100,000 requests/day?

Q14. Explain why context accumulates across a sequential pipeline. If Agent A's output (300 tokens) is included in Agent B's input, and both A's and B's outputs (300 + 600 = 900 tokens) are included in Agent C's input, what are the true input tokens for C?

Q15. Design a cost tracking system that records per-agent, per-request costs and can generate a daily report. What fields would you log? Write pseudocode.

Q16. Your multi-agent pipeline costs $0.05 per request. At 200,000 requests/day, what is the monthly cost? The PM says "cut it in half." Propose three specific strategies.

Q17. Explain early termination as a cost optimization. Give an example where a cheap classifier agent can skip expensive downstream agents for 60% of requests. Calculate the savings.

Q18. When is the cost of a multi-agent pipeline justified despite being 3x more expensive than a single call? Give three concrete scenarios with reasoning.

Q19. You discover that 40% of your pipeline's requests have identical inputs (same customer asks the same question). How much money can caching save? What is the trade-off?

Q20. Design exercise: Create a cost budget for a multi-agent system with 5 agents. Allocate a model (GPT-4o or GPT-4o-mini) to each agent, estimate tokens, and calculate total per-request and monthly cost at 50,000 requests/day.

4.19.c — Debugging Across Agents (Q21-Q28)

Q21. Explain the error propagation problem in multi-agent systems. Why are LLM errors especially dangerous compared to traditional software errors (which throw exceptions)?

Q22. Agent A misclassifies a billing question as a technical issue. Agents B and C work perfectly on the wrong input. The user gets a wrong answer. Describe the debugging process to find that Agent A was the root cause.

Q23. What is a trace ID and why is it essential for multi-agent debugging? What happens if you don't have trace IDs?

Q24. List the minimum fields you should log for every agent call in a pipeline. Explain why each field is important.

Q25. Write code that implements a blame isolation protocol: given a pipeline trace, walk backwards through agents and check whether each agent's output is correct given its input.

Q26. Your pipeline works 95% of the time but fails 5% with the same input. This is a non-deterministic failure. How do you debug it? What logging would have helped you catch it?

Q27. Compare three tracing approaches: console.log, custom structured tracer, and LangSmith. For each, describe when it's appropriate and what its limitations are.

Q28. Design an agent unit test for a classification agent. It should test 10 known inputs and verify the output matches expected results. How do you handle the fact that LLM outputs are non-deterministic?

4.19.d — Managing Shared State (Q29-Q36)

Q29. Define shared state in a multi-agent pipeline. Give three examples of data that constitutes shared state.

Q30. Compare the pipeline state object pattern with the explicit context passing pattern. When would you choose each one?

Q31. Explain why mutable state is dangerous in multi-agent systems. Give a concrete example where Agent B accidentally overwrites Agent A's data.

Q32. Write code that implements immutable state management using Object.freeze() and the spread operator. Show how each agent returns its contribution and the runner merges it.

Q33. Three agents run in parallel and all write to the same state object. Explain the potential race condition. Write code that safely handles parallel agent results without race conditions.

Q34. Design a state schema for a 4-agent document analysis pipeline. Define what fields are added at each stage and what types/constraints they have.

Q35. Your pipeline processes a large document and fails at Agent 3 of 5. Without persistent state, you must restart from scratch. Design a resumable pipeline that persists state after each agent.

Q36. List three anti-patterns in multi-agent state management and explain how to fix each one.

4.19.e — When Not to Use Multi-Agent (Q37-Q46)

Q37. Name five tasks that are commonly over-engineered with multi-agent pipelines but work fine with a single LLM call. For each, explain why multi-agent is unnecessary.

Q38. Apply the "do you actually need an LLM?" test to these tasks: (a) date formatting, (b) sentiment analysis, (c) JSON validation, (d) code review, (e) email routing based on sender domain.

Q39. Write a single LLM prompt that performs sentiment analysis, summarization, and keyword extraction simultaneously. Compare its output quality and cost to a 3-agent approach.

Q40. Explain the Agent Justification Test (Necessity, Value, Cost-Benefit). Apply it to a proposed "Tone Checker Agent" in a customer support pipeline.

Q41. What is the YAGNI principle and how does it apply to AI architecture? Give an example of a team violating YAGNI with a multi-agent system.

Q42. Describe the subtraction test. You have a 5-agent pipeline. How do you determine which agents can be removed without hurting quality?

Q43. A PM asks for a multi-agent pipeline for email categorization (5 categories). Walk through the complexity ladder and argue for the simplest approach that works.

Q44. Case study analysis: A team built a 6-agent customer support bot that takes 10 seconds per response. Users abandon after 3 seconds. Propose a redesign using the principles from 4.19.e.

Q45. When IS multi-agent architecture genuinely necessary? Give three scenarios where a single call cannot solve the problem and explain why.

Q46. Your team has been running a multi-agent pipeline for 6 months. Write a review checklist with 5 questions to evaluate whether the architecture is still appropriate.

Answer Hints

Q	Hint
Q1	Sequential: 500+1200+800 = 2500ms. Parallel: max(500,1200,800) = 1200ms.
Q7	Total = T_A + max(T_B1, T_B2) + T_C = 400 + max(800,1100) + 500 = 2000ms.
Q11	(2000 * $2.50/1M) + (500 * $10.00/1M) = $0.005 + $0.005 = $0.01/request. Monthly: $0.01 * 100K * 30 = $30,000.
Q12	A: $0.004, B: $0.009750, C: $0.008. Total: $0.02175/request. Monthly: $65,250. More than 2x single call.
Q13	A (mini): $0.000240, B (4o): $0.009750, C (mini): $0.000480. Total: $0.01047. Monthly: $31,410. Savings: ~$33,840/month vs all-GPT-4o.
Q14	Agent C input = its own system prompt + A's output + B's output + user context. Context grows at each stage.
Q16	200K * $0.05 * 30 = $300,000/month. Strategies: cheaper models for simple agents, caching, early termination.
Q17	If 60% of requests skip 2 expensive agents ($0.04 each), savings = 200K * 0.6 * $0.04 * 30 = $144,000/month.
Q19	If 40% are cache hits, you avoid 40% of LLM costs. Trade-off: stale cache may serve outdated answers.
Q33	Use `Promise.all()` to collect independent results, then merge after all complete. Never write to shared object during execution.
Q38	(a) No LLM needed: code. (b) Single call. (c) No LLM: JSON.parse + schema. (d) Single call or multi-agent for large codebases. (e) No LLM: check sender domain with code.
Q43	Level 0: rule-based keyword matching. Level 1: single LLM call with 5 categories. Start with rules, escalate unmatched to LLM.

<- Back to 4.19 -- Multi-Agent Architecture Concerns (README)