Episode 4 — Generative AI Engineering / 4.16 — Agent Design Patterns
4.16 -- Agent Design Patterns: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps -- reopen
README.mdthen4.16.a...4.16.d. - Practice --
4.16-Exercise-Questions.md. - Polish answers --
4.16-Interview-Questions.md.
Core vocabulary
| Term | One-liner |
|---|---|
| Agent design pattern | Reusable architecture for dividing complex AI work across multiple specialized agents |
| Planner-Executor | One agent decomposes a task into a structured plan; another executes each step |
| Researcher-Writer | One agent gathers facts from external sources; another synthesizes them into polished output |
| Critic-Refiner | A loop where one agent evaluates output quality and another improves it until a threshold is met |
| Router | One agent classifies user intent and dispatches to the appropriate specialized handler |
| Dependency graph | Map of which plan steps depend on which others; determines execution order and parallelism |
| Re-planning | When a step fails, asking the Planner to create a new plan that works around the failure |
| Grounding constraint | Writer instruction: "Use ONLY the provided facts" -- prevents hallucination |
| Quality threshold | Minimum Critic score (e.g., 8/10) to exit the Critic-Refiner loop |
| Diminishing returns | When score improves < 1 point per iteration; signals it is time to stop the loop |
| Intent classification | The Router's core job: determine what type of request the user is making |
| Fallback routing | When Router confidence is low, dispatch to a general handler or ask for clarification |
| Handler registry | Map of intent -> specialized agent config; enables adding capabilities without changing Router logic |
| Self-reflection | Lighter Critic-Refiner variant where a single agent critiques its own output |
| Composability | The property that patterns can be stacked: Router -> Planner-Executor -> Researcher-Writer -> Critic-Refiner |
Pattern 1: Planner-Executor
User Task ──► Planner Agent ──► Structured Plan (JSON) ──► Executor Agent ──► Results
(temperature 0) [step, tool, params, (temperature 0)
"Break this depends_on] "Execute each
into steps" step in order"
Planner outputs structured JSON:
{ "steps": [
{ "step_number": 1, "action": "load_csv", "depends_on": [] },
{ "step_number": 2, "action": "clean_data", "depends_on": [1] },
{ "step_number": 3, "action": "calculate_stats", "depends_on": [2] },
{ "step_number": 4, "action": "detect_trends", "depends_on": [3] },
{ "step_number": 5, "action": "generate_chart", "depends_on": [3] }, <- parallel with 4
{ "step_number": 6, "action": "write_report", "depends_on": [3, 4, 5] }
] }
Executor rules: check dependencies -> call tool -> capture output -> pass to dependents. Steps 4 and 5 run in parallel (both depend on 3, not on each other).
Failure strategies:
| Strategy | When to use | Cost |
|---|---|---|
| Skip dependents | Non-critical step fails | Free (partial results) |
| Retry with backoff | Transient error (network, rate limit) | 1-3 extra tool calls |
| Re-plan | Structural failure (wrong tool, wrong approach) | 1 extra Planner LLM call |
Pattern 2: Researcher-Writer
User Query ──► Researcher Agent ──► Raw Facts (JSON) ──► Writer Agent ──► Polished Output
(temperature 0) { facts[], stats[], (temperature 0.4)
tools: search, sources[], gaps[] } "Use ONLY the facts
RAG, APIs provided below"
Key design rules:
Researcher:
- temperature 0 (factual precision)
- Multiple sources (web + RAG + API)
- Structured output: facts[], key_statistics[], gaps[]
- Source attribution for every fact
Writer:
- temperature 0.3-0.5 (readable prose)
- No tools (grounding constraint)
- "Use ONLY the facts provided" (critical instruction)
- Cite sources inline
Validate research before writing:
validateResearch(research):
facts.length >= 5? (sufficient facts)
key_statistics.length >= 2? (sufficient data)
unique sources >= 2? (source diversity)
gaps < facts? (more found than missing)
Pattern 3: Critic-Refiner
Task ──► Generator ──► Draft v1 ──► Critic ──► Score + Issues
(temp 0.7) (temp 0) |
| score < threshold?
| YES ──► Refiner ──► Draft v2 ──► Critic again
| (temp 0.3)
| NO ──► Exit loop (quality met)
|
Max iterations reached? ──► Exit loop (safety)
Critic output format:
{
"overall_score": 6,
"issues": [
{ "severity": "critical", "location": "paragraph 2",
"issue": "SQL injection vulnerability", "suggestion": "use parameterized queries" }
],
"strengths": ["clear structure", "good examples"],
"ready_to_publish": false
}
Two mandatory exit conditions:
| Condition | Purpose |
|---|---|
overall_score >= qualityThreshold | Happy path: output is good enough |
iteration >= maxIterations | Safety net: prevent infinite loops |
Cost per iteration: ~$0.025 (critique + refine). Most tasks converge in 2-3 iterations.
Pattern 4: Router
User Message ──► Router Agent ──┬──► Code Agent (temp 0.2, tools: run_code, lint)
(gpt-4o-mini) ├──► Data Agent (temp 0, tools: load_csv, stats)
(temp 0) ├──► Creative Agent (temp 0.9, tools: none)
├──► Math Agent (temp 0, tools: calculator)
└──► Fallback Agent (temp 0.5, general purpose)
Three classification approaches:
| Approach | Speed | Accuracy | Cost |
|---|---|---|---|
| Keyword matching | Fast (~1ms) | Medium | Free |
| LLM classification | Slower (~500ms) | High | ~$0.00015/call |
| Hybrid (keyword first, LLM fallback) | Fast for clear cases | High | Minimal |
Three-tier fallback:
Confidence >= 0.8 -> Route to specialist (Tier 1)
Confidence 0.5-0.8 -> Route to specialist + add disclaimer (Tier 2)
Confidence < 0.5 -> General fallback or ask for clarification (Tier 3)
Router vs function calling:
Router: Selects WHICH AGENT handles the request
Function calling: Selects WHICH TOOLS the agent uses
They operate at different levels and are complementary.
When to use each pattern (decision table)
| Condition | Pattern |
|---|---|
| System handles multiple distinct request types | Router (front door) |
| Task needs external data then synthesis | Researcher-Writer |
| Task has 5+ sequential steps with dependencies | Planner-Executor |
| Output needs iterative quality improvement | Critic-Refiner (end of pipeline) |
| Simple question, all info in prompt | No pattern -- single LLM call |
Decision flowchart
Multiple request types? ──YES──► ROUTER at the front
|
NO
|
Needs external data? ──YES──► RESEARCHER-WRITER
|
NO
|
Multi-step with deps? ──YES──► PLANNER-EXECUTOR
|
NO
|
Quality threshold >= 8? ──YES──► CRITIC-REFINER
|
NO
|
▼
Single LLM call
Pattern comparison
| Aspect | Planner-Executor | Researcher-Writer | Critic-Refiner | Router |
|---|---|---|---|---|
| Core idea | Decompose + execute | Gather + synthesize | Generate + iterate | Classify + dispatch |
| Agent count | 2 (Planner + Executor) | 2 (Researcher + Writer) | 3 (Generator + Critic + Refiner) | 1 + N handlers |
| Flow type | Linear with dependencies | Two-phase pipeline | Loop | Fan-out |
| LLM calls | 1 (plan) + N (steps) | 1-5 (research) + 1 (write) | 1 + 2 per iteration | 1 (route) + 1 (handle) |
| Key output | Per-step results | Grounded content | Polished content | Routed response |
| Temperature | 0 / 0 | 0 / 0.4 | 0.7 / 0 / 0.3 | 0 / varies by handler |
| Best for | Data pipelines, code gen | Reports, summaries | Code review, writing | Chatbots, API gateways |
Combining patterns
Full production pipeline:
ROUTER (classify) ──► PLANNER-EXECUTOR (orchestrate)
|
├── Step 1: RESEARCHER (gather facts)
├── Step 2: WRITER (synthesize)
└── Step 3: CRITIC-REFINER (polish)
Composition rules:
| Pattern | Position in pipeline |
|---|---|
| Router | Always at the front (classifies and dispatches) |
| Planner-Executor | Orchestrator (breaks task into steps that invoke other patterns) |
| Researcher-Writer | Pipeline stage (research feeds into writing) |
| Critic-Refiner | Always at the end (polishes whatever upstream produced) |
Temperature cheat sheet
| Role | Temperature | Why |
|---|---|---|
| Planner | 0 | Plans must be deterministic |
| Executor | 0 | Execution must be precise |
| Researcher | 0 | Facts must be accurate |
| Writer | 0.3-0.5 | Needs creativity for prose |
| Generator | 0.7 | Creative exploration for first draft |
| Critic | 0 | Evaluation must be consistent |
| Refiner | 0.3 | Controlled improvement |
| Router | 0 | Classification must be deterministic |
Rule of thumb: Agents that judge, plan, or route use temperature 0. Agents that create use 0.3-0.7.
Common gotchas
| Gotcha | Why it hurts | Fix |
|---|---|---|
| Using one temperature for all agents | Research is creative (inaccurate) or writing is flat (robotic) | Match temperature to role (see cheat sheet above) |
| Critic at temperature > 0 | Same content gets 5/10 one run, 9/10 the next | Critic temperature must be 0 |
| Vague Critic rubric | Scores are subjective and inconsistent | Add numbered criteria with explicit score anchors |
| No max iterations on Critic-Refiner | Infinite loop burning tokens | Always set maxIterations: 3 as safety net |
| Writer hallucinating beyond research | Report includes facts not in research data | Add "Use ONLY the facts provided" + validate programmatically |
| Router using expensive model | GPT-4o for classification at 100K/day = $300/day | Use GPT-4o-mini for routing ($15/day) |
| Single source in Researcher output | Low diversity, biased or unreliable findings | Validate unique sources >= 2 before passing to Writer |
| Free-text Planner output | Executor cannot parse steps reliably | Force JSON output with response_format: { type: "json_object" } |
| No fallback in Router | Low-confidence messages get wrong handler | Add three-tier fallback (specialist, disclaimer, general) |
| Skipping research validation | Writer produces report from 1 fact | Run validateResearch() before invoking the Writer |
| Not tracking Critic score progression | Cannot detect diminishing returns or quality drift | Log scores per iteration; stop if delta < 1 point |
| Separate Critic-Refiner for low-stakes content | Over-engineering: 3 agents for an internal note | Use self-reflection (single agent) for low-stakes, separate agents for high-stakes |
Quick mental model
Agent Design Patterns = 4 composable building blocks
Planner-Executor Decompose + Execute (complex multi-step tasks)
Researcher-Writer Gather + Synthesize (grounded content creation)
Critic-Refiner Generate + Iterate (quality improvement loop)
Router Classify + Dispatch (multi-capability systems)
Composition:
Router (front) -> Planner (orchestrate) -> Researcher-Writer (pipeline) -> Critic-Refiner (polish)
Golden rules:
1. Start with the simplest pattern that works
2. Temperature 0 for judging/planning/routing, 0.3-0.7 for creating
3. Structured JSON at every handoff (auditability)
4. Two exit conditions on every loop (threshold + max iterations)
5. Validate upstream output before passing downstream
End of 4.16 quick revision.