Episode 4 — Generative AI Engineering / 4.16 — Agent Design Patterns

4.16 -- Agent Design Patterns: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps -- reopen README.md then 4.16.a...4.16.d.
Practice -- 4.16-Exercise-Questions.md.
Polish answers -- 4.16-Interview-Questions.md.

Core vocabulary

Term	One-liner
Agent design pattern	Reusable architecture for dividing complex AI work across multiple specialized agents
Planner-Executor	One agent decomposes a task into a structured plan; another executes each step
Researcher-Writer	One agent gathers facts from external sources; another synthesizes them into polished output
Critic-Refiner	A loop where one agent evaluates output quality and another improves it until a threshold is met
Router	One agent classifies user intent and dispatches to the appropriate specialized handler
Dependency graph	Map of which plan steps depend on which others; determines execution order and parallelism
Re-planning	When a step fails, asking the Planner to create a new plan that works around the failure
Grounding constraint	Writer instruction: "Use ONLY the provided facts" -- prevents hallucination
Quality threshold	Minimum Critic score (e.g., 8/10) to exit the Critic-Refiner loop
Diminishing returns	When score improves < 1 point per iteration; signals it is time to stop the loop
Intent classification	The Router's core job: determine what type of request the user is making
Fallback routing	When Router confidence is low, dispatch to a general handler or ask for clarification
Handler registry	Map of intent -> specialized agent config; enables adding capabilities without changing Router logic
Self-reflection	Lighter Critic-Refiner variant where a single agent critiques its own output
Composability	The property that patterns can be stacked: Router -> Planner-Executor -> Researcher-Writer -> Critic-Refiner

Pattern 1: Planner-Executor

User Task ──► Planner Agent ──► Structured Plan (JSON) ──► Executor Agent ──► Results
              (temperature 0)   [step, tool, params,       (temperature 0)
               "Break this       depends_on]                "Execute each
                into steps"                                  step in order"

Planner outputs structured JSON:

{ "steps": [
    { "step_number": 1, "action": "load_csv",        "depends_on": []  },
    { "step_number": 2, "action": "clean_data",       "depends_on": [1] },
    { "step_number": 3, "action": "calculate_stats",  "depends_on": [2] },
    { "step_number": 4, "action": "detect_trends",    "depends_on": [3] },
    { "step_number": 5, "action": "generate_chart",   "depends_on": [3] },  <- parallel with 4
    { "step_number": 6, "action": "write_report",     "depends_on": [3, 4, 5] }
] }

Executor rules: check dependencies -> call tool -> capture output -> pass to dependents. Steps 4 and 5 run in parallel (both depend on 3, not on each other).

Failure strategies:

Strategy	When to use	Cost
Skip dependents	Non-critical step fails	Free (partial results)
Retry with backoff	Transient error (network, rate limit)	1-3 extra tool calls
Re-plan	Structural failure (wrong tool, wrong approach)	1 extra Planner LLM call

Pattern 2: Researcher-Writer

User Query ──► Researcher Agent ──► Raw Facts (JSON) ──► Writer Agent ──► Polished Output
               (temperature 0)      { facts[], stats[],   (temperature 0.4)
               tools: search,         sources[], gaps[] }  "Use ONLY the facts
               RAG, APIs                                    provided below"

Key design rules:

Researcher:
  - temperature 0 (factual precision)
  - Multiple sources (web + RAG + API)
  - Structured output: facts[], key_statistics[], gaps[]
  - Source attribution for every fact

Writer:
  - temperature 0.3-0.5 (readable prose)
  - No tools (grounding constraint)
  - "Use ONLY the facts provided" (critical instruction)
  - Cite sources inline

Validate research before writing:

validateResearch(research):
  facts.length >= 5?           (sufficient facts)
  key_statistics.length >= 2?  (sufficient data)
  unique sources >= 2?         (source diversity)
  gaps < facts?                (more found than missing)

Pattern 3: Critic-Refiner

Task ──► Generator ──► Draft v1 ──► Critic ──► Score + Issues
         (temp 0.7)                  (temp 0)   |
                                                 |  score < threshold?
                                                 |  YES ──► Refiner ──► Draft v2 ──► Critic again
                                                 |          (temp 0.3)
                                                 |  NO ──► Exit loop (quality met)
                                                 |
                                      Max iterations reached? ──► Exit loop (safety)

Critic output format:

{
  "overall_score": 6,
  "issues": [
    { "severity": "critical", "location": "paragraph 2",
      "issue": "SQL injection vulnerability", "suggestion": "use parameterized queries" }
  ],
  "strengths": ["clear structure", "good examples"],
  "ready_to_publish": false
}

Two mandatory exit conditions:

Condition	Purpose
`overall_score >= qualityThreshold`	Happy path: output is good enough
`iteration >= maxIterations`	Safety net: prevent infinite loops

Cost per iteration: ~$0.025 (critique + refine). Most tasks converge in 2-3 iterations.

Pattern 4: Router

User Message ──► Router Agent ──┬──► Code Agent      (temp 0.2, tools: run_code, lint)
                 (gpt-4o-mini)  ├──► Data Agent      (temp 0,   tools: load_csv, stats)
                 (temp 0)       ├──► Creative Agent   (temp 0.9, tools: none)
                                ├──► Math Agent       (temp 0,   tools: calculator)
                                └──► Fallback Agent   (temp 0.5, general purpose)

Three classification approaches:

Approach	Speed	Accuracy	Cost
Keyword matching	Fast (~1ms)	Medium	Free
LLM classification	Slower (~500ms)	High	~$0.00015/call
Hybrid (keyword first, LLM fallback)	Fast for clear cases	High	Minimal

Three-tier fallback:

Confidence >= 0.8  ->  Route to specialist (Tier 1)
Confidence 0.5-0.8 ->  Route to specialist + add disclaimer (Tier 2)
Confidence < 0.5   ->  General fallback or ask for clarification (Tier 3)

Router vs function calling:

Router:           Selects WHICH AGENT handles the request
Function calling: Selects WHICH TOOLS the agent uses
They operate at different levels and are complementary.

When to use each pattern (decision table)

Condition	Pattern
System handles multiple distinct request types	Router (front door)
Task needs external data then synthesis	Researcher-Writer
Task has 5+ sequential steps with dependencies	Planner-Executor
Output needs iterative quality improvement	Critic-Refiner (end of pipeline)
Simple question, all info in prompt	No pattern -- single LLM call

Decision flowchart

Multiple request types? ──YES──► ROUTER at the front
         |
         NO
         |
Needs external data? ──YES──► RESEARCHER-WRITER
         |
         NO
         |
Multi-step with deps? ──YES──► PLANNER-EXECUTOR
         |
         NO
         |
Quality threshold >= 8? ──YES──► CRITIC-REFINER
         |
         NO
         |
         ▼
    Single LLM call

Pattern comparison

Aspect	Planner-Executor	Researcher-Writer	Critic-Refiner	Router
Core idea	Decompose + execute	Gather + synthesize	Generate + iterate	Classify + dispatch
Agent count	2 (Planner + Executor)	2 (Researcher + Writer)	3 (Generator + Critic + Refiner)	1 + N handlers
Flow type	Linear with dependencies	Two-phase pipeline	Loop	Fan-out
LLM calls	1 (plan) + N (steps)	1-5 (research) + 1 (write)	1 + 2 per iteration	1 (route) + 1 (handle)
Key output	Per-step results	Grounded content	Polished content	Routed response
Temperature	0 / 0	0 / 0.4	0.7 / 0 / 0.3	0 / varies by handler
Best for	Data pipelines, code gen	Reports, summaries	Code review, writing	Chatbots, API gateways

Combining patterns

Full production pipeline:

  ROUTER (classify) ──► PLANNER-EXECUTOR (orchestrate)
                              |
                              ├── Step 1: RESEARCHER (gather facts)
                              ├── Step 2: WRITER (synthesize)
                              └── Step 3: CRITIC-REFINER (polish)

Composition rules:

Pattern	Position in pipeline
Router	Always at the front (classifies and dispatches)
Planner-Executor	Orchestrator (breaks task into steps that invoke other patterns)
Researcher-Writer	Pipeline stage (research feeds into writing)
Critic-Refiner	Always at the end (polishes whatever upstream produced)

Temperature cheat sheet

Role	Temperature	Why
Planner	0	Plans must be deterministic
Executor	0	Execution must be precise
Researcher	0	Facts must be accurate
Writer	0.3-0.5	Needs creativity for prose
Generator	0.7	Creative exploration for first draft
Critic	0	Evaluation must be consistent
Refiner	0.3	Controlled improvement
Router	0	Classification must be deterministic

Rule of thumb: Agents that judge, plan, or route use temperature 0. Agents that create use 0.3-0.7.

Common gotchas

Gotcha	Why it hurts	Fix
Using one temperature for all agents	Research is creative (inaccurate) or writing is flat (robotic)	Match temperature to role (see cheat sheet above)
Critic at temperature > 0	Same content gets 5/10 one run, 9/10 the next	Critic temperature must be 0
Vague Critic rubric	Scores are subjective and inconsistent	Add numbered criteria with explicit score anchors
No max iterations on Critic-Refiner	Infinite loop burning tokens	Always set `maxIterations: 3` as safety net
Writer hallucinating beyond research	Report includes facts not in research data	Add "Use ONLY the facts provided" + validate programmatically
Router using expensive model	GPT-4o for classification at 100K/day = $300/day	Use GPT-4o-mini for routing ($15/day)
Single source in Researcher output	Low diversity, biased or unreliable findings	Validate `unique sources >= 2` before passing to Writer
Free-text Planner output	Executor cannot parse steps reliably	Force JSON output with `response_format: { type: "json_object" }`
No fallback in Router	Low-confidence messages get wrong handler	Add three-tier fallback (specialist, disclaimer, general)
Skipping research validation	Writer produces report from 1 fact	Run `validateResearch()` before invoking the Writer
Not tracking Critic score progression	Cannot detect diminishing returns or quality drift	Log scores per iteration; stop if delta < 1 point
Separate Critic-Refiner for low-stakes content	Over-engineering: 3 agents for an internal note	Use self-reflection (single agent) for low-stakes, separate agents for high-stakes

Quick mental model

Agent Design Patterns = 4 composable building blocks

  Planner-Executor      Decompose + Execute     (complex multi-step tasks)
  Researcher-Writer     Gather + Synthesize      (grounded content creation)
  Critic-Refiner        Generate + Iterate       (quality improvement loop)
  Router                Classify + Dispatch      (multi-capability systems)

Composition:
  Router (front) -> Planner (orchestrate) -> Researcher-Writer (pipeline) -> Critic-Refiner (polish)

Golden rules:
  1. Start with the simplest pattern that works
  2. Temperature 0 for judging/planning/routing, 0.3-0.7 for creating
  3. Structured JSON at every handoff (auditability)
  4. Two exit conditions on every loop (threshold + max iterations)
  5. Validate upstream output before passing downstream

End of 4.16 quick revision.