Episode 4 — Generative AI Engineering / 4.16 — Agent Design Patterns

4.16 -- Agent Design Patterns: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

  1. Skim before labs or interviews.
  2. Drill gaps -- reopen README.md then 4.16.a...4.16.d.
  3. Practice -- 4.16-Exercise-Questions.md.
  4. Polish answers -- 4.16-Interview-Questions.md.

Core vocabulary

TermOne-liner
Agent design patternReusable architecture for dividing complex AI work across multiple specialized agents
Planner-ExecutorOne agent decomposes a task into a structured plan; another executes each step
Researcher-WriterOne agent gathers facts from external sources; another synthesizes them into polished output
Critic-RefinerA loop where one agent evaluates output quality and another improves it until a threshold is met
RouterOne agent classifies user intent and dispatches to the appropriate specialized handler
Dependency graphMap of which plan steps depend on which others; determines execution order and parallelism
Re-planningWhen a step fails, asking the Planner to create a new plan that works around the failure
Grounding constraintWriter instruction: "Use ONLY the provided facts" -- prevents hallucination
Quality thresholdMinimum Critic score (e.g., 8/10) to exit the Critic-Refiner loop
Diminishing returnsWhen score improves < 1 point per iteration; signals it is time to stop the loop
Intent classificationThe Router's core job: determine what type of request the user is making
Fallback routingWhen Router confidence is low, dispatch to a general handler or ask for clarification
Handler registryMap of intent -> specialized agent config; enables adding capabilities without changing Router logic
Self-reflectionLighter Critic-Refiner variant where a single agent critiques its own output
ComposabilityThe property that patterns can be stacked: Router -> Planner-Executor -> Researcher-Writer -> Critic-Refiner

Pattern 1: Planner-Executor

User Task ──► Planner Agent ──► Structured Plan (JSON) ──► Executor Agent ──► Results
              (temperature 0)   [step, tool, params,       (temperature 0)
               "Break this       depends_on]                "Execute each
                into steps"                                  step in order"

Planner outputs structured JSON:

{ "steps": [
    { "step_number": 1, "action": "load_csv",        "depends_on": []  },
    { "step_number": 2, "action": "clean_data",       "depends_on": [1] },
    { "step_number": 3, "action": "calculate_stats",  "depends_on": [2] },
    { "step_number": 4, "action": "detect_trends",    "depends_on": [3] },
    { "step_number": 5, "action": "generate_chart",   "depends_on": [3] },  <- parallel with 4
    { "step_number": 6, "action": "write_report",     "depends_on": [3, 4, 5] }
] }

Executor rules: check dependencies -> call tool -> capture output -> pass to dependents. Steps 4 and 5 run in parallel (both depend on 3, not on each other).

Failure strategies:

StrategyWhen to useCost
Skip dependentsNon-critical step failsFree (partial results)
Retry with backoffTransient error (network, rate limit)1-3 extra tool calls
Re-planStructural failure (wrong tool, wrong approach)1 extra Planner LLM call

Pattern 2: Researcher-Writer

User Query ──► Researcher Agent ──► Raw Facts (JSON) ──► Writer Agent ──► Polished Output
               (temperature 0)      { facts[], stats[],   (temperature 0.4)
               tools: search,         sources[], gaps[] }  "Use ONLY the facts
               RAG, APIs                                    provided below"

Key design rules:

Researcher:
  - temperature 0 (factual precision)
  - Multiple sources (web + RAG + API)
  - Structured output: facts[], key_statistics[], gaps[]
  - Source attribution for every fact

Writer:
  - temperature 0.3-0.5 (readable prose)
  - No tools (grounding constraint)
  - "Use ONLY the facts provided" (critical instruction)
  - Cite sources inline

Validate research before writing:

validateResearch(research):
  facts.length >= 5?           (sufficient facts)
  key_statistics.length >= 2?  (sufficient data)
  unique sources >= 2?         (source diversity)
  gaps < facts?                (more found than missing)

Pattern 3: Critic-Refiner

Task ──► Generator ──► Draft v1 ──► Critic ──► Score + Issues
         (temp 0.7)                  (temp 0)   |
                                                 |  score < threshold?
                                                 |  YES ──► Refiner ──► Draft v2 ──► Critic again
                                                 |          (temp 0.3)
                                                 |  NO ──► Exit loop (quality met)
                                                 |
                                      Max iterations reached? ──► Exit loop (safety)

Critic output format:

{
  "overall_score": 6,
  "issues": [
    { "severity": "critical", "location": "paragraph 2",
      "issue": "SQL injection vulnerability", "suggestion": "use parameterized queries" }
  ],
  "strengths": ["clear structure", "good examples"],
  "ready_to_publish": false
}

Two mandatory exit conditions:

ConditionPurpose
overall_score >= qualityThresholdHappy path: output is good enough
iteration >= maxIterationsSafety net: prevent infinite loops

Cost per iteration: ~$0.025 (critique + refine). Most tasks converge in 2-3 iterations.


Pattern 4: Router

User Message ──► Router Agent ──┬──► Code Agent      (temp 0.2, tools: run_code, lint)
                 (gpt-4o-mini)  ├──► Data Agent      (temp 0,   tools: load_csv, stats)
                 (temp 0)       ├──► Creative Agent   (temp 0.9, tools: none)
                                ├──► Math Agent       (temp 0,   tools: calculator)
                                └──► Fallback Agent   (temp 0.5, general purpose)

Three classification approaches:

ApproachSpeedAccuracyCost
Keyword matchingFast (~1ms)MediumFree
LLM classificationSlower (~500ms)High~$0.00015/call
Hybrid (keyword first, LLM fallback)Fast for clear casesHighMinimal

Three-tier fallback:

Confidence >= 0.8  ->  Route to specialist (Tier 1)
Confidence 0.5-0.8 ->  Route to specialist + add disclaimer (Tier 2)
Confidence < 0.5   ->  General fallback or ask for clarification (Tier 3)

Router vs function calling:

Router:           Selects WHICH AGENT handles the request
Function calling: Selects WHICH TOOLS the agent uses
They operate at different levels and are complementary.

When to use each pattern (decision table)

ConditionPattern
System handles multiple distinct request typesRouter (front door)
Task needs external data then synthesisResearcher-Writer
Task has 5+ sequential steps with dependenciesPlanner-Executor
Output needs iterative quality improvementCritic-Refiner (end of pipeline)
Simple question, all info in promptNo pattern -- single LLM call

Decision flowchart

Multiple request types? ──YES──► ROUTER at the front
         |
         NO
         |
Needs external data? ──YES──► RESEARCHER-WRITER
         |
         NO
         |
Multi-step with deps? ──YES──► PLANNER-EXECUTOR
         |
         NO
         |
Quality threshold >= 8? ──YES──► CRITIC-REFINER
         |
         NO
         |
         ▼
    Single LLM call

Pattern comparison

AspectPlanner-ExecutorResearcher-WriterCritic-RefinerRouter
Core ideaDecompose + executeGather + synthesizeGenerate + iterateClassify + dispatch
Agent count2 (Planner + Executor)2 (Researcher + Writer)3 (Generator + Critic + Refiner)1 + N handlers
Flow typeLinear with dependenciesTwo-phase pipelineLoopFan-out
LLM calls1 (plan) + N (steps)1-5 (research) + 1 (write)1 + 2 per iteration1 (route) + 1 (handle)
Key outputPer-step resultsGrounded contentPolished contentRouted response
Temperature0 / 00 / 0.40.7 / 0 / 0.30 / varies by handler
Best forData pipelines, code genReports, summariesCode review, writingChatbots, API gateways

Combining patterns

Full production pipeline:

  ROUTER (classify) ──► PLANNER-EXECUTOR (orchestrate)
                              |
                              ├── Step 1: RESEARCHER (gather facts)
                              ├── Step 2: WRITER (synthesize)
                              └── Step 3: CRITIC-REFINER (polish)

Composition rules:

PatternPosition in pipeline
RouterAlways at the front (classifies and dispatches)
Planner-ExecutorOrchestrator (breaks task into steps that invoke other patterns)
Researcher-WriterPipeline stage (research feeds into writing)
Critic-RefinerAlways at the end (polishes whatever upstream produced)

Temperature cheat sheet

RoleTemperatureWhy
Planner0Plans must be deterministic
Executor0Execution must be precise
Researcher0Facts must be accurate
Writer0.3-0.5Needs creativity for prose
Generator0.7Creative exploration for first draft
Critic0Evaluation must be consistent
Refiner0.3Controlled improvement
Router0Classification must be deterministic

Rule of thumb: Agents that judge, plan, or route use temperature 0. Agents that create use 0.3-0.7.


Common gotchas

GotchaWhy it hurtsFix
Using one temperature for all agentsResearch is creative (inaccurate) or writing is flat (robotic)Match temperature to role (see cheat sheet above)
Critic at temperature > 0Same content gets 5/10 one run, 9/10 the nextCritic temperature must be 0
Vague Critic rubricScores are subjective and inconsistentAdd numbered criteria with explicit score anchors
No max iterations on Critic-RefinerInfinite loop burning tokensAlways set maxIterations: 3 as safety net
Writer hallucinating beyond researchReport includes facts not in research dataAdd "Use ONLY the facts provided" + validate programmatically
Router using expensive modelGPT-4o for classification at 100K/day = $300/dayUse GPT-4o-mini for routing ($15/day)
Single source in Researcher outputLow diversity, biased or unreliable findingsValidate unique sources >= 2 before passing to Writer
Free-text Planner outputExecutor cannot parse steps reliablyForce JSON output with response_format: { type: "json_object" }
No fallback in RouterLow-confidence messages get wrong handlerAdd three-tier fallback (specialist, disclaimer, general)
Skipping research validationWriter produces report from 1 factRun validateResearch() before invoking the Writer
Not tracking Critic score progressionCannot detect diminishing returns or quality driftLog scores per iteration; stop if delta < 1 point
Separate Critic-Refiner for low-stakes contentOver-engineering: 3 agents for an internal noteUse self-reflection (single agent) for low-stakes, separate agents for high-stakes

Quick mental model

Agent Design Patterns = 4 composable building blocks

  Planner-Executor      Decompose + Execute     (complex multi-step tasks)
  Researcher-Writer     Gather + Synthesize      (grounded content creation)
  Critic-Refiner        Generate + Iterate       (quality improvement loop)
  Router                Classify + Dispatch      (multi-capability systems)

Composition:
  Router (front) -> Planner (orchestrate) -> Researcher-Writer (pipeline) -> Critic-Refiner (polish)

Golden rules:
  1. Start with the simplest pattern that works
  2. Temperature 0 for judging/planning/routing, 0.3-0.7 for creating
  3. Structured JSON at every handoff (auditability)
  4. Two exit conditions on every loop (threshold + max iterations)
  5. Validate upstream output before passing downstream

End of 4.16 quick revision.