Episode 4 — Generative AI Engineering / 4.16 — Agent Design Patterns
4.16.c — Critic-Refiner Pattern
In one sentence: The Critic-Refiner pattern creates an iterative improvement loop — a Generator produces initial output, a Critic evaluates its quality and identifies specific issues, and a Refiner improves the output based on the criticism — repeating until a quality threshold is met or a maximum iteration count is reached.
Navigation: ← 4.16.b Researcher-Writer · 4.16.d — Router Agents →
1. What Is the Critic-Refiner Pattern?
The Critic-Refiner pattern is a self-improving loop where AI output is evaluated and refined iteratively:
- Generator — produces initial output (a draft, code, analysis, or any content).
- Critic — evaluates the output against quality criteria and produces specific, actionable feedback (not just "this is bad" but "paragraph 2 lacks a supporting example, the conclusion contradicts the introduction").
- Refiner — takes the original output plus the Critic's feedback and produces an improved version.
- Loop — the improved version goes back to the Critic. This continues until the Critic says the output is good enough or a maximum number of iterations is reached.
This pattern models the real-world process of drafting, reviewing, and revising — the same loop that writers, programmers, and designers use daily.
┌────────────────────────────────────────────────────────────────┐
│ CRITIC-REFINER LOOP │
│ │
│ User Task: "Write a technical blog post about caching" │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ GENERATOR │ Produces initial draft │
│ │ │ (temperature 0.7 for creative first draft) │
│ └──────┬───────┘ │
│ │ │
│ ▼ draft v1 │
│ ┌──────────────┐ │
│ │ CRITIC │ Evaluates quality │
│ │ │ Returns: │
│ │ │ - score: 6/10 │
│ │ │ - issues: ["no code examples", │
│ │ │ "intro too long", "missing │
│ │ │ performance comparison"] │
│ └──────┬───────┘ │
│ │ │
│ score < threshold? │
│ YES ──► continue loop │
│ │ │
│ ▼ feedback │
│ ┌──────────────┐ │
│ │ REFINER │ Improves based on criticism │
│ │ │ Produces draft v2 │
│ └──────┬───────┘ │
│ │ │
│ ▼ draft v2 │
│ ┌──────────────┐ │
│ │ CRITIC │ Re-evaluates │
│ │ │ score: 8/10 │
│ │ │ issues: ["comparison table could │
│ │ │ be more detailed"] │
│ └──────┬───────┘ │
│ │ │
│ score >= threshold (8)? │
│ YES ──► exit loop │
│ │ │
│ ▼ │
│ Final Output: draft v2 (quality: 8/10) │
└────────────────────────────────────────────────────────────────┘
2. When to Use This Pattern
| Use This Pattern When | Do NOT Use When |
|---|---|
| Output quality matters more than speed | Real-time responses needed (latency-critical) |
| First-draft quality is typically insufficient | Simple factual queries (no iteration needed) |
| Clear quality criteria exist (rubric, checklist) | Quality is purely subjective (no measurable criteria) |
| Cost of bad output is high (legal, medical, public-facing) | Budget is tight (each iteration = more API calls) |
| You want consistent quality across many outputs | One-shot tasks where "good enough" is fine |
Real-world use cases
- Code review and improvement — generate code, critique for bugs/style/performance, refine
- Writing quality — generate article, critique for clarity/accuracy/engagement, refine
- Data quality — generate data transformation, critique for correctness/completeness, refine
- Prompt optimization — generate prompt, critique for ambiguity/token efficiency, refine
- Translation quality — translate text, critique for accuracy/fluency/cultural fit, refine
- Email drafting — generate email, critique for tone/professionalism/clarity, refine
3. The Generator: Producing Initial Output
The Generator is the simplest agent — it produces the first draft based on the user's request.
const GENERATOR_SYSTEM_PROMPT = `You are a skilled content generator. Produce high-quality first drafts based on the user's request.
Focus on:
- Completeness — cover all aspects of the topic
- Structure — use clear headings and logical flow
- Accuracy — stick to facts you are confident about
- Clarity — write for the target audience
Do not self-critique or hedge. Produce the best draft you can in one pass.`;
4. The Critic: Evaluating Quality
The Critic is the most important agent in this pattern. A good Critic provides specific, actionable feedback — not vague opinions.
Critic system prompt
const CRITIC_SYSTEM_PROMPT = `You are a strict Quality Critic. You evaluate content against specific criteria and provide actionable feedback.
EVALUATION CRITERIA:
1. Accuracy — Are all facts correct? Any hallucinations or unsupported claims?
2. Completeness — Does the content cover the topic thoroughly? Any missing aspects?
3. Structure — Is the content logically organized? Good headings, flow, transitions?
4. Clarity — Is the language clear and appropriate for the audience?
5. Examples — Are there sufficient concrete examples and code samples?
6. Actionability — Can the reader apply what they learned?
SCORING:
- Rate each criterion from 1-10.
- Provide an overall score (1-10).
- A score of 8+ means the content is ready to publish.
- A score below 8 means it needs improvement.
OUTPUT FORMAT (strict JSON):
{
"overall_score": <1-10>,
"criteria_scores": {
"accuracy": <1-10>,
"completeness": <1-10>,
"structure": <1-10>,
"clarity": <1-10>,
"examples": <1-10>,
"actionability": <1-10>
},
"issues": [
{
"severity": "critical | major | minor",
"location": "Where in the content (section/paragraph)",
"issue": "What is wrong",
"suggestion": "Specific improvement to make"
}
],
"strengths": ["What the content does well"],
"ready_to_publish": <true|false>
}
RULES:
- Be specific. "The writing is unclear" is bad feedback. "Paragraph 3 uses jargon ('memoization') without defining it for the beginner audience" is good feedback.
- Be constructive. Every issue must have a concrete suggestion for improvement.
- Be honest. Do not inflate scores. If the content is mediocre, say so.
- Focus on the most impactful issues first (critical > major > minor).`;
Why the Critic must be strict
A lenient Critic defeats the purpose of the loop. If the Critic gives 9/10 to mediocre content, the loop exits immediately and no improvement happens. Tips for maintaining Critic strictness:
- Define clear criteria — numbered rubric with specific expectations
- Require structured output — JSON forces the Critic to be specific
- Set a realistic threshold — 8/10 is a good default (demanding but achievable)
- Include severity levels — helps the Refiner prioritize
5. The Refiner: Improving Based on Feedback
The Refiner receives the original content AND the Critic's feedback, and produces an improved version.
const REFINER_SYSTEM_PROMPT = `You are a Content Refiner. You receive content along with specific criticism and produce an improved version.
RULES:
1. Address EVERY issue raised by the Critic, starting with critical issues.
2. Preserve strengths identified by the Critic — do not break what already works.
3. Make minimal changes — do not rewrite sections that have no issues.
4. If the Critic suggests adding examples, add real, practical examples.
5. If the Critic identifies factual errors, correct them (or remove the claim if unsure).
6. Return the COMPLETE improved content, not just the changes.
Produce the improved content directly — no meta-commentary about what you changed.`;
6. Full Implementation: Iterative Content Improvement
import OpenAI from 'openai';
const openai = new OpenAI();
// ─────────────────────────────────────────────────────────
// AGENT DEFINITIONS
// ─────────────────────────────────────────────────────────
async function generate(task) {
console.log('\n=== GENERATOR ===');
console.log(`Task: "${task}"\n`);
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0.7,
messages: [
{
role: 'system',
content: `You are a skilled technical writer. Write a comprehensive, well-structured piece on the given topic. Include code examples in JavaScript where relevant. Target audience: intermediate developers.`,
},
{ role: 'user', content: task },
],
});
const content = response.choices[0].message.content;
console.log(` Generated: ${content.length} characters`);
return content;
}
async function critique(content, criteria) {
console.log('\n=== CRITIC ===');
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0,
response_format: { type: 'json_object' },
messages: [
{
role: 'system',
content: `You are a strict Quality Critic. Evaluate the content against these criteria and return structured feedback.
CRITERIA: ${criteria.join(', ')}
Return JSON:
{
"overall_score": <1-10>,
"criteria_scores": { <criterion>: <1-10>, ... },
"issues": [
{
"severity": "critical|major|minor",
"location": "...",
"issue": "...",
"suggestion": "..."
}
],
"strengths": ["..."],
"ready_to_publish": <boolean>
}
Be strict. 8+ means genuinely excellent. Most first drafts score 5-7.`,
},
{
role: 'user',
content: `Evaluate this content:\n\n${content}`,
},
],
});
const feedback = JSON.parse(response.choices[0].message.content);
console.log(` Score: ${feedback.overall_score}/10`);
console.log(` Issues: ${feedback.issues.length}`);
console.log(` Ready: ${feedback.ready_to_publish}`);
if (feedback.issues.length > 0) {
console.log(' Top issues:');
feedback.issues.slice(0, 3).forEach((issue) => {
console.log(` [${issue.severity}] ${issue.issue}`);
});
}
return feedback;
}
async function refine(content, feedback) {
console.log('\n=== REFINER ===');
const issueList = feedback.issues
.map((i) => `- [${i.severity}] ${i.location}: ${i.issue} → Suggestion: ${i.suggestion}`)
.join('\n');
const strengthList = feedback.strengths.map((s) => `- ${s}`).join('\n');
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0.3,
messages: [
{
role: 'system',
content: `You are a Content Refiner. You receive content and specific feedback. Produce an improved version that addresses all issues while preserving strengths. Return ONLY the improved content.`,
},
{
role: 'user',
content: `ORIGINAL CONTENT:
${content}
CRITIC SCORE: ${feedback.overall_score}/10
ISSUES TO FIX:
${issueList}
STRENGTHS TO PRESERVE:
${strengthList}
Produce the improved version:`,
},
],
});
const improved = response.choices[0].message.content;
console.log(` Refined: ${improved.length} characters`);
return improved;
}
// ─────────────────────────────────────────────────────────
// CRITIC-REFINER LOOP
// ─────────────────────────────────────────────────────────
async function criticRefinerLoop(task, config = {}) {
const {
qualityThreshold = 8,
maxIterations = 3,
criteria = ['accuracy', 'completeness', 'structure', 'clarity', 'examples', 'actionability'],
} = config;
console.log('========================================');
console.log(' CRITIC-REFINER LOOP');
console.log('========================================');
console.log(`Task: ${task}`);
console.log(`Threshold: ${qualityThreshold}/10`);
console.log(`Max iterations: ${maxIterations}`);
console.log(`Criteria: ${criteria.join(', ')}`);
// Track iteration history for analysis
const history = [];
// Step 1: Generate initial draft
let content = await generate(task);
history.push({ iteration: 0, type: 'generated', length: content.length });
// Step 2: Iterative critique-refine loop
for (let iteration = 1; iteration <= maxIterations; iteration++) {
console.log(`\n--- Iteration ${iteration}/${maxIterations} ---`);
// Critique
const feedback = await critique(content, criteria);
history.push({
iteration,
type: 'critique',
score: feedback.overall_score,
issueCount: feedback.issues.length,
ready: feedback.ready_to_publish,
});
// Check if quality threshold is met
if (feedback.overall_score >= qualityThreshold || feedback.ready_to_publish) {
console.log(`\nQuality threshold met! Score: ${feedback.overall_score}/10`);
return {
content,
finalScore: feedback.overall_score,
iterations: iteration,
history,
finalFeedback: feedback,
};
}
// Refine
content = await refine(content, feedback);
history.push({ iteration, type: 'refined', length: content.length });
}
// Max iterations reached — return best effort
const finalFeedback = await critique(content, criteria);
console.log(`\nMax iterations reached. Final score: ${finalFeedback.overall_score}/10`);
return {
content,
finalScore: finalFeedback.overall_score,
iterations: maxIterations,
history,
finalFeedback,
maxIterationsReached: true,
};
}
// ─────────────────────────────────────────────────────────
// RUN
// ─────────────────────────────────────────────────────────
const result = await criticRefinerLoop(
'Write a technical blog post explaining caching strategies for web applications (browser cache, CDN, server-side, database query cache) with JavaScript examples',
{
qualityThreshold: 8,
maxIterations: 3,
criteria: ['accuracy', 'completeness', 'structure', 'clarity', 'examples', 'actionability'],
}
);
console.log('\n=== FINAL RESULT ===');
console.log(`Score: ${result.finalScore}/10`);
console.log(`Iterations: ${result.iterations}`);
console.log(`Content length: ${result.content.length} characters`);
console.log('\nIteration history:');
result.history.forEach((h) => {
if (h.type === 'critique') {
console.log(` Iteration ${h.iteration}: Score ${h.score}/10, ${h.issueCount} issues`);
}
});
Expected output
========================================
CRITIC-REFINER LOOP
========================================
Task: Write a technical blog post explaining caching strategies...
Threshold: 8/10
Max iterations: 3
Criteria: accuracy, completeness, structure, clarity, examples, actionability
=== GENERATOR ===
Task: "Write a technical blog post..."
Generated: 4200 characters
--- Iteration 1/3 ---
=== CRITIC ===
Score: 6/10
Issues: 5
Ready: false
Top issues:
[critical] Missing code examples for CDN and database caching
[major] No comparison table between caching strategies
[major] Cache invalidation not addressed
=== REFINER ===
Refined: 5800 characters
--- Iteration 2/3 ---
=== CRITIC ===
Score: 8/10
Issues: 1
Ready: true
Quality threshold met! Score: 8/10
=== FINAL RESULT ===
Score: 8/10
Iterations: 2
Content length: 5800 characters
7. Self-Reflection Pattern
A special case of the Critic-Refiner pattern is self-reflection, where a single agent critiques its own output. Instead of separate Critic and Refiner agents, one agent generates, reflects, and improves.
async function selfReflect(task, maxIterations = 3) {
console.log('\n=== SELF-REFLECTION AGENT ===');
const messages = [
{
role: 'system',
content: `You are an AI agent that generates content and then critically reflects on it to improve.
PROCESS:
1. First, produce your best attempt at the task.
2. Then, critically analyze your output: What is good? What is weak? What is missing?
3. Produce an improved version addressing your self-critique.
4. Repeat until you are satisfied.
FORMAT each response as:
## Draft
[your content]
## Self-Critique
- Strengths: [list]
- Weaknesses: [list]
- Missing: [list]
- Score: X/10
## Improved Version
[improved content — only if score < 8]`,
},
{ role: 'user', content: task },
];
let finalContent = '';
for (let i = 1; i <= maxIterations; i++) {
console.log(`\n--- Reflection Round ${i} ---`);
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0.4,
messages,
});
const output = response.choices[0].message.content;
messages.push({ role: 'assistant', content: output });
// Check if the agent is satisfied (score >= 8 and no "Improved Version")
const scoreMatch = output.match(/Score:\s*(\d+)\/10/);
const score = scoreMatch ? parseInt(scoreMatch[1]) : 0;
console.log(` Self-assessed score: ${score}/10`);
if (score >= 8) {
// Extract the final draft content
const draftMatch = output.match(/## Draft\n([\s\S]*?)(?=\n## Self-Critique)/);
const improvedMatch = output.match(/## Improved Version\n([\s\S]*?)$/);
finalContent = (improvedMatch ? improvedMatch[1] : draftMatch?.[1]) || output;
console.log(' Self-reflection complete — satisfied with quality.');
break;
}
// Ask for another round
messages.push({
role: 'user',
content: 'Continue improving based on your self-critique. Follow the same format.',
});
}
return finalContent;
}
Self-reflection vs separate agents
| Aspect | Self-Reflection | Separate Critic + Refiner |
|---|---|---|
| LLM calls | 1-3 calls (one agent) | 2-6 calls (multiple agents) |
| Cost | Lower (fewer calls) | Higher (more calls) |
| Critique quality | Model may be lenient on its own work | Separate Critic can be configured to be strict |
| Best for | Quick iteration, low-stakes content | High-stakes content requiring rigorous review |
| Risk | "Blind spot" — model can't see its own flaws | Critic may find issues Generator agent would miss |
8. Critic-Refiner for Code Review
One of the most powerful applications is automated code review and improvement.
async function codeReviewLoop(code, requirements) {
const CODE_CRITIC_PROMPT = `You are a senior code reviewer. Evaluate this code against best practices.
CHECK FOR:
1. Correctness — Does it handle edge cases? Any bugs?
2. Security — SQL injection, XSS, input validation?
3. Performance — O(n^2) when O(n) is possible? Memory leaks?
4. Readability — Clear names, comments where needed, consistent style?
5. Error handling — Are errors caught and handled gracefully?
6. Testing — Is the code testable? Are there obvious test cases?
Return JSON:
{
"overall_score": <1-10>,
"issues": [
{ "severity": "critical|major|minor", "line": "...", "issue": "...", "fix": "..." }
],
"strengths": ["..."],
"ready_for_production": <boolean>
}`;
const CODE_REFINER_PROMPT = `You are a senior developer. Refactor the code to address all review feedback.
Return ONLY the improved code — no explanations.`;
let currentCode = code;
for (let i = 1; i <= 3; i++) {
console.log(`\n--- Code Review Round ${i} ---`);
// Critique
const critiqueResponse = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0,
response_format: { type: 'json_object' },
messages: [
{ role: 'system', content: CODE_CRITIC_PROMPT },
{ role: 'user', content: `Requirements: ${requirements}\n\nCode:\n\`\`\`javascript\n${currentCode}\n\`\`\`` },
],
});
const review = JSON.parse(critiqueResponse.choices[0].message.content);
console.log(` Score: ${review.overall_score}/10`);
console.log(` Issues: ${review.issues.length}`);
if (review.ready_for_production || review.overall_score >= 9) {
console.log(' Code is production-ready!');
return { code: currentCode, review, iterations: i };
}
// Refine
const issueList = review.issues
.map((issue) => `[${issue.severity}] ${issue.line}: ${issue.issue} -> Fix: ${issue.fix}`)
.join('\n');
const refineResponse = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0,
messages: [
{ role: 'system', content: CODE_REFINER_PROMPT },
{
role: 'user',
content: `Original code:\n\`\`\`javascript\n${currentCode}\n\`\`\`\n\nReview feedback:\n${issueList}\n\nProduce improved code:`,
},
],
});
currentCode = refineResponse.choices[0].message.content
.replace(/^```javascript\n?/, '')
.replace(/\n?```$/, '');
console.log(` Refined: ${currentCode.length} characters`);
}
return { code: currentCode, iterations: 3, maxReached: true };
}
// Usage
const result = await codeReviewLoop(
`
function fetchUsers(db, name) {
const query = "SELECT * FROM users WHERE name = '" + name + "'";
const result = db.query(query);
return result;
}`,
'Fetch users by name from database. Must be secure, handle errors, and return typed results.'
);
After iteration 1, the Critic would flag the SQL injection vulnerability. The Refiner would produce parameterized queries. After iteration 2, the Critic might flag missing error handling. The Refiner would add try/catch. By iteration 3, the code is production-ready.
9. Configuring the Loop
Choosing the quality threshold
| Content Type | Recommended Threshold | Why |
|---|---|---|
| Internal notes | 6/10 | Speed matters more than polish |
| Blog posts | 7-8/10 | Good quality, doesn't need perfection |
| Documentation | 8/10 | Accuracy and clarity are critical |
| Legal/medical | 9/10 | Errors have serious consequences |
| Production code | 8-9/10 | Bugs are expensive to fix later |
Choosing max iterations
1 iteration = 3 LLM calls (generate + critique + refine)
2 iterations = 5 LLM calls (generate + critique + refine + critique + refine)
3 iterations = 7 LLM calls (max recommended for most use cases)
Cost per iteration (GPT-4o, ~2000 tokens each):
Input: ~2000 tokens × $2.50/1M = $0.005
Output: ~2000 tokens × $10.00/1M = $0.020
Per iteration: ~$0.025
3 iterations: ~$0.075
For high-volume applications, consider:
- 1 iteration for 80% of requests (good enough)
- 2 iterations for edge cases (Critic flagged critical issues)
- 3 iterations maximum (diminishing returns beyond this)
Diminishing returns detection
function shouldContinueIterating(history) {
if (history.length < 2) return true;
const lastTwo = history.filter((h) => h.type === 'critique').slice(-2);
if (lastTwo.length < 2) return true;
const scoreDelta = lastTwo[1].score - lastTwo[0].score;
// If improvement between last two iterations is < 1 point, stop
if (scoreDelta < 1) {
console.log(` Diminishing returns: score improved only ${scoreDelta} points. Stopping.`);
return false;
}
return true;
}
10. Design Considerations
Preventing infinite loops
Always have TWO exit conditions:
- Quality threshold met (the happy path)
- Maximum iterations reached (the safety net)
const MAX_ITERATIONS = 3; // Hard ceiling
const QUALITY_THRESHOLD = 8; // Exit when score >= this
const MIN_IMPROVEMENT = 0.5; // Exit if improvement per iteration < this
Critic consistency
The Critic should produce consistent scores. Run the same content through the Critic multiple times — if scores vary by more than 1 point, the Critic's prompt needs refinement.
// Test Critic consistency
async function testCriticConsistency(content, criteria, runs = 5) {
const scores = [];
for (let i = 0; i < runs; i++) {
const feedback = await critique(content, criteria);
scores.push(feedback.overall_score);
}
const avg = scores.reduce((a, b) => a + b) / scores.length;
const variance = scores.reduce((a, b) => a + (b - avg) ** 2, 0) / scores.length;
console.log(`Scores: ${scores.join(', ')}`);
console.log(`Average: ${avg.toFixed(1)}, Variance: ${variance.toFixed(2)}`);
// Variance > 1.0 suggests the Critic is too inconsistent
}
Combining with other patterns
The Critic-Refiner loop is often the final stage in a pipeline:
Planner-Executor → produces raw output
Researcher-Writer → produces a draft report
→ Critic-Refiner → polishes the draft to publication quality
11. Key Takeaways
- The Critic-Refiner pattern implements iterative improvement — generate, critique, refine, repeat. It models the human draft-review-revise cycle.
- The Critic must be strict and specific — vague feedback like "needs improvement" is useless. Good feedback names the exact location, issue, and suggested fix.
- Two exit conditions prevent infinite loops — quality threshold (happy path) and max iterations (safety net). Monitor for diminishing returns.
- Self-reflection is a lighter alternative — one agent critiques its own work. Cheaper but less rigorous than separate Critic and Refiner agents.
- Code review is a killer use case — the Critic finds bugs, security issues, and style violations; the Refiner produces improved code. Multiple rounds catch more issues than a single pass.
- Cost scales linearly with iterations — each iteration is 2 LLM calls (critique + refine). Most tasks converge in 2-3 iterations. Diminishing returns detection saves money.
Explain-It Challenge
- A team lead asks "why not just ask GPT to write a better article in one shot instead of doing this loop?" Explain the advantage of iterative improvement with a specific Critic.
- Your Critic gives the same content a score of 6 on one run and 9 on another run. What is wrong, and how do you fix it?
- You are building a legal document generator. Choose the quality threshold, max iterations, and explain your reasoning. What happens if you set the threshold too low? Too high?
Navigation: ← 4.16.b Researcher-Writer · 4.16.d — Router Agents →