Episode 4 — Generative AI Engineering / 4.3 — Prompt Engineering Fundamentals
4.3 — Prompt Engineering Fundamentals: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps — reopen
README.md→4.3.a…4.3.d. - Practice —
4.3-Exercise-Questions.md. - Polish answers —
4.3-Interview-Questions.md.
Core vocabulary
| Term | One-liner |
|---|---|
| Prompt engineering | Designing instructions that make LLMs produce reliable, useful output |
| System prompt | Stable instructions (persona, rules, format) — same across requests |
| User prompt | Per-request content (task, input data, specific question) |
| Zero-shot | No examples — only instructions |
| One-shot | One example before the actual task |
| Few-shot | Multiple examples (typically 3-7) before the task |
| Chain-of-thought (CoT) | Asking the model to reason step by step before answering |
| Persona/role | Assigning an identity to frame the model's expertise and vocabulary |
| Delimiters | Markers (XML tags, ---, ===) that separate sections in input/output |
| "Respond ONLY with..." | Pattern to prevent the model from adding extra text around structured output |
| Defense in depth | Prompt instructions + API format enforcement + code validation |
Six dimensions of a clear prompt
1. ROLE → Who is the model? (persona/expertise)
2. TASK → What should it do? (specific action verb)
3. CONTEXT → What background? (constraints, scope)
4. FORMAT → What shape is the output? (JSON, list, table)
5. TONE → How should it sound? (formal, casual, technical)
6. BOUNDARIES → What should it NOT do? (limits, exclusions)
Do vs Don't cheat sheet
| Don't (weaker) | Do (stronger) |
|---|---|
| Don't be vague | Be specific — include numbers, names, dates |
| Don't be long | Keep under 50 words |
| Don't use jargon | Write at 8th-grade reading level |
| Don't guess | If uncertain, say "I'm not sure" |
| Don't hallucinate | Base claims on provided documents; cite source |
| Don't include extra text | Respond ONLY with JSON |
Best practice: combine both — "Do" for direction, "Don't" to close loopholes.
Few-shot at a glance
Zero-shot: "Classify as positive/negative" → model guesses format
One-shot: + 1 example → model sees the pattern once
Few-shot: + 3-7 examples → model sees pattern multiple times → reliable
DIMINISHING RETURNS:
0 examples → ~80% accuracy
3 examples → ~93% accuracy (+13%)
5 examples → ~95% accuracy (+2%)
10 examples → ~96% accuracy (+1%)
20 examples → ~96.5% (+0.5%) ← token waste
SWEET SPOT: 3-5 examples
Effective example design
✓ Cover ALL output labels (positive, negative, neutral)
✓ Include edge cases (ambiguous inputs)
✓ Use diverse phrasing (not all the same structure)
✓ Keep examples concise (minimal tokens for max signal)
✓ Use consistent formatting (same delimiters everywhere)
✗ Avoid biased label distribution
✗ Avoid examples too long or unrepresentative of real data
Chain-of-thought (CoT)
WHAT: Ask model to show reasoning steps before final answer
HOW: "Let's think step by step" or structured template
WHY: Each reasoning token provides context for the next step
USE CoT for: SKIP CoT for:
✓ Math / calculations ✗ Simple facts ("capital of?")
✓ Logic / reasoning ✗ Direct extraction
✓ Multi-step problems ✗ Clear-cut classification
✓ Debugging code ✗ Translation
✓ Comparison / analysis ✗ Formatting / rewriting
✓ Ambiguous decisions ✗ Token-tight budgets
CoT cost impact
Without CoT: ~10 output tokens/request
With CoT: ~150 output tokens/request
Cost multiplier: ~15x more output tokens
At scale (200K requests/month, $10/1M output tokens):
Without: $20/month
With: $300/month
STRATEGY: Selective CoT — only for complex requests
Visible vs hidden reasoning
VISIBLE (standard CoT):
+ You can verify/debug each step
+ Users see the "why"
- More output tokens (costs more)
- Reasoning may be post-hoc rationalization
HIDDEN (reasoning models like o1/o3):
+ Often higher accuracy on complex tasks
+ Cleaner output
- Cannot inspect reasoning
- Harder to debug
Output formatting
The golden rule
IF CODE PARSES THE OUTPUT → YOU MUST SPECIFY FORMAT
JSON output checklist
□ Temperature 0
□ "Respond ONLY with valid JSON"
□ Show exact schema with key names and types
□ Specify null handling (not "N/A", not "", not "unknown")
□ Numbers as numbers (not "$12.99" → use 12.99)
□ Few-shot examples showing the exact JSON structure
□ response_format: json_object (or json_schema)
□ JSON.parse() in try-catch
□ Strip markdown code fences before parsing
□ Validate required fields and types in code
□ Retry logic for parse failures
"Respond ONLY with..." variations
JSON: "Respond ONLY with valid JSON. No explanation. No code fences."
Label: "Respond with ONLY the category name. One word. No punctuation."
List: "Respond ONLY with the items, one per line. No bullets. No headers."
Code: "Respond ONLY with the code. No explanation. No markdown."
Number: "Respond with ONLY the number. No text. No units."
Defense in depth
Layer 1: PROMPT → "Respond ONLY with JSON" + schema + examples
Layer 2: API → response_format: json_schema (strict)
Layer 3: CODE → JSON.parse() + field validation + type checks
Each layer catches what the previous layer misses.
Common formatting failures
| Failure | Prevention |
|---|---|
| Wrapper text ("Here's the JSON:") | "Respond ONLY with..." in system prompt |
| Markdown code fences (```json) | "Do not use code fences" + strip in code |
| Wrong field names | Show EXACT keys in schema |
| Wrong types (string "30" vs number 30) | Specify types explicitly + validate |
| Array instead of object | "Return a JSON OBJECT (not array)" |
| Missing fields | "All fields required. Use null for missing." |
| Extra fields | "Do not add keys not in the schema" |
Prompt structure template
// System prompt (stable across requests)
const system = `You are [PERSONA].
TASK: [What to do]
RULES:
- [Rule 1]
- [Rule 2]
- [Rule 3]
OUTPUT FORMAT:
[Exact schema or structure]
DO:
- [Positive instruction 1]
- [Positive instruction 2]
DON'T:
- [Negative instruction 1]
- [Negative instruction 2]
Respond ONLY with [format]. No extra text.`;
// User prompt (changes per request)
const user = `[Examples if using few-shot]
[Actual input/task for this request]`;
Cost optimization quick reference
TECHNIQUE TOKEN IMPACT ACCURACY IMPACT
Trim system prompt -20-40% input Usually none
Reduce few-shot (10→4) -30% input -1-2% accuracy
Remove CoT from simple tasks -80% output None on simple tasks
Model tiering (mini/full) -60-80% cost Varies by task
Dynamic example selection Same tokens +2-5% accuracy
Prompt iteration workflow
1. Write basic prompt
2. Run 5-10 times on diverse inputs
3. Identify most common failure
4. Fix with: more specificity / examples / format rules
5. Re-run 5-10 times
6. Repeat until consistent
7. Validate on held-out test set
8. Version the prompt in code
When to use what
| Situation | Technique |
|---|---|
| Model gives random format | Add output format specification |
| Model misunderstands the task | Add persona + clearer instructions |
| Model gets edge cases wrong | Add few-shot examples covering edges |
| Model makes reasoning errors | Add chain-of-thought |
| JSON parsing keeps failing | Add response_format + code validation |
| Output is inconsistent | Temperature 0 + few-shot + format spec |
| Cost is too high | Reduce examples, selective CoT, model tiering |
End of 4.3 quick revision.