Episode 4 — Generative AI Engineering / 4.3 — Prompt Engineering Fundamentals

4.3 — Prompt Engineering Fundamentals: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps — reopen README.md → 4.3.a…4.3.d.
Practice — 4.3-Exercise-Questions.md.
Polish answers — 4.3-Interview-Questions.md.

Core vocabulary

Term	One-liner
Prompt engineering	Designing instructions that make LLMs produce reliable, useful output
System prompt	Stable instructions (persona, rules, format) — same across requests
User prompt	Per-request content (task, input data, specific question)
Zero-shot	No examples — only instructions
One-shot	One example before the actual task
Few-shot	Multiple examples (typically 3-7) before the task
Chain-of-thought (CoT)	Asking the model to reason step by step before answering
Persona/role	Assigning an identity to frame the model's expertise and vocabulary
Delimiters	Markers (XML tags, `---`, `===`) that separate sections in input/output
"Respond ONLY with..."	Pattern to prevent the model from adding extra text around structured output
Defense in depth	Prompt instructions + API format enforcement + code validation

Six dimensions of a clear prompt

1. ROLE        → Who is the model? (persona/expertise)
2. TASK        → What should it do? (specific action verb)
3. CONTEXT     → What background? (constraints, scope)
4. FORMAT      → What shape is the output? (JSON, list, table)
5. TONE        → How should it sound? (formal, casual, technical)
6. BOUNDARIES  → What should it NOT do? (limits, exclusions)

Do vs Don't cheat sheet

Don't (weaker)	Do (stronger)
Don't be vague	Be specific — include numbers, names, dates
Don't be long	Keep under 50 words
Don't use jargon	Write at 8th-grade reading level
Don't guess	If uncertain, say "I'm not sure"
Don't hallucinate	Base claims on provided documents; cite source
Don't include extra text	Respond ONLY with JSON

Best practice: combine both — "Do" for direction, "Don't" to close loopholes.

Few-shot at a glance

Zero-shot:   "Classify as positive/negative" → model guesses format
One-shot:    + 1 example → model sees the pattern once
Few-shot:    + 3-7 examples → model sees pattern multiple times → reliable

DIMINISHING RETURNS:
  0 examples → ~80% accuracy
  3 examples → ~93% accuracy (+13%)
  5 examples → ~95% accuracy (+2%)
  10 examples → ~96% accuracy (+1%)
  20 examples → ~96.5% (+0.5%) ← token waste

SWEET SPOT: 3-5 examples

Effective example design

✓ Cover ALL output labels (positive, negative, neutral)
✓ Include edge cases (ambiguous inputs)
✓ Use diverse phrasing (not all the same structure)
✓ Keep examples concise (minimal tokens for max signal)
✓ Use consistent formatting (same delimiters everywhere)
✗ Avoid biased label distribution
✗ Avoid examples too long or unrepresentative of real data

Chain-of-thought (CoT)

WHAT:   Ask model to show reasoning steps before final answer
HOW:    "Let's think step by step" or structured template
WHY:    Each reasoning token provides context for the next step

USE CoT for:               SKIP CoT for:
  ✓ Math / calculations      ✗ Simple facts ("capital of?")
  ✓ Logic / reasoning         ✗ Direct extraction
  ✓ Multi-step problems       ✗ Clear-cut classification
  ✓ Debugging code            ✗ Translation
  ✓ Comparison / analysis     ✗ Formatting / rewriting
  ✓ Ambiguous decisions       ✗ Token-tight budgets

CoT cost impact

Without CoT:  ~10 output tokens/request
With CoT:     ~150 output tokens/request
Cost multiplier: ~15x more output tokens

At scale (200K requests/month, $10/1M output tokens):
  Without: $20/month
  With:    $300/month
  
STRATEGY: Selective CoT — only for complex requests

Visible vs hidden reasoning

VISIBLE (standard CoT):
  + You can verify/debug each step
  + Users see the "why"
  - More output tokens (costs more)
  - Reasoning may be post-hoc rationalization

HIDDEN (reasoning models like o1/o3):
  + Often higher accuracy on complex tasks
  + Cleaner output
  - Cannot inspect reasoning
  - Harder to debug

Output formatting

The golden rule

IF CODE PARSES THE OUTPUT → YOU MUST SPECIFY FORMAT

JSON output checklist

□ Temperature 0
□ "Respond ONLY with valid JSON"
□ Show exact schema with key names and types
□ Specify null handling (not "N/A", not "", not "unknown")
□ Numbers as numbers (not "$12.99" → use 12.99)
□ Few-shot examples showing the exact JSON structure
□ response_format: json_object (or json_schema)
□ JSON.parse() in try-catch
□ Strip markdown code fences before parsing
□ Validate required fields and types in code
□ Retry logic for parse failures

"Respond ONLY with..." variations

JSON:   "Respond ONLY with valid JSON. No explanation. No code fences."
Label:  "Respond with ONLY the category name. One word. No punctuation."
List:   "Respond ONLY with the items, one per line. No bullets. No headers."
Code:   "Respond ONLY with the code. No explanation. No markdown."
Number: "Respond with ONLY the number. No text. No units."

Defense in depth

Layer 1: PROMPT    → "Respond ONLY with JSON" + schema + examples
Layer 2: API       → response_format: json_schema (strict)
Layer 3: CODE      → JSON.parse() + field validation + type checks

Each layer catches what the previous layer misses.

Common formatting failures

Failure	Prevention
Wrapper text ("Here's the JSON:")	"Respond ONLY with..." in system prompt
Markdown code fences (```json)	"Do not use code fences" + strip in code
Wrong field names	Show EXACT keys in schema
Wrong types (string "30" vs number 30)	Specify types explicitly + validate
Array instead of object	"Return a JSON OBJECT (not array)"
Missing fields	"All fields required. Use null for missing."
Extra fields	"Do not add keys not in the schema"

Prompt structure template

// System prompt (stable across requests)
const system = `You are [PERSONA].

TASK: [What to do]

RULES:
- [Rule 1]
- [Rule 2]
- [Rule 3]

OUTPUT FORMAT:
[Exact schema or structure]

DO:
- [Positive instruction 1]
- [Positive instruction 2]

DON'T:
- [Negative instruction 1]
- [Negative instruction 2]

Respond ONLY with [format]. No extra text.`;

// User prompt (changes per request)
const user = `[Examples if using few-shot]

[Actual input/task for this request]`;

Cost optimization quick reference

TECHNIQUE                    TOKEN IMPACT    ACCURACY IMPACT
Trim system prompt           -20-40% input   Usually none
Reduce few-shot (10→4)       -30% input      -1-2% accuracy
Remove CoT from simple tasks -80% output     None on simple tasks
Model tiering (mini/full)    -60-80% cost    Varies by task
Dynamic example selection    Same tokens     +2-5% accuracy

Prompt iteration workflow

1. Write basic prompt
2. Run 5-10 times on diverse inputs
3. Identify most common failure
4. Fix with: more specificity / examples / format rules
5. Re-run 5-10 times
6. Repeat until consistent
7. Validate on held-out test set
8. Version the prompt in code

When to use what

Situation	Technique
Model gives random format	Add output format specification
Model misunderstands the task	Add persona + clearer instructions
Model gets edge cases wrong	Add few-shot examples covering edges
Model makes reasoning errors	Add chain-of-thought
JSON parsing keeps failing	Add response_format + code validation
Output is inconsistent	Temperature 0 + few-shot + format spec
Cost is too high	Reduce examples, selective CoT, model tiering

End of 4.3 quick revision.