Episode 4 — Generative AI Engineering / 4.3 — Prompt Engineering Fundamentals

4.3 — Exercise Questions: Prompt Engineering Fundamentals

Practice questions for all four subtopics in Section 4.3. Mix of conceptual, rewrite, and hands-on tasks.

How to use this material (instructions)

  1. Read lessons in orderREADME.md, then 4.3.a4.3.d.
  2. Answer closed-book first — then compare to the matching lesson.
  3. Test with a real LLM — run the prompts you write and compare outputs.
  4. Interview prep4.3-Interview-Questions.md.
  5. Quick review4.3-Quick-Revision.md.

4.3.a — Writing Clear Instructions (Q1–Q14)

Q1. Name the six dimensions of a clear prompt. Give a one-line description of each.

Q2. Rewrite this vague prompt to be specific: "Tell me about databases." Target audience: a junior backend developer choosing between PostgreSQL and MongoDB.

Q3. Explain why persona assignment works. What changes in the model's output when you add "You are a senior security auditor" to the system prompt?

Q4. Convert this paragraph-style instruction into a numbered step-by-step format: "Look at this code and tell me what's wrong with it, what the time complexity is, and how to fix it, and also explain what it does."

Q5. Rewrite these "Don't" instructions as "Do" instructions:

  • "Don't be too technical."
  • "Don't include irrelevant details."
  • "Don't make up facts."

Q6. You are building a customer support chatbot for an e-commerce company. Write a complete system prompt that covers: persona, task scope, format, tone, and boundaries.

Q7. What is the difference between the system prompt and the user prompt? Give one example of something that belongs in each.

Q8. A prompt returns different output structures every time you run it. The same input sometimes gives a list, sometimes a paragraph, sometimes a table. What is missing from the prompt?

Q9. Write a prompt that asks the model to extract three fields (title, author, year) from a book description and return them as a JSON object. Include null handling for missing fields.

Q10. Why is "Write good code" a bad prompt? Rewrite it as a specific, actionable instruction for a model that generates JavaScript utility functions.

Q11. Your system prompt is 3,500 tokens long. The model seems to ignore instructions near the middle. What are two strategies to fix this without simply shortening the prompt?

Q12. Compare these two prompts and predict which produces better output:

  • A: "Summarize this article."
  • B: "Summarize this article in 3 bullet points. Each bullet: one sentence, under 25 words. Focus on: main claim, evidence, and conclusion." Explain WHY B is better using prompt engineering principles.

Q13. Write a prompt for a model that grades student essays on a 1-10 scale. The prompt should produce consistent scores across multiple runs. Include all six dimensions.

Q14. Hands-on: Take any prompt you've used with ChatGPT or an API. Rewrite it applying at least 4 techniques from 4.3.a. Run both versions and compare the outputs.


4.3.b — Few-Shot Examples (Q15–Q28)

Q15. Define zero-shot, one-shot, and few-shot prompting. Give a one-sentence example of each for a sentiment classification task.

Q16. You have a classification task with 5 possible labels. You include 3 examples, all labeled "billing." What problem will this cause?

Q17. Why are edge case examples more valuable than clear-cut examples in a few-shot prompt? Illustrate with a sentiment analysis example.

Q18. Write a few-shot prompt (3 examples) for classifying emails into: urgent, normal, spam. Include at least one edge case.

Q19. Your few-shot prompt uses arrows () in examples 1 and 3, but colons (:) in example 2. What impact does this inconsistency have on the model's output?

Q20. Calculate the token cost difference between a zero-shot and a 5-shot version of a classification prompt. Assume: zero-shot instruction = 100 tokens, each example = 60 tokens, GPT-4o at $2.50/1M input tokens, 500,000 requests/month.

Q21. What is the diminishing returns curve for few-shot examples? At what point should you stop adding examples?

Q22. Write a few-shot prompt for data extraction — extracting name, email, and phone from unstructured text. Include 3 examples demonstrating: all fields present, one field missing (null), and a tricky format.

Q23. When is it better to use detailed instructions instead of few-shot examples? Name three scenarios.

Q24. What are negative examples in few-shot prompting? Write a prompt that uses one positive and one negative example to teach the model not to include titles (Dr., Mr., etc.) in extracted names.

Q25. Explain dynamic example selection. Why would you select different examples for different inputs instead of using the same examples every time?

Q26. A colleague says "I'll just add 25 examples to be safe." What are the three problems with using too many examples?

Q27. Write a few-shot prompt for text transformation: converting informal addresses into standardized US postal format. Include 3 examples covering: abbreviations, missing zip codes, and all-caps input.

Q28. Hands-on: Build a classification prompt using zero-shot, then add 1 example, then 3, then 5. Test each on 10 inputs and track accuracy. At what point did accuracy stop improving?


4.3.c — Chain-of-Thought (Q29–Q40)

Q29. Explain chain-of-thought (CoT) prompting in one paragraph. Why does forcing the model to "show its work" improve accuracy?

Q30. What is the difference between zero-shot CoT (adding "Let's think step by step") and few-shot CoT (providing reasoning traces in examples)?

Q31. Write a CoT prompt for this problem: "A store sells apples for $1.50 each. A customer has a 20% off coupon and buys 8 apples. How much do they pay?" Show the expected reasoning structure.

Q32. Name three types of tasks where CoT significantly improves accuracy. For each, explain WHY reasoning steps help.

Q33. Name three types of tasks where CoT is wasteful. For each, explain why the model doesn't need intermediate reasoning.

Q34. Calculation: A CoT prompt generates ~150 output tokens per request vs ~10 tokens without CoT. At GPT-4o pricing ($10/1M output tokens), what is the monthly cost difference for 200,000 requests?

Q35. Explain the difference between visible reasoning (standard CoT) and hidden reasoning (reasoning models like o1). What can you do with visible reasoning that you cannot do with hidden reasoning?

Q36. Write a structured CoT template for debugging code. Use the format: SYMPTOMS, HYPOTHESIS, TRACE, ROOT CAUSE, FIX.

Q37. What is post-hoc rationalization in CoT? Why does it mean you shouldn't blindly trust the reasoning steps the model generates?

Q38. Design a selective CoT system for a customer support bot. Which types of questions should trigger CoT, and which should skip it?

Q39. Combine few-shot + CoT for this task: determining if a refund request is eligible based on a 30-day return policy. Write 2 examples with reasoning traces, then the prompt for a new case.

Q40. Hands-on: Ask a model to solve "What is 47 × 38?" with and without CoT. Then try "What is the capital of Germany?" with and without CoT. Compare accuracy and token usage for both.


4.3.d — Output Formatting Instructions (Q41–Q55)

Q41. Why does JSON.parse() fail when the model adds "Here's the JSON:" before the JSON object? How do you prevent this?

Q42. Write a system prompt for a JSON extraction API that prevents ALL of these failures: (a) wrapper text, (b) markdown code fences, (c) wrong field names, (d) string instead of number.

Q43. What does the "Respond ONLY with..." pattern do? Write three variations of this pattern for: JSON, a single category label, and a Markdown table.

Q44. Compare XML-style delimiters (<tag>content</tag>) with section markers (---SECTION---). When would you use each?

Q45. Your extraction pipeline returns {"price": "$12.99"} instead of {"price": 12.99}. Write the prompt instruction AND the code validation that would catch and prevent this.

Q46. Explain the three layers of defense in depth for output formatting: prompt instructions, API-level enforcement, and code validation. What does each layer guarantee?

Q47. What is OpenAI's response_format: { type: 'json_object' } parameter? How is it different from response_format: { type: 'json_schema' } with strict mode?

Q48. Write a prompt that combines few-shot examples + format instructions for extracting event details (name, date, location, price) from promotional text. Include 2 examples and explicit null handling.

Q49. The model sometimes returns an array [{...}] when your code expects an object {items: [{...}]}. Write the prompt instruction that prevents this and the code that validates the structure.

Q50. Why is temperature 0 especially important when the output will be parsed by code? What happens to JSON reliability at temperature 0.8?

Q51. Write a complete parsing function in JavaScript that handles all common formatting failures: wrapper text, code fences, trailing commas, and incomplete JSON.

Q52. Design a retry strategy for when the model returns unparseable output. Include: max retries, what to change between retries, and when to give up.

Q53. You need the model to return both a human-readable explanation AND machine-parseable data. Design a delimiter-based format that separates these two parts.

Q54. Write a system prompt for a Markdown-formatted code review that always uses these sections: Summary, Issues, Suggestions, Verdict. The Verdict must be exactly one of: APPROVE, REQUEST_CHANGES, REJECT.

Q55. Hands-on: Build a receipt extraction function that (a) uses a system prompt with format instructions, (b) includes 2 few-shot examples, (c) uses response_format: json_object, (d) validates the parsed output in code. Test it with 3 different receipt formats.


Answer Hints

QHint
Q1Role, Task, Context, Format, Tone, Boundaries
Q5Replace with positive equivalents: "Use simple language...", "Focus only on...", "Say 'I'm not sure' if..."
Q8Missing output format specification
Q11Front-load critical instructions; repeat key rules at the end ("lost in the middle" problem)
Q16Model develops bias toward "billing" label — it under-predicts other categories
Q20Zero-shot: 100 tokens × 500K = 50M tokens × $2.50/1M = $125. Few-shot: (100+300) × 500K = 200M × $2.50/1M = $500. Difference: $375/month
Q213-5 examples provide ~90% of the benefit; beyond 10, returns are marginal
Q26Token cost, diminishing returns, and risk of the examples not matching the actual input distribution
Q34Without CoT: 200K × 10 × $10/1M = $20. With CoT: 200K × 150 × $10/1M = $300. Difference: $280/month
Q35Visible: you can read, verify, and debug each step. Hidden: you get the answer but cannot inspect the reasoning
Q41"Here's the JSON:" is not valid JSON — use "Respond ONLY with valid JSON" in the prompt
Q46Prompt: reduces likelihood of bad format. API: guarantees valid JSON structure. Code: validates field names, types, and business rules
Q50Temperature 0 = consistent output = reliable parsing. Temperature 0.8 = varied output = occasional JSON syntax errors

← Back to 4.3 — Prompt Engineering Fundamentals (README)