Episode 4 — Generative AI Engineering / 4.10 — Error Handling in AI Applications

4.10 — Exercise Questions: Error Handling in AI Applications

Practice questions for all four subtopics in Section 4.10. Mix of conceptual, debugging, code analysis, and design tasks.

How to use this material (instructions)

Read lessons in order — README.md, then 4.10.a → 4.10.d.
Answer closed-book first — then compare to the matching lesson.
Build the code — implement the retry wrapper, JSON parser, and logger yourself before checking the lesson code.
Interview prep — 4.10-Interview-Questions.md.
Quick review — 4.10-Quick-Revision.md.

4.10.a — Handling Invalid JSON (Q1–Q10)

Q1. List five different ways an LLM can return invalid JSON, even when your prompt says "respond in JSON only." For each, give a one-line example of what the response looks like.

Q2. Why does JSON.parse('{"name": "Alice",}') throw an error? What specific JSON rule does it violate, and which programming language allows this syntax?

Q3. Write a function extractJsonFromText(text) that handles all three of these responses and returns the parsed JSON object:

'Here is the data: {"name": "Alice"}'
'```json\n{"name": "Alice"}\n```'
'{"name": "Alice"}'

Q4. Your multi-layer JSON parser uses the "first { to last }" strategy. The LLM responds: 'Found {"count": 2} items in {"category": "books"} collection'. What does your parser extract, and why is it wrong? How would you improve the strategy?

Q5. Explain why response_format: { type: 'json_object' } in the OpenAI API does not eliminate the need for error handling. Name three failure modes that still occur.

Q6. Code review: This code handles LLM JSON responses. Find three bugs or problems:

const content = response.choices[0].message.content;
const data = JSON.parse(content);
if (data.name) {
  saveUser(data);
}

Q7. Write a cleanJson(text) function that fixes: (a) trailing commas, (b) single quotes, (c) JavaScript-style comments, and (d) unquoted keys. Test it against these inputs:

"{'name': 'Alice', 'age': 30,}"
'{name: "Alice", // user data\n age: 30}'

Q8. Why is fixing unescaped quotes inside JSON strings (e.g., {"text": "She said "hello""}) much harder than fixing trailing commas? What makes it ambiguous?

Q9. You're building a JSON parser for LLM output. Should you try all repair strategies on every response, or should you attempt them in a specific order? Design the optimal order and explain why.

Q10. Hands-on: Build a parseLlmJson(text) function that returns { success, data, strategy, error }. Test it against at least 8 different malformed inputs from the failure pattern catalog in 4.10.a.

4.10.b — Partial Responses and Timeouts (Q11–Q20)

Q11. What is finish_reason in the OpenAI API response? List all possible values and explain what each means for your application.

Q12. A response comes back with HTTP 200 and finish_reason: "length". Is this a success or a failure? Why is this one of the most insidious error modes in LLM applications?

Q13. Your application sets max_tokens: 500 but the expected JSON output is typically 800 tokens. What happens? How should you determine the right max_tokens value?

Q14. Explain the difference between these three timeout scenarios: (a) client-side timeout (AbortController), (b) API server timeout (408/504), (c) model generation timeout. Which ones return partial data?

Q15. Write a function streamWithTimeout(messages, options) that streams an LLM response and aborts if: (a) no chunk arrives within 15 seconds, or (b) total streaming time exceeds 2 minutes. Return whatever content was received before the timeout.

Q16. Your repair function tries to close truncated JSON by counting open braces and adding closing ones. Given the truncated input '{"users": [{"name": "Ali', what does your repair produce? Is the result usable?

Q17. Design a "continuation" strategy: when finish_reason is "length", send the partial output back to the model and ask it to continue. What are two risks of this approach?

Q18. Calculation: Your input uses 120,000 tokens on a model with a 128K context window. You set max_tokens: 4096. How many output tokens will the model actually produce? Why?

Q19. A user reports the chatbot "hangs forever" on complex questions. You check and find no timeout is configured. Design timeout values for three features: (a) quick data extraction, (b) document summarization, (c) code generation.

Q20. Hands-on: Build a robustLlmCall(messages, options) function that handles truncation (with auto-continuation), timeouts (with AbortController), and content filtering. Return a standardized result object with metadata.

4.10.c — Retry Mechanisms (Q21–Q32)

Q21. Classify each of these errors as retryable or non-retryable: (a) 429 Too Many Requests, (b) 401 Unauthorized, (c) 500 Internal Server Error, (d) 400 Bad Request, (e) ECONNRESET, (f) response is malformed JSON.

Q22. Explain exponential backoff with an example. If the base delay is 1 second, what are the delays for attempts 0 through 5?

Q23. What is jitter and why is it important? Explain the thundering herd problem that occurs without jitter.

Q24. Compare equal jitter (delay = half exponential + random half) vs full jitter (delay = random between 0 and exponential). Which is recommended by AWS and why?

Q25. Write a retryWithBackoff(fn, options) function that: (a) classifies errors as retryable/non-retryable, (b) uses exponential backoff with full jitter, (c) respects Retry-After headers, (d) has a max retry limit, (e) logs each attempt.

Q26. When the LLM returns valid JSON that fails schema validation, you have two retry strategies: (A) retry with the same prompt, or (B) retry with validation errors included in the prompt. When should you use each?

Q27. Cost calculation: A single LLM call costs $0.0075. Your system retries up to 3 times with error feedback (input grows by ~1500 tokens each retry). Calculate the worst-case cost of a single request that triggers all 3 retries.

Q28. What is the circuit breaker pattern? How does it prevent a failing API from consuming all your retry budget? Draw the state diagram (Closed → Open → Half-Open).

Q29. Design a fallback chain for an LLM application: primary model → fallback model → cached response → graceful degradation. For each level, specify when to fall through and what quality trade-offs the user experiences.

Q30. Your monitoring shows 15% of requests trigger at least one retry, and the average retry count is 1.8. Is this healthy? What thresholds would prompt you to investigate?

Q31. Hands-on: Build a retryWithFeedback(messages, schema, options) function that retries with validation error feedback. Test it by intentionally using a strict schema that the model sometimes violates (e.g., requiring an enum value).

Q32. A teammate proposes retrying every error up to 10 times "to maximize reliability." Write a response explaining why this is a bad idea, covering cost, latency, and non-retryable errors.

4.10.d — Logging AI Requests (Q33–Q44)

Q33. List eight fields that every LLM API call log should contain. For each field, explain why it's needed for debugging or monitoring.

Q34. Why is logging more critical for LLM applications than for traditional APIs? Give a specific debugging scenario that's impossible without logs.

Q35. Design a structured log schema (JSON format) for an LLM API call. Include request metadata, model configuration, response metrics, validation results, and error information.

Q36. Your application processes user messages that may contain emails, phone numbers, and Social Security numbers. Design a PII filter function that redacts these before logging. What are the trade-offs of aggressive PII filtering?

Q37. Propose a tiered logging strategy with at least three tiers (e.g., production, staging, development). For each tier, specify what content is logged, whether PII is filtered, and the retention period.

Q38. Name five key metrics for an LLM monitoring dashboard. For each metric, specify what values are healthy vs alarming.

Q39. Design four alert rules for an LLM application. For each, specify the condition, severity level, and recommended response action.

Q40. Your prompt v2.3 has a 96.2% success rate and v2.4 has a 98.7% success rate over ~5,400 requests each. How would you use logs to determine if this improvement is real or noise? What other metrics would you compare?

Q41. Explain how you would use logs to identify which prompt changes improved performance. What specific log fields enable A/B comparison?

Q42. A compliance officer asks: "If we log LLM responses, and the model generates something problematic, are we liable?" How does your logging policy address this?

Q43. Calculation: Your LLM service handles 50,000 requests/day. Each log entry is ~2KB. How much storage do you need per month? If you add full content logging (~10KB/entry), how does that change? At what point should you move to a tiered retention strategy?

Q44. Hands-on: Build an InstrumentedLlmClient class that wraps the OpenAI SDK, automatically logs every call (with timing, token counts, and error classification), and provides a getMetrics() method that returns aggregated statistics.

Answer Hints

Q	Hint
Q2	JSON specification (RFC 8259) forbids trailing commas; JavaScript/TypeScript allows them
Q5	Truncation (finish_reason: "length"), timeouts, wrong schema (valid JSON but wrong fields)
Q6	No try/catch on JSON.parse, no null check on content, no schema validation
Q12	HTTP 200 = success, but content is incomplete — it "looks like success" to naive error handling
Q18	Available = 128,000 - 120,000 = 8,000 tokens; actual output capped at ~8,000, not 4,096 since 4,096 < 8,000. Output will be up to 4,096 because max_tokens is the binding constraint here
Q21	Retryable: a, c, e, f. Non-retryable: b, d
Q22	1s, 2s, 4s, 8s, 16s, 32s (base × 2^attempt)
Q24	Full jitter: lower average wait time, maximum request spreading
Q27	Attempt 1: $0.0075, Attempt 2: ~$0.0113, Attempt 3: ~$0.0150, Attempt 4: ~$0.0188; Total: ~$0.0526 (7x single call)
Q28	Closed (normal) → too many failures → Open (fail fast) → timer expires → Half-Open (try one) → success → Closed
Q30	15% retry rate is borderline high; investigate if >10% or if average retries >2.0
Q33	requestId, model, timestamp, latencyMs, promptTokens, completionTokens, finishReason, error
Q43	50,000 × 2KB × 30 = ~3GB/month (metadata); 50,000 × 10KB × 30 = ~15GB/month (with content)

← Back to 4.10 — Error Handling in AI Applications (README)