Episode 6 — Scaling Reliability Microservices Web3 / 6.2 — Building and Orchestrating Microservices

6.2 — Exercise Questions: Building & Orchestrating Microservices

Practice questions for all five subtopics in Section 6.2. Mix of conceptual, design, code, and scenario-based.

How to use this material (instructions)

  1. Read lessons in orderREADME.md, then 6.2.a through 6.2.e.
  2. Answer closed-book first — then compare to the matching lesson.
  3. Build the services — set up the three services from 6.2.a and run them.
  4. Interview prep6.2-Interview-Questions.md.
  5. Quick review6.2-Quick-Revision.md.

6.2.a — Independent Services Setup (Q1–Q10)

Q1. Define what makes a microservice "independent." List 5 criteria.

Q2. Compare monorepo vs polyrepo for microservices. Name 3 advantages and 2 disadvantages of each.

Q3. You have a User Service, Order Service, and Notification Service. Draw the project structure for a monorepo that includes separate package.json, .env, and Dockerfile for each.

Q4. Why should each microservice have its own database? What problems arise when two services share a database?

Q5. Write a health check endpoint for a service that verifies both the process and its database connection.

Q6. Code: Create a minimal Express service that runs on port 4004, has a /health endpoint, and a /products GET endpoint returning an in-memory array. Include the package.json.

Q7. Your Docker Compose file has USER_SERVICE_URL=http://localhost:4001 for the order service. This works locally but fails in Docker. Why? What should the URL be?

Q8. Explain the difference between ports and expose in Docker Compose. When do you use each?

Q9. Scenario: You are running 5 services locally with npm run dev in 5 terminal tabs. A colleague says "this doesn't scale." What tools or approaches help?

Q10. What is service discovery? Name 3 approaches and explain when you would use each.


6.2.b — API Gateway Pattern (Q11–Q20)

Q11. List 6 responsibilities of an API gateway. For each, explain why it belongs at the gateway level rather than in individual services.

Q12. Draw the request flow for POST /api/orders through a gateway that performs authentication, rate limiting, and proxying. Label each step.

Q13. Code: Write Express middleware that extracts a JWT from the Authorization header, verifies it, and injects X-User-Id and X-User-Role headers for downstream services.

Q14. Why should internal service-to-service calls bypass the gateway? What happens if Service A calls Service B through the gateway?

Q15. Your gateway rate-limits at 100 requests per 15 minutes per IP. A legitimate user behind a corporate proxy shares an IP with 200 people. They keep hitting the rate limit. How do you fix this?

Q16. Scenario: Your gateway is a single Express server. It handles 500 requests/second. The business expects 50,000 rps by next year. What are your options?

Q17. Explain the difference between expose: ["4001"] and ports: ["4001:4001"] in Docker Compose. Which should the gateway use? Which should backend services use?

Q18. Your API has 20 microservices. A developer adds a new service and must update the gateway routing configuration. What are three ways to make this easier?

Q19. Compare building a custom Express gateway vs using Kong vs using AWS API Gateway. Give a use case for each.

Q20. Code: Write a simple route configuration object that maps /api/users to http://user-service:4001/users and /api/products to http://product-service:4005/products. Write the loop that creates proxy routes from this configuration.


6.2.c — Retry, Timeout & Circuit Breaker (Q21–Q32)

Q21. Name 5 reasons a network call between two microservices can fail that would not happen in a monolith function call.

Q22. Explain the difference between simple retry, exponential backoff, and exponential backoff with jitter. When would you use each?

Q23. Code: Implement a retryWithBackoff function that takes an async function, max retries, and base delay. Use exponential backoff with jitter.

Q24. You set a retry count of 10 with no backoff. The downstream service is overloaded. Explain how your retries make the situation worse. What is this called?

Q25. Which HTTP status codes should you retry on? Which should you NOT retry on? Explain why for each.

Q26. What is the default HTTP timeout in Node.js? Why is relying on this default dangerous?

Q27. Code: Write a function that uses AbortController to make a fetch request with a 3-second timeout.

Q28. Draw the circuit breaker state machine. Label all three states and the transitions between them.

Q29. Scenario: Your circuit breaker has failureThreshold: 5 and resetTimeout: 30000. The downstream service fails 5 times. Walk through what happens for the next 60 seconds of calls.

Q30. What is the bulkhead pattern? Draw a diagram showing how it prevents one failing dependency from consuming all resources.

Q31. Code: Complete this fallback chain: try the primary service, then try a cached value from Redis, then return a default object.

Q32. You have a ResilientHttpClient class that combines retries, timeouts, and a circuit breaker. A junior developer asks "Why not just use plain axios?" Explain the risks of unprotected HTTP calls in a distributed system.

Q33. Scenario: Your order service calls the payment service with a 30-second timeout. The payment service starts taking 25 seconds per request (normally 200ms). Describe the cascade failure that occurs. How would a circuit breaker help?


6.2.d — Event-Driven Architecture (Q34–Q43)

Q34. Explain why a synchronous call from Order Service to Notification Service is problematic. What happens when Notification Service is down?

Q35. Draw the pub/sub pattern with a message broker. Show one publisher and three consumers.

Q36. Define these RabbitMQ terms: exchange, binding, queue, consumer, acknowledgment.

Q37. Compare fanout, direct, and topic exchanges. Give a real-world use case for each.

Q38. Code: Write a RabbitMQ publisher function that publishes a JSON event to a topic exchange with a routing key. Include persistent delivery mode.

Q39. Code: Write a consumer that subscribes to a queue, processes messages, and acknowledges them. Handle processing errors with nack and requeue.

Q40. What is a dead letter queue? Why is it critical for production systems? How do you configure one in RabbitMQ?

Q41. Scenario: Your notification service is down for maintenance for 2 hours. During that time, 50,000 order.placed events are published. What happens to those events? What happens when the service comes back?

Q42. Your team publishes events BEFORE saving to the database. The database write then fails. What problem does this create? What is the correct order?

Q43. Explain the prefetch setting in RabbitMQ. What happens if you set it to 1? What if you set it to 1000?


6.2.e — Event Payloads & Idempotency (Q44–Q55)

Q44. Design an event payload for user.email_verified. Include all required metadata fields. Explain why each field is necessary.

Q45. What is the correct naming convention for events? Explain why order.placed is better than createOrder or ORDER_CREATED.

Q46. Your event schema needs a new field (discountAmount). How do you add it without breaking existing consumers?

Q47. Explain why events can be delivered more than once. Describe three specific scenarios that cause duplicate delivery.

Q48. Code: Implement an idempotency check using Redis. The function should return true if the event has already been processed and false if it is new.

Q49. Code: Write a database INSERT statement that uses ON CONFLICT (event_id) DO NOTHING for idempotent event processing. Explain why this is stronger than a Redis-based check.

Q50. Define eventual consistency. Give a real-world example where it is acceptable and one where it is not.

Q51. Scenario: Events arrive out of order — order.shipped arrives before order.payment_received. How do you handle this? Describe two strategies.

Q52. What is event sourcing? How does it differ from traditional CRUD? When would you use it?

Q53. Your payment service processes order.placed, charges the customer, then crashes before acknowledging the message. The queue redelivers. Without idempotency, what happens? Write the code to prevent it.

Q54. Design the complete event flow for an e-commerce checkout. Starting from order.placed, list all downstream events and which services produce and consume them.

Q55. What are correlationId and causationId? How do they help with debugging in a distributed system?


Answer Hints

QHint
Q4Shared DB = hidden coupling, schema changes break multiple services, can't scale DB independently
Q7In Docker, services reference each other by service name: http://user-service:4001
Q8ports maps host:container (external access), expose makes port available within Docker network only
Q15Rate limit by API key or user ID instead of IP. Use keyGenerator in express-rate-limit
Q16Horizontal scaling (multiple gateway instances behind load balancer), move to Kong/Nginx, or AWS API Gateway
Q24Retry storm / thundering herd. 10 retries x 1000 clients = 10,000 additional requests hitting the overloaded service
Q26Default is ~120 seconds. 120s of blocking per call = thread pool exhaustion = cascade failure
Q29Calls 1-5 fail normally. Call 6: circuit OPEN, rejected immediately. For next 30s: all calls rejected instantly. At 30s: HALF_OPEN, one probe allowed. If probe succeeds: CLOSED. If probe fails: OPEN for another 30s
Q41Events queue up in RabbitMQ (durable queues persist to disk). When service restarts, it consumes backlog.
Q42"Ghost events" — consumers react to something that never actually happened. Always save-then-publish
Q43prefetch=1: process one at a time (slow but safe). prefetch=1000: buffer 1000 unacked messages (fast but risks losing messages if consumer crashes)
Q471) Consumer crashes after processing but before ack. 2) Network drops the ack. 3) Publisher retries after timeout
Q49Database constraint is atomic and durable — survives restarts. Redis TTL might expire, allowing duplicates
Q51Strategy 1: Timestamp comparison (ignore older events). Strategy 2: State machine (only allow valid transitions)

<- Back to 6.2 — Building & Orchestrating Microservices (README)