Episode 6 — Scaling Reliability Microservices Web3 / 6.2 — Building and Orchestrating Microservices

6.2 — Building & Orchestrating Microservices: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

  1. Skim before labs or interviews.
  2. Drill gaps — reopen README.md then 6.2.a through 6.2.e.
  3. Practice6.2-Exercise-Questions.md.
  4. Polish answers6.2-Interview-Questions.md.

Core Vocabulary

TermOne-liner
MicroserviceIndependently deployable service with its own codebase, database, and process
API GatewaySingle entry point for all external traffic — routes, authenticates, rate-limits
Service discoveryMechanism for services to find each other's network addresses dynamically
Circuit breakerPattern that stops calls to a failing service (CLOSED / OPEN / HALF-OPEN)
Exponential backoffRetry strategy where wait time doubles each attempt (1s, 2s, 4s, 8s)
JitterRandom delay added to backoff to prevent thundering herd
BulkheadIsolates resources per dependency so one failure does not exhaust all capacity
Message queueBroker that stores messages between publishers and consumers (RabbitMQ, Kafka)
ExchangeRabbitMQ component that receives messages and routes to queues
BindingRule connecting an exchange to a queue (with routing key pattern)
Dead letter queue (DLQ)Queue for messages that fail processing after max retries
IdempotencyProcessing the same event N times = same result as 1 time
Eventual consistencyData across services becomes consistent after a delay, not instantly
Event sourcingStore every event (not just current state) and rebuild state by replaying
Correlation IDUnique ID that traces a request across multiple services and events

Service Setup Checklist

For EVERY microservice, ensure:
  [ ] Own package.json (separate dependencies)
  [ ] Own .env file (separate config, never shared)
  [ ] Own port (4001, 4002, 4003, ...)
  [ ] Own Dockerfile
  [ ] Own database (never share databases!)
  [ ] /health endpoint (liveness check)
  [ ] /ready endpoint (readiness check — DB connected?)
  [ ] Graceful shutdown handler
  [ ] Structured logging with service name

API Gateway Pattern

Client
  │
  ▼
┌──────────────────┐
│   API GATEWAY    │  :3000
│   1. Rate limit  │
│   2. Auth (JWT)  │
│   3. Route       │
│   4. Log         │
│   5. Proxy       │
└────┬──────┬──────┘
     ▼      ▼
  :4001   :4002    (internal only — not exposed externally)

Gateway responsibilities:
  ✓ Routing          /api/users → user-service:4001/users
  ✓ Authentication   Validate JWT, inject X-User-Id header
  ✓ Rate limiting    100 req / 15 min per IP or API key
  ✓ Logging          Request ID, timing, status
  ✓ CORS             Cross-origin policies
  ✓ SSL termination  HTTPS at edge, HTTP internally

Key rules:
  - Only the gateway is exposed to the internet
  - Internal service calls bypass the gateway
  - Services trust X-User-Id from gateway (never re-validate JWT)
  - Strip X-User-Id from client requests (prevent impersonation)

Retry / Timeout / Circuit Breaker

TIMEOUT:
  Internal calls:   3-5 seconds
  External APIs:    5-10 seconds
  NEVER use default (~120s) — always set explicitly

RETRY:
  Simple:           retry immediately (bad)
  Exp. backoff:     1s, 2s, 4s, 8s (better)
  Backoff + jitter: 1.3s, 2.7s, 5.1s (best — prevents thundering herd)

  Retry:    500, 502, 503, 504, 429, 408 (transient)
  No retry: 400, 401, 403, 404, 409     (client errors)

CIRCUIT BREAKER:
  CLOSED     → normal, track failures
  OPEN       → reject all calls instantly (fail fast)
  HALF-OPEN  → allow 1 probe; succeed = CLOSED, fail = OPEN

  failureThreshold: 5      (open after 5 consecutive failures)
  resetTimeout:     30000  (try again after 30 seconds)
// Production-ready call pattern:
const result = await breaker.call(() =>
  retryWithJitter(() =>
    axios.get(url, { timeout: 3000 }),
    3,    // maxRetries
    1000  // baseDelay
  )
);

RabbitMQ Basics

ARCHITECTURE:
  Producer → Exchange → Binding → Queue → Consumer

EXCHANGE TYPES:
  Fanout:  Send to ALL bound queues (broadcast)
  Direct:  Send only if binding key = routing key exactly
  Topic:   Pattern match (* = one word, # = zero or more)

ACKNOWLEDGMENTS:
  ack(msg)              → success, remove from queue
  nack(msg, false, true)  → fail, requeue (retry)
  nack(msg, false, false) → fail, send to DLQ

PREFETCH:
  prefetch(1)   → process one at a time (safe, slow)
  prefetch(10)  → buffer 10 unacked messages (balanced)
  prefetch(100) → fast but risk losing messages on crash

DEAD LETTER QUEUE:
  Queue with x-dead-letter-exchange argument
  Catches nack'd messages that exceed retry count
  Monitor DLQ → alert if messages accumulate
# Docker Compose for RabbitMQ
rabbitmq:
  image: rabbitmq:3-management
  ports:
    - "5672:5672"     # AMQP
    - "15672:15672"   # Management UI (guest/guest)

Event Design Rules

PAYLOAD STRUCTURE:
{
  "type":     "order.placed",          ← entity.action (past tense)
  "data":     { orderId, userId, ... },← business payload
  "metadata": {
    "eventId":       "evt_abc123",     ← unique ID (for idempotency)
    "timestamp":     "2026-04-...",    ← when it happened
    "source":        "order-service",  ← who published it
    "version":       "2.0",           ← schema version
    "correlationId": "req_xyz",        ← request trace
    "causationId":   "evt_previous"    ← what caused this event
  }
}

NAMING: entity.action_past_tense
  ✓ order.placed, user.created, payment.failed
  ✗ createOrder, ORDER_CREATED, sendEmail

VERSIONING: only ADD fields, never remove or rename
  v1.0: { orderId, totalAmount }
  v1.1: { orderId, totalAmount, currency }  ← old consumers ignore new fields

PUBLISH ORDER: Save to DB first, THEN publish event
  ✗ publish → save (risk: "ghost event" if save fails)
  ✓ save → publish (risk: event lost if publish fails — mitigate with outbox pattern)

Idempotency Strategies

WHY: At-least-once delivery = events can be delivered multiple times
     Consumer crashes before ack → redeliver
     Network drops ack → redeliver
     Publisher retries → duplicate messages

STRATEGIES (weakest → strongest):

1. Redis event ID check (fast, simple)
   if (await redis.exists(`processed:${eventId}`)) return;
   await processEvent(event);
   await redis.set(`processed:${eventId}`, '1', 'EX', 86400);

2. Database unique constraint (strongest)
   INSERT INTO payments (order_id, amount, event_id)
   VALUES ($1, $2, $3)
   -- unique constraint on event_id prevents duplicates

3. Upsert / ON CONFLICT DO NOTHING
   INSERT INTO orders (id, ...) VALUES ($1, ...)
   ON CONFLICT (id) DO NOTHING

4. Conditional state update
   UPDATE orders SET status = 'shipped'
   WHERE id = $1 AND status = 'placed'
   -- only transitions from placed → shipped; safe to call multiple times

Common Gotchas

GotchaWhy
Shared database between servicesCreates hidden coupling; defeats microservices purpose
No timeout on HTTP calls120s default = thread pool exhaustion = cascade failure
Retry without backoffOverwhelms the already-failing service (retry storm)
Retry on 400/404Request is wrong; retrying won't fix it
Auth in every serviceDuplicated logic; update one, forget others
Gateway for internal callsUnnecessary hop; adds latency
Publish event before DB writeRisk of "ghost events" — announcing something that never happened
No idempotencyDuplicate events = duplicate charges, emails, records
No DLQFailed messages lost forever; no way to recover
Events named as commandscreateOrder is a command, not an event; use order.placed
Removing event fieldsBreaks existing consumers; only add, never remove
No correlation IDImpossible to trace a request across 10 services

Architecture Cheat Sheet

┌──────────────────────────────────────────────────────────┐
│  EXTERNAL traffic:                                        │
│    Client → Gateway → Service                             │
│                                                           │
│  INTERNAL sync calls (need response):                     │
│    Service A → ResilientClient → Service B                │
│    (timeout + retry + circuit breaker + fallback)         │
│                                                           │
│  INTERNAL async (fire-and-forget):                        │
│    Service A → publish event → RabbitMQ → Service B       │
│    (idempotent consumer + DLQ + monitoring)               │
│                                                           │
│  RULE: Use sync when you NEED the response.               │
│        Use async when the action can happen EVENTUALLY.   │
└──────────────────────────────────────────────────────────┘

Key Numbers

WhatValue
Internal HTTP timeout3-5 seconds
External API timeout5-10 seconds
Retry count2-3 attempts
Circuit breaker threshold3-5 consecutive failures
Circuit breaker reset timeout15-60 seconds
RabbitMQ prefetch10-50 messages
Idempotency TTL (Redis)24 hours
DLQ alert threshold> 0 messages

End of 6.2 quick revision.