Episode 6 — Scaling Reliability Microservices Web3 / 6.2 — Building and Orchestrating Microservices

6.2 — Building & Orchestrating Microservices: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps — reopen README.md then 6.2.a through 6.2.e.
Practice — 6.2-Exercise-Questions.md.
Polish answers — 6.2-Interview-Questions.md.

Core Vocabulary

Term	One-liner
Microservice	Independently deployable service with its own codebase, database, and process
API Gateway	Single entry point for all external traffic — routes, authenticates, rate-limits
Service discovery	Mechanism for services to find each other's network addresses dynamically
Circuit breaker	Pattern that stops calls to a failing service (CLOSED / OPEN / HALF-OPEN)
Exponential backoff	Retry strategy where wait time doubles each attempt (1s, 2s, 4s, 8s)
Jitter	Random delay added to backoff to prevent thundering herd
Bulkhead	Isolates resources per dependency so one failure does not exhaust all capacity
Message queue	Broker that stores messages between publishers and consumers (RabbitMQ, Kafka)
Exchange	RabbitMQ component that receives messages and routes to queues
Binding	Rule connecting an exchange to a queue (with routing key pattern)
Dead letter queue (DLQ)	Queue for messages that fail processing after max retries
Idempotency	Processing the same event N times = same result as 1 time
Eventual consistency	Data across services becomes consistent after a delay, not instantly
Event sourcing	Store every event (not just current state) and rebuild state by replaying
Correlation ID	Unique ID that traces a request across multiple services and events

Service Setup Checklist

For EVERY microservice, ensure:
  [ ] Own package.json (separate dependencies)
  [ ] Own .env file (separate config, never shared)
  [ ] Own port (4001, 4002, 4003, ...)
  [ ] Own Dockerfile
  [ ] Own database (never share databases!)
  [ ] /health endpoint (liveness check)
  [ ] /ready endpoint (readiness check — DB connected?)
  [ ] Graceful shutdown handler
  [ ] Structured logging with service name

API Gateway Pattern

Client
  │
  ▼
┌──────────────────┐
│   API GATEWAY    │  :3000
│   1. Rate limit  │
│   2. Auth (JWT)  │
│   3. Route       │
│   4. Log         │
│   5. Proxy       │
└────┬──────┬──────┘
     ▼      ▼
  :4001   :4002    (internal only — not exposed externally)

Gateway responsibilities:
  ✓ Routing          /api/users → user-service:4001/users
  ✓ Authentication   Validate JWT, inject X-User-Id header
  ✓ Rate limiting    100 req / 15 min per IP or API key
  ✓ Logging          Request ID, timing, status
  ✓ CORS             Cross-origin policies
  ✓ SSL termination  HTTPS at edge, HTTP internally

Key rules:
  - Only the gateway is exposed to the internet
  - Internal service calls bypass the gateway
  - Services trust X-User-Id from gateway (never re-validate JWT)
  - Strip X-User-Id from client requests (prevent impersonation)

Retry / Timeout / Circuit Breaker

TIMEOUT:
  Internal calls:   3-5 seconds
  External APIs:    5-10 seconds
  NEVER use default (~120s) — always set explicitly

RETRY:
  Simple:           retry immediately (bad)
  Exp. backoff:     1s, 2s, 4s, 8s (better)
  Backoff + jitter: 1.3s, 2.7s, 5.1s (best — prevents thundering herd)

  Retry:    500, 502, 503, 504, 429, 408 (transient)
  No retry: 400, 401, 403, 404, 409     (client errors)

CIRCUIT BREAKER:
  CLOSED     → normal, track failures
  OPEN       → reject all calls instantly (fail fast)
  HALF-OPEN  → allow 1 probe; succeed = CLOSED, fail = OPEN

  failureThreshold: 5      (open after 5 consecutive failures)
  resetTimeout:     30000  (try again after 30 seconds)

// Production-ready call pattern:
const result = await breaker.call(() =>
  retryWithJitter(() =>
    axios.get(url, { timeout: 3000 }),
    3,    // maxRetries
    1000  // baseDelay
  )
);

RabbitMQ Basics

ARCHITECTURE:
  Producer → Exchange → Binding → Queue → Consumer

EXCHANGE TYPES:
  Fanout:  Send to ALL bound queues (broadcast)
  Direct:  Send only if binding key = routing key exactly
  Topic:   Pattern match (* = one word, # = zero or more)

ACKNOWLEDGMENTS:
  ack(msg)              → success, remove from queue
  nack(msg, false, true)  → fail, requeue (retry)
  nack(msg, false, false) → fail, send to DLQ

PREFETCH:
  prefetch(1)   → process one at a time (safe, slow)
  prefetch(10)  → buffer 10 unacked messages (balanced)
  prefetch(100) → fast but risk losing messages on crash

DEAD LETTER QUEUE:
  Queue with x-dead-letter-exchange argument
  Catches nack'd messages that exceed retry count
  Monitor DLQ → alert if messages accumulate

# Docker Compose for RabbitMQ
rabbitmq:
  image: rabbitmq:3-management
  ports:
    - "5672:5672"     # AMQP
    - "15672:15672"   # Management UI (guest/guest)

Event Design Rules

PAYLOAD STRUCTURE:
{
  "type":     "order.placed",          ← entity.action (past tense)
  "data":     { orderId, userId, ... },← business payload
  "metadata": {
    "eventId":       "evt_abc123",     ← unique ID (for idempotency)
    "timestamp":     "2026-04-...",    ← when it happened
    "source":        "order-service",  ← who published it
    "version":       "2.0",           ← schema version
    "correlationId": "req_xyz",        ← request trace
    "causationId":   "evt_previous"    ← what caused this event
  }
}

NAMING: entity.action_past_tense
  ✓ order.placed, user.created, payment.failed
  ✗ createOrder, ORDER_CREATED, sendEmail

VERSIONING: only ADD fields, never remove or rename
  v1.0: { orderId, totalAmount }
  v1.1: { orderId, totalAmount, currency }  ← old consumers ignore new fields

PUBLISH ORDER: Save to DB first, THEN publish event
  ✗ publish → save (risk: "ghost event" if save fails)
  ✓ save → publish (risk: event lost if publish fails — mitigate with outbox pattern)

Idempotency Strategies

WHY: At-least-once delivery = events can be delivered multiple times
     Consumer crashes before ack → redeliver
     Network drops ack → redeliver
     Publisher retries → duplicate messages

STRATEGIES (weakest → strongest):

1. Redis event ID check (fast, simple)
   if (await redis.exists(`processed:${eventId}`)) return;
   await processEvent(event);
   await redis.set(`processed:${eventId}`, '1', 'EX', 86400);

2. Database unique constraint (strongest)
   INSERT INTO payments (order_id, amount, event_id)
   VALUES ($1, $2, $3)
   -- unique constraint on event_id prevents duplicates

3. Upsert / ON CONFLICT DO NOTHING
   INSERT INTO orders (id, ...) VALUES ($1, ...)
   ON CONFLICT (id) DO NOTHING

4. Conditional state update
   UPDATE orders SET status = 'shipped'
   WHERE id = $1 AND status = 'placed'
   -- only transitions from placed → shipped; safe to call multiple times

Common Gotchas

Gotcha	Why
Shared database between services	Creates hidden coupling; defeats microservices purpose
No timeout on HTTP calls	120s default = thread pool exhaustion = cascade failure
Retry without backoff	Overwhelms the already-failing service (retry storm)
Retry on 400/404	Request is wrong; retrying won't fix it
Auth in every service	Duplicated logic; update one, forget others
Gateway for internal calls	Unnecessary hop; adds latency
Publish event before DB write	Risk of "ghost events" — announcing something that never happened
No idempotency	Duplicate events = duplicate charges, emails, records
No DLQ	Failed messages lost forever; no way to recover
Events named as commands	`createOrder` is a command, not an event; use `order.placed`
Removing event fields	Breaks existing consumers; only add, never remove
No correlation ID	Impossible to trace a request across 10 services

Architecture Cheat Sheet

┌──────────────────────────────────────────────────────────┐
│  EXTERNAL traffic:                                        │
│    Client → Gateway → Service                             │
│                                                           │
│  INTERNAL sync calls (need response):                     │
│    Service A → ResilientClient → Service B                │
│    (timeout + retry + circuit breaker + fallback)         │
│                                                           │
│  INTERNAL async (fire-and-forget):                        │
│    Service A → publish event → RabbitMQ → Service B       │
│    (idempotent consumer + DLQ + monitoring)               │
│                                                           │
│  RULE: Use sync when you NEED the response.               │
│        Use async when the action can happen EVENTUALLY.   │
└──────────────────────────────────────────────────────────┘

Key Numbers

What	Value
Internal HTTP timeout	3-5 seconds
External API timeout	5-10 seconds
Retry count	2-3 attempts
Circuit breaker threshold	3-5 consecutive failures
Circuit breaker reset timeout	15-60 seconds
RabbitMQ prefetch	10-50 messages
Idempotency TTL (Redis)	24 hours
DLQ alert threshold	> 0 messages

End of 6.2 quick revision.