Episode 6 — Scaling Reliability Microservices Web3 / 6.2 — Building and Orchestrating Microservices
6.2 — Building & Orchestrating Microservices: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps — reopen
README.mdthen6.2.athrough6.2.e. - Practice —
6.2-Exercise-Questions.md. - Polish answers —
6.2-Interview-Questions.md.
Core Vocabulary
| Term | One-liner |
|---|---|
| Microservice | Independently deployable service with its own codebase, database, and process |
| API Gateway | Single entry point for all external traffic — routes, authenticates, rate-limits |
| Service discovery | Mechanism for services to find each other's network addresses dynamically |
| Circuit breaker | Pattern that stops calls to a failing service (CLOSED / OPEN / HALF-OPEN) |
| Exponential backoff | Retry strategy where wait time doubles each attempt (1s, 2s, 4s, 8s) |
| Jitter | Random delay added to backoff to prevent thundering herd |
| Bulkhead | Isolates resources per dependency so one failure does not exhaust all capacity |
| Message queue | Broker that stores messages between publishers and consumers (RabbitMQ, Kafka) |
| Exchange | RabbitMQ component that receives messages and routes to queues |
| Binding | Rule connecting an exchange to a queue (with routing key pattern) |
| Dead letter queue (DLQ) | Queue for messages that fail processing after max retries |
| Idempotency | Processing the same event N times = same result as 1 time |
| Eventual consistency | Data across services becomes consistent after a delay, not instantly |
| Event sourcing | Store every event (not just current state) and rebuild state by replaying |
| Correlation ID | Unique ID that traces a request across multiple services and events |
Service Setup Checklist
For EVERY microservice, ensure:
[ ] Own package.json (separate dependencies)
[ ] Own .env file (separate config, never shared)
[ ] Own port (4001, 4002, 4003, ...)
[ ] Own Dockerfile
[ ] Own database (never share databases!)
[ ] /health endpoint (liveness check)
[ ] /ready endpoint (readiness check — DB connected?)
[ ] Graceful shutdown handler
[ ] Structured logging with service name
API Gateway Pattern
Client
│
▼
┌──────────────────┐
│ API GATEWAY │ :3000
│ 1. Rate limit │
│ 2. Auth (JWT) │
│ 3. Route │
│ 4. Log │
│ 5. Proxy │
└────┬──────┬──────┘
▼ ▼
:4001 :4002 (internal only — not exposed externally)
Gateway responsibilities:
✓ Routing /api/users → user-service:4001/users
✓ Authentication Validate JWT, inject X-User-Id header
✓ Rate limiting 100 req / 15 min per IP or API key
✓ Logging Request ID, timing, status
✓ CORS Cross-origin policies
✓ SSL termination HTTPS at edge, HTTP internally
Key rules:
- Only the gateway is exposed to the internet
- Internal service calls bypass the gateway
- Services trust X-User-Id from gateway (never re-validate JWT)
- Strip X-User-Id from client requests (prevent impersonation)
Retry / Timeout / Circuit Breaker
TIMEOUT:
Internal calls: 3-5 seconds
External APIs: 5-10 seconds
NEVER use default (~120s) — always set explicitly
RETRY:
Simple: retry immediately (bad)
Exp. backoff: 1s, 2s, 4s, 8s (better)
Backoff + jitter: 1.3s, 2.7s, 5.1s (best — prevents thundering herd)
Retry: 500, 502, 503, 504, 429, 408 (transient)
No retry: 400, 401, 403, 404, 409 (client errors)
CIRCUIT BREAKER:
CLOSED → normal, track failures
OPEN → reject all calls instantly (fail fast)
HALF-OPEN → allow 1 probe; succeed = CLOSED, fail = OPEN
failureThreshold: 5 (open after 5 consecutive failures)
resetTimeout: 30000 (try again after 30 seconds)
// Production-ready call pattern:
const result = await breaker.call(() =>
retryWithJitter(() =>
axios.get(url, { timeout: 3000 }),
3, // maxRetries
1000 // baseDelay
)
);
RabbitMQ Basics
ARCHITECTURE:
Producer → Exchange → Binding → Queue → Consumer
EXCHANGE TYPES:
Fanout: Send to ALL bound queues (broadcast)
Direct: Send only if binding key = routing key exactly
Topic: Pattern match (* = one word, # = zero or more)
ACKNOWLEDGMENTS:
ack(msg) → success, remove from queue
nack(msg, false, true) → fail, requeue (retry)
nack(msg, false, false) → fail, send to DLQ
PREFETCH:
prefetch(1) → process one at a time (safe, slow)
prefetch(10) → buffer 10 unacked messages (balanced)
prefetch(100) → fast but risk losing messages on crash
DEAD LETTER QUEUE:
Queue with x-dead-letter-exchange argument
Catches nack'd messages that exceed retry count
Monitor DLQ → alert if messages accumulate
# Docker Compose for RabbitMQ
rabbitmq:
image: rabbitmq:3-management
ports:
- "5672:5672" # AMQP
- "15672:15672" # Management UI (guest/guest)
Event Design Rules
PAYLOAD STRUCTURE:
{
"type": "order.placed", ← entity.action (past tense)
"data": { orderId, userId, ... },← business payload
"metadata": {
"eventId": "evt_abc123", ← unique ID (for idempotency)
"timestamp": "2026-04-...", ← when it happened
"source": "order-service", ← who published it
"version": "2.0", ← schema version
"correlationId": "req_xyz", ← request trace
"causationId": "evt_previous" ← what caused this event
}
}
NAMING: entity.action_past_tense
✓ order.placed, user.created, payment.failed
✗ createOrder, ORDER_CREATED, sendEmail
VERSIONING: only ADD fields, never remove or rename
v1.0: { orderId, totalAmount }
v1.1: { orderId, totalAmount, currency } ← old consumers ignore new fields
PUBLISH ORDER: Save to DB first, THEN publish event
✗ publish → save (risk: "ghost event" if save fails)
✓ save → publish (risk: event lost if publish fails — mitigate with outbox pattern)
Idempotency Strategies
WHY: At-least-once delivery = events can be delivered multiple times
Consumer crashes before ack → redeliver
Network drops ack → redeliver
Publisher retries → duplicate messages
STRATEGIES (weakest → strongest):
1. Redis event ID check (fast, simple)
if (await redis.exists(`processed:${eventId}`)) return;
await processEvent(event);
await redis.set(`processed:${eventId}`, '1', 'EX', 86400);
2. Database unique constraint (strongest)
INSERT INTO payments (order_id, amount, event_id)
VALUES ($1, $2, $3)
-- unique constraint on event_id prevents duplicates
3. Upsert / ON CONFLICT DO NOTHING
INSERT INTO orders (id, ...) VALUES ($1, ...)
ON CONFLICT (id) DO NOTHING
4. Conditional state update
UPDATE orders SET status = 'shipped'
WHERE id = $1 AND status = 'placed'
-- only transitions from placed → shipped; safe to call multiple times
Common Gotchas
| Gotcha | Why |
|---|---|
| Shared database between services | Creates hidden coupling; defeats microservices purpose |
| No timeout on HTTP calls | 120s default = thread pool exhaustion = cascade failure |
| Retry without backoff | Overwhelms the already-failing service (retry storm) |
| Retry on 400/404 | Request is wrong; retrying won't fix it |
| Auth in every service | Duplicated logic; update one, forget others |
| Gateway for internal calls | Unnecessary hop; adds latency |
| Publish event before DB write | Risk of "ghost events" — announcing something that never happened |
| No idempotency | Duplicate events = duplicate charges, emails, records |
| No DLQ | Failed messages lost forever; no way to recover |
| Events named as commands | createOrder is a command, not an event; use order.placed |
| Removing event fields | Breaks existing consumers; only add, never remove |
| No correlation ID | Impossible to trace a request across 10 services |
Architecture Cheat Sheet
┌──────────────────────────────────────────────────────────┐
│ EXTERNAL traffic: │
│ Client → Gateway → Service │
│ │
│ INTERNAL sync calls (need response): │
│ Service A → ResilientClient → Service B │
│ (timeout + retry + circuit breaker + fallback) │
│ │
│ INTERNAL async (fire-and-forget): │
│ Service A → publish event → RabbitMQ → Service B │
│ (idempotent consumer + DLQ + monitoring) │
│ │
│ RULE: Use sync when you NEED the response. │
│ Use async when the action can happen EVENTUALLY. │
└──────────────────────────────────────────────────────────┘
Key Numbers
| What | Value |
|---|---|
| Internal HTTP timeout | 3-5 seconds |
| External API timeout | 5-10 seconds |
| Retry count | 2-3 attempts |
| Circuit breaker threshold | 3-5 consecutive failures |
| Circuit breaker reset timeout | 15-60 seconds |
| RabbitMQ prefetch | 10-50 messages |
| Idempotency TTL (Redis) | 24 hours |
| DLQ alert threshold | > 0 messages |
End of 6.2 quick revision.