Episode 6 — Scaling Reliability Microservices Web3 / 6.5 — Scaling Concepts
6.5 -- Scaling Concepts: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps -- reopen
README.md->6.5.a...6.5.c. - Practice --
6.5-Exercise-Questions.md. - Polish answers --
6.5-Interview-Questions.md.
Core Vocabulary
| Term | One-liner |
|---|---|
| Vertical scaling | Upgrade one machine (more CPU, more RAM) -- scale UP |
| Horizontal scaling | Add more machines behind a load balancer -- scale OUT |
| Load balancer | Distributes incoming traffic across multiple backend servers |
| Layer 4 LB | Operates on TCP/UDP packets -- sees IP and port only (NLB) |
| Layer 7 LB | Operates on HTTP requests -- sees URL, headers, cookies (ALB) |
| Health check | Periodic probe to determine if a server is alive and ready |
| Sticky session | LB routes same client to same server (session affinity) -- code smell |
| Stateless | Server stores no per-client data between requests |
| JWT | Self-contained authentication token -- no server-side session needed |
| Read replica | Copy of a database that serves read queries (scales reads) |
| Sharding | Splitting data across multiple databases by a shard key (scales writes) |
| Auto-scaling | Automatically add/remove instances based on metrics |
| Cluster module | Node.js built-in to fork worker processes per CPU core |
| PM2 | Production process manager for Node.js with cluster mode |
| SSL termination | LB handles HTTPS encryption; backends use plain HTTP |
Vertical vs Horizontal Comparison
VERTICAL (Scale Up) HORIZONTAL (Scale Out)
───────────────────── ─────────────────────
Bigger machine More machines
No code changes Must be stateless
Exponential cost curve Linear cost curve
Hard ceiling No ceiling
Single point of failure Fault tolerant (N-1 survive)
Downtime to resize Zero-downtime scaling
Strong consistency (1 DB) Needs distributed patterns
Best for: databases, quick fix Best for: APIs, web servers
| Factor | Vertical | Horizontal |
|---|---|---|
| Cost 2x capacity | ~2.5-3x price | ~2x price |
| Code changes | None | Stateless required |
| Failure impact | Total outage | One instance lost |
| Max capacity | Largest available machine | Unlimited (budget) |
| Scale speed | Minutes (reboot) | Seconds (add to pool) |
Load Balancing Algorithms
ROUND ROBIN → 1, 2, 3, 1, 2, 3... Simple, even, ignores load
WEIGHTED RR → A(3x), B(2x), C(1x) Mixed instance sizes
LEAST CONNECTIONS → Server with fewest Best default for web apps
IP HASH → Hash(IP) % N Built-in sticky sessions
LEAST RESP TIME → Fastest + fewest Smart but complex
Quick decision
Identical servers, uniform requests → Round Robin
Mixed instance sizes → Weighted Round Robin
Variable request durations (default) → Least Connections
Need sticky sessions (avoid if can) → IP Hash
Performance-sensitive, mixed backends → Least Response Time
AWS Load Balancer Cheat Sheet
ALB (Application Load Balancer)
Layer: 7 (HTTP/HTTPS)
Use for: Web apps, REST APIs, gRPC, microservices
Features: Path routing, host routing, SSL termination, WebSocket
NLB (Network Load Balancer)
Layer: 4 (TCP/UDP)
Use for: Gaming, IoT, ultra-low latency, static IP
Features: Static IP, TCP passthrough, millions of packets/sec
CLB (Classic Load Balancer)
Status: LEGACY — do not use for new projects
Health Check Types
LIVENESS (/health) → "Am I running?"
- Fast, no dependency checks
- Returns 200 if process is alive
- Use for: should this process exist?
READINESS (/health/ready) → "Can I serve traffic?"
- Checks DB, Redis, external services
- Returns 503 if any dependency is down
- Use for: should the LB send me traffic?
Stateless Design Checklist
STATE TYPE STATEFUL (BAD) STATELESS (GOOD)
───────────────── ──────────────────────── ─────────────────────
Sessions express-session MemoryStore Redis (connect-redis)
Authentication Server-side session lookup JWT token
File uploads multer({ dest: './uploads'}) S3 + pre-signed URLs
Caching const cache = new Map() Redis / Memcached
Background jobs setInterval(fn, ms) Bull queue (Redis)
Rate limiting In-memory counter Redis-based limiter
WebSocket state Local connection list Redis pub/sub adapter
Config Hardcoded / local file Environment variables
Session Migration Quick Reference
// BEFORE: Stateful (MemoryStore)
app.use(session({ secret: 'x' }));
// AFTER: Stateless (Redis)
const RedisStore = require('connect-redis').default;
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
app.use(session({
store: new RedisStore({ client: redis }),
secret: process.env.SESSION_SECRET,
}));
// Zero route changes needed — just swap the store
JWT vs Redis Sessions
JWT REDIS SESSIONS
───────────── ──────────────
Token on client Session ID on client, data in Redis
No server storage Redis lookup per request
Hard to revoke Easy to revoke (delete key)
Larger payload per request Small cookie (session ID only)
Best for: APIs, mobile, micro Best for: web apps needing logout
Node.js Scaling Progression
1 process on 1 core → 500 rps
cluster module (4 workers) → 2,000 rps
PM2 cluster mode → 2,000 rps (easier ops)
3 instances + ALB → 6,000 rps
Auto-scaling (2-20 instances) → 2,000-40,000 rps
Kubernetes (10-100 pods) → 20,000-200,000 rps
Numbers are illustrative — actual throughput depends on
endpoint complexity, database calls, and payload size.
Database Scaling Progression
1. Bigger instance (vertical) → More CPU, RAM, IOPS
2. Connection pooling → PgBouncer, app-level pooling
3. Redis cache → Eliminate 90%+ of reads
4. Read replicas → 2-5 replicas for remaining reads
5. Sharding (LAST RESORT) → Split data by shard key
Most apps never need step 5.
Auto-Scaling Configuration
Metric: CPU Utilization (most common)
Scale out: > 70% for 2 minutes
Scale in: < 30% for 5 minutes (slower to avoid flapping)
Minimum: 2 instances (always running)
Maximum: 20 instances (cost cap)
Cooldown: 60s out, 300s in
Common Gotchas
| Gotcha | Why It Bites You |
|---|---|
| "We just need a bigger server" | Exponential cost + hard ceiling + single point of failure |
| In-memory sessions + LB | Session lost when request hits a different server |
| Sticky sessions as a "fix" | Uneven load, session loss on failure, cannot auto-scale |
| Local file uploads | Files exist on one server only |
setInterval for jobs | Runs on EVERY server instance (N times instead of once) |
| Rate limiter in memory | Each server counts independently (user gets Nx the limit) |
| Node.js single thread | One process cannot use multiple cores — use cluster/PM2 |
| Sharding too early | Enormous complexity; exhaust caching + read replicas first |
| No health checks | LB sends traffic to dead servers |
| Scale-in too fast | Server removed during traffic spike (flapping) |
Scaling Decision Flowchart
Is the bottleneck CPU/Memory?
├── YES → Can you go stateless?
│ ├── YES → Scale out (horizontal) + auto-scaling
│ └── NO → Scale up (vertical) while refactoring for stateless
└── NO
└── Is the bottleneck the database?
├── Read-heavy? → Add read replicas + Redis cache
├── Write-heavy? → Bigger instance → queue writes → shard (last resort)
└── Connection limit? → Add connection pooling (PgBouncer)
One-liners for Interview Day
- "Scale up means bigger box; scale out means more boxes behind a load balancer."
- "Stateless does not mean no state -- state lives in Redis, S3, and the database, not on the server."
- "Sticky sessions are a symptom of stateful design. Fix the disease, not the symptom."
- "Least connections is the safest default LB algorithm -- it self-adjusts to variable request durations."
- "For Node.js, always use cluster mode or PM2 -- a single process wastes all cores but one."
- "Shard your database only after you have exhausted caching, read replicas, and vertical scaling."
End of 6.5 quick revision.