Episode 6 — Scaling Reliability Microservices Web3 / 6.5 — Scaling Concepts

6.5 -- Scaling Concepts: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps -- reopen README.md -> 6.5.a...6.5.c.
Practice -- 6.5-Exercise-Questions.md.
Polish answers -- 6.5-Interview-Questions.md.

Core Vocabulary

Term	One-liner
Vertical scaling	Upgrade one machine (more CPU, more RAM) -- scale UP
Horizontal scaling	Add more machines behind a load balancer -- scale OUT
Load balancer	Distributes incoming traffic across multiple backend servers
Layer 4 LB	Operates on TCP/UDP packets -- sees IP and port only (NLB)
Layer 7 LB	Operates on HTTP requests -- sees URL, headers, cookies (ALB)
Health check	Periodic probe to determine if a server is alive and ready
Sticky session	LB routes same client to same server (session affinity) -- code smell
Stateless	Server stores no per-client data between requests
JWT	Self-contained authentication token -- no server-side session needed
Read replica	Copy of a database that serves read queries (scales reads)
Sharding	Splitting data across multiple databases by a shard key (scales writes)
Auto-scaling	Automatically add/remove instances based on metrics
Cluster module	Node.js built-in to fork worker processes per CPU core
PM2	Production process manager for Node.js with cluster mode
SSL termination	LB handles HTTPS encryption; backends use plain HTTP

Vertical vs Horizontal Comparison

VERTICAL (Scale Up)              HORIZONTAL (Scale Out)
─────────────────────            ─────────────────────
Bigger machine                   More machines
No code changes                  Must be stateless
Exponential cost curve           Linear cost curve
Hard ceiling                     No ceiling
Single point of failure          Fault tolerant (N-1 survive)
Downtime to resize               Zero-downtime scaling
Strong consistency (1 DB)        Needs distributed patterns
Best for: databases, quick fix   Best for: APIs, web servers

Factor	Vertical	Horizontal
Cost 2x capacity	~2.5-3x price	~2x price
Code changes	None	Stateless required
Failure impact	Total outage	One instance lost
Max capacity	Largest available machine	Unlimited (budget)
Scale speed	Minutes (reboot)	Seconds (add to pool)

Load Balancing Algorithms

ROUND ROBIN       → 1, 2, 3, 1, 2, 3...   Simple, even, ignores load
WEIGHTED RR       → A(3x), B(2x), C(1x)    Mixed instance sizes
LEAST CONNECTIONS → Server with fewest       Best default for web apps
IP HASH           → Hash(IP) % N             Built-in sticky sessions
LEAST RESP TIME   → Fastest + fewest         Smart but complex

Quick decision

Identical servers, uniform requests    → Round Robin
Mixed instance sizes                   → Weighted Round Robin
Variable request durations (default)   → Least Connections
Need sticky sessions (avoid if can)    → IP Hash
Performance-sensitive, mixed backends  → Least Response Time

AWS Load Balancer Cheat Sheet

ALB (Application Load Balancer)
  Layer: 7 (HTTP/HTTPS)
  Use for: Web apps, REST APIs, gRPC, microservices
  Features: Path routing, host routing, SSL termination, WebSocket

NLB (Network Load Balancer)
  Layer: 4 (TCP/UDP)
  Use for: Gaming, IoT, ultra-low latency, static IP
  Features: Static IP, TCP passthrough, millions of packets/sec

CLB (Classic Load Balancer)
  Status: LEGACY — do not use for new projects

Health Check Types

LIVENESS  (/health)        → "Am I running?"
  - Fast, no dependency checks
  - Returns 200 if process is alive
  - Use for: should this process exist?

READINESS (/health/ready)  → "Can I serve traffic?"
  - Checks DB, Redis, external services
  - Returns 503 if any dependency is down
  - Use for: should the LB send me traffic?

Stateless Design Checklist

STATE TYPE         STATEFUL (BAD)              STATELESS (GOOD)
─────────────────  ────────────────────────    ─────────────────────
Sessions           express-session MemoryStore  Redis (connect-redis)
Authentication     Server-side session lookup   JWT token
File uploads       multer({ dest: './uploads'}) S3 + pre-signed URLs
Caching            const cache = new Map()      Redis / Memcached
Background jobs    setInterval(fn, ms)          Bull queue (Redis)
Rate limiting      In-memory counter            Redis-based limiter
WebSocket state    Local connection list        Redis pub/sub adapter
Config             Hardcoded / local file       Environment variables

Session Migration Quick Reference

// BEFORE: Stateful (MemoryStore)
app.use(session({ secret: 'x' }));

// AFTER: Stateless (Redis)
const RedisStore = require('connect-redis').default;
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
app.use(session({
  store: new RedisStore({ client: redis }),
  secret: process.env.SESSION_SECRET,
}));
// Zero route changes needed — just swap the store

JWT vs Redis Sessions

JWT                              REDIS SESSIONS
─────────────                    ──────────────
Token on client                  Session ID on client, data in Redis
No server storage                Redis lookup per request
Hard to revoke                   Easy to revoke (delete key)
Larger payload per request       Small cookie (session ID only)
Best for: APIs, mobile, micro    Best for: web apps needing logout

Node.js Scaling Progression

1 process on 1 core              → 500 rps
cluster module (4 workers)        → 2,000 rps
PM2 cluster mode                  → 2,000 rps (easier ops)
3 instances + ALB                 → 6,000 rps
Auto-scaling (2-20 instances)     → 2,000-40,000 rps
Kubernetes (10-100 pods)          → 20,000-200,000 rps

Numbers are illustrative — actual throughput depends on
endpoint complexity, database calls, and payload size.

Database Scaling Progression

1. Bigger instance (vertical)     → More CPU, RAM, IOPS
2. Connection pooling             → PgBouncer, app-level pooling
3. Redis cache                    → Eliminate 90%+ of reads
4. Read replicas                  → 2-5 replicas for remaining reads
5. Sharding (LAST RESORT)        → Split data by shard key

Most apps never need step 5.

Auto-Scaling Configuration

Metric:     CPU Utilization (most common)
Scale out:  > 70% for 2 minutes
Scale in:   < 30% for 5 minutes (slower to avoid flapping)
Minimum:    2 instances (always running)
Maximum:    20 instances (cost cap)
Cooldown:   60s out, 300s in

Common Gotchas

Gotcha	Why It Bites You
"We just need a bigger server"	Exponential cost + hard ceiling + single point of failure
In-memory sessions + LB	Session lost when request hits a different server
Sticky sessions as a "fix"	Uneven load, session loss on failure, cannot auto-scale
Local file uploads	Files exist on one server only
`setInterval` for jobs	Runs on EVERY server instance (N times instead of once)
Rate limiter in memory	Each server counts independently (user gets Nx the limit)
Node.js single thread	One process cannot use multiple cores — use cluster/PM2
Sharding too early	Enormous complexity; exhaust caching + read replicas first
No health checks	LB sends traffic to dead servers
Scale-in too fast	Server removed during traffic spike (flapping)

Scaling Decision Flowchart

Is the bottleneck CPU/Memory?
  ├── YES → Can you go stateless?
  │         ├── YES → Scale out (horizontal) + auto-scaling
  │         └── NO  → Scale up (vertical) while refactoring for stateless
  └── NO
      └── Is the bottleneck the database?
          ├── Read-heavy? → Add read replicas + Redis cache
          ├── Write-heavy? → Bigger instance → queue writes → shard (last resort)
          └── Connection limit? → Add connection pooling (PgBouncer)

One-liners for Interview Day

"Scale up means bigger box; scale out means more boxes behind a load balancer."
"Stateless does not mean no state -- state lives in Redis, S3, and the database, not on the server."
"Sticky sessions are a symptom of stateful design. Fix the disease, not the symptom."
"Least connections is the safest default LB algorithm -- it self-adjusts to variable request durations."
"For Node.js, always use cluster mode or PM2 -- a single process wastes all cores but one."
"Shard your database only after you have exhausted caching, read replicas, and vertical scaling."

End of 6.5 quick revision.