Episode 6 — Scaling Reliability Microservices Web3 / 6.5 — Scaling Concepts
6.5.b -- Load Balancers
In one sentence: A load balancer is a traffic cop that sits between clients and your servers, distributing incoming requests across multiple instances so no single server gets overwhelmed -- and quietly removing unhealthy servers from the pool.
Navigation: <- 6.5.a -- Vertical vs Horizontal Scaling | 6.5.c -- Stateless Design ->
1. What Does a Load Balancer Do?
A load balancer accepts incoming network traffic and distributes it across a group of backend servers (called a target group, upstream, or backend pool).
Without a load balancer: With a load balancer:
Client ──────> Server Client
│
(If Server dies, ┌────┴────┐
everything is down) │ Load │
│ Balancer │
└────┬────┘
┌────┼────┐
▼ ▼ ▼
Srv1 Srv2 Srv3
(If Srv2 dies, LB stops
sending traffic to it.
Srv1 and Srv3 continue.)
Core responsibilities
- Traffic distribution -- spread requests across healthy servers.
- Health checking -- periodically ping servers; remove unhealthy ones from the pool.
- SSL termination -- handle HTTPS encryption/decryption so backend servers deal with plain HTTP.
- Connection management -- manage TCP connections, keepalives, and timeouts.
- High availability -- the LB itself is typically redundant (active-passive or active-active).
2. Layer 4 vs Layer 7 Load Balancing
Load balancers operate at different layers of the OSI network model. The two that matter for web applications are Layer 4 (Transport) and Layer 7 (Application).
┌──────────────────────────────────────────────────┐
│ OSI Layers │
│ │
│ Layer 7 (Application) HTTP, HTTPS, WebSocket │ ← ALB, Nginx, HAProxy
│ Layer 6 (Presentation) SSL/TLS │
│ Layer 5 (Session) Connection state │
│ Layer 4 (Transport) TCP, UDP │ ← NLB, bare TCP LB
│ Layer 3 (Network) IP │
│ Layer 2 (Data Link) Ethernet │
│ Layer 1 (Physical) Wire, fiber │
└──────────────────────────────────────────────────┘
Layer 4 load balancing
- Operates on TCP/UDP packets -- sees IP addresses and port numbers only.
- Does not inspect HTTP headers, URLs, cookies, or request bodies.
- Extremely fast -- minimal processing per packet.
- Cannot route based on URL path (e.g.,
/apivs/static). - Use case: raw TCP traffic, database connections, gaming servers, any non-HTTP protocol.
Layer 7 load balancing
- Operates on HTTP/HTTPS requests -- sees the full request: URL, headers, cookies, body.
- Can route based on URL path (
/api-> API servers,/images-> CDN). - Can route based on HTTP headers (e.g.,
Authorization,Accept-Language). - Can insert/modify headers (e.g.,
X-Forwarded-Forwith the client's real IP). - Can do content-based routing, A/B testing, canary deployments.
- Slightly slower than L4 due to HTTP parsing, but the flexibility is worth it for web apps.
Comparison table
| Feature | Layer 4 (Transport) | Layer 7 (Application) |
|---|---|---|
| Sees | IP + Port | Full HTTP request |
| Speed | Fastest | Fast (small overhead for parsing) |
| URL-based routing | No | Yes |
| Cookie/header routing | No | Yes |
| SSL termination | Pass-through only | Yes (offloads SSL from backends) |
| WebSocket support | Yes (TCP pass-through) | Yes (upgrade-aware) |
| Health checks | TCP connect (port open?) | HTTP GET (status 200?) |
| AWS service | NLB (Network Load Balancer) | ALB (Application Load Balancer) |
| Best for | Raw TCP, ultra-low latency | Web apps, APIs, microservices |
3. Load Balancing Algorithms
The algorithm determines which server receives the next request.
3.1 Round Robin
The simplest algorithm. Requests go to servers in order: 1, 2, 3, 1, 2, 3, ...
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (back to start)
Request 5 → Server B
...
Pros: Simple, even distribution when servers are identical. Cons: Ignores server load. If Server A is handling a slow database query, it still gets the next request.
3.2 Weighted Round Robin
Like round robin, but servers with more capacity get more requests.
Server A (weight: 3) → gets 3 out of every 6 requests
Server B (weight: 2) → gets 2 out of every 6 requests
Server C (weight: 1) → gets 1 out of every 6 requests
Sequence: A, A, A, B, B, C, A, A, A, B, B, C, ...
Use case: Mixed instance sizes. A m5.xlarge (weight 4) should get 4x the traffic of a t3.small (weight 1).
3.3 Least Connections
Send each new request to the server with the fewest active connections.
Server A: 12 active connections
Server B: 3 active connections ← next request goes here
Server C: 8 active connections
Pros: Adapts to slow requests automatically. If Server A is handling long-running WebSocket connections, it naturally gets fewer new requests. Cons: Slightly more overhead to track connection counts. Best for: Applications with variable request durations (file uploads, WebSocket, streaming).
3.4 IP Hash
Hash the client's IP address to determine the server. The same client IP always goes to the same server.
// Simplified IP hash logic
function getServer(clientIP, servers) {
const hash = hashFunction(clientIP);
const index = hash % servers.length;
return servers[index];
}
// Client 192.168.1.1 → always Server B
// Client 192.168.1.2 → always Server C
// Client 192.168.1.3 → always Server A
Pros: Built-in session affinity without cookies or tokens. Cons: Uneven distribution if many clients share an IP (corporate NAT). If a server is removed, many clients get rehashed to different servers (cache misses). Best for: Simple sticky sessions when you cannot change the application.
3.5 Least Response Time
Send requests to the server with the fastest average response time combined with fewest connections. This is the "smartest" algorithm.
Pros: Dynamically adapts to server performance. Cons: Requires constant latency measurement. May cause thundering herd toward a newly-fast server.
Algorithm comparison
| Algorithm | Complexity | Best When | Watch Out For |
|---|---|---|---|
| Round Robin | Low | Identical servers, uniform requests | Ignores server load |
| Weighted RR | Low | Mixed instance sizes | Weights need manual tuning |
| Least Connections | Medium | Variable request durations | Slight tracking overhead |
| IP Hash | Low | Need sticky sessions | Uneven with shared IPs |
| Least Response Time | High | Mixed performance servers | Thundering herd risk |
4. Health Checks
A load balancer must know which servers are healthy. It does this by periodically sending health check requests.
Load Balancer Backend Servers
│ │
│── GET /health ──────────> Srv1 → 200 OK ✓
│── GET /health ──────────> Srv2 → 200 OK ✓
│── GET /health ──────────> Srv3 → TIMEOUT ✗
│ │
│ Srv3 marked unhealthy │
│ Traffic routed to │
│ Srv1 and Srv2 only │
│ │
│── GET /health ──────────> Srv3 → 200 OK ✓ (recovered)
│ Srv3 marked healthy │
│ Traffic includes Srv3 │
Implementing a health check endpoint in Express
const express = require('express');
const mongoose = require('mongoose');
const Redis = require('ioredis');
const app = express();
const redis = new Redis(process.env.REDIS_URL);
// Simple health check — just "I'm alive"
app.get('/health', (req, res) => {
res.status(200).json({ status: 'ok' });
});
// Deep health check — verify dependencies
app.get('/health/ready', async (req, res) => {
const checks = {};
// Check MongoDB connection
try {
await mongoose.connection.db.admin().ping();
checks.mongodb = 'ok';
} catch (err) {
checks.mongodb = 'failed';
}
// Check Redis connection
try {
await redis.ping();
checks.redis = 'ok';
} catch (err) {
checks.redis = 'failed';
}
// Check memory usage (fail if > 90% of limit)
const memUsage = process.memoryUsage();
const memLimit = 512 * 1024 * 1024; // 512 MB
checks.memory = memUsage.heapUsed < memLimit * 0.9 ? 'ok' : 'warning';
const allOk = Object.values(checks).every((v) => v === 'ok');
res.status(allOk ? 200 : 503).json({
status: allOk ? 'healthy' : 'degraded',
checks,
uptime: process.uptime(),
timestamp: new Date().toISOString(),
});
});
Health check configuration
| Parameter | Typical Value | Purpose |
|---|---|---|
| Interval | 10-30 seconds | How often to check |
| Timeout | 5 seconds | How long to wait for a response |
| Healthy threshold | 2-3 consecutive successes | Passes needed to mark healthy |
| Unhealthy threshold | 2-3 consecutive failures | Failures needed to mark unhealthy |
| Path | /health or /health/ready | Endpoint to check |
Important: The basic /health endpoint (liveness) should be fast and dependency-free. The deep /health/ready endpoint (readiness) checks all dependencies. Use liveness for "should this process be running?" and readiness for "should this server receive traffic?"
5. Session Affinity (Sticky Sessions)
Session affinity (sticky sessions) ensures that all requests from the same client go to the same server.
Without sticky sessions: With sticky sessions:
Req 1 (Alice) → Server A Req 1 (Alice) → Server A
Req 2 (Alice) → Server B ✗ Req 2 (Alice) → Server A ✓
Req 3 (Alice) → Server C ✗ Req 3 (Alice) → Server A ✓
(Session data is on A only!) (Always goes to A)
How sticky sessions work
- Cookie-based: The LB sets a cookie (e.g.,
AWSALB=server-a-id) on the first response. Subsequent requests include this cookie, and the LB routes to the same server. - IP-based: The LB hashes the client IP (same as IP hash algorithm).
- Application cookie: The application sets a cookie; the LB reads it to route.
Why sticky sessions are a code smell
Sticky sessions exist because the application is stateful -- it stores session data in memory. This is a temporary band-aid, not a solution:
- Uneven load -- one server may accumulate more active sessions, while others are idle.
- Failover breaks sessions -- if Server A dies, all users stuck to it lose their sessions.
- Scaling is impaired -- new servers receive no existing sessions; old servers carry all the load.
The real solution: Make the application stateless (see 6.5.c) and eliminate the need for sticky sessions entirely.
When sticky sessions are acceptable
- Legacy applications that cannot be refactored in the short term.
- WebSocket connections that maintain long-lived state on a specific server.
- In-memory caches where re-computing is expensive (but a shared cache like Redis is better).
6. AWS Load Balancer Comparison
| Feature | ALB (Application) | NLB (Network) | CLB (Classic) |
|---|---|---|---|
| OSI Layer | Layer 7 | Layer 4 | Layer 4 + 7 |
| Protocols | HTTP, HTTPS, gRPC | TCP, UDP, TLS | HTTP, HTTPS, TCP |
| Routing | Path, host, header, query | Port only | Basic |
| WebSocket | Native support | TCP pass-through | Not supported |
| SSL termination | Yes | Optional (TLS pass-through) | Yes |
| Sticky sessions | Cookie-based | No | Cookie-based |
| Static IP | No (use Global Accelerator) | Yes | No |
| Performance | Millions req/sec | Millions packets/sec | Lower |
| Price | ~$0.0225/hour + LCU | ~$0.0225/hour + NLCU | ~$0.025/hour |
| Use case | Web apps, APIs, microservices | Gaming, IoT, ultra-low latency | Legacy -- do not use for new projects |
Decision guide
Is it HTTP/HTTPS traffic?
├── YES → Use ALB
│ Need path-based routing? → ALB
│ Need gRPC? → ALB
│ Need WebSocket? → ALB
└── NO → Use NLB
Need static IP? → NLB
Need ultra-low latency? → NLB
Need TCP/UDP? → NLB
7. Nginx as a Load Balancer
Nginx is the most popular open-source load balancer / reverse proxy. It handles Layer 7 load balancing with excellent performance.
Basic load balancing configuration
# /etc/nginx/conf.d/api-loadbalancer.conf
# Define the backend server pool
upstream api_servers {
# Round robin (default)
server 10.0.1.10:3000;
server 10.0.1.11:3000;
server 10.0.1.12:3000;
}
server {
listen 80;
server_name api.example.com;
location / {
proxy_pass http://api_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}
Weighted round robin
upstream api_servers {
server 10.0.1.10:3000 weight=3; # Gets 3x traffic (bigger machine)
server 10.0.1.11:3000 weight=2; # Gets 2x traffic
server 10.0.1.12:3000 weight=1; # Gets 1x traffic (smallest)
}
Least connections
upstream api_servers {
least_conn;
server 10.0.1.10:3000;
server 10.0.1.11:3000;
server 10.0.1.12:3000;
}
IP hash (sticky sessions)
upstream api_servers {
ip_hash;
server 10.0.1.10:3000;
server 10.0.1.11:3000;
server 10.0.1.12:3000;
}
Health checks (passive)
upstream api_servers {
server 10.0.1.10:3000 max_fails=3 fail_timeout=30s;
server 10.0.1.11:3000 max_fails=3 fail_timeout=30s;
server 10.0.1.12:3000 max_fails=3 fail_timeout=30s;
# max_fails=3 → Mark unhealthy after 3 failed requests
# fail_timeout=30s → Try again after 30 seconds
}
Path-based routing (microservices)
# Route different paths to different service pools
upstream user_service {
server 10.0.2.10:3001;
server 10.0.2.11:3001;
}
upstream order_service {
server 10.0.3.10:3002;
server 10.0.3.11:3002;
}
upstream static_files {
server 10.0.4.10:8080;
}
server {
listen 80;
server_name app.example.com;
location /api/users {
proxy_pass http://user_service;
}
location /api/orders {
proxy_pass http://order_service;
}
location /static {
proxy_pass http://static_files;
}
}
8. HAProxy
HAProxy is another widely-used open-source load balancer, especially popular for Layer 4 (TCP) balancing and high-throughput scenarios. It is used by GitHub, Stack Overflow, and Reddit.
# Minimal HAProxy configuration
frontend http_front
bind *:80
default_backend api_servers
backend api_servers
balance roundrobin
option httpchk GET /health
server srv1 10.0.1.10:3000 check
server srv2 10.0.1.11:3000 check
server srv3 10.0.1.12:3000 check
Nginx vs HAProxy: Both are excellent. Nginx is more commonly used as a combined web server + reverse proxy + load balancer. HAProxy is a dedicated load balancer and excels at raw TCP performance. For most Node.js applications, either works well.
9. DNS-Based Load Balancing
DNS load balancing distributes traffic by returning different IP addresses for the same domain name.
Query: api.example.com
Response (round-robin DNS):
api.example.com → 52.1.1.1 (Server A, US-East)
api.example.com → 54.2.2.2 (Server B, US-West)
api.example.com → 13.3.3.3 (Server C, EU-West)
Client 1 resolves → 52.1.1.1
Client 2 resolves → 54.2.2.2
Client 3 resolves → 13.3.3.3
Client 4 resolves → 52.1.1.1 (back to start)
Limitations
- Slow failover -- DNS records are cached (TTL). If a server dies, clients may continue connecting to it for minutes or hours.
- No health checks -- basic DNS does not know if a server is healthy.
- Uneven distribution -- clients cache DNS responses; one IP may receive disproportionate traffic.
- No request awareness -- DNS cannot distribute based on server load or URL path.
When to use DNS load balancing
- Global distribution -- route users to the nearest region (geolocation-based DNS).
- Disaster recovery -- failover from one region to another by updating DNS.
- As a first layer -- DNS distributes across regions; each region has its own ALB/NLB for local balancing.
10. Global Load Balancing
For applications serving users worldwide, you need global load balancing that routes users to the closest healthy region.
AWS Route 53 (DNS-based global routing)
Route 53 routing policies:
1. Latency-based routing:
User in Tokyo → ap-northeast-1 (lowest latency)
User in London → eu-west-1 (lowest latency)
User in New York → us-east-1 (lowest latency)
2. Geolocation routing:
User in EU → eu-west-1 (comply with GDPR)
User in US → us-east-1
Default → us-east-1
3. Failover routing:
Primary: us-east-1
Secondary: us-west-2 (if primary health check fails)
4. Weighted routing:
90% → us-east-1 (production)
10% → us-east-2 (canary deployment)
AWS CloudFront (CDN + edge routing)
CloudFront is a CDN that also acts as a global load balancer:
User in Sydney
│
▼
CloudFront Edge (Sydney)
│
├── Static content (HTML, CSS, JS, images) → served from edge cache
│
└── API requests → routed to nearest origin:
├── Origin Group A: ALB in ap-southeast-2 (primary)
└── Origin Group B: ALB in us-west-2 (failover)
Multi-tier load balancing architecture
┌─────────────────────────────────────────────────────────┐
│ Global Layer │
│ │
│ Route 53 (DNS) ──> CloudFront (CDN/Edge) │
│ │
├─────────────────────┬───────────────────────────────────┤
│ US-East Region │ EU-West Region │
│ │ │
│ ALB │ ALB │
│ ┌──┐ ┌──┐ ┌──┐ │ ┌──┐ ┌──┐ ┌──┐ │
│ │S1│ │S2│ │S3│ │ │S4│ │S5│ │S6│ │
│ └──┘ └──┘ └──┘ │ └──┘ └──┘ └──┘ │
│ │ │ │ │
│ RDS Primary │ RDS Read Replica │
│ │ │
└─────────────────────┴───────────────────────────────────┘
11. Key Takeaways
- Load balancers are required for horizontal scaling -- you cannot have multiple servers without something to distribute traffic.
- Layer 7 (ALB) for web apps, Layer 4 (NLB) for raw TCP -- most Node.js applications want an ALB.
- Health checks are non-negotiable -- without them, the LB sends traffic to dead servers.
- Least connections is the safest default -- it adapts to variable request durations automatically.
- Sticky sessions are a sign of stateful design -- fix the root cause (make the app stateless) rather than relying on sticky sessions.
- Global load balancing combines DNS + CDN + regional LBs -- Route 53 for DNS routing, CloudFront for edge caching, ALB for regional distribution.
- Nginx is the Swiss Army knife -- reverse proxy, load balancer, static file server, and SSL terminator in one process.
Explain-It Challenge
- A colleague says "we don't need a load balancer -- we only have one server." Why should they set one up anyway? (Hint: think about what happens during deployments and failures.)
- Explain to a non-technical product manager why switching from sticky sessions to stateless design will improve reliability.
- You have a microservices architecture with three services: user-service, order-service, and image-service (handles large file uploads). Which load balancing algorithm would you use for each, and why?
Navigation: <- 6.5.a -- Vertical vs Horizontal Scaling | 6.5.c -- Stateless Design ->