Episode 6 — Scaling Reliability Microservices Web3 / 6.5 — Scaling Concepts

6.5.b -- Load Balancers

In one sentence: A load balancer is a traffic cop that sits between clients and your servers, distributing incoming requests across multiple instances so no single server gets overwhelmed -- and quietly removing unhealthy servers from the pool.

Navigation: <- 6.5.a -- Vertical vs Horizontal Scaling | 6.5.c -- Stateless Design ->

1. What Does a Load Balancer Do?

A load balancer accepts incoming network traffic and distributes it across a group of backend servers (called a target group, upstream, or backend pool).

Without a load balancer:              With a load balancer:

Client ──────> Server                 Client
                                        │
(If Server dies,                   ┌────┴────┐
 everything is down)               │  Load   │
                                   │ Balancer │
                                   └────┬────┘
                                   ┌────┼────┐
                                   ▼    ▼    ▼
                                 Srv1 Srv2 Srv3

                                 (If Srv2 dies, LB stops
                                  sending traffic to it.
                                  Srv1 and Srv3 continue.)

Core responsibilities

Traffic distribution -- spread requests across healthy servers.
Health checking -- periodically ping servers; remove unhealthy ones from the pool.
SSL termination -- handle HTTPS encryption/decryption so backend servers deal with plain HTTP.
Connection management -- manage TCP connections, keepalives, and timeouts.
High availability -- the LB itself is typically redundant (active-passive or active-active).

2. Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI network model. The two that matter for web applications are Layer 4 (Transport) and Layer 7 (Application).

┌──────────────────────────────────────────────────┐
│                  OSI Layers                        │
│                                                    │
│  Layer 7 (Application)   HTTP, HTTPS, WebSocket   │  ← ALB, Nginx, HAProxy
│  Layer 6 (Presentation)  SSL/TLS                  │
│  Layer 5 (Session)       Connection state          │
│  Layer 4 (Transport)     TCP, UDP                  │  ← NLB, bare TCP LB
│  Layer 3 (Network)       IP                        │
│  Layer 2 (Data Link)     Ethernet                  │
│  Layer 1 (Physical)      Wire, fiber               │
└──────────────────────────────────────────────────┘

Layer 4 load balancing

Operates on TCP/UDP packets -- sees IP addresses and port numbers only.
Does not inspect HTTP headers, URLs, cookies, or request bodies.
Extremely fast -- minimal processing per packet.
Cannot route based on URL path (e.g., /api vs /static).
Use case: raw TCP traffic, database connections, gaming servers, any non-HTTP protocol.

Layer 7 load balancing

Operates on HTTP/HTTPS requests -- sees the full request: URL, headers, cookies, body.
Can route based on URL path (/api -> API servers, /images -> CDN).
Can route based on HTTP headers (e.g., Authorization, Accept-Language).
Can insert/modify headers (e.g., X-Forwarded-For with the client's real IP).
Can do content-based routing, A/B testing, canary deployments.
Slightly slower than L4 due to HTTP parsing, but the flexibility is worth it for web apps.

Comparison table

Feature	Layer 4 (Transport)	Layer 7 (Application)
Sees	IP + Port	Full HTTP request
Speed	Fastest	Fast (small overhead for parsing)
URL-based routing	No	Yes
Cookie/header routing	No	Yes
SSL termination	Pass-through only	Yes (offloads SSL from backends)
WebSocket support	Yes (TCP pass-through)	Yes (upgrade-aware)
Health checks	TCP connect (port open?)	HTTP GET (status 200?)
AWS service	NLB (Network Load Balancer)	ALB (Application Load Balancer)
Best for	Raw TCP, ultra-low latency	Web apps, APIs, microservices

3. Load Balancing Algorithms

The algorithm determines which server receives the next request.

3.1 Round Robin

The simplest algorithm. Requests go to servers in order: 1, 2, 3, 1, 2, 3, ...

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  (back to start)
Request 5 → Server B
...

Pros: Simple, even distribution when servers are identical. Cons: Ignores server load. If Server A is handling a slow database query, it still gets the next request.

3.2 Weighted Round Robin

Like round robin, but servers with more capacity get more requests.

Server A (weight: 3) → gets 3 out of every 6 requests
Server B (weight: 2) → gets 2 out of every 6 requests
Server C (weight: 1) → gets 1 out of every 6 requests

Sequence: A, A, A, B, B, C, A, A, A, B, B, C, ...

Use case: Mixed instance sizes. A m5.xlarge (weight 4) should get 4x the traffic of a t3.small (weight 1).

3.3 Least Connections

Send each new request to the server with the fewest active connections.

Server A: 12 active connections
Server B:  3 active connections  ← next request goes here
Server C:  8 active connections

Pros: Adapts to slow requests automatically. If Server A is handling long-running WebSocket connections, it naturally gets fewer new requests. Cons: Slightly more overhead to track connection counts. Best for: Applications with variable request durations (file uploads, WebSocket, streaming).

3.4 IP Hash

Hash the client's IP address to determine the server. The same client IP always goes to the same server.

// Simplified IP hash logic
function getServer(clientIP, servers) {
  const hash = hashFunction(clientIP);
  const index = hash % servers.length;
  return servers[index];
}

// Client 192.168.1.1 → always Server B
// Client 192.168.1.2 → always Server C
// Client 192.168.1.3 → always Server A

Pros: Built-in session affinity without cookies or tokens. Cons: Uneven distribution if many clients share an IP (corporate NAT). If a server is removed, many clients get rehashed to different servers (cache misses). Best for: Simple sticky sessions when you cannot change the application.

3.5 Least Response Time

Send requests to the server with the fastest average response time combined with fewest connections. This is the "smartest" algorithm.

Pros: Dynamically adapts to server performance. Cons: Requires constant latency measurement. May cause thundering herd toward a newly-fast server.

Algorithm comparison

Algorithm	Complexity	Best When	Watch Out For
Round Robin	Low	Identical servers, uniform requests	Ignores server load
Weighted RR	Low	Mixed instance sizes	Weights need manual tuning
Least Connections	Medium	Variable request durations	Slight tracking overhead
IP Hash	Low	Need sticky sessions	Uneven with shared IPs
Least Response Time	High	Mixed performance servers	Thundering herd risk

4. Health Checks

A load balancer must know which servers are healthy. It does this by periodically sending health check requests.

Load Balancer                Backend Servers
     │                            │
     │── GET /health ──────────> Srv1 → 200 OK ✓
     │── GET /health ──────────> Srv2 → 200 OK ✓
     │── GET /health ──────────> Srv3 → TIMEOUT  ✗
     │                            │
     │   Srv3 marked unhealthy   │
     │   Traffic routed to        │
     │   Srv1 and Srv2 only       │
     │                            │
     │── GET /health ──────────> Srv3 → 200 OK ✓  (recovered)
     │   Srv3 marked healthy      │
     │   Traffic includes Srv3    │

Implementing a health check endpoint in Express

const express = require('express');
const mongoose = require('mongoose');
const Redis = require('ioredis');

const app = express();
const redis = new Redis(process.env.REDIS_URL);

// Simple health check — just "I'm alive"
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'ok' });
});

// Deep health check — verify dependencies
app.get('/health/ready', async (req, res) => {
  const checks = {};

  // Check MongoDB connection
  try {
    await mongoose.connection.db.admin().ping();
    checks.mongodb = 'ok';
  } catch (err) {
    checks.mongodb = 'failed';
  }

  // Check Redis connection
  try {
    await redis.ping();
    checks.redis = 'ok';
  } catch (err) {
    checks.redis = 'failed';
  }

  // Check memory usage (fail if > 90% of limit)
  const memUsage = process.memoryUsage();
  const memLimit = 512 * 1024 * 1024; // 512 MB
  checks.memory = memUsage.heapUsed < memLimit * 0.9 ? 'ok' : 'warning';

  const allOk = Object.values(checks).every((v) => v === 'ok');
  res.status(allOk ? 200 : 503).json({
    status: allOk ? 'healthy' : 'degraded',
    checks,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

Health check configuration

Parameter	Typical Value	Purpose
Interval	10-30 seconds	How often to check
Timeout	5 seconds	How long to wait for a response
Healthy threshold	2-3 consecutive successes	Passes needed to mark healthy
Unhealthy threshold	2-3 consecutive failures	Failures needed to mark unhealthy
Path	`/health` or `/health/ready`	Endpoint to check

Important: The basic /health endpoint (liveness) should be fast and dependency-free. The deep /health/ready endpoint (readiness) checks all dependencies. Use liveness for "should this process be running?" and readiness for "should this server receive traffic?"

5. Session Affinity (Sticky Sessions)

Session affinity (sticky sessions) ensures that all requests from the same client go to the same server.

Without sticky sessions:           With sticky sessions:

Req 1 (Alice) → Server A          Req 1 (Alice) → Server A
Req 2 (Alice) → Server B  ✗       Req 2 (Alice) → Server A  ✓
Req 3 (Alice) → Server C  ✗       Req 3 (Alice) → Server A  ✓
(Session data is on A only!)       (Always goes to A)

How sticky sessions work

Cookie-based: The LB sets a cookie (e.g., AWSALB=server-a-id) on the first response. Subsequent requests include this cookie, and the LB routes to the same server.
IP-based: The LB hashes the client IP (same as IP hash algorithm).
Application cookie: The application sets a cookie; the LB reads it to route.

Why sticky sessions are a code smell

Sticky sessions exist because the application is stateful -- it stores session data in memory. This is a temporary band-aid, not a solution:

Uneven load -- one server may accumulate more active sessions, while others are idle.
Failover breaks sessions -- if Server A dies, all users stuck to it lose their sessions.
Scaling is impaired -- new servers receive no existing sessions; old servers carry all the load.

The real solution: Make the application stateless (see 6.5.c) and eliminate the need for sticky sessions entirely.

When sticky sessions are acceptable

Legacy applications that cannot be refactored in the short term.
WebSocket connections that maintain long-lived state on a specific server.
In-memory caches where re-computing is expensive (but a shared cache like Redis is better).

6. AWS Load Balancer Comparison

Feature	ALB (Application)	NLB (Network)	CLB (Classic)
OSI Layer	Layer 7	Layer 4	Layer 4 + 7
Protocols	HTTP, HTTPS, gRPC	TCP, UDP, TLS	HTTP, HTTPS, TCP
Routing	Path, host, header, query	Port only	Basic
WebSocket	Native support	TCP pass-through	Not supported
SSL termination	Yes	Optional (TLS pass-through)	Yes
Sticky sessions	Cookie-based	No	Cookie-based
Static IP	No (use Global Accelerator)	Yes	No
Performance	Millions req/sec	Millions packets/sec	Lower
Price	~$0.0225/hour + LCU	~$0.0225/hour + NLCU	~$0.025/hour
Use case	Web apps, APIs, microservices	Gaming, IoT, ultra-low latency	Legacy -- do not use for new projects

Decision guide

Is it HTTP/HTTPS traffic?
  ├── YES → Use ALB
  │         Need path-based routing? → ALB
  │         Need gRPC? → ALB
  │         Need WebSocket? → ALB
  └── NO  → Use NLB
            Need static IP? → NLB
            Need ultra-low latency? → NLB
            Need TCP/UDP? → NLB

7. Nginx as a Load Balancer

Nginx is the most popular open-source load balancer / reverse proxy. It handles Layer 7 load balancing with excellent performance.

Basic load balancing configuration

# /etc/nginx/conf.d/api-loadbalancer.conf

# Define the backend server pool
upstream api_servers {
    # Round robin (default)
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://api_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }
}

Weighted round robin

upstream api_servers {
    server 10.0.1.10:3000 weight=3;   # Gets 3x traffic (bigger machine)
    server 10.0.1.11:3000 weight=2;   # Gets 2x traffic
    server 10.0.1.12:3000 weight=1;   # Gets 1x traffic (smallest)
}

Least connections

upstream api_servers {
    least_conn;
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}

IP hash (sticky sessions)

upstream api_servers {
    ip_hash;
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}

Health checks (passive)

upstream api_servers {
    server 10.0.1.10:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:3000 max_fails=3 fail_timeout=30s;

    # max_fails=3    → Mark unhealthy after 3 failed requests
    # fail_timeout=30s → Try again after 30 seconds
}

Path-based routing (microservices)

# Route different paths to different service pools
upstream user_service {
    server 10.0.2.10:3001;
    server 10.0.2.11:3001;
}

upstream order_service {
    server 10.0.3.10:3002;
    server 10.0.3.11:3002;
}

upstream static_files {
    server 10.0.4.10:8080;
}

server {
    listen 80;
    server_name app.example.com;

    location /api/users {
        proxy_pass http://user_service;
    }

    location /api/orders {
        proxy_pass http://order_service;
    }

    location /static {
        proxy_pass http://static_files;
    }
}

8. HAProxy

HAProxy is another widely-used open-source load balancer, especially popular for Layer 4 (TCP) balancing and high-throughput scenarios. It is used by GitHub, Stack Overflow, and Reddit.

# Minimal HAProxy configuration
frontend http_front
    bind *:80
    default_backend api_servers

backend api_servers
    balance roundrobin
    option httpchk GET /health
    server srv1 10.0.1.10:3000 check
    server srv2 10.0.1.11:3000 check
    server srv3 10.0.1.12:3000 check

Nginx vs HAProxy: Both are excellent. Nginx is more commonly used as a combined web server + reverse proxy + load balancer. HAProxy is a dedicated load balancer and excels at raw TCP performance. For most Node.js applications, either works well.

9. DNS-Based Load Balancing

DNS load balancing distributes traffic by returning different IP addresses for the same domain name.

Query: api.example.com
Response (round-robin DNS):
  api.example.com → 52.1.1.1    (Server A, US-East)
  api.example.com → 54.2.2.2    (Server B, US-West)
  api.example.com → 13.3.3.3    (Server C, EU-West)

Client 1 resolves → 52.1.1.1
Client 2 resolves → 54.2.2.2
Client 3 resolves → 13.3.3.3
Client 4 resolves → 52.1.1.1 (back to start)

Limitations

Slow failover -- DNS records are cached (TTL). If a server dies, clients may continue connecting to it for minutes or hours.
No health checks -- basic DNS does not know if a server is healthy.
Uneven distribution -- clients cache DNS responses; one IP may receive disproportionate traffic.
No request awareness -- DNS cannot distribute based on server load or URL path.

When to use DNS load balancing

Global distribution -- route users to the nearest region (geolocation-based DNS).
Disaster recovery -- failover from one region to another by updating DNS.
As a first layer -- DNS distributes across regions; each region has its own ALB/NLB for local balancing.

10. Global Load Balancing

For applications serving users worldwide, you need global load balancing that routes users to the closest healthy region.

AWS Route 53 (DNS-based global routing)

Route 53 routing policies:

1. Latency-based routing:
   User in Tokyo     → ap-northeast-1 (lowest latency)
   User in London    → eu-west-1 (lowest latency)
   User in New York  → us-east-1 (lowest latency)

2. Geolocation routing:
   User in EU        → eu-west-1 (comply with GDPR)
   User in US        → us-east-1
   Default           → us-east-1

3. Failover routing:
   Primary: us-east-1
   Secondary: us-west-2 (if primary health check fails)

4. Weighted routing:
   90% → us-east-1 (production)
   10% → us-east-2 (canary deployment)

AWS CloudFront (CDN + edge routing)

CloudFront is a CDN that also acts as a global load balancer:

User in Sydney
    │
    ▼
CloudFront Edge (Sydney)
    │
    ├── Static content (HTML, CSS, JS, images) → served from edge cache
    │
    └── API requests → routed to nearest origin:
        ├── Origin Group A: ALB in ap-southeast-2 (primary)
        └── Origin Group B: ALB in us-west-2 (failover)

Multi-tier load balancing architecture

┌─────────────────────────────────────────────────────────┐
│                   Global Layer                           │
│                                                          │
│    Route 53 (DNS)  ──>  CloudFront (CDN/Edge)           │
│                                                          │
├─────────────────────┬───────────────────────────────────┤
│    US-East Region   │         EU-West Region            │
│                     │                                    │
│    ALB              │         ALB                        │
│    ┌──┐ ┌──┐ ┌──┐  │         ┌──┐ ┌──┐ ┌──┐           │
│    │S1│ │S2│ │S3│   │         │S4│ │S5│ │S6│            │
│    └──┘ └──┘ └──┘  │         └──┘ └──┘ └──┘           │
│         │           │              │                     │
│    RDS Primary      │         RDS Read Replica           │
│                     │                                    │
└─────────────────────┴───────────────────────────────────┘

11. Key Takeaways

Load balancers are required for horizontal scaling -- you cannot have multiple servers without something to distribute traffic.
Layer 7 (ALB) for web apps, Layer 4 (NLB) for raw TCP -- most Node.js applications want an ALB.
Health checks are non-negotiable -- without them, the LB sends traffic to dead servers.
Least connections is the safest default -- it adapts to variable request durations automatically.
Sticky sessions are a sign of stateful design -- fix the root cause (make the app stateless) rather than relying on sticky sessions.
Global load balancing combines DNS + CDN + regional LBs -- Route 53 for DNS routing, CloudFront for edge caching, ALB for regional distribution.
Nginx is the Swiss Army knife -- reverse proxy, load balancer, static file server, and SSL terminator in one process.

Explain-It Challenge

A colleague says "we don't need a load balancer -- we only have one server." Why should they set one up anyway? (Hint: think about what happens during deployments and failures.)
Explain to a non-technical product manager why switching from sticky sessions to stateless design will improve reliability.
You have a microservices architecture with three services: user-service, order-service, and image-service (handles large file uploads). Which load balancing algorithm would you use for each, and why?

Navigation: <- 6.5.a -- Vertical vs Horizontal Scaling | 6.5.c -- Stateless Design ->