Episode 6 — Scaling Reliability Microservices Web3 / 6.5 — Scaling Concepts

6.5.b -- Load Balancers

In one sentence: A load balancer is a traffic cop that sits between clients and your servers, distributing incoming requests across multiple instances so no single server gets overwhelmed -- and quietly removing unhealthy servers from the pool.

Navigation: <- 6.5.a -- Vertical vs Horizontal Scaling | 6.5.c -- Stateless Design ->


1. What Does a Load Balancer Do?

A load balancer accepts incoming network traffic and distributes it across a group of backend servers (called a target group, upstream, or backend pool).

Without a load balancer:              With a load balancer:

Client ──────> Server                 Client
                                        │
(If Server dies,                   ┌────┴────┐
 everything is down)               │  Load   │
                                   │ Balancer │
                                   └────┬────┘
                                   ┌────┼────┐
                                   ▼    ▼    ▼
                                 Srv1 Srv2 Srv3

                                 (If Srv2 dies, LB stops
                                  sending traffic to it.
                                  Srv1 and Srv3 continue.)

Core responsibilities

  1. Traffic distribution -- spread requests across healthy servers.
  2. Health checking -- periodically ping servers; remove unhealthy ones from the pool.
  3. SSL termination -- handle HTTPS encryption/decryption so backend servers deal with plain HTTP.
  4. Connection management -- manage TCP connections, keepalives, and timeouts.
  5. High availability -- the LB itself is typically redundant (active-passive or active-active).

2. Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI network model. The two that matter for web applications are Layer 4 (Transport) and Layer 7 (Application).

┌──────────────────────────────────────────────────┐
│                  OSI Layers                        │
│                                                    │
│  Layer 7 (Application)   HTTP, HTTPS, WebSocket   │  ← ALB, Nginx, HAProxy
│  Layer 6 (Presentation)  SSL/TLS                  │
│  Layer 5 (Session)       Connection state          │
│  Layer 4 (Transport)     TCP, UDP                  │  ← NLB, bare TCP LB
│  Layer 3 (Network)       IP                        │
│  Layer 2 (Data Link)     Ethernet                  │
│  Layer 1 (Physical)      Wire, fiber               │
└──────────────────────────────────────────────────┘

Layer 4 load balancing

  • Operates on TCP/UDP packets -- sees IP addresses and port numbers only.
  • Does not inspect HTTP headers, URLs, cookies, or request bodies.
  • Extremely fast -- minimal processing per packet.
  • Cannot route based on URL path (e.g., /api vs /static).
  • Use case: raw TCP traffic, database connections, gaming servers, any non-HTTP protocol.

Layer 7 load balancing

  • Operates on HTTP/HTTPS requests -- sees the full request: URL, headers, cookies, body.
  • Can route based on URL path (/api -> API servers, /images -> CDN).
  • Can route based on HTTP headers (e.g., Authorization, Accept-Language).
  • Can insert/modify headers (e.g., X-Forwarded-For with the client's real IP).
  • Can do content-based routing, A/B testing, canary deployments.
  • Slightly slower than L4 due to HTTP parsing, but the flexibility is worth it for web apps.

Comparison table

FeatureLayer 4 (Transport)Layer 7 (Application)
SeesIP + PortFull HTTP request
SpeedFastestFast (small overhead for parsing)
URL-based routingNoYes
Cookie/header routingNoYes
SSL terminationPass-through onlyYes (offloads SSL from backends)
WebSocket supportYes (TCP pass-through)Yes (upgrade-aware)
Health checksTCP connect (port open?)HTTP GET (status 200?)
AWS serviceNLB (Network Load Balancer)ALB (Application Load Balancer)
Best forRaw TCP, ultra-low latencyWeb apps, APIs, microservices

3. Load Balancing Algorithms

The algorithm determines which server receives the next request.

3.1 Round Robin

The simplest algorithm. Requests go to servers in order: 1, 2, 3, 1, 2, 3, ...

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  (back to start)
Request 5 → Server B
...

Pros: Simple, even distribution when servers are identical. Cons: Ignores server load. If Server A is handling a slow database query, it still gets the next request.

3.2 Weighted Round Robin

Like round robin, but servers with more capacity get more requests.

Server A (weight: 3) → gets 3 out of every 6 requests
Server B (weight: 2) → gets 2 out of every 6 requests
Server C (weight: 1) → gets 1 out of every 6 requests

Sequence: A, A, A, B, B, C, A, A, A, B, B, C, ...

Use case: Mixed instance sizes. A m5.xlarge (weight 4) should get 4x the traffic of a t3.small (weight 1).

3.3 Least Connections

Send each new request to the server with the fewest active connections.

Server A: 12 active connections
Server B:  3 active connections  ← next request goes here
Server C:  8 active connections

Pros: Adapts to slow requests automatically. If Server A is handling long-running WebSocket connections, it naturally gets fewer new requests. Cons: Slightly more overhead to track connection counts. Best for: Applications with variable request durations (file uploads, WebSocket, streaming).

3.4 IP Hash

Hash the client's IP address to determine the server. The same client IP always goes to the same server.

// Simplified IP hash logic
function getServer(clientIP, servers) {
  const hash = hashFunction(clientIP);
  const index = hash % servers.length;
  return servers[index];
}

// Client 192.168.1.1 → always Server B
// Client 192.168.1.2 → always Server C
// Client 192.168.1.3 → always Server A

Pros: Built-in session affinity without cookies or tokens. Cons: Uneven distribution if many clients share an IP (corporate NAT). If a server is removed, many clients get rehashed to different servers (cache misses). Best for: Simple sticky sessions when you cannot change the application.

3.5 Least Response Time

Send requests to the server with the fastest average response time combined with fewest connections. This is the "smartest" algorithm.

Pros: Dynamically adapts to server performance. Cons: Requires constant latency measurement. May cause thundering herd toward a newly-fast server.

Algorithm comparison

AlgorithmComplexityBest WhenWatch Out For
Round RobinLowIdentical servers, uniform requestsIgnores server load
Weighted RRLowMixed instance sizesWeights need manual tuning
Least ConnectionsMediumVariable request durationsSlight tracking overhead
IP HashLowNeed sticky sessionsUneven with shared IPs
Least Response TimeHighMixed performance serversThundering herd risk

4. Health Checks

A load balancer must know which servers are healthy. It does this by periodically sending health check requests.

Load Balancer                Backend Servers
     │                            │
     │── GET /health ──────────> Srv1 → 200 OK ✓
     │── GET /health ──────────> Srv2 → 200 OK ✓
     │── GET /health ──────────> Srv3 → TIMEOUT  ✗
     │                            │
     │   Srv3 marked unhealthy   │
     │   Traffic routed to        │
     │   Srv1 and Srv2 only       │
     │                            │
     │── GET /health ──────────> Srv3 → 200 OK ✓  (recovered)
     │   Srv3 marked healthy      │
     │   Traffic includes Srv3    │

Implementing a health check endpoint in Express

const express = require('express');
const mongoose = require('mongoose');
const Redis = require('ioredis');

const app = express();
const redis = new Redis(process.env.REDIS_URL);

// Simple health check — just "I'm alive"
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'ok' });
});

// Deep health check — verify dependencies
app.get('/health/ready', async (req, res) => {
  const checks = {};

  // Check MongoDB connection
  try {
    await mongoose.connection.db.admin().ping();
    checks.mongodb = 'ok';
  } catch (err) {
    checks.mongodb = 'failed';
  }

  // Check Redis connection
  try {
    await redis.ping();
    checks.redis = 'ok';
  } catch (err) {
    checks.redis = 'failed';
  }

  // Check memory usage (fail if > 90% of limit)
  const memUsage = process.memoryUsage();
  const memLimit = 512 * 1024 * 1024; // 512 MB
  checks.memory = memUsage.heapUsed < memLimit * 0.9 ? 'ok' : 'warning';

  const allOk = Object.values(checks).every((v) => v === 'ok');
  res.status(allOk ? 200 : 503).json({
    status: allOk ? 'healthy' : 'degraded',
    checks,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

Health check configuration

ParameterTypical ValuePurpose
Interval10-30 secondsHow often to check
Timeout5 secondsHow long to wait for a response
Healthy threshold2-3 consecutive successesPasses needed to mark healthy
Unhealthy threshold2-3 consecutive failuresFailures needed to mark unhealthy
Path/health or /health/readyEndpoint to check

Important: The basic /health endpoint (liveness) should be fast and dependency-free. The deep /health/ready endpoint (readiness) checks all dependencies. Use liveness for "should this process be running?" and readiness for "should this server receive traffic?"


5. Session Affinity (Sticky Sessions)

Session affinity (sticky sessions) ensures that all requests from the same client go to the same server.

Without sticky sessions:           With sticky sessions:

Req 1 (Alice) → Server A          Req 1 (Alice) → Server A
Req 2 (Alice) → Server B  ✗       Req 2 (Alice) → Server A  ✓
Req 3 (Alice) → Server C  ✗       Req 3 (Alice) → Server A  ✓
(Session data is on A only!)       (Always goes to A)

How sticky sessions work

  1. Cookie-based: The LB sets a cookie (e.g., AWSALB=server-a-id) on the first response. Subsequent requests include this cookie, and the LB routes to the same server.
  2. IP-based: The LB hashes the client IP (same as IP hash algorithm).
  3. Application cookie: The application sets a cookie; the LB reads it to route.

Why sticky sessions are a code smell

Sticky sessions exist because the application is stateful -- it stores session data in memory. This is a temporary band-aid, not a solution:

  • Uneven load -- one server may accumulate more active sessions, while others are idle.
  • Failover breaks sessions -- if Server A dies, all users stuck to it lose their sessions.
  • Scaling is impaired -- new servers receive no existing sessions; old servers carry all the load.

The real solution: Make the application stateless (see 6.5.c) and eliminate the need for sticky sessions entirely.

When sticky sessions are acceptable

  • Legacy applications that cannot be refactored in the short term.
  • WebSocket connections that maintain long-lived state on a specific server.
  • In-memory caches where re-computing is expensive (but a shared cache like Redis is better).

6. AWS Load Balancer Comparison

FeatureALB (Application)NLB (Network)CLB (Classic)
OSI LayerLayer 7Layer 4Layer 4 + 7
ProtocolsHTTP, HTTPS, gRPCTCP, UDP, TLSHTTP, HTTPS, TCP
RoutingPath, host, header, queryPort onlyBasic
WebSocketNative supportTCP pass-throughNot supported
SSL terminationYesOptional (TLS pass-through)Yes
Sticky sessionsCookie-basedNoCookie-based
Static IPNo (use Global Accelerator)YesNo
PerformanceMillions req/secMillions packets/secLower
Price~$0.0225/hour + LCU~$0.0225/hour + NLCU~$0.025/hour
Use caseWeb apps, APIs, microservicesGaming, IoT, ultra-low latencyLegacy -- do not use for new projects

Decision guide

Is it HTTP/HTTPS traffic?
  ├── YES → Use ALB
  │         Need path-based routing? → ALB
  │         Need gRPC? → ALB
  │         Need WebSocket? → ALB
  └── NO  → Use NLB
            Need static IP? → NLB
            Need ultra-low latency? → NLB
            Need TCP/UDP? → NLB

7. Nginx as a Load Balancer

Nginx is the most popular open-source load balancer / reverse proxy. It handles Layer 7 load balancing with excellent performance.

Basic load balancing configuration

# /etc/nginx/conf.d/api-loadbalancer.conf

# Define the backend server pool
upstream api_servers {
    # Round robin (default)
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://api_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }
}

Weighted round robin

upstream api_servers {
    server 10.0.1.10:3000 weight=3;   # Gets 3x traffic (bigger machine)
    server 10.0.1.11:3000 weight=2;   # Gets 2x traffic
    server 10.0.1.12:3000 weight=1;   # Gets 1x traffic (smallest)
}

Least connections

upstream api_servers {
    least_conn;
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}

IP hash (sticky sessions)

upstream api_servers {
    ip_hash;
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}

Health checks (passive)

upstream api_servers {
    server 10.0.1.10:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:3000 max_fails=3 fail_timeout=30s;

    # max_fails=3    → Mark unhealthy after 3 failed requests
    # fail_timeout=30s → Try again after 30 seconds
}

Path-based routing (microservices)

# Route different paths to different service pools
upstream user_service {
    server 10.0.2.10:3001;
    server 10.0.2.11:3001;
}

upstream order_service {
    server 10.0.3.10:3002;
    server 10.0.3.11:3002;
}

upstream static_files {
    server 10.0.4.10:8080;
}

server {
    listen 80;
    server_name app.example.com;

    location /api/users {
        proxy_pass http://user_service;
    }

    location /api/orders {
        proxy_pass http://order_service;
    }

    location /static {
        proxy_pass http://static_files;
    }
}

8. HAProxy

HAProxy is another widely-used open-source load balancer, especially popular for Layer 4 (TCP) balancing and high-throughput scenarios. It is used by GitHub, Stack Overflow, and Reddit.

# Minimal HAProxy configuration
frontend http_front
    bind *:80
    default_backend api_servers

backend api_servers
    balance roundrobin
    option httpchk GET /health
    server srv1 10.0.1.10:3000 check
    server srv2 10.0.1.11:3000 check
    server srv3 10.0.1.12:3000 check

Nginx vs HAProxy: Both are excellent. Nginx is more commonly used as a combined web server + reverse proxy + load balancer. HAProxy is a dedicated load balancer and excels at raw TCP performance. For most Node.js applications, either works well.


9. DNS-Based Load Balancing

DNS load balancing distributes traffic by returning different IP addresses for the same domain name.

Query: api.example.com
Response (round-robin DNS):
  api.example.com → 52.1.1.1    (Server A, US-East)
  api.example.com → 54.2.2.2    (Server B, US-West)
  api.example.com → 13.3.3.3    (Server C, EU-West)

Client 1 resolves → 52.1.1.1
Client 2 resolves → 54.2.2.2
Client 3 resolves → 13.3.3.3
Client 4 resolves → 52.1.1.1 (back to start)

Limitations

  • Slow failover -- DNS records are cached (TTL). If a server dies, clients may continue connecting to it for minutes or hours.
  • No health checks -- basic DNS does not know if a server is healthy.
  • Uneven distribution -- clients cache DNS responses; one IP may receive disproportionate traffic.
  • No request awareness -- DNS cannot distribute based on server load or URL path.

When to use DNS load balancing

  • Global distribution -- route users to the nearest region (geolocation-based DNS).
  • Disaster recovery -- failover from one region to another by updating DNS.
  • As a first layer -- DNS distributes across regions; each region has its own ALB/NLB for local balancing.

10. Global Load Balancing

For applications serving users worldwide, you need global load balancing that routes users to the closest healthy region.

AWS Route 53 (DNS-based global routing)

Route 53 routing policies:

1. Latency-based routing:
   User in Tokyo     → ap-northeast-1 (lowest latency)
   User in London    → eu-west-1 (lowest latency)
   User in New York  → us-east-1 (lowest latency)

2. Geolocation routing:
   User in EU        → eu-west-1 (comply with GDPR)
   User in US        → us-east-1
   Default           → us-east-1

3. Failover routing:
   Primary: us-east-1
   Secondary: us-west-2 (if primary health check fails)

4. Weighted routing:
   90% → us-east-1 (production)
   10% → us-east-2 (canary deployment)

AWS CloudFront (CDN + edge routing)

CloudFront is a CDN that also acts as a global load balancer:

User in Sydney
    │
    ▼
CloudFront Edge (Sydney)
    │
    ├── Static content (HTML, CSS, JS, images) → served from edge cache
    │
    └── API requests → routed to nearest origin:
        ├── Origin Group A: ALB in ap-southeast-2 (primary)
        └── Origin Group B: ALB in us-west-2 (failover)

Multi-tier load balancing architecture

┌─────────────────────────────────────────────────────────┐
│                   Global Layer                           │
│                                                          │
│    Route 53 (DNS)  ──>  CloudFront (CDN/Edge)           │
│                                                          │
├─────────────────────┬───────────────────────────────────┤
│    US-East Region   │         EU-West Region            │
│                     │                                    │
│    ALB              │         ALB                        │
│    ┌──┐ ┌──┐ ┌──┐  │         ┌──┐ ┌──┐ ┌──┐           │
│    │S1│ │S2│ │S3│   │         │S4│ │S5│ │S6│            │
│    └──┘ └──┘ └──┘  │         └──┘ └──┘ └──┘           │
│         │           │              │                     │
│    RDS Primary      │         RDS Read Replica           │
│                     │                                    │
└─────────────────────┴───────────────────────────────────┘

11. Key Takeaways

  1. Load balancers are required for horizontal scaling -- you cannot have multiple servers without something to distribute traffic.
  2. Layer 7 (ALB) for web apps, Layer 4 (NLB) for raw TCP -- most Node.js applications want an ALB.
  3. Health checks are non-negotiable -- without them, the LB sends traffic to dead servers.
  4. Least connections is the safest default -- it adapts to variable request durations automatically.
  5. Sticky sessions are a sign of stateful design -- fix the root cause (make the app stateless) rather than relying on sticky sessions.
  6. Global load balancing combines DNS + CDN + regional LBs -- Route 53 for DNS routing, CloudFront for edge caching, ALB for regional distribution.
  7. Nginx is the Swiss Army knife -- reverse proxy, load balancer, static file server, and SSL terminator in one process.

Explain-It Challenge

  1. A colleague says "we don't need a load balancer -- we only have one server." Why should they set one up anyway? (Hint: think about what happens during deployments and failures.)
  2. Explain to a non-technical product manager why switching from sticky sessions to stateless design will improve reliability.
  3. You have a microservices architecture with three services: user-service, order-service, and image-service (handles large file uploads). Which load balancing algorithm would you use for each, and why?

Navigation: <- 6.5.a -- Vertical vs Horizontal Scaling | 6.5.c -- Stateless Design ->