Episode 6 — Scaling Reliability Microservices Web3 / 6.3 — AWS Cloud Native Deployment

6.3.c — Application Load Balancer

In one sentence: The Application Load Balancer (ALB) is a Layer 7 (HTTP/HTTPS) load balancer that distributes incoming traffic across your ECS tasks, enables path-based routing to different microservices, terminates HTTPS, and performs health checks to ensure traffic only reaches healthy containers.

Navigation: ← 6.3.b — ECS and Fargate · 6.3.d — VPC Networking and IAM →

1. What Does an ALB Do?

An ALB sits between the internet and your ECS services. It receives all incoming HTTP/HTTPS requests and routes them to the correct backend service based on rules you define.

┌─────────────────────────────────────────────────────────────────────┐
│                   TRAFFIC FLOW WITH ALB                              │
│                                                                     │
│  Users (Internet)                                                   │
│       │                                                             │
│       │  HTTPS (port 443)                                           │
│       ▼                                                             │
│  ┌──────────────────┐                                               │
│  │  Application      │  ← Single entry point                       │
│  │  Load Balancer    │  ← Terminates HTTPS (SSL certificate)       │
│  │  (ALB)            │  ← Routes based on URL path / host header   │
│  └────────┬─────────┘                                               │
│           │                                                         │
│     ┌─────┼──────────────────┐                                      │
│     │     │                  │                                      │
│     ▼     ▼                  ▼                                      │
│  ┌──────┐ ┌──────┐   ┌────────────┐                                │
│  │User  │ │Order │   │Payment     │                                │
│  │Svc   │ │Svc   │   │Svc         │                                │
│  │(x3)  │ │(x2)  │   │(x2)        │                                │
│  └──────┘ └──────┘   └────────────┘                                │
│                                                                     │
│  /api/users/*  /api/orders/*  /api/payments/*                       │
└─────────────────────────────────────────────────────────────────────┘

ALB vs NLB vs CLB

Feature	ALB (Application)	NLB (Network)	CLB (Classic)
OSI Layer	Layer 7 (HTTP/HTTPS)	Layer 4 (TCP/UDP)	Layer 4+7 (legacy)
Routing	Path, host, header, query string	Port-based only	Basic round-robin
Protocol	HTTP, HTTPS, WebSocket, gRPC	TCP, UDP, TLS	HTTP, TCP
Performance	Millions of requests/sec	Millions of packets/sec, ultra-low latency	Limited
Use case	Web APIs, microservices	Game servers, IoT, extreme perf	Legacy (avoid for new)
ECS integration	Excellent — dynamic port mapping, path routing	Good — static port mapping	Deprecated for ECS

Rule of thumb: Use ALB for HTTP/HTTPS workloads. Use NLB for raw TCP/UDP or when you need static IPs and extreme low latency.

2. ALB Components

Listeners

A listener checks for incoming connections on a specific port and protocol. You typically have two listeners:

Listener 1:  Port 80  (HTTP)   → Redirect to HTTPS
Listener 2:  Port 443 (HTTPS)  → Forward to target groups based on rules

Rules

Each listener has rules that match incoming requests and route them to target groups. Rules are evaluated in priority order.

Rule 1 (priority 1):  IF path = /api/users/*    THEN forward → user-service-tg
Rule 2 (priority 2):  IF path = /api/orders/*   THEN forward → order-service-tg
Rule 3 (priority 3):  IF path = /api/payments/* THEN forward → payment-service-tg
Rule 4 (default):     ELSE                       THEN return 404

Target Groups

A target group is a collection of targets (ECS tasks, EC2 instances, or Lambda functions) that receive traffic from the ALB. Each microservice typically has its own target group.

Target Group: user-service-tg
  ├── Task 10.0.1.15:3000  (healthy)
  ├── Task 10.0.2.23:3000  (healthy)
  └── Task 10.0.1.42:3000  (healthy)

Target Group: order-service-tg
  ├── Task 10.0.1.88:3000  (healthy)
  └── Task 10.0.2.55:3000  (healthy)

3. Path-Based Routing

Path-based routing is what makes a single ALB serve multiple microservices. Each path prefix maps to a different ECS service.

Architecture example

Single ALB: api.myapp.com

  /api/users/*        →  user-service    (ECS Service, 3 tasks)
  /api/users/123      →  user-service
  /api/orders/*       →  order-service   (ECS Service, 2 tasks)
  /api/orders/456     →  order-service
  /api/payments/*     →  payment-service (ECS Service, 2 tasks)
  /health             →  health-check    (static 200 response)
  /*                  →  default-action  (return 404)

Why path-based routing?

Single domain — one DNS entry, one SSL certificate
Centralized access point — easier to add WAF, logging, rate limiting
Microservices behind one URL — clients don't need to know about individual services
Independent scaling — each target group (service) scales independently

Host-based routing alternative

You can also route by hostname (subdomain):

users.api.myapp.com    →  user-service-tg
orders.api.myapp.com   →  order-service-tg
payments.api.myapp.com →  payment-service-tg

Or combine both path and host routing for complex architectures.

4. Setting Up ALB — Complete Walkthrough

Step 1: Create the ALB

# Create the ALB in public subnets
ALB_ARN=$(aws elbv2 create-load-balancer \
  --name my-app-alb \
  --subnets subnet-public-1a subnet-public-1b \
  --security-groups sg-alb-public \
  --scheme internet-facing \
  --type application \
  --ip-address-type ipv4 \
  --query 'LoadBalancers[0].LoadBalancerArn' \
  --output text \
  --region us-east-1)

echo "ALB ARN: $ALB_ARN"

Step 2: Create target groups

# Target group for user-service
USER_TG_ARN=$(aws elbv2 create-target-group \
  --name user-service-tg \
  --protocol HTTP \
  --port 3000 \
  --vpc-id vpc-0a1b2c3d \
  --target-type ip \
  --health-check-protocol HTTP \
  --health-check-path /health \
  --health-check-interval-seconds 30 \
  --health-check-timeout-seconds 5 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3 \
  --query 'TargetGroups[0].TargetGroupArn' \
  --output text \
  --region us-east-1)

# Target group for order-service
ORDER_TG_ARN=$(aws elbv2 create-target-group \
  --name order-service-tg \
  --protocol HTTP \
  --port 3000 \
  --vpc-id vpc-0a1b2c3d \
  --target-type ip \
  --health-check-protocol HTTP \
  --health-check-path /health \
  --health-check-interval-seconds 30 \
  --health-check-timeout-seconds 5 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3 \
  --query 'TargetGroups[0].TargetGroupArn' \
  --output text \
  --region us-east-1)

Important: For Fargate, --target-type must be ip (not instance), because Fargate tasks are identified by their private IP, not by EC2 instance IDs.

Step 3: Create HTTPS listener

# Create HTTPS listener (port 443)
aws elbv2 create-listener \
  --load-balancer-arn "$ALB_ARN" \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=arn:aws:acm:us-east-1:123456789012:certificate/abc-123 \
  --default-action Type=fixed-response,FixedResponseConfig='{
    StatusCode="404",
    ContentType="application/json",
    MessageBody="{\"error\": \"Not Found\"}"
  }' \
  --region us-east-1

Step 4: Create HTTP-to-HTTPS redirect

# Create HTTP listener that redirects to HTTPS
aws elbv2 create-listener \
  --load-balancer-arn "$ALB_ARN" \
  --protocol HTTP \
  --port 80 \
  --default-action Type=redirect,RedirectConfig='{
    Protocol="HTTPS",
    Port="443",
    StatusCode="HTTP_301"
  }' \
  --region us-east-1

Step 5: Add path-based routing rules

# Get the HTTPS listener ARN
LISTENER_ARN=$(aws elbv2 describe-listeners \
  --load-balancer-arn "$ALB_ARN" \
  --query 'Listeners[?Port==`443`].ListenerArn' \
  --output text \
  --region us-east-1)

# Rule: /api/users/* → user-service-tg
aws elbv2 create-rule \
  --listener-arn "$LISTENER_ARN" \
  --priority 1 \
  --conditions '[{
    "Field": "path-pattern",
    "PathPatternConfig": { "Values": ["/api/users/*"] }
  }]' \
  --actions "[{
    \"Type\": \"forward\",
    \"TargetGroupArn\": \"$USER_TG_ARN\"
  }]" \
  --region us-east-1

# Rule: /api/orders/* → order-service-tg
aws elbv2 create-rule \
  --listener-arn "$LISTENER_ARN" \
  --priority 2 \
  --conditions '[{
    "Field": "path-pattern",
    "PathPatternConfig": { "Values": ["/api/orders/*"] }
  }]' \
  --actions "[{
    \"Type\": \"forward\",
    \"TargetGroupArn\": \"$ORDER_TG_ARN\"
  }]" \
  --region us-east-1

5. Health Checks

Health checks are critical — they determine which tasks receive traffic. If a task fails its health check, the ALB stops sending traffic to it, and ECS may replace it.

Health check configuration

┌───────────────────────────────────────────────────────────┐
│                 ALB HEALTH CHECK FLOW                      │
│                                                           │
│  ALB sends:   GET /health  HTTP/1.1                       │
│  Every:       30 seconds (interval)                       │
│  Timeout:     5 seconds (must respond within)             │
│                                                           │
│  To become HEALTHY:                                       │
│    2 consecutive 200 responses (healthy threshold)        │
│                                                           │
│  To become UNHEALTHY:                                     │
│    3 consecutive failures (unhealthy threshold)           │
│                                                           │
│  Matcher: HTTP status code 200                            │
└───────────────────────────────────────────────────────────┘

Health check parameters explained

Parameter	Default	Recommended	Explanation
Path	`/`	`/health`	Endpoint ALB hits to check health
Protocol	HTTP	HTTP	Protocol for health check (HTTP, not HTTPS — traffic is internal)
Port	traffic-port	traffic-port	Use the same port the app listens on
Interval	30s	15-30s	Time between health checks
Timeout	5s	5s	Max time to wait for a response
Healthy threshold	5	2	Number of consecutive successes to mark healthy
Unhealthy threshold	2	3	Number of consecutive failures to mark unhealthy
Success codes	200	200	HTTP status codes that count as healthy

Node.js health check endpoint

// Express.js health check endpoint
app.get('/health', (req, res) => {
  // Basic health check — app is running
  res.status(200).json({ status: 'healthy', timestamp: new Date().toISOString() });
});

// Advanced health check — verify dependencies
app.get('/health', async (req, res) => {
  try {
    // Check database connection
    await db.query('SELECT 1');
    
    // Check Redis connection
    await redis.ping();
    
    res.status(200).json({
      status: 'healthy',
      dependencies: {
        database: 'connected',
        cache: 'connected'
      },
      uptime: process.uptime(),
      timestamp: new Date().toISOString()
    });
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      error: error.message,
      timestamp: new Date().toISOString()
    });
  }
});

Design consideration: Should your health check include dependency checks? A failing database shouldn't necessarily make the container "unhealthy" — the ALB would drain ALL tasks, causing a total outage. Consider a separate /ready endpoint for deep checks and keep /health lightweight.

6. HTTPS Termination at ALB

HTTPS termination means the ALB handles SSL/TLS encryption and decryption. Traffic from the ALB to your ECS tasks is plain HTTP over the private network.

Client → [HTTPS/443] → ALB → [HTTP/3000] → ECS Task

Why this is good:
  1. Your app doesn't manage SSL certificates
  2. ALB handles certificate renewal (via ACM)
  3. Reduced CPU load on your containers
  4. Easier to manage — one certificate for the whole ALB

Setting up HTTPS with ACM

# Step 1: Request a certificate (free with ACM)
CERT_ARN=$(aws acm request-certificate \
  --domain-name api.myapp.com \
  --subject-alternative-names "*.myapp.com" \
  --validation-method DNS \
  --query 'CertificateArn' \
  --output text \
  --region us-east-1)

# Step 2: Validate the certificate (add DNS record — ACM shows you the record)
aws acm describe-certificate \
  --certificate-arn "$CERT_ARN" \
  --query 'Certificate.DomainValidationOptions' \
  --region us-east-1

# Step 3: Attach to ALB listener (done in Step 3 above)
# The --certificates flag in create-listener references this cert

Security policy

The ALB supports configurable TLS policies. Use the latest policy to disable outdated protocols:

# Use ELBSecurityPolicy-TLS13-1-2-2021-06 or newer
# This enforces TLS 1.2+ and disables weak ciphers

7. Connection Draining (Deregistration Delay)

When a task is being removed (scaling in, deployment), the ALB doesn't cut connections immediately. Connection draining allows in-flight requests to complete.

Task is marked for removal:

  1. ALB stops sending NEW requests to the task
  2. Existing connections continue for up to [deregistration delay]
  3. After delay (or all connections close), task is deregistered
  4. ECS stops the task

Default deregistration delay: 300 seconds (5 minutes)

Configuring deregistration delay

# Set deregistration delay to 30 seconds (for fast deployments)
aws elbv2 modify-target-group-attributes \
  --target-group-arn "$USER_TG_ARN" \
  --attributes Key=deregistration_delay.timeout_seconds,Value=30

Tuning guidelines:

Short-lived requests (REST APIs): 30-60 seconds
Long-lived connections (WebSocket): 300+ seconds
Batch processing: Match to max expected request duration

8. ALB + ECS Integration

When you create an ECS service with a load balancer, the integration is automatic:

ECS launches a new task → registers its IP with the target group
ALB starts health-checking the new task
Task passes health checks → ALB sends traffic to it
Task stops or fails health check → ALB deregisters it
ECS replaces failed tasks → cycle repeats

How it looks in the ECS service definition

aws ecs create-service \
  --cluster production \
  --service-name user-service \
  --task-definition user-service:3 \
  --desired-count 3 \
  --launch-type FARGATE \
  --load-balancers '[
    {
      "targetGroupArn": "arn:aws:elasticloadbalancing:...:targetgroup/user-svc-tg/abc123",
      "containerName": "user-service",
      "containerPort": 3000
    }
  ]' \
  --network-configuration '{
    "awsvpcConfiguration": {
      "subnets": ["subnet-private-1a", "subnet-private-1b"],
      "securityGroups": ["sg-ecs-tasks"],
      "assignPublicIp": "DISABLED"
    }
  }'

Security group chain

Internet → ALB (sg-alb)
  Inbound:  port 443 from 0.0.0.0/0
  Outbound: port 3000 to sg-ecs-tasks

ALB → ECS Tasks (sg-ecs-tasks)
  Inbound:  port 3000 from sg-alb    ← ONLY the ALB can reach tasks
  Outbound: port 443 to 0.0.0.0/0    (for calling external APIs)

This ensures your ECS tasks are not directly accessible from the internet — all traffic must flow through the ALB.

9. ALB Access Logs

ALB can log every request to S3 for analysis and debugging:

# Enable access logging
aws elbv2 modify-load-balancer-attributes \
  --load-balancer-arn "$ALB_ARN" \
  --attributes \
    Key=access_logs.s3.enabled,Value=true \
    Key=access_logs.s3.bucket,Value=my-alb-logs-bucket \
    Key=access_logs.s3.prefix,Value=alb/my-app

Log entries include: timestamp, client IP, request URL, status code, response time, target IP, bytes sent/received.

10. Sticky Sessions

By default, the ALB distributes requests evenly. Sticky sessions route a user's requests to the same target for a configured duration.

# Enable application-based stickiness
aws elbv2 modify-target-group-attributes \
  --target-group-arn "$USER_TG_ARN" \
  --attributes \
    Key=stickiness.enabled,Value=true \
    Key=stickiness.type,Value=app_cookie \
    Key=stickiness.app_cookie.cookie_name,Value=SESSIONID \
    Key=stickiness.app_cookie.duration_seconds,Value=3600

Best practice: Avoid sticky sessions if possible. They create uneven load distribution. Use shared session stores (Redis, DynamoDB) instead, so any task can handle any request.

11. Key Takeaways

ALB is a Layer 7 load balancer — it understands HTTP and can route based on path, host, headers, and query strings.
Path-based routing lets one ALB serve multiple microservices (/api/users/*, /api/orders/*).
Target groups group ECS tasks that serve the same service — each gets independent health checks.
Health checks are critical — misconfigured health checks cause tasks to be drained or never receive traffic.
HTTPS termination at the ALB — use ACM for free, auto-renewing SSL certificates. Backend traffic is HTTP.
Connection draining lets in-flight requests complete before a task is stopped.
Security group chain — internet talks to ALB, ALB talks to ECS tasks, tasks are NOT directly accessible.
Target type must be ip for Fargate — not instance.

Explain-It Challenge

Your service returns 502 Bad Gateway errors after a deployment. The old version worked fine. Walk through how the ALB, target group, and health checks could cause this.
Explain to a product manager why you need an ALB instead of just exposing each microservice directly to the internet.
A developer added a health check that queries the database. During a database outage, ALL tasks are marked unhealthy and the entire service goes down. How would you redesign the health check strategy?

Navigation: ← 6.3.b — ECS and Fargate · 6.3.d — VPC Networking and IAM →