Episode 6 — Scaling Reliability Microservices Web3 / 6.3 — AWS Cloud Native Deployment
6.3.c — Application Load Balancer
In one sentence: The Application Load Balancer (ALB) is a Layer 7 (HTTP/HTTPS) load balancer that distributes incoming traffic across your ECS tasks, enables path-based routing to different microservices, terminates HTTPS, and performs health checks to ensure traffic only reaches healthy containers.
Navigation: ← 6.3.b — ECS and Fargate · 6.3.d — VPC Networking and IAM →
1. What Does an ALB Do?
An ALB sits between the internet and your ECS services. It receives all incoming HTTP/HTTPS requests and routes them to the correct backend service based on rules you define.
┌─────────────────────────────────────────────────────────────────────┐
│ TRAFFIC FLOW WITH ALB │
│ │
│ Users (Internet) │
│ │ │
│ │ HTTPS (port 443) │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Application │ ← Single entry point │
│ │ Load Balancer │ ← Terminates HTTPS (SSL certificate) │
│ │ (ALB) │ ← Routes based on URL path / host header │
│ └────────┬─────────┘ │
│ │ │
│ ┌─────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌────────────┐ │
│ │User │ │Order │ │Payment │ │
│ │Svc │ │Svc │ │Svc │ │
│ │(x3) │ │(x2) │ │(x2) │ │
│ └──────┘ └──────┘ └────────────┘ │
│ │
│ /api/users/* /api/orders/* /api/payments/* │
└─────────────────────────────────────────────────────────────────────┘
ALB vs NLB vs CLB
| Feature | ALB (Application) | NLB (Network) | CLB (Classic) |
|---|---|---|---|
| OSI Layer | Layer 7 (HTTP/HTTPS) | Layer 4 (TCP/UDP) | Layer 4+7 (legacy) |
| Routing | Path, host, header, query string | Port-based only | Basic round-robin |
| Protocol | HTTP, HTTPS, WebSocket, gRPC | TCP, UDP, TLS | HTTP, TCP |
| Performance | Millions of requests/sec | Millions of packets/sec, ultra-low latency | Limited |
| Use case | Web APIs, microservices | Game servers, IoT, extreme perf | Legacy (avoid for new) |
| ECS integration | Excellent — dynamic port mapping, path routing | Good — static port mapping | Deprecated for ECS |
Rule of thumb: Use ALB for HTTP/HTTPS workloads. Use NLB for raw TCP/UDP or when you need static IPs and extreme low latency.
2. ALB Components
Listeners
A listener checks for incoming connections on a specific port and protocol. You typically have two listeners:
Listener 1: Port 80 (HTTP) → Redirect to HTTPS
Listener 2: Port 443 (HTTPS) → Forward to target groups based on rules
Rules
Each listener has rules that match incoming requests and route them to target groups. Rules are evaluated in priority order.
Rule 1 (priority 1): IF path = /api/users/* THEN forward → user-service-tg
Rule 2 (priority 2): IF path = /api/orders/* THEN forward → order-service-tg
Rule 3 (priority 3): IF path = /api/payments/* THEN forward → payment-service-tg
Rule 4 (default): ELSE THEN return 404
Target Groups
A target group is a collection of targets (ECS tasks, EC2 instances, or Lambda functions) that receive traffic from the ALB. Each microservice typically has its own target group.
Target Group: user-service-tg
├── Task 10.0.1.15:3000 (healthy)
├── Task 10.0.2.23:3000 (healthy)
└── Task 10.0.1.42:3000 (healthy)
Target Group: order-service-tg
├── Task 10.0.1.88:3000 (healthy)
└── Task 10.0.2.55:3000 (healthy)
3. Path-Based Routing
Path-based routing is what makes a single ALB serve multiple microservices. Each path prefix maps to a different ECS service.
Architecture example
Single ALB: api.myapp.com
/api/users/* → user-service (ECS Service, 3 tasks)
/api/users/123 → user-service
/api/orders/* → order-service (ECS Service, 2 tasks)
/api/orders/456 → order-service
/api/payments/* → payment-service (ECS Service, 2 tasks)
/health → health-check (static 200 response)
/* → default-action (return 404)
Why path-based routing?
- Single domain — one DNS entry, one SSL certificate
- Centralized access point — easier to add WAF, logging, rate limiting
- Microservices behind one URL — clients don't need to know about individual services
- Independent scaling — each target group (service) scales independently
Host-based routing alternative
You can also route by hostname (subdomain):
users.api.myapp.com → user-service-tg
orders.api.myapp.com → order-service-tg
payments.api.myapp.com → payment-service-tg
Or combine both path and host routing for complex architectures.
4. Setting Up ALB — Complete Walkthrough
Step 1: Create the ALB
# Create the ALB in public subnets
ALB_ARN=$(aws elbv2 create-load-balancer \
--name my-app-alb \
--subnets subnet-public-1a subnet-public-1b \
--security-groups sg-alb-public \
--scheme internet-facing \
--type application \
--ip-address-type ipv4 \
--query 'LoadBalancers[0].LoadBalancerArn' \
--output text \
--region us-east-1)
echo "ALB ARN: $ALB_ARN"
Step 2: Create target groups
# Target group for user-service
USER_TG_ARN=$(aws elbv2 create-target-group \
--name user-service-tg \
--protocol HTTP \
--port 3000 \
--vpc-id vpc-0a1b2c3d \
--target-type ip \
--health-check-protocol HTTP \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--query 'TargetGroups[0].TargetGroupArn' \
--output text \
--region us-east-1)
# Target group for order-service
ORDER_TG_ARN=$(aws elbv2 create-target-group \
--name order-service-tg \
--protocol HTTP \
--port 3000 \
--vpc-id vpc-0a1b2c3d \
--target-type ip \
--health-check-protocol HTTP \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--query 'TargetGroups[0].TargetGroupArn' \
--output text \
--region us-east-1)
Important: For Fargate,
--target-typemust beip(notinstance), because Fargate tasks are identified by their private IP, not by EC2 instance IDs.
Step 3: Create HTTPS listener
# Create HTTPS listener (port 443)
aws elbv2 create-listener \
--load-balancer-arn "$ALB_ARN" \
--protocol HTTPS \
--port 443 \
--certificates CertificateArn=arn:aws:acm:us-east-1:123456789012:certificate/abc-123 \
--default-action Type=fixed-response,FixedResponseConfig='{
StatusCode="404",
ContentType="application/json",
MessageBody="{\"error\": \"Not Found\"}"
}' \
--region us-east-1
Step 4: Create HTTP-to-HTTPS redirect
# Create HTTP listener that redirects to HTTPS
aws elbv2 create-listener \
--load-balancer-arn "$ALB_ARN" \
--protocol HTTP \
--port 80 \
--default-action Type=redirect,RedirectConfig='{
Protocol="HTTPS",
Port="443",
StatusCode="HTTP_301"
}' \
--region us-east-1
Step 5: Add path-based routing rules
# Get the HTTPS listener ARN
LISTENER_ARN=$(aws elbv2 describe-listeners \
--load-balancer-arn "$ALB_ARN" \
--query 'Listeners[?Port==`443`].ListenerArn' \
--output text \
--region us-east-1)
# Rule: /api/users/* → user-service-tg
aws elbv2 create-rule \
--listener-arn "$LISTENER_ARN" \
--priority 1 \
--conditions '[{
"Field": "path-pattern",
"PathPatternConfig": { "Values": ["/api/users/*"] }
}]' \
--actions "[{
\"Type\": \"forward\",
\"TargetGroupArn\": \"$USER_TG_ARN\"
}]" \
--region us-east-1
# Rule: /api/orders/* → order-service-tg
aws elbv2 create-rule \
--listener-arn "$LISTENER_ARN" \
--priority 2 \
--conditions '[{
"Field": "path-pattern",
"PathPatternConfig": { "Values": ["/api/orders/*"] }
}]' \
--actions "[{
\"Type\": \"forward\",
\"TargetGroupArn\": \"$ORDER_TG_ARN\"
}]" \
--region us-east-1
5. Health Checks
Health checks are critical — they determine which tasks receive traffic. If a task fails its health check, the ALB stops sending traffic to it, and ECS may replace it.
Health check configuration
┌───────────────────────────────────────────────────────────┐
│ ALB HEALTH CHECK FLOW │
│ │
│ ALB sends: GET /health HTTP/1.1 │
│ Every: 30 seconds (interval) │
│ Timeout: 5 seconds (must respond within) │
│ │
│ To become HEALTHY: │
│ 2 consecutive 200 responses (healthy threshold) │
│ │
│ To become UNHEALTHY: │
│ 3 consecutive failures (unhealthy threshold) │
│ │
│ Matcher: HTTP status code 200 │
└───────────────────────────────────────────────────────────┘
Health check parameters explained
| Parameter | Default | Recommended | Explanation |
|---|---|---|---|
| Path | / | /health | Endpoint ALB hits to check health |
| Protocol | HTTP | HTTP | Protocol for health check (HTTP, not HTTPS — traffic is internal) |
| Port | traffic-port | traffic-port | Use the same port the app listens on |
| Interval | 30s | 15-30s | Time between health checks |
| Timeout | 5s | 5s | Max time to wait for a response |
| Healthy threshold | 5 | 2 | Number of consecutive successes to mark healthy |
| Unhealthy threshold | 2 | 3 | Number of consecutive failures to mark unhealthy |
| Success codes | 200 | 200 | HTTP status codes that count as healthy |
Node.js health check endpoint
// Express.js health check endpoint
app.get('/health', (req, res) => {
// Basic health check — app is running
res.status(200).json({ status: 'healthy', timestamp: new Date().toISOString() });
});
// Advanced health check — verify dependencies
app.get('/health', async (req, res) => {
try {
// Check database connection
await db.query('SELECT 1');
// Check Redis connection
await redis.ping();
res.status(200).json({
status: 'healthy',
dependencies: {
database: 'connected',
cache: 'connected'
},
uptime: process.uptime(),
timestamp: new Date().toISOString()
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: error.message,
timestamp: new Date().toISOString()
});
}
});
Design consideration: Should your health check include dependency checks? A failing database shouldn't necessarily make the container "unhealthy" — the ALB would drain ALL tasks, causing a total outage. Consider a separate
/readyendpoint for deep checks and keep/healthlightweight.
6. HTTPS Termination at ALB
HTTPS termination means the ALB handles SSL/TLS encryption and decryption. Traffic from the ALB to your ECS tasks is plain HTTP over the private network.
Client → [HTTPS/443] → ALB → [HTTP/3000] → ECS Task
Why this is good:
1. Your app doesn't manage SSL certificates
2. ALB handles certificate renewal (via ACM)
3. Reduced CPU load on your containers
4. Easier to manage — one certificate for the whole ALB
Setting up HTTPS with ACM
# Step 1: Request a certificate (free with ACM)
CERT_ARN=$(aws acm request-certificate \
--domain-name api.myapp.com \
--subject-alternative-names "*.myapp.com" \
--validation-method DNS \
--query 'CertificateArn' \
--output text \
--region us-east-1)
# Step 2: Validate the certificate (add DNS record — ACM shows you the record)
aws acm describe-certificate \
--certificate-arn "$CERT_ARN" \
--query 'Certificate.DomainValidationOptions' \
--region us-east-1
# Step 3: Attach to ALB listener (done in Step 3 above)
# The --certificates flag in create-listener references this cert
Security policy
The ALB supports configurable TLS policies. Use the latest policy to disable outdated protocols:
# Use ELBSecurityPolicy-TLS13-1-2-2021-06 or newer
# This enforces TLS 1.2+ and disables weak ciphers
7. Connection Draining (Deregistration Delay)
When a task is being removed (scaling in, deployment), the ALB doesn't cut connections immediately. Connection draining allows in-flight requests to complete.
Task is marked for removal:
1. ALB stops sending NEW requests to the task
2. Existing connections continue for up to [deregistration delay]
3. After delay (or all connections close), task is deregistered
4. ECS stops the task
Default deregistration delay: 300 seconds (5 minutes)
Configuring deregistration delay
# Set deregistration delay to 30 seconds (for fast deployments)
aws elbv2 modify-target-group-attributes \
--target-group-arn "$USER_TG_ARN" \
--attributes Key=deregistration_delay.timeout_seconds,Value=30
Tuning guidelines:
- Short-lived requests (REST APIs): 30-60 seconds
- Long-lived connections (WebSocket): 300+ seconds
- Batch processing: Match to max expected request duration
8. ALB + ECS Integration
When you create an ECS service with a load balancer, the integration is automatic:
- ECS launches a new task → registers its IP with the target group
- ALB starts health-checking the new task
- Task passes health checks → ALB sends traffic to it
- Task stops or fails health check → ALB deregisters it
- ECS replaces failed tasks → cycle repeats
How it looks in the ECS service definition
aws ecs create-service \
--cluster production \
--service-name user-service \
--task-definition user-service:3 \
--desired-count 3 \
--launch-type FARGATE \
--load-balancers '[
{
"targetGroupArn": "arn:aws:elasticloadbalancing:...:targetgroup/user-svc-tg/abc123",
"containerName": "user-service",
"containerPort": 3000
}
]' \
--network-configuration '{
"awsvpcConfiguration": {
"subnets": ["subnet-private-1a", "subnet-private-1b"],
"securityGroups": ["sg-ecs-tasks"],
"assignPublicIp": "DISABLED"
}
}'
Security group chain
Internet → ALB (sg-alb)
Inbound: port 443 from 0.0.0.0/0
Outbound: port 3000 to sg-ecs-tasks
ALB → ECS Tasks (sg-ecs-tasks)
Inbound: port 3000 from sg-alb ← ONLY the ALB can reach tasks
Outbound: port 443 to 0.0.0.0/0 (for calling external APIs)
This ensures your ECS tasks are not directly accessible from the internet — all traffic must flow through the ALB.
9. ALB Access Logs
ALB can log every request to S3 for analysis and debugging:
# Enable access logging
aws elbv2 modify-load-balancer-attributes \
--load-balancer-arn "$ALB_ARN" \
--attributes \
Key=access_logs.s3.enabled,Value=true \
Key=access_logs.s3.bucket,Value=my-alb-logs-bucket \
Key=access_logs.s3.prefix,Value=alb/my-app
Log entries include: timestamp, client IP, request URL, status code, response time, target IP, bytes sent/received.
10. Sticky Sessions
By default, the ALB distributes requests evenly. Sticky sessions route a user's requests to the same target for a configured duration.
# Enable application-based stickiness
aws elbv2 modify-target-group-attributes \
--target-group-arn "$USER_TG_ARN" \
--attributes \
Key=stickiness.enabled,Value=true \
Key=stickiness.type,Value=app_cookie \
Key=stickiness.app_cookie.cookie_name,Value=SESSIONID \
Key=stickiness.app_cookie.duration_seconds,Value=3600
Best practice: Avoid sticky sessions if possible. They create uneven load distribution. Use shared session stores (Redis, DynamoDB) instead, so any task can handle any request.
11. Key Takeaways
- ALB is a Layer 7 load balancer — it understands HTTP and can route based on path, host, headers, and query strings.
- Path-based routing lets one ALB serve multiple microservices (
/api/users/*,/api/orders/*). - Target groups group ECS tasks that serve the same service — each gets independent health checks.
- Health checks are critical — misconfigured health checks cause tasks to be drained or never receive traffic.
- HTTPS termination at the ALB — use ACM for free, auto-renewing SSL certificates. Backend traffic is HTTP.
- Connection draining lets in-flight requests complete before a task is stopped.
- Security group chain — internet talks to ALB, ALB talks to ECS tasks, tasks are NOT directly accessible.
- Target type must be
ipfor Fargate — notinstance.
Explain-It Challenge
- Your service returns 502 Bad Gateway errors after a deployment. The old version worked fine. Walk through how the ALB, target group, and health checks could cause this.
- Explain to a product manager why you need an ALB instead of just exposing each microservice directly to the internet.
- A developer added a health check that queries the database. During a database outage, ALL tasks are marked unhealthy and the entire service goes down. How would you redesign the health check strategy?
Navigation: ← 6.3.b — ECS and Fargate · 6.3.d — VPC Networking and IAM →