Episode 6 — Scaling Reliability Microservices Web3 / 6.9 — Final Production Deployment

6.9 -- Final Production Deployment: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

Skim before labs or interviews.
Drill gaps -- reopen README.md --> 6.9.a...6.9.c.
Practice -- 6.9-Exercise-Questions.md.
Polish answers -- 6.9-Interview-Questions.md.

Core Vocabulary

Term	One-liner
Multi-stage build	Dockerfile with separate build and runtime stages -- smaller, secure images
Non-root user	Container runs as a limited user, not root -- limits blast radius
`.dockerignore`	Excludes files from Docker build context (like `.gitignore` for Docker)
Layer caching	Docker caches unchanged layers; order Dockerfile commands by change frequency
Named volume	Docker-managed persistent storage that survives container restarts
Bridge network	Default Docker network; containers communicate by service name
Health check	Probe that tells Docker/orchestrator whether a container is actually healthy
PM2	Node.js production process manager -- clustering, auto-restart, zero-downtime reload
Elastic IP	Static public IP for EC2 that persists across instance stop/start
Route 53	AWS DNS service for managing domains and records
A record	DNS record pointing a domain to an IPv4 address
CNAME	DNS record pointing a domain to another domain name
ACM	AWS Certificate Manager -- free SSL certs for ALB/CloudFront, auto-renewing
Let's Encrypt	Free SSL certificate authority -- use Certbot for automation
SSL termination	Decrypting HTTPS at a specific layer (ALB or Nginx)
Reverse proxy	Nginx sits between internet and Node.js -- handles SSL, compression, static files
CI	Continuous Integration -- build + test on every push
CD (Delivery)	Every passing build is deploy-ready; human clicks deploy
CD (Deployment)	Every passing build auto-deploys to production
GitHub Actions	CI/CD platform built into GitHub -- YAML workflow files
Blue/Green	Two identical environments; instant switch from old to new
Canary	Route small % of traffic to new version, gradually increase
Circuit breaker	ECS auto-rolls back if new tasks keep failing health checks

Dockerfile Best Practices

# 1. Use Alpine base
FROM node:20-alpine AS builder

# 2. Set working directory
WORKDIR /app

# 3. Copy package files FIRST (layer caching)
COPY package.json package-lock.json ./

# 4. Use npm ci (deterministic)
RUN npm ci

# 5. Copy source code AFTER deps
COPY . .

# 6. Build
RUN npm run build

# 7. Second stage: production only
FROM node:20-alpine AS production
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force
COPY --from=builder /app/dist ./dist

# 8. Non-root user
RUN addgroup -S app && adduser -S app -G app
USER app

# 9. Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD wget --spider http://localhost:3000/health || exit 1

# 10. Start
EXPOSE 3000
CMD ["node", "dist/index.js"]

Image size impact:

node:20          --> ~1 GB base
node:20-alpine   --> ~130 MB base
+ multi-stage    --> ~100-200 MB final image

Docker Compose Template (Production)

version: "3.9"
services:
  app:
    image: ECR_REGISTRY/my-app:${IMAGE_TAG}
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: "0.50"
          memory: 256M
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
    networks:
      - backend
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  postgres:
    image: postgres:16-alpine
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser"]
      interval: 10s
      retries: 5
    networks:
      - backend

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes --maxmemory 128mb
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      retries: 3
    networks:
      - backend

volumes:
  pg_data:
  redis_data:

networks:
  backend:
    driver: bridge

EC2 Setup Checklist

[ ] Choose instance type (t3.medium for most Node.js APIs)
[ ] Launch with key pair (SSH access)
[ ] Security group: ports 22, 80, 443 only
[ ] Attach Elastic IP (static address)
[ ] SSH in and install:
    [ ] Node.js via nvm
    [ ] PM2 globally (npm i -g pm2)
    [ ] Nginx
    [ ] Certbot (Let's Encrypt)
[ ] Clone/deploy application
[ ] Start with PM2 cluster mode
[ ] Configure PM2 startup (pm2 startup && pm2 save)
[ ] Configure Nginx reverse proxy
[ ] Set up SSL with Certbot
[ ] Verify: curl https://your-domain.com/health
[ ] Verify auto-renewal: certbot renew --dry-run

SSL / Domain Steps

1. DOMAIN SETUP
   Register domain (Route 53 or other registrar)
   Create hosted zone in Route 53
   Create A record: api.example.com --> EC2 Elastic IP
   OR Alias record: api.example.com --> ALB DNS name

2. SSL CERTIFICATE
   Option A (ALB):
     Request certificate in ACM
     Validate via DNS (add CNAME)
     Attach to ALB HTTPS listener (port 443)

   Option B (Server):
     Install Certbot
     Run: certbot --nginx -d api.example.com
     Auto-configures Nginx + auto-renewal

3. HTTPS REDIRECT
   Nginx: return 301 https://$host$request_uri;
   ALB: HTTP listener redirects to HTTPS

4. VERIFY
   curl https://api.example.com/health
   Check cert expiry: echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates

Nginx Reverse Proxy Cheat Sheet

# HTTP --> HTTPS redirect
server {
    listen 80;
    server_name api.example.com;
    return 301 https://$host$request_uri;
}

# HTTPS server
server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    # Security headers
    add_header Strict-Transport-Security "max-age=63072000" always;
    add_header X-Frame-Options DENY always;
    add_header X-Content-Type-Options nosniff always;

    # Proxy to Node.js
    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

CI/CD Pipeline Stages

+--------+     +--------+     +--------+     +--------+     +--------+
|  Lint  | --> |  Test  | --> | Build  | --> |  Scan  | --> | Deploy |
| ESLint |     | Jest   |     | Docker |     | Trivy  |     | ECS /  |
| Prettier|    | Integ  |     | Image  |     |        |     | EC2    |
+--------+     +--------+     +--------+     +--------+     +--------+

Triggers:
  PR opened   --> Lint + Test only (no deploy)
  Push to main --> Full pipeline (Lint + Test + Build + Scan + Deploy)

GitHub Actions key patterns:

# Cache npm to speed up CI
- uses: actions/setup-node@v4
  with:
    cache: "npm"

# Service containers for integration tests
services:
  postgres:
    image: postgres:16-alpine

# Only run deploy on main branch
if: github.ref == 'refs/heads/main'

# Require approval for production
environment: production

# Access secrets
env:
  DB_URL: ${{ secrets.DATABASE_URL }}

Deployment Strategies Comparison

Strategy	How It Works	Rollback Speed	Risk	Cost	Complexity
Recreate	Stop old, start new	Minutes (downtime!)	High	1x	Very Low
Rolling update	Replace instances gradually	Minutes	Medium	1x	Low
Blue/Green	Run both, switch traffic	Seconds	Low	2x during deploy	Medium
Canary	5% traffic to new, then grow	Seconds	Very Low	~1x	High

When to use what:

Dev/Testing        --> Recreate (simplest)
Standard production --> Rolling update (ECS default)
Critical services   --> Blue/Green (instant rollback)
High-traffic APIs   --> Canary (limit blast radius)

Docker Image Tagging Strategy

DO:
  my-api:abc1234       (git SHA -- unique, traceable)
  my-api:v1.2.3        (semver -- human-readable releases)

DON'T:
  my-api:latest        (mutable -- which version is running?)
  my-api:production     (overwritten on every deploy -- no rollback)

Secrets Management

DEVELOPMENT:
  .env file (in .gitignore)

CI/CD:
  GitHub Settings > Secrets > Actions
  Reference: ${{ secrets.MY_SECRET }}
  Scope: organization > repository > environment

PRODUCTION:
  ECS task definition environment variables
  AWS Systems Manager Parameter Store
  AWS Secrets Manager (for rotation)
  Docker Secrets (Swarm mode)

NEVER:
  Hardcode in Dockerfile
  Commit .env to git
  Echo secrets in CI logs
  Use the same secrets across environments

Common Gotchas

Gotcha	Why It Hurts	Fix
Deploying `:latest` tag	Cannot trace which version is running	Use git SHA tags
`npm install` in CI	Non-deterministic; may install different versions	Use `npm ci`
Running as root in Docker	Compromised container = full access	`USER appuser` in Dockerfile
No health checks	Orchestrator thinks broken container is healthy	Add `HEALTHCHECK`
`COPY . .` before `npm ci`	Busts cache on every code change	Copy package files first
SSL cert expiry	Site goes down with zero warning	ACM (auto-renew) or Certbot timer
Exposing DB ports in production	Database reachable from internet	Remove `ports:` in prod Compose
No circuit breaker	Bad deploy stays up, serving errors	Enable ECS circuit breaker
Same secrets in staging and prod	Staging bug can corrupt production data	Separate secrets per environment
No `--only=production`	Production image has devDependencies	`npm ci --only=production`
Forgetting HTTP-to-HTTPS redirect	Users on HTTP see unencrypted site	Nginx `return 301` or ALB redirect
No resource limits	One container OOMs the host	Set `memory` and `cpus` limits

PM2 Quick Reference

pm2 start ecosystem.config.js   # Start app
pm2 list                         # View all processes
pm2 monit                        # Live dashboard
pm2 logs my-api                  # View logs
pm2 reload my-api                # Zero-downtime reload (cluster mode)
pm2 restart my-api               # Restart (brief downtime)
pm2 stop my-api                  # Stop
pm2 delete my-api                # Remove from PM2
pm2 startup systemd              # Auto-start on boot
pm2 save                         # Save current process list

One-Command References

# Build and push Docker image
docker build -t my-api:abc1234 . && docker push ECR/my-api:abc1234

# Scan for vulnerabilities
trivy image my-api:abc1234

# Deploy to ECS
aws ecs update-service --cluster prod --service my-api --force-new-deployment

# Rollback ECS to previous version
aws ecs update-service --cluster prod --service my-api --task-definition my-api:41

# Check SSL certificate expiry
echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates

# Test Nginx config
sudo nginx -t && sudo systemctl reload nginx

# Certbot renewal dry run
sudo certbot renew --dry-run

End of 6.9 quick revision.