Episode 6 — Scaling Reliability Microservices Web3 / 6.9 — Final Production Deployment

6.9 -- Final Production Deployment: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

  1. Skim before labs or interviews.
  2. Drill gaps -- reopen README.md --> 6.9.a...6.9.c.
  3. Practice -- 6.9-Exercise-Questions.md.
  4. Polish answers -- 6.9-Interview-Questions.md.

Core Vocabulary

TermOne-liner
Multi-stage buildDockerfile with separate build and runtime stages -- smaller, secure images
Non-root userContainer runs as a limited user, not root -- limits blast radius
.dockerignoreExcludes files from Docker build context (like .gitignore for Docker)
Layer cachingDocker caches unchanged layers; order Dockerfile commands by change frequency
Named volumeDocker-managed persistent storage that survives container restarts
Bridge networkDefault Docker network; containers communicate by service name
Health checkProbe that tells Docker/orchestrator whether a container is actually healthy
PM2Node.js production process manager -- clustering, auto-restart, zero-downtime reload
Elastic IPStatic public IP for EC2 that persists across instance stop/start
Route 53AWS DNS service for managing domains and records
A recordDNS record pointing a domain to an IPv4 address
CNAMEDNS record pointing a domain to another domain name
ACMAWS Certificate Manager -- free SSL certs for ALB/CloudFront, auto-renewing
Let's EncryptFree SSL certificate authority -- use Certbot for automation
SSL terminationDecrypting HTTPS at a specific layer (ALB or Nginx)
Reverse proxyNginx sits between internet and Node.js -- handles SSL, compression, static files
CIContinuous Integration -- build + test on every push
CD (Delivery)Every passing build is deploy-ready; human clicks deploy
CD (Deployment)Every passing build auto-deploys to production
GitHub ActionsCI/CD platform built into GitHub -- YAML workflow files
Blue/GreenTwo identical environments; instant switch from old to new
CanaryRoute small % of traffic to new version, gradually increase
Circuit breakerECS auto-rolls back if new tasks keep failing health checks

Dockerfile Best Practices

# 1. Use Alpine base
FROM node:20-alpine AS builder

# 2. Set working directory
WORKDIR /app

# 3. Copy package files FIRST (layer caching)
COPY package.json package-lock.json ./

# 4. Use npm ci (deterministic)
RUN npm ci

# 5. Copy source code AFTER deps
COPY . .

# 6. Build
RUN npm run build

# 7. Second stage: production only
FROM node:20-alpine AS production
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force
COPY --from=builder /app/dist ./dist

# 8. Non-root user
RUN addgroup -S app && adduser -S app -G app
USER app

# 9. Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD wget --spider http://localhost:3000/health || exit 1

# 10. Start
EXPOSE 3000
CMD ["node", "dist/index.js"]

Image size impact:

node:20          --> ~1 GB base
node:20-alpine   --> ~130 MB base
+ multi-stage    --> ~100-200 MB final image

Docker Compose Template (Production)

version: "3.9"
services:
  app:
    image: ECR_REGISTRY/my-app:${IMAGE_TAG}
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: "0.50"
          memory: 256M
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
    networks:
      - backend
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  postgres:
    image: postgres:16-alpine
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser"]
      interval: 10s
      retries: 5
    networks:
      - backend

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes --maxmemory 128mb
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      retries: 3
    networks:
      - backend

volumes:
  pg_data:
  redis_data:

networks:
  backend:
    driver: bridge

EC2 Setup Checklist

[ ] Choose instance type (t3.medium for most Node.js APIs)
[ ] Launch with key pair (SSH access)
[ ] Security group: ports 22, 80, 443 only
[ ] Attach Elastic IP (static address)
[ ] SSH in and install:
    [ ] Node.js via nvm
    [ ] PM2 globally (npm i -g pm2)
    [ ] Nginx
    [ ] Certbot (Let's Encrypt)
[ ] Clone/deploy application
[ ] Start with PM2 cluster mode
[ ] Configure PM2 startup (pm2 startup && pm2 save)
[ ] Configure Nginx reverse proxy
[ ] Set up SSL with Certbot
[ ] Verify: curl https://your-domain.com/health
[ ] Verify auto-renewal: certbot renew --dry-run

SSL / Domain Steps

1. DOMAIN SETUP
   Register domain (Route 53 or other registrar)
   Create hosted zone in Route 53
   Create A record: api.example.com --> EC2 Elastic IP
   OR Alias record: api.example.com --> ALB DNS name

2. SSL CERTIFICATE
   Option A (ALB):
     Request certificate in ACM
     Validate via DNS (add CNAME)
     Attach to ALB HTTPS listener (port 443)

   Option B (Server):
     Install Certbot
     Run: certbot --nginx -d api.example.com
     Auto-configures Nginx + auto-renewal

3. HTTPS REDIRECT
   Nginx: return 301 https://$host$request_uri;
   ALB: HTTP listener redirects to HTTPS

4. VERIFY
   curl https://api.example.com/health
   Check cert expiry: echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates

Nginx Reverse Proxy Cheat Sheet

# HTTP --> HTTPS redirect
server {
    listen 80;
    server_name api.example.com;
    return 301 https://$host$request_uri;
}

# HTTPS server
server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    # Security headers
    add_header Strict-Transport-Security "max-age=63072000" always;
    add_header X-Frame-Options DENY always;
    add_header X-Content-Type-Options nosniff always;

    # Proxy to Node.js
    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

CI/CD Pipeline Stages

+--------+     +--------+     +--------+     +--------+     +--------+
|  Lint  | --> |  Test  | --> | Build  | --> |  Scan  | --> | Deploy |
| ESLint |     | Jest   |     | Docker |     | Trivy  |     | ECS /  |
| Prettier|    | Integ  |     | Image  |     |        |     | EC2    |
+--------+     +--------+     +--------+     +--------+     +--------+

Triggers:
  PR opened   --> Lint + Test only (no deploy)
  Push to main --> Full pipeline (Lint + Test + Build + Scan + Deploy)

GitHub Actions key patterns:

# Cache npm to speed up CI
- uses: actions/setup-node@v4
  with:
    cache: "npm"

# Service containers for integration tests
services:
  postgres:
    image: postgres:16-alpine

# Only run deploy on main branch
if: github.ref == 'refs/heads/main'

# Require approval for production
environment: production

# Access secrets
env:
  DB_URL: ${{ secrets.DATABASE_URL }}

Deployment Strategies Comparison

StrategyHow It WorksRollback SpeedRiskCostComplexity
RecreateStop old, start newMinutes (downtime!)High1xVery Low
Rolling updateReplace instances graduallyMinutesMedium1xLow
Blue/GreenRun both, switch trafficSecondsLow2x during deployMedium
Canary5% traffic to new, then growSecondsVery Low~1xHigh

When to use what:

Dev/Testing        --> Recreate (simplest)
Standard production --> Rolling update (ECS default)
Critical services   --> Blue/Green (instant rollback)
High-traffic APIs   --> Canary (limit blast radius)

Docker Image Tagging Strategy

DO:
  my-api:abc1234       (git SHA -- unique, traceable)
  my-api:v1.2.3        (semver -- human-readable releases)

DON'T:
  my-api:latest        (mutable -- which version is running?)
  my-api:production     (overwritten on every deploy -- no rollback)

Secrets Management

DEVELOPMENT:
  .env file (in .gitignore)

CI/CD:
  GitHub Settings > Secrets > Actions
  Reference: ${{ secrets.MY_SECRET }}
  Scope: organization > repository > environment

PRODUCTION:
  ECS task definition environment variables
  AWS Systems Manager Parameter Store
  AWS Secrets Manager (for rotation)
  Docker Secrets (Swarm mode)

NEVER:
  Hardcode in Dockerfile
  Commit .env to git
  Echo secrets in CI logs
  Use the same secrets across environments

Common Gotchas

GotchaWhy It HurtsFix
Deploying :latest tagCannot trace which version is runningUse git SHA tags
npm install in CINon-deterministic; may install different versionsUse npm ci
Running as root in DockerCompromised container = full accessUSER appuser in Dockerfile
No health checksOrchestrator thinks broken container is healthyAdd HEALTHCHECK
COPY . . before npm ciBusts cache on every code changeCopy package files first
SSL cert expirySite goes down with zero warningACM (auto-renew) or Certbot timer
Exposing DB ports in productionDatabase reachable from internetRemove ports: in prod Compose
No circuit breakerBad deploy stays up, serving errorsEnable ECS circuit breaker
Same secrets in staging and prodStaging bug can corrupt production dataSeparate secrets per environment
No --only=productionProduction image has devDependenciesnpm ci --only=production
Forgetting HTTP-to-HTTPS redirectUsers on HTTP see unencrypted siteNginx return 301 or ALB redirect
No resource limitsOne container OOMs the hostSet memory and cpus limits

PM2 Quick Reference

pm2 start ecosystem.config.js   # Start app
pm2 list                         # View all processes
pm2 monit                        # Live dashboard
pm2 logs my-api                  # View logs
pm2 reload my-api                # Zero-downtime reload (cluster mode)
pm2 restart my-api               # Restart (brief downtime)
pm2 stop my-api                  # Stop
pm2 delete my-api                # Remove from PM2
pm2 startup systemd              # Auto-start on boot
pm2 save                         # Save current process list

One-Command References

# Build and push Docker image
docker build -t my-api:abc1234 . && docker push ECR/my-api:abc1234

# Scan for vulnerabilities
trivy image my-api:abc1234

# Deploy to ECS
aws ecs update-service --cluster prod --service my-api --force-new-deployment

# Rollback ECS to previous version
aws ecs update-service --cluster prod --service my-api --task-definition my-api:41

# Check SSL certificate expiry
echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates

# Test Nginx config
sudo nginx -t && sudo systemctl reload nginx

# Certbot renewal dry run
sudo certbot renew --dry-run

End of 6.9 quick revision.