Episode 6 — Scaling Reliability Microservices Web3 / 6.9 — Final Production Deployment
6.9 -- Final Production Deployment: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps -- reopen
README.md-->6.9.a...6.9.c. - Practice --
6.9-Exercise-Questions.md. - Polish answers --
6.9-Interview-Questions.md.
Core Vocabulary
| Term | One-liner |
|---|---|
| Multi-stage build | Dockerfile with separate build and runtime stages -- smaller, secure images |
| Non-root user | Container runs as a limited user, not root -- limits blast radius |
.dockerignore | Excludes files from Docker build context (like .gitignore for Docker) |
| Layer caching | Docker caches unchanged layers; order Dockerfile commands by change frequency |
| Named volume | Docker-managed persistent storage that survives container restarts |
| Bridge network | Default Docker network; containers communicate by service name |
| Health check | Probe that tells Docker/orchestrator whether a container is actually healthy |
| PM2 | Node.js production process manager -- clustering, auto-restart, zero-downtime reload |
| Elastic IP | Static public IP for EC2 that persists across instance stop/start |
| Route 53 | AWS DNS service for managing domains and records |
| A record | DNS record pointing a domain to an IPv4 address |
| CNAME | DNS record pointing a domain to another domain name |
| ACM | AWS Certificate Manager -- free SSL certs for ALB/CloudFront, auto-renewing |
| Let's Encrypt | Free SSL certificate authority -- use Certbot for automation |
| SSL termination | Decrypting HTTPS at a specific layer (ALB or Nginx) |
| Reverse proxy | Nginx sits between internet and Node.js -- handles SSL, compression, static files |
| CI | Continuous Integration -- build + test on every push |
| CD (Delivery) | Every passing build is deploy-ready; human clicks deploy |
| CD (Deployment) | Every passing build auto-deploys to production |
| GitHub Actions | CI/CD platform built into GitHub -- YAML workflow files |
| Blue/Green | Two identical environments; instant switch from old to new |
| Canary | Route small % of traffic to new version, gradually increase |
| Circuit breaker | ECS auto-rolls back if new tasks keep failing health checks |
Dockerfile Best Practices
# 1. Use Alpine base
FROM node:20-alpine AS builder
# 2. Set working directory
WORKDIR /app
# 3. Copy package files FIRST (layer caching)
COPY package.json package-lock.json ./
# 4. Use npm ci (deterministic)
RUN npm ci
# 5. Copy source code AFTER deps
COPY . .
# 6. Build
RUN npm run build
# 7. Second stage: production only
FROM node:20-alpine AS production
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force
COPY --from=builder /app/dist ./dist
# 8. Non-root user
RUN addgroup -S app && adduser -S app -G app
USER app
# 9. Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD wget --spider http://localhost:3000/health || exit 1
# 10. Start
EXPOSE 3000
CMD ["node", "dist/index.js"]
Image size impact:
node:20 --> ~1 GB base
node:20-alpine --> ~130 MB base
+ multi-stage --> ~100-200 MB final image
Docker Compose Template (Production)
version: "3.9"
services:
app:
image: ECR_REGISTRY/my-app:${IMAGE_TAG}
restart: unless-stopped
deploy:
resources:
limits:
cpus: "0.50"
memory: 256M
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:3000/health"]
interval: 30s
timeout: 5s
retries: 3
networks:
- backend
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
postgres:
image: postgres:16-alpine
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U appuser"]
interval: 10s
retries: 5
networks:
- backend
redis:
image: redis:7-alpine
command: redis-server --appendonly yes --maxmemory 128mb
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
retries: 3
networks:
- backend
volumes:
pg_data:
redis_data:
networks:
backend:
driver: bridge
EC2 Setup Checklist
[ ] Choose instance type (t3.medium for most Node.js APIs)
[ ] Launch with key pair (SSH access)
[ ] Security group: ports 22, 80, 443 only
[ ] Attach Elastic IP (static address)
[ ] SSH in and install:
[ ] Node.js via nvm
[ ] PM2 globally (npm i -g pm2)
[ ] Nginx
[ ] Certbot (Let's Encrypt)
[ ] Clone/deploy application
[ ] Start with PM2 cluster mode
[ ] Configure PM2 startup (pm2 startup && pm2 save)
[ ] Configure Nginx reverse proxy
[ ] Set up SSL with Certbot
[ ] Verify: curl https://your-domain.com/health
[ ] Verify auto-renewal: certbot renew --dry-run
SSL / Domain Steps
1. DOMAIN SETUP
Register domain (Route 53 or other registrar)
Create hosted zone in Route 53
Create A record: api.example.com --> EC2 Elastic IP
OR Alias record: api.example.com --> ALB DNS name
2. SSL CERTIFICATE
Option A (ALB):
Request certificate in ACM
Validate via DNS (add CNAME)
Attach to ALB HTTPS listener (port 443)
Option B (Server):
Install Certbot
Run: certbot --nginx -d api.example.com
Auto-configures Nginx + auto-renewal
3. HTTPS REDIRECT
Nginx: return 301 https://$host$request_uri;
ALB: HTTP listener redirects to HTTPS
4. VERIFY
curl https://api.example.com/health
Check cert expiry: echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates
Nginx Reverse Proxy Cheat Sheet
# HTTP --> HTTPS redirect
server {
listen 80;
server_name api.example.com;
return 301 https://$host$request_uri;
}
# HTTPS server
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
# Security headers
add_header Strict-Transport-Security "max-age=63072000" always;
add_header X-Frame-Options DENY always;
add_header X-Content-Type-Options nosniff always;
# Proxy to Node.js
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
CI/CD Pipeline Stages
+--------+ +--------+ +--------+ +--------+ +--------+
| Lint | --> | Test | --> | Build | --> | Scan | --> | Deploy |
| ESLint | | Jest | | Docker | | Trivy | | ECS / |
| Prettier| | Integ | | Image | | | | EC2 |
+--------+ +--------+ +--------+ +--------+ +--------+
Triggers:
PR opened --> Lint + Test only (no deploy)
Push to main --> Full pipeline (Lint + Test + Build + Scan + Deploy)
GitHub Actions key patterns:
# Cache npm to speed up CI
- uses: actions/setup-node@v4
with:
cache: "npm"
# Service containers for integration tests
services:
postgres:
image: postgres:16-alpine
# Only run deploy on main branch
if: github.ref == 'refs/heads/main'
# Require approval for production
environment: production
# Access secrets
env:
DB_URL: ${{ secrets.DATABASE_URL }}
Deployment Strategies Comparison
| Strategy | How It Works | Rollback Speed | Risk | Cost | Complexity |
|---|---|---|---|---|---|
| Recreate | Stop old, start new | Minutes (downtime!) | High | 1x | Very Low |
| Rolling update | Replace instances gradually | Minutes | Medium | 1x | Low |
| Blue/Green | Run both, switch traffic | Seconds | Low | 2x during deploy | Medium |
| Canary | 5% traffic to new, then grow | Seconds | Very Low | ~1x | High |
When to use what:
Dev/Testing --> Recreate (simplest)
Standard production --> Rolling update (ECS default)
Critical services --> Blue/Green (instant rollback)
High-traffic APIs --> Canary (limit blast radius)
Docker Image Tagging Strategy
DO:
my-api:abc1234 (git SHA -- unique, traceable)
my-api:v1.2.3 (semver -- human-readable releases)
DON'T:
my-api:latest (mutable -- which version is running?)
my-api:production (overwritten on every deploy -- no rollback)
Secrets Management
DEVELOPMENT:
.env file (in .gitignore)
CI/CD:
GitHub Settings > Secrets > Actions
Reference: ${{ secrets.MY_SECRET }}
Scope: organization > repository > environment
PRODUCTION:
ECS task definition environment variables
AWS Systems Manager Parameter Store
AWS Secrets Manager (for rotation)
Docker Secrets (Swarm mode)
NEVER:
Hardcode in Dockerfile
Commit .env to git
Echo secrets in CI logs
Use the same secrets across environments
Common Gotchas
| Gotcha | Why It Hurts | Fix |
|---|---|---|
Deploying :latest tag | Cannot trace which version is running | Use git SHA tags |
npm install in CI | Non-deterministic; may install different versions | Use npm ci |
| Running as root in Docker | Compromised container = full access | USER appuser in Dockerfile |
| No health checks | Orchestrator thinks broken container is healthy | Add HEALTHCHECK |
COPY . . before npm ci | Busts cache on every code change | Copy package files first |
| SSL cert expiry | Site goes down with zero warning | ACM (auto-renew) or Certbot timer |
| Exposing DB ports in production | Database reachable from internet | Remove ports: in prod Compose |
| No circuit breaker | Bad deploy stays up, serving errors | Enable ECS circuit breaker |
| Same secrets in staging and prod | Staging bug can corrupt production data | Separate secrets per environment |
No --only=production | Production image has devDependencies | npm ci --only=production |
| Forgetting HTTP-to-HTTPS redirect | Users on HTTP see unencrypted site | Nginx return 301 or ALB redirect |
| No resource limits | One container OOMs the host | Set memory and cpus limits |
PM2 Quick Reference
pm2 start ecosystem.config.js # Start app
pm2 list # View all processes
pm2 monit # Live dashboard
pm2 logs my-api # View logs
pm2 reload my-api # Zero-downtime reload (cluster mode)
pm2 restart my-api # Restart (brief downtime)
pm2 stop my-api # Stop
pm2 delete my-api # Remove from PM2
pm2 startup systemd # Auto-start on boot
pm2 save # Save current process list
One-Command References
# Build and push Docker image
docker build -t my-api:abc1234 . && docker push ECR/my-api:abc1234
# Scan for vulnerabilities
trivy image my-api:abc1234
# Deploy to ECS
aws ecs update-service --cluster prod --service my-api --force-new-deployment
# Rollback ECS to previous version
aws ecs update-service --cluster prod --service my-api --task-definition my-api:41
# Check SSL certificate expiry
echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates
# Test Nginx config
sudo nginx -t && sudo systemctl reload nginx
# Certbot renewal dry run
sudo certbot renew --dry-run
End of 6.9 quick revision.