Episode 6 — Scaling Reliability Microservices Web3 / 6.9 — Final Production Deployment

6.9 -- Exercise Questions: Final Production Deployment

Practice questions for all three subtopics in Section 6.9. Mix of conceptual, hands-on, and design tasks.

How to use this material (instructions)

  1. Read lessons in order -- README.md, then 6.9.a --> 6.9.c.
  2. Answer closed-book first -- then compare to the matching lesson.
  3. Try the hands-on tasks -- spin up Docker, configure Nginx, write a GitHub Actions file.
  4. Interview prep -- 6.9-Interview-Questions.md.
  5. Quick review -- 6.9-Quick-Revision.md.

6.9.a -- Docker Deployment (Q1--Q12)

Q1. What is a multi-stage build in Docker? Explain why a production Node.js image should use one.

Q2. Look at this Dockerfile snippet. Identify three problems and fix them:

FROM node:20
WORKDIR /app
COPY . .
RUN npm install
ENV DATABASE_URL=postgresql://admin:password123@db:5432/prod
EXPOSE 3000
CMD ["node", "src/index.js"]

Q3. Why should Docker containers run as a non-root user? What specific risk does running as root create?

Q4. Explain the purpose of .dockerignore. List five entries that should always be in a Node.js project's .dockerignore.

Q5. In the following Dockerfile, explain why the order of COPY and RUN matters for layer caching:

COPY package.json package-lock.json ./
RUN npm ci
COPY . .

What happens if you reverse the first and third lines?

Q6. Compare Docker's three network drivers: bridge, host, and overlay. When would you use each?

Q7. What is the difference between a named volume and a bind mount? Which is appropriate for production database persistence?

Q8. Write a Docker Compose healthcheck for a PostgreSQL container that verifies the database is accepting connections.

Q9. A container keeps getting OOMKilled in production. What does this mean, and how do you fix it?

Q10. Explain the difference between docker compose up (development) and docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d (production). What does each Compose file typically contain?

Q11. You want to pass a database password to a Docker container without baking it into the image. Name three methods for injecting secrets at runtime.

Q12. Hands-on: Write a complete multi-stage Dockerfile for a TypeScript Express API that:

  • Uses node:20-alpine as the base
  • Installs dependencies with npm ci
  • Compiles TypeScript
  • Creates a non-root user
  • Includes a health check
  • Only contains production dependencies in the final image

6.9.b -- EC2 and SSL (Q13--Q24)

Q13. Compare t3 and m5 EC2 instance types. What is the "burstable" model and when might it be a problem?

Q14. What is PM2 and why should you use it instead of running node index.js directly in production? Name four features PM2 provides.

Q15. Explain the difference between pm2 restart and pm2 reload. Which one causes downtime?

Q16. What is an Elastic IP and why do you need one for a production EC2 instance?

Q17. Describe the steps to point a custom domain (e.g., api.example.com) to an EC2 instance using Route 53. What DNS record type do you use?

Q18. What is the difference between an A record and a CNAME record? When do you use each?

Q19. Explain why SSL/TLS is non-negotiable for production. Name four consequences of running a production API over plain HTTP.

Q20. Compare ACM (AWS Certificate Manager) and Let's Encrypt. When would you choose each?

Q21. What is SSL termination? Compare terminating SSL at the ALB vs at the Nginx server. What are the trade-offs?

Q22. Write the Nginx server block that:

  • Listens on port 80
  • Redirects all HTTP traffic to HTTPS
  • Listens on port 443 with SSL
  • Proxies requests to a Node.js app on port 3000
  • Sets proper proxy headers (X-Real-IP, X-Forwarded-For, etc.)

Q23. Let's Encrypt certificates expire after 90 days. How does Certbot handle renewal? What happens if renewal fails?

Q24. Hands-on: Write out the complete sequence of bash commands to go from a fresh EC2 instance to a live HTTPS API. Include: installing Node.js, deploying code, setting up PM2, configuring Nginx, and obtaining a Let's Encrypt certificate.


6.9.c -- CI/CD Pipelines (Q25--Q36)

Q25. Define Continuous Integration, Continuous Delivery, and Continuous Deployment. What is the key difference between Delivery and Deployment?

Q26. Name the five core stages of a CI/CD pipeline in order, and explain what each stage checks.

Q27. In a GitHub Actions workflow file, what is the difference between a job and a step? Can jobs run in parallel?

Q28. Why should you use npm ci instead of npm install in CI pipelines? What problem does npm ci solve?

Q29. Explain how GitHub Secrets work. Where do you configure them, and how do you reference them in a workflow file?

Q30. Why should you never deploy the :latest tag to production? What tagging strategy should you use instead?

Q31. What is the ECS deployment circuit breaker? How does it protect against bad deployments?

Q32. Compare blue/green, canary, and rolling update deployment strategies. Fill in a comparison table with: rollback speed, risk level, resource cost, and complexity.

Q33. Your CI pipeline takes 12 minutes. Name four strategies to speed it up.

Q34. A deployment to production fails at 4 PM. Walk through the rollback procedure step by step using ECS and ECR.

Q35. Explain the concept of environment promotion (staging --> production). Why should you never deploy directly to production without a staging step?

Q36. Hands-on: Write a complete GitHub Actions workflow file (.github/workflows/ci.yml) that:

  • Triggers on push to main and pull requests
  • Runs ESLint
  • Runs Jest tests with a PostgreSQL service container
  • Builds a Docker image
  • Pushes the image to ECR (only on main branch pushes)

Answer Hints

QHint
Q2Problems: no multi-stage, npm install not npm ci, secret baked into image, runs as root, no .dockerignore benefit, no health check
Q5Reversing means every code change invalidates the npm install layer -- full reinstall on every build
Q9OOMKilled = container exceeded its memory limit; fix by increasing limit or finding the memory leak
Q13t3 earns/spends CPU credits; m5 gives consistent performance; t3 can be throttled if credits exhausted
Q15restart stops then starts (brief downtime); reload starts new processes before stopping old ones (zero downtime, cluster mode only)
Q18A record points to an IP; CNAME points to another domain name; A for EC2 with Elastic IP, CNAME for ALB
Q21ALB termination: free ACM cert, offloads CPU; Server termination: end-to-end encryption, need Certbot
Q25Delivery: human approves deploy; Deployment: automatic deploy on passing tests
Q28npm ci deletes node_modules and installs exact versions from lock file -- deterministic, faster
Q30:latest is mutable -- you cannot tell which version is running; use git SHA tags for traceability
Q31Circuit breaker detects failed health checks and automatically rolls back to the previous task definition
Q33Cache node_modules, run lint/test in parallel, use smaller runner image, skip unchanged services
Q34Identify previous working image tag --> update ECS service to previous task definition --> verify health --> investigate root cause

<-- Back to 6.9 -- Final Production Deployment (README)