Episode 6 — Scaling Reliability Microservices Web3 / 6.5 — Scaling Concepts

6.5 -- Exercise Questions: Scaling Concepts

Practice questions for all three subtopics in Section 6.5. Mix of conceptual, comparison, design, and hands-on tasks.

How to use this material (instructions)

Read lessons in order -- README.md, then 6.5.a -> 6.5.c.
Answer closed-book first -- then compare to the matching lesson.
Draw diagrams -- scaling questions benefit enormously from visual thinking.
Interview prep -- 6.5-Interview-Questions.md.
Quick review -- 6.5-Quick-Revision.md.

6.5.a -- Vertical vs Horizontal Scaling (Q1--Q12)

Q1. Define vertical scaling and horizontal scaling in one sentence each. Which one requires code changes?

Q2. Your Express API runs on a t3.small (2 vCPU, 2 GB RAM) and CPU averages 85% during peak hours. The team debates whether to upgrade to a t3.xlarge (4 vCPU, 16 GB RAM) or add a second t3.small behind a load balancer. List two advantages and two disadvantages of each approach.

Q3. Why does the cost curve for vertical scaling grow exponentially while horizontal scaling grows linearly? Explain using AWS instance pricing as an example.

Q4. A startup has a single PostgreSQL database that is the bottleneck. Should they scale the database vertically or horizontally? What factors should they consider before sharding?

Q5. Explain the concept of read replicas. Write pseudocode or real code showing how an application routes read queries to replicas and write queries to the primary.

Q6. Calculation: Your API handles 10 million requests per month. A single t3.medium (2 vCPU, 4 GB, $30/month) can handle 2 million requests/month. An m5.4xlarge (16 vCPU, 64 GB, $560/month) can handle 12 million. Compare the cost of vertical vs horizontal scaling to handle this load. Include ALB cost (~$20/month).

Q7. Explain the Node.js cluster module. Why does a single Node.js process fail to utilise a 4-core machine? Write the minimal code to fork one worker per CPU core.

Q8. Compare PM2 cluster mode to using the Node.js cluster module directly. When would you choose PM2 over writing your own cluster code?

Q9. What is auto-scaling? Describe the three components: metric (what to measure), threshold (when to trigger), and cooldown (prevent flapping). Give realistic values for an Express API.

Q10. Explain why database sharding is considered a "last resort" scaling strategy. Name three problems it introduces.

Q11. A video streaming company has two workloads: (a) a metadata API that serves JSON and (b) a transcoding service that converts video files. Which workload benefits more from vertical scaling and which from horizontal? Why?

Q12. Design exercise: You are architecting a system that must handle 1,000 requests/second today and 50,000 requests/second in 12 months. Outline a scaling plan that starts simple and evolves. Include database, application tier, and load balancing decisions at each stage.

6.5.b -- Load Balancers (Q13--Q24)

Q13. Explain what a load balancer does in three bullet points. Why is it essential for horizontal scaling?

Q14. Compare Layer 4 and Layer 7 load balancing. Give one use case where Layer 4 is better and one where Layer 7 is required.

Q15. Describe the round robin algorithm. What is its biggest weakness? When does it work well?

Q16. Explain the least connections algorithm. Why is it better than round robin for applications with variable request durations? Give an example scenario.

Q17. What is IP hash load balancing? When would you use it instead of round robin? What happens when you add or remove a server from the pool?

Q18. Write an Nginx configuration that load-balances across three backend servers using the least connections algorithm, with passive health checks that mark a server unhealthy after 3 failures.

Q19. Explain the difference between a liveness health check (/health) and a readiness health check (/health/ready). Write an Express endpoint for each.

Q20. Compare AWS ALB, NLB, and CLB. Which would you choose for: (a) a REST API, (b) a gaming server using raw TCP, (c) a gRPC microservice?

Q21. What is SSL termination? Why do most architectures terminate SSL at the load balancer rather than at each backend server?

Q22. Explain DNS-based load balancing. What are its three biggest limitations compared to a dedicated load balancer?

Q23. Draw a multi-tier load balancing architecture with: Route 53 (global DNS), CloudFront (CDN), ALBs (regional), and backend servers. Explain the role of each layer.

Q24. Scenario: Your API has three endpoints: /api/users (fast, 50ms), /api/reports (slow, 5 seconds), and /api/upload (file upload, 30 seconds). Should you use the same load balancing algorithm for all three? Propose a better strategy.

6.5.c -- Stateless Design (Q25--Q37)

Q25. Define "stateless server" in one sentence. Does stateless mean the application has no state at all?

Q26. List five types of state that a typical Express application might store on the server. For each, name the external store it should be moved to for stateless design.

Q27. Explain why stateless design is a prerequisite for horizontal scaling. Use a specific example of what goes wrong with stateful servers behind a load balancer.

Q28. Write the code to migrate an Express app from MemoryStore sessions to Redis-backed sessions using connect-redis. What changes are needed in the application code itself (routes, middleware)?

Q29. Compare JWT authentication and Redis-backed sessions for a stateless architecture. When would you choose one over the other? Address token revocation.

Q30. A developer says: "We use sticky sessions, so our stateful app scales horizontally just fine." Explain three problems with this statement.

Q31. Your application allows users to upload profile pictures. Currently, files are saved to /uploads/ on the local disk. Describe how to migrate to S3 while keeping the same API contract for the frontend.

Q32. Explain how WebSocket connections break the stateless model. How does the Redis pub/sub adapter for Socket.IO solve cross-server messaging?

Q33. Your application uses setInterval to run a cleanup job every 5 minutes. Why is this a problem in a horizontally scaled deployment? Propose a solution using a distributed job queue.

Q34. Hands-on: Write a stateless Express API with: (a) JWT authentication (no server-side session), (b) Redis-backed caching, and (c) an endpoint that works correctly regardless of which server instance handles the request.

Q35. Explain how rate limiting breaks in a stateful multi-instance deployment. If each server tracks requests independently and the limit is 100 req/15min, how many requests could a user actually make across 3 servers?

Q36. Use the stateless design checklist from 6.5.c to audit the following Express app. Identify all stateful patterns and propose fixes:

const sessions = {};
const cache = {};
const express = require('express');
const app = express();

app.post('/login', (req, res) => {
  const sessionId = Math.random().toString(36);
  sessions[sessionId] = { user: req.body.username };
  res.cookie('sid', sessionId);
  res.json({ ok: true });
});

app.get('/data', (req, res) => {
  if (cache['data']) return res.json(cache['data']);
  const data = fetchFromDB();
  cache['data'] = data;
  res.json(data);
});

app.post('/upload', multer({ dest: './uploads' }).single('file'), (req, res) => {
  res.json({ path: req.file.path });
});

setInterval(() => { /* cleanup */ }, 60000);

Q37. Design challenge: Describe a complete stateless architecture for an e-commerce checkout flow that includes: user authentication, shopping cart, payment processing, and order confirmation email. Specify which external stores you would use and why.

Answer Hints

Q	Hint
Q1	Vertical = bigger machine (no code changes). Horizontal = more machines (must be stateless).
Q6	Horizontal: 5x t3.medium ($150) + ALB ($20) = $170. Vertical: 1x m5.4xlarge = $560. Horizontal is 3.3x cheaper.
Q7	Node.js is single-threaded. `cluster.fork()` creates child processes that share the same port.
Q10	Cross-shard queries, re-sharding complexity, loss of JOINs, operational overhead.
Q14	L4: raw TCP gaming server. L7: URL-based routing for microservices.
Q17	IP hash gives the same client the same server. Adding/removing a server causes rehashing for a portion of clients.
Q20	(a) ALB (HTTP, path routing). (b) NLB (raw TCP). (c) ALB (gRPC support).
Q25	No server-side per-client data between requests. State still exists -- in external stores.
Q29	JWT: fully stateless, hard to revoke. Redis sessions: easy revocation, needs network hop.
Q30	Uneven load, session loss on failure, cannot use spot instances.
Q35	300 requests total (100 per server x 3 servers). Use Redis-based rate limiter for accurate global counting.
Q36	Four problems: in-memory sessions, in-memory cache, local file uploads, setInterval job. Fix with Redis, Redis, S3, Bull queue.

<- Back to 6.5 -- Scaling Concepts (README)