Episode 6 — Scaling Reliability Microservices Web3 / 6.8 — Production Hardening

Interview Questions: Production Hardening

Model answers for rate limiting, CORS and secure headers, and DDoS protection.

How to use this material (instructions)

Read lessons in order — README.md, then 6.8.a → 6.8.c.
Practice out loud — definition → example → pitfall.
Pair with exercises — 6.8-Exercise-Questions.md.
Quick review — 6.8-Quick-Revision.md.

Beginner (Q1–Q4)

Q1. What is rate limiting and why do production APIs need it?

Why interviewers ask: Tests if you understand the most fundamental production security measure — every deployed API must have it.

Model answer:

Rate limiting restricts how many requests a client can make within a time window. Without it, a single client — whether a malicious attacker or a buggy script — can overwhelm your server, drive up cloud costs, brute-force authentication, or degrade service for everyone.

In production, you typically implement multiple layers: a global limit (e.g., 500 requests per 15 minutes per IP), endpoint-specific limits (e.g., 5 login attempts per 15 minutes), and tiered limits based on the caller's plan (free vs. paid). The standard library for Express is express-rate-limit, and for multi-server deployments you back it with Redis so all instances share a single counter.

When a client exceeds the limit, the server returns HTTP 429 Too Many Requests along with headers like RateLimit-Limit, RateLimit-Remaining, and Retry-After so the client can self-regulate.

Q2. What is CORS and why does the browser enforce it?

Why interviewers ask: CORS errors are one of the most common issues developers encounter — understanding the mechanism separates debugging from guessing.

Model answer:

CORS (Cross-Origin Resource Sharing) is a browser security mechanism built on top of the Same-Origin Policy (SOP). SOP prevents JavaScript on one origin (protocol + host + port) from reading responses from a different origin. Without SOP, any website you visit could silently call your bank's API using your cookies and steal data.

CORS allows servers to opt in to cross-origin access by sending specific response headers. For simple GET/POST requests, the server returns Access-Control-Allow-Origin with the allowed origin. For non-simple requests (PUT, DELETE, custom headers, JSON content-type), the browser sends an OPTIONS preflight first, and the server must respond with Access-Control-Allow-Methods and Access-Control-Allow-Headers.

// Production CORS setup
import cors from 'cors';

app.use(cors({
  origin: ['https://app.example.com', 'https://admin.example.com'],
  credentials: true,
  methods: ['GET', 'POST', 'PUT', 'DELETE'],
  allowedHeaders: ['Content-Type', 'Authorization'],
  maxAge: 86400,
}));

The critical rule: never use Access-Control-Allow-Origin: * in production, especially with credentials. Whitelist specific origins.

Q3. What does Helmet.js do?

Why interviewers ask: Tests awareness of HTTP security headers — a simple middleware that prevents entire classes of attacks.

Model answer:

Helmet.js is an Express middleware that sets security-related HTTP headers with a single app.use(helmet()) call. It configures:

Content-Security-Policy (CSP) — controls which scripts, styles, and resources the browser can load. The most powerful anti-XSS header.
Strict-Transport-Security (HSTS) — forces browsers to use HTTPS only.
X-Frame-Options: DENY — prevents your page from being embedded in an iframe (clickjacking protection).
X-Content-Type-Options: nosniff — stops browsers from MIME-sniffing responses.
Referrer-Policy — controls what referrer information is sent with requests.

It also removes X-Powered-By: Express, which leaks server technology information. Helmet is considered the minimum security baseline for any Express application in production.

Q4. What are the three types of DDoS attacks?

Why interviewers ask: Verifies you understand the threat landscape and can reason about which defenses apply where.

Model answer:

Volumetric attacks (Layer 3/4) flood your network with massive data — UDP floods, ICMP floods, DNS amplification. Scale can reach terabits per second. Defense: CDN/CloudFront with massive edge bandwidth absorbs the traffic before it reaches your infrastructure.

Protocol attacks (Layer 3/4) exploit protocol weaknesses to exhaust connection-state resources. SYN floods send thousands of half-open TCP connections that fill your server's connection table. Defense: AWS Shield Standard handles these automatically.

Application-layer attacks (Layer 7) are the most dangerous for Node.js developers. They send seemingly legitimate HTTP requests — search queries, login attempts, API calls — that consume CPU, database connections, and memory. A Slowloris attack holds connections open by sending headers extremely slowly. Defense: WAF rate-based rules, application-level rate limiting, request timeouts, and queue-based processing for expensive endpoints.

Intermediate (Q5–Q8)

Q5. Compare token bucket and sliding window counter for rate limiting. When would you choose each?

Why interviewers ask: Shows depth beyond "I use express-rate-limit" — understanding algorithm trade-offs matters for system design.

Model answer:

Sliding window counter divides time into fixed windows and uses a weighted estimate from the previous window to provide smooth enforcement. It is simple, memory-efficient (stores only two counters per client), and prevents the boundary-burst problem of pure fixed windows.

Sliding Window: prev_count × (1 - window_progress) + current_count
Example: 80 × 0.6 + 30 = 78 → under 100 limit → allowed

Token bucket gives each client a bucket that refills at a steady rate. Requests consume tokens. An empty bucket means denial. The key difference: token bucket allows controlled bursts — a client that was idle for a while accumulates tokens and can briefly exceed the per-second rate.

Token Bucket: capacity=10, refill=2/sec
After 5s idle: 10 tokens accumulated → can send 10 requests instantly
Then: 2 requests/sec sustained rate

Choose sliding window counter when you want steady, predictable enforcement (most REST APIs). Choose token bucket when bursts are acceptable and even desirable (Stripe, AWS — they want clients to be able to spike briefly after periods of low usage).

Q6. How do you handle CORS for a system with multiple frontends and a shared API?

Why interviewers ask: Tests practical CORS knowledge in a realistic multi-tenant or multi-app scenario.

Model answer:

Use a dynamic origin validation function instead of a static list:

const allowedOrigins = new Set([
  process.env.WEB_APP_URL,        // https://app.example.com
  process.env.ADMIN_URL,          // https://admin.example.com
  process.env.MOBILE_WEB_URL,     // https://m.example.com
  process.env.PARTNER_URL,        // https://partner.thirdparty.com
].filter(Boolean));

app.use(cors({
  origin: (origin, callback) => {
    // Allow server-to-server (no origin) and mobile apps
    if (!origin) return callback(null, true);

    if (allowedOrigins.has(origin)) {
      callback(null, true);
    } else {
      console.warn(`CORS blocked: ${origin}`);
      callback(new Error('Not allowed by CORS'));
    }
  },
  credentials: true,
  exposedHeaders: ['RateLimit-Limit', 'RateLimit-Remaining'],
  maxAge: 86400,
}));

Key considerations: (1) Store allowed origins in environment variables, not hardcoded. (2) Log blocked origins for debugging. (3) Never reflect the incoming Origin header without validation — that is equivalent to allowing everyone. (4) When using credentials (cookies), you cannot use * for origin — you must return the specific requesting origin. (5) Cache preflight with maxAge to reduce OPTIONS request volume. (6) Expose custom headers (RateLimit-*) via exposedHeaders or the frontend cannot read them.

Q7. How do you protect an Express application against Slowloris attacks?

Why interviewers ask: Tests understanding of a non-obvious application-layer attack vector and Node.js-specific mitigations.

Model answer:

Slowloris opens many connections and sends HTTP headers extremely slowly — one byte every few seconds — keeping each connection alive without completing the request. Node.js has a limited connection pool, and these connections tie it up.

Mitigation involves server-level timeouts:

const server = app.listen(3000);

// Max time to receive all headers (kills Slowloris)
server.headersTimeout = 20000;    // 20 seconds

// Max time to receive complete request
server.requestTimeout = 30000;    // 30 seconds

// Close idle keep-alive connections
server.keepAliveTimeout = 5000;   // 5 seconds

// Overall request timeout
server.timeout = 60000;           // 60 seconds

Additionally: (1) Connection limiting — track active connections and refuse new ones past a threshold. (2) Reverse proxy — put Nginx or an ALB in front of Express; they handle slow connections much better than Node.js. (3) CloudFront — terminates connections at the edge. (4) Rate limiting — limit total connections per IP.

The critical insight: headersTimeout is the Slowloris killer. Default Node.js headersTimeout is 60 seconds, which is far too generous. Set it to 10-20 seconds.

Q8. Design a WAF rule set for an e-commerce API.

Why interviewers ask: Tests practical security architecture — combining multiple WAF rules into a coherent defense.

Model answer:

Priority order matters — WAF evaluates rules in priority order and stops at the first match.

Rule 1 (Priority 1): IP Reputation Block
  → Use AWS Managed IP Reputation List
  → Block known malicious IPs, botnets, anonymizers
  → Action: BLOCK

Rule 2 (Priority 2): Rate Limiting
  → Rate-based rule: max 2000 requests / 5 minutes per IP
  → Action: BLOCK (auto-unblocks when rate drops)

Rule 3 (Priority 3): Geo-Restriction
  → Only allow countries where you have customers
  → Block: all countries NOT in (US, CA, GB, DE, FR, AU)
  → Action: BLOCK

Rule 4 (Priority 4): SQL Injection Protection
  → AWS Managed SQLi Rule Set
  → Inspects query strings, body, headers, URI
  → Action: BLOCK

Rule 5 (Priority 5): XSS Protection
  → AWS Managed XSS Rule Set
  → Action: BLOCK

Rule 6 (Priority 6): Known Bad Inputs
  → AWS Managed Known Bad Inputs Rule Set
  → Blocks Log4j, path traversal, etc.
  → Action: BLOCK

Rule 7 (Priority 7): Size Constraints
  → Block request body > 8KB (except /api/upload)
  → Block URI > 2KB
  → Action: BLOCK

Rule 8 (Priority 8): Bot Control
  → AWS Bot Control managed rule
  → Challenge/CAPTCHA suspected bots
  → Action: CAPTCHA

Default Action: ALLOW

This set is defense-in-depth: IP reputation catches known bad actors, rate limiting catches automated abuse, geo-blocking reduces attack surface, managed rules catch common web vulnerabilities, and size constraints prevent payload-based attacks.

Advanced (Q9–Q11)

Q9. Design a distributed rate limiting system that handles 100K requests per second across 50 server instances.

Why interviewers ask: Tests system design under extreme scale — rate limiting itself must not become the bottleneck.

Model answer:

At 100K req/s, a naive "check Redis on every request" approach creates a Redis bottleneck. The design uses local + global rate limiting:

Architecture:

  Client Request
       │
       ▼
  ┌─────────────────────┐
  │  Local Token Bucket   │  Each server maintains in-memory
  │  (per server)         │  token buckets. Allows 80% of
  │                       │  the per-server quota locally.
  └──────────┬────────────┘
             │ Only when local bucket is exhausted
             │ OR every N seconds for sync
             ▼
  ┌─────────────────────┐
  │  Redis Cluster        │  Global counters. Servers sync
  │  (shared state)       │  periodically (every 1-2 seconds)
  │  Lua atomic scripts   │  not on every request.
  └─────────────────────┘

Implementation strategy:

Split the global limit across servers. If the limit is 1000/min and you have 50 servers, each server gets a local budget of 20/min.
Local token bucket handles most requests purely in-memory — zero network overhead.
Periodic sync (every 1-2 seconds) — each server reports its count to Redis and fetches the global count. If the global count is approaching the limit, servers reduce their local budgets.
Lua scripts in Redis ensure atomic check-and-increment.
Redis Cluster (not single Redis) for high availability and throughput.
Approximate is acceptable — at this scale, being 5% over the limit occasionally is fine. The goal is protection, not perfect accounting.

Trade-off: This system allows brief over-limit bursts (the time between syncs) but eliminates Redis as a bottleneck. For 99% of use cases, this approximation is sufficient.

Q10. A production incident: your API is returning 403 to some legitimate users. Logs show CORS errors, WAF blocks, and rate limit hits all at the same time. How do you triage?

Why interviewers ask: Tests incident response skills and understanding of how security layers interact and can cause false positives.

Model answer:

Step 1: Categorize the errors.

Check logs by error type:

CORS errors (browser console, not server logs) — the browser blocked the response. Check if a new frontend origin was deployed without updating the CORS whitelist.
WAF blocks (AWS WAF logs in CloudWatch/S3) — check which rule is triggering. Look at the terminatingRuleId. Common false positive: a legitimate request body that matches a SQL injection pattern (e.g., a product description containing "SELECT").
Rate limit 429s (application logs) — check if a legitimate high-traffic client (corporate NAT, a partner integration) is hitting the limit.

Step 2: Correlate by client identity.

Group the errors by IP, user ID, and API key. If the same users are hitting multiple errors, the root cause is likely one layer affecting the others.

Step 3: Check recent deployments.

Did someone change the CORS whitelist, WAF rules, or rate limit thresholds? Roll back the change if it correlates with the incident timeline.

Step 4: Fix by priority.

CORS: Add the missing origin to the whitelist. Deploy immediately.
WAF false positive: Add an exception rule (scope it narrowly to the specific URI/pattern). Do NOT disable the entire rule set.
Rate limit: Temporarily increase the limit for the affected client, then work with them on proper API key throttling.

Step 5: Post-mortem.

Add monitoring alerts for each layer: CORS rejection rate, WAF block rate, 429 response rate. Establish baselines so spikes trigger alerts before users complain.

Q11. How would you design a complete production hardening strategy for a new microservices platform launching to 1 million users?

Why interviewers ask: Tests the ability to synthesize all production hardening concepts into a coherent architecture.

Model answer:

Traffic Flow:

DNS (Route 53)
  → CloudFront (CDN + Shield Standard + WAF)
    → ALB (HTTPS termination, health checks)
      → API Gateway service (rate limiting, auth)
        → Microservices (business logic)
          → Databases (RDS, DynamoDB, Redis)

Layer 1 — Edge (CloudFront + Shield):

Enable Shield Standard (free, automatic L3/L4 protection).
CloudFront caches static assets and absorbs volumetric attacks.
Consider Shield Advanced if revenue justifies the $3K/month.

Layer 2 — WAF:

IP reputation list (managed rule).
Rate-based rule: 3000 req/5min per IP.
SQL injection + XSS managed rules.
Known bad inputs managed rule.
Geo-restriction to target markets.
Bot Control for sensitive endpoints.

Layer 3 — API Gateway Service:

Distributed rate limiting with Redis (tiered by plan: free/basic/pro/enterprise).
Authentication and API key validation.
Request ID injection for tracing.

Layer 4 — Each Microservice:

Helmet.js for secure headers.
CORS whitelist (only the gateway should call services; services reject direct browser access).
Request size limits (1MB JSON, 5MB uploads).
Request timeouts (30s default, 5s for health checks).
Connection limits.

Layer 5 — Data:

Database connection pooling with limits.
Query timeouts.
Read replicas for expensive queries.

Monitoring:

CloudWatch dashboards for WAF blocks, Shield events, 4xx/5xx rates.
Alerts on rate limit spike (>10% requests getting 429).
Alert on WAF block spike.
Runbook for each alert type.

// Gateway service — core hardening middleware stack
app.set('trust proxy', 1);
app.use(enforceHttps);
app.use(helmet(productionHelmetConfig));
app.use(cors(productionCorsConfig));
app.use(express.json({ limit: '1mb' }));
app.use(globalRateLimiter);      // Redis-backed, 500/15min
app.use('/api/auth', authLimiter); // 5/15min, skip successes
app.use(requestTimeout(30000));
app.use(requestIdMiddleware);
app.use(authMiddleware);
app.use('/api/', tieredApiLimiter); // By API key plan

The key principle: defense in depth. Each layer assumes the previous layer might fail. The WAF assumes CloudFront might let something through. The rate limiter assumes the WAF might not catch it. The application code assumes the rate limiter might not be enough.

Quick-fire

#	Question	One-line answer
1	HTTP status for rate limited?	429 Too Many Requests
2	Which algorithm allows controlled bursts?	Token bucket
3	CORS wildcard with credentials?	Forbidden — browser blocks it
4	What does `HttpOnly` on a cookie prevent?	JavaScript access (XSS cookie theft)
5	AWS Shield Standard cost?	Free — included with all AWS services
6	What triggers a CORS preflight?	Non-simple requests (PUT, DELETE, custom headers, JSON content-type)
7	Most powerful anti-XSS header?	Content-Security-Policy
8	Slowloris killer setting?	`server.headersTimeout` (set to 10-20s)
9	What does `app.disable('x-powered-by')` do?	Removes `X-Powered-By: Express` header
10	WAF stands for?	Web Application Firewall

← Back to 6.8 — Production Hardening (README)