Episode 6 — Scaling Reliability Microservices Web3 / 6.6 — Caching in Production
6.6 — Interview Questions: Caching in Production
Model answers for Redis caching, cache invalidation, and TTL strategies.
How to use this material (instructions)
- Read lessons in order —
README.md, then6.6.a→6.6.c. - Practice out loud — definition, example, pitfall.
- Pair with exercises —
6.6-Exercise-Questions.md. - Quick review —
6.6-Quick-Revision.md.
Beginner (Q1–Q4)
Q1. What is caching and why do we use it in production?
Why interviewers ask: Tests fundamental understanding. Engineers who cannot explain caching will struggle with performance optimization, which is a daily concern in production systems.
Model answer:
Caching is storing frequently accessed data in a fast, temporary storage layer (like Redis or in-memory) so that subsequent requests can be served without hitting the slower primary data source (database, external API).
We use caching to solve three production problems: latency (a Redis lookup takes < 1ms vs 50-500ms for a database query), throughput (Redis handles 100K+ operations/second vs a few thousand for most databases), and cost (fewer database queries = smaller database instance = lower cloud bills).
The fundamental tradeoff: caching trades memory and freshness for speed and scalability. Cached data may be slightly stale, and you need memory to store it, but the performance gains are typically 10-100x.
// Without cache: every request hits the database
app.get('/api/products/:id', async (req, res) => {
const product = await db.findOne({ _id: req.params.id }); // 50-200ms
res.json(product);
});
// With cache: 95%+ of requests skip the database entirely
app.get('/api/products/:id', async (req, res) => {
const cached = await redis.get(`product:${req.params.id}`); // < 1ms
if (cached) return res.json(JSON.parse(cached));
const product = await db.findOne({ _id: req.params.id });
await redis.set(`product:${req.params.id}`, JSON.stringify(product), 'EX', 900);
res.json(product);
});
Q2. What is Redis and why is it the industry standard for caching?
Why interviewers ask: Redis appears in virtually every production stack. Interviewers want to confirm you have practical experience with it, not just theoretical knowledge.
Model answer:
Redis (Remote Dictionary Server) is an open-source, in-memory data structure store. It keeps all data in RAM, which is why reads and writes complete in sub-millisecond times. Unlike a simple key-value store, Redis supports rich data structures: strings, hashes, lists, sets, sorted sets, and more.
Redis is the industry standard because it combines five qualities no other tool matches simultaneously:
- Speed — 100K-500K operations/second, < 1ms latency
- Data structures — not just get/set; hashes for partial object updates, sorted sets for leaderboards, lists for queues
- Built-in TTL — keys auto-expire, perfect for cache lifecycles
- Shared state — all app server instances read from the same cache (unlike in-process caches)
- Ecosystem — pub/sub for invalidation, Lua scripting for atomic operations, clustering for horizontal scaling
In production, Redis commonly serves as: a cache layer, a session store, a rate limiter, a pub/sub message broker, and a job queue — often all in the same deployment.
Q3. Explain the cache-aside pattern.
Why interviewers ask: Cache-aside is the most widely used caching pattern. Understanding it (and its limitations) is table-stakes knowledge for backend engineers.
Model answer:
Cache-aside (also called lazy loading) means the application is responsible for managing the cache. The cache is a passive store — it does not load data on its own.
Read flow:
- Application checks the cache for the requested key
- Cache HIT — return the cached data immediately
- Cache MISS — query the database, store the result in the cache (with a TTL), return the data
Write flow:
- Application writes the update to the database (source of truth)
- Application deletes the cache key (invalidation)
- The next read will miss the cache and re-populate it with fresh data
// Read path
async function getProduct(id) {
const cached = await redis.get(`product:${id}`);
if (cached) return JSON.parse(cached); // HIT
const product = await db.findOne({ _id: id }); // MISS
if (product) {
await redis.set(`product:${id}`, JSON.stringify(product), 'EX', 600);
}
return product;
}
// Write path
async function updateProduct(id, data) {
await db.updateOne({ _id: id }, { $set: data });
await redis.del(`product:${id}`); // Invalidate — don't update
}
Key detail: On writes, we delete the cache key rather than updating it. This is safer because it avoids race conditions where two concurrent writes could leave the cache in an inconsistent state. The next read always fetches the freshest data from the database.
Limitation: The first request after invalidation (or after a cold start) always hits the database. For very hot keys, this cold miss can be problematic (cache stampede).
Q4. What is TTL and how do you choose the right value?
Why interviewers ask: TTL selection reveals whether a candidate thinks about data freshness vs performance tradeoffs — a critical production skill.
Model answer:
TTL (Time To Live) is the duration in seconds that a cached value remains valid. When the TTL expires, Redis automatically deletes the key. The next request triggers a cache miss and a fresh database query.
Choosing the right TTL depends on the data's change frequency and the tolerance for staleness:
| Data Type | TTL | Reasoning |
|---|---|---|
| Static config / feature flags | 5-15 minutes | Rarely changes, but updates should propagate within minutes |
| Product catalog | 15-60 minutes | Changes infrequently, slight staleness is acceptable |
| Product prices | 1-5 minutes | Stale prices can cause customer complaints or overselling |
| Inventory / stock counts | 30-60 seconds | High-frequency changes, staleness causes overselling |
| User sessions | 24-48 hours | Long-lived by design, invalidated explicitly on logout |
| Real-time data (stock prices) | 5-15 seconds | Must be very fresh |
The golden rule: every cached key should have a TTL. Caching without expiration is a ticking time bomb — if invalidation logic has a bug, stale data persists forever. TTL is your safety net.
Intermediate (Q5–Q8)
Q5. Explain cache invalidation and why it is considered one of the hardest problems in computer science.
Why interviewers ask: This separates engineers who have dealt with production caching bugs from those who have only read tutorials. Real-world invalidation is messy.
Model answer:
Cache invalidation is the process of removing or updating cached data when the source data changes. It is "hard" because of three fundamental challenges:
1. Distributed state. The cache and database are separate systems. There is no atomic "update both at the same time" operation. Between writing to the DB and invalidating the cache, there is always a window where they disagree.
2. Derived data. When a user changes their name, you need to invalidate not just user:123 but every cache key that contains that user's name: team pages, leaderboards, search indices, comment displays, email templates. Tracking all these dependencies is error-prone.
3. Concurrency. Two simultaneous writes can create a permanent stale state:
t=0 Request A reads user from DB (gets old data, slow query)
t=1 Request B updates user in DB
t=2 Request B deletes user from cache (correct invalidation)
t=3 Request A stores OLD data in cache (from its slow read at t=0)
Cache now has stale data with no future invalidation scheduled
In production, we address this with layered defenses: event-based invalidation (delete on write) for immediate freshness, TTL as a safety net (stale data expires eventually), and version-based keys or locking for high-concurrency scenarios. No single strategy is perfect — you combine them based on the data's criticality.
Q6. What is cache stampede and how do you prevent it?
Why interviewers ask: Cache stampede is a real production incident trigger. Engineers who have handled traffic spikes know this pattern intimately.
Model answer:
Cache stampede (thundering herd) occurs when a popular cache key expires and many concurrent requests simultaneously discover the cache is empty. All of them query the database at the same time, potentially overwhelming it.
Example: A product page cached with a 10-minute TTL receives 1,000 requests/second. When the key expires, all 1,000 requests in that second miss the cache and each independently queries the database — instead of 1 query, you get 1,000 identical queries.
Three prevention strategies:
1. Locking (Mutex): Only one request fetches from the database. Others wait.
async function getWithLock(key, fetchFn, ttl) {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const lockAcquired = await redis.set(`lock:${key}`, '1', 'NX', 'EX', 10);
if (lockAcquired) {
const data = await fetchFn();
await redis.set(key, JSON.stringify(data), 'EX', ttl);
await redis.del(`lock:${key}`);
return data;
}
// Wait and retry
await new Promise(r => setTimeout(r, 100));
return getWithLock(key, fetchFn, ttl);
}
2. Probabilistic early expiration: Randomly refresh the cache before it expires. As TTL decreases, the probability of a background refresh increases, spreading load over time instead of concentrating it at expiration.
3. Background refresh: A background job monitors hot keys and refreshes them before their TTL runs out, so the key never actually expires under traffic.
I use locking for most cases, and background refresh for keys that receive extremely high traffic (homepage, trending content).
Q7. Compare HTTP caching (Cache-Control headers) vs application caching (Redis). When do you use each?
Why interviewers ask: Full-stack engineers need to understand caching at every layer. Many candidates only know Redis and overlook HTTP caching, which can eliminate server requests entirely.
Model answer:
| Aspect | HTTP Caching | Application Caching (Redis) |
|---|---|---|
| Where | Browser + CDN (before server) | Server-side (after server, before DB) |
| Benefit | Request never reaches the server | Server skips the database |
| Control | Response headers | Application code |
| Invalidation | TTL, ETag, 304 Not Modified | DEL, TTL, pub/sub |
| Scope | public (everyone) or private (one user) | All server instances |
| Best for | Static assets, public API responses | DB query results, computed data, sessions |
In production, you layer both:
Request → Browser Cache → CDN Cache → App Server → Redis → Database
Each layer catches requests before they reach the next layer.
// Layer 1: HTTP headers (browser + CDN)
res.set('Cache-Control', 'public, max-age=60, s-maxage=300');
// Layer 2: Redis (application cache)
const cached = await redis.get('products:featured');
if (cached) return res.json(JSON.parse(cached));
HTTP caching is more impactful when it works — a browser cache hit means zero network traffic, zero server load. But it is harder to invalidate (you cannot "delete" a browser cache remotely). Redis is fully under your control but only helps after the request reaches your server.
Rule of thumb: Use HTTP caching for public, read-heavy content (product pages, images, API responses). Use Redis for user-specific data, computed results, and anything you need to invalidate precisely.
Q8. What is the stale-while-revalidate pattern?
Why interviewers ask: This pattern is increasingly used in modern web architectures (both HTTP and application-level). Understanding it shows awareness of modern best practices.
Model answer:
Stale-while-revalidate means serving expired (stale) cached data immediately to the user while fetching fresh data in the background. The user gets a fast response (even if slightly outdated), and the cache is updated for the next request.
HTTP level — using the Cache-Control directive:
Cache-Control: public, max-age=60, stale-while-revalidate=300
This means: data is fresh for 60 seconds. After that, for up to 300 more seconds, serve the stale data immediately while revalidating in the background. After 360 seconds total, the cache is truly expired.
Application level — using Redis with two keys:
async function staleWhileRevalidate(key, fetchFn, freshTTL, staleTTL) {
const fresh = await redis.get(`fresh:${key}`);
if (fresh) return JSON.parse(fresh); // Still fresh
const stale = await redis.get(`stale:${key}`);
if (stale) {
// Serve stale immediately, refresh in background
fetchFn().then(data => {
redis.set(`fresh:${key}`, JSON.stringify(data), 'EX', freshTTL);
redis.set(`stale:${key}`, JSON.stringify(data), 'EX', staleTTL);
});
return JSON.parse(stale); // Return stale now
}
// No data at all — must wait
const data = await fetchFn();
await redis.set(`fresh:${key}`, JSON.stringify(data), 'EX', freshTTL);
await redis.set(`stale:${key}`, JSON.stringify(data), 'EX', staleTTL);
return data;
}
Why it matters: In a traditional cache miss, the user waits 200ms+ for a database query. With stale-while-revalidate, the user gets a response in < 1ms, and the data is "at most one cache cycle behind." For most content (product listings, profiles, feeds), this is an excellent tradeoff.
Advanced (Q9–Q11)
Q9. Design a multi-layer caching architecture for a high-traffic e-commerce site.
Why interviewers ask: This is a system design question that tests your ability to combine caching concepts into a coherent production architecture. Senior/staff-level question.
Model answer:
I would implement four caching layers, each with specific responsibilities:
Layer 1: Browser Cache
What: Static assets (JS, CSS, images) + public API responses
TTL: Static assets: 1 year (immutable, hashed filenames)
API responses: 60s (max-age) + 300s (stale-while-revalidate)
Headers: Cache-Control: public, max-age=60, stale-while-revalidate=300
Layer 2: CDN (CloudFront / Cloudflare)
What: Product pages, category pages, images, static assets
TTL: s-maxage=300 for API, 1 year for hashed static assets
Purge: API call on product update, deploy, or price change
Benefit: Sub-20ms response from edge server near the user
Layer 3: Redis Application Cache
What: Database query results, computed aggregations, sessions
TTL: Variable per data type (see table below)
Pattern: Cache-aside with event-based invalidation + TTL safety net
Stampede: Mutex locking for popular keys
Layer 4: Database Query Cache
What: Repeated identical queries (MongoDB's internal cache)
TTL: Managed by the database engine
TTL strategy by endpoint:
| Endpoint | Browser | CDN | Redis | Invalidation |
|---|---|---|---|---|
| Product page | 60s | 300s | 1800s | On update + pub/sub |
| Search results | 30s | 60s | 300s | TTL only (too many combinations) |
| Cart | no-store | None | None | Always real-time |
| User profile | 60s (private) | None | 300s | On profile edit |
| Homepage featured | 60s | 300s | 600s | On feature flag change |
| Checkout/payment | no-store | None | None | Never cache |
Invalidation architecture:
// Product Service publishes to Redis pub/sub
await redis.publish('cache:invalidation', JSON.stringify({
entity: 'product', id: productId, action: 'updated'
}));
// Every service subscribes and invalidates its own cached data
subscriber.on('message', (channel, msg) => {
const { entity, id } = JSON.parse(msg);
redis.del(`${entity}:${id}`); // Direct key
redis.del(`${entity}:${id}:details`); // Derived keys
});
Monitoring: Track cache hit rate (target: > 90%), average latency (< 5ms for cache hit), and staleness (max age of served data). Alert when hit rate drops below 80%.
Q10. You deployed a price change for a flash sale, but 40% of users still see the old price 10 minutes later. Walk through your debugging process.
Why interviewers ask: This is a real production incident scenario. Tests ability to systematically diagnose multi-layer caching issues under pressure.
Model answer:
I would check each caching layer from outermost (closest to user) to innermost:
Step 1: Browser cache. Check the Cache-Control headers on the product API response. If max-age=3600, the browser cached the old price for up to 1 hour and will not even ask the server for fresh data. Fix: Reduce max-age to 60-120 seconds for price-sensitive endpoints, or use no-cache with ETag validation.
Step 2: CDN cache. Check if the CDN is serving a cached version. Use curl -I to inspect response headers — look for CDN cache status headers (e.g., X-Cache: HIT). If the CDN still has the old price, purge the CDN cache for the affected URLs. Fix: Trigger CDN purge as part of the price update workflow.
Step 3: Redis application cache. Connect to Redis and check the key:
redis-cli GET product:12345
redis-cli TTL product:12345
If the key still contains the old price, the invalidation on write failed or was not triggered. Fix: Verify the update code calls redis.del() after the DB update. Check logs for Redis connection errors that might have caused a silent failure.
Step 4: Database. Verify the price was actually updated in the database:
db.collection('products').findOne({ _id: '12345' }, { projection: { price: 1 } });
If the DB still has the old price, the update itself failed.
Step 5: Race condition. If some users see the new price and others see the old price, the issue is likely at the CDN or browser layer. Users whose browser cache hasn't expired still see old data. Immediate fix: Force a CDN purge and add cache-busting query parameters to the product API URL.
Prevention: For price-critical data, use short TTL (1-5 min) at every layer, trigger CDN purge on every price change, and consider no-cache with ETag for the product detail endpoint so every request revalidates.
Q11. How would you design a caching system that handles 1 million requests per second with 99.99% availability?
Why interviewers ask: This is a staff/principal-level question testing deep infrastructure knowledge, failure mode thinking, and scaling patterns.
Model answer:
At 1M req/sec, a single Redis instance (max ~200-500K ops/sec) is not enough. Here is the architecture:
Redis Cluster with read replicas:
┌─────────────────────┐
│ Application Layer │
│ (1000+ instances) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Redis Proxy Layer │
│ (Consistent hashing) │
└──────────┬──────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ Primary │ │ Primary │ │ Primary │
├──────────┤ ├──────────┤ ├──────────┤
│ Replica │ │ Replica │ │ Replica │
│ Replica │ │ Replica │ │ Replica │
└──────────┘ └──────────┘ └──────────┘
Key design decisions:
1. Sharding: Redis Cluster automatically partitions keys across 16,384 hash slots distributed across multiple primary nodes. Each shard handles ~200-300K ops/sec, so 5-6 shards handle 1M+.
2. Read replicas: Each shard has 2 read replicas. Reads go to replicas (majority of traffic), writes go to the primary. This multiplies read throughput by 3x per shard.
3. Local in-process cache (L1): Add an in-process LRU cache (Node.js lru-cache) for the hottest keys. This absorbs 30-50% of reads before they even hit Redis, reducing network round-trips.
import { LRUCache } from 'lru-cache';
const localCache = new LRUCache({ max: 10000, ttl: 5000 }); // 5s local TTL
async function get(key) {
// L1: Local memory (0ms)
const local = localCache.get(key);
if (local) return local;
// L2: Redis cluster (< 1ms)
const remote = await redis.get(key);
if (remote) {
const parsed = JSON.parse(remote);
localCache.set(key, parsed);
return parsed;
}
return null; // Cache miss at all levels
}
4. Failure modes and 99.99% availability:
- Single shard failure: Automatic failover to replica (Redis Sentinel or Cluster auto-failover). Failover takes 1-5 seconds.
- Redis completely down: Application falls through to database with circuit breaker. Local cache serves hot keys for the 5-second TTL window.
- Network partition: Redis Cluster continues serving with available shards. Requests for unavailable shards fall through to DB.
- Thundering herd after recovery: Mutex locking + stale-while-revalidate prevents stampede when cache comes back online.
5. Monitoring: Track per-shard memory usage, replication lag, command latency p99, and cache hit ratio. Alert on: hit ratio < 85%, latency p99 > 5ms, replication lag > 1 second, memory > 80%.
Quick-fire
| # | Question | One-line answer |
|---|---|---|
| 1 | Redis stores data in... | RAM (in-memory), not disk |
| 2 | Most common caching pattern? | Cache-aside (lazy loading) |
| 3 | On write: delete or update cache? | Delete — safer, avoids race conditions |
| 4 | DB first or cache first on write? | Database first, then invalidate cache |
| 5 | What prevents cache stampede? | Mutex lock, probabilistic early expiry, or background refresh |
| 6 | no-cache vs no-store? | no-cache = cache but revalidate; no-store = never cache |
| 7 | What is an ETag? | Content fingerprint — enables 304 Not Modified responses |
| 8 | s-maxage vs max-age? | s-maxage = CDN TTL; max-age = browser TTL |
| 9 | Default Redis port? | 6379 |
| 10 | Every cache key should have? | A TTL — never cache without expiration |
← Back to 6.6 — Caching in Production (README)