Episode 9 — System Design / 9.9 — Core Infrastructure
9.9.a Caching Strategies
Why Caching Matters
Caching is the single most impactful performance optimization in system design. By storing frequently accessed data closer to the consumer, caching reduces latency, decreases load on databases, and improves throughput.
The numbers tell the story:
| Operation | Latency |
|---|---|
| L1 cache reference | ~0.5 ns |
| L2 cache reference | ~7 ns |
| RAM reference | ~100 ns |
| Read from SSD | ~150,000 ns |
| Read from HDD | ~10,000,000 ns |
| Network round trip (same datacenter) | ~500,000 ns |
| Network round trip (cross-continent) | ~150,000,000 ns |
A cache hit on Redis (~1ms) vs a database query (~10-50ms) is a 10-50x improvement. At scale, this difference is existential.
Caching Patterns
1. Cache-Aside (Lazy Loading)
The application is responsible for reading and writing to the cache. The cache sits beside the database, not between.
Cache-Aside Pattern
Application Cache (Redis)
| |
|--- 1. Check cache ---------->|
|<-- 2. Cache MISS ------------|
|
|--- 3. Query DB ------------> Database
|<-- 4. Return data ----------|
|
|--- 5. Write to cache ------->| Cache (Redis)
| |
NEXT REQUEST (Cache HIT):
|--- 1. Check cache ---------->|
|<-- 2. Return data ----------| (DB never touched)
How it works:
- Application checks the cache first
- On cache miss, application queries the database
- Application writes the result to cache
- Subsequent reads are served from cache
Pros:
- Only requested data is cached (no wasted memory)
- Cache failure does not break the system (graceful degradation)
- Simple to implement
Cons:
- First request is always a cache miss (cold start)
- Data can become stale (cache and DB can diverge)
- Application bears the complexity of cache management
When to use: Read-heavy workloads where stale data is tolerable for short periods.
# Cache-Aside Implementation
def get_user(user_id):
# Step 1: Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Step 2: Cache miss - query database
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Step 3: Populate cache with TTL
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
2. Write-Through
Every write goes to the cache AND the database synchronously. The cache is always consistent with the database.
Write-Through Pattern
Application Cache Database
| | |
|-- Write ------>| |
| |-- Write ------->|
| |<-- ACK ---------|
|<-- ACK --------| |
| | |
|-- Read ------->| |
|<-- Data -------| (always fresh) |
How it works:
- Application writes to the cache
- Cache synchronously writes to the database
- Write is acknowledged only after both succeed
- Reads always hit the cache (guaranteed fresh)
Pros:
- Data is never stale in cache
- Reads are always fast and consistent
- Simple mental model
Cons:
- Write latency is higher (two writes per operation)
- Cache is filled with data that may never be read
- If cache goes down, writes must bypass it
When to use: Systems where data consistency between cache and DB is critical, and write volume is moderate.
3. Write-Behind (Write-Back)
Writes go to the cache immediately, and the cache asynchronously flushes to the database in batches.
Write-Behind Pattern
Application Cache Database
| | |
|-- Write ------>| |
|<-- ACK --------| (immediate) |
| | |
| | ... time passes ...
| | |
| |-- Batch Write ->|
| |<-- ACK ---------|
| | |
Writes are batched and flushed asynchronously
How it works:
- Application writes to the cache
- Cache immediately acknowledges the write
- Cache batches writes and flushes to DB asynchronously
- Reads are served from cache (always fast)
Pros:
- Extremely fast writes (no DB latency on write path)
- Batching reduces DB load
- Good for write-heavy workloads
Cons:
- Risk of data loss if cache crashes before flush
- Complexity in ensuring eventual consistency
- Harder to debug and reason about
When to use: Write-heavy workloads where slight data loss risk is acceptable (analytics, counters, session data).
4. Read-Through
The cache sits between the application and the database. The cache itself is responsible for loading data from the database on a miss.
Read-Through Pattern
Application Cache Database
| | |
|-- Read ------->| |
| |-- (miss) ------>|
| |<-- Data --------|
| | (stores it) |
|<-- Data -------| |
| | |
The CACHE loads data from DB, not the application
How it works:
- Application always reads from the cache
- On cache miss, the cache itself queries the database
- Cache stores the result and returns it
- Application never talks to the database directly for reads
Pros:
- Application code is simpler (no cache management logic)
- Separation of concerns
- Easy to swap cache implementations
Cons:
- First request is still a cache miss
- Cache library/layer must understand the data model
- Less flexibility than cache-aside
When to use: When you want a clean abstraction and are using a caching library that supports read-through (e.g., Caffeine, NCache).
Pattern Comparison Table
| Pattern | Write Path | Read Path | Consistency | Latency | Complexity | Best For |
|---|---|---|---|---|---|---|
| Cache-Aside | App writes DB, then cache | App checks cache, then DB | Eventual | Medium | Low | General purpose, read-heavy |
| Write-Through | Cache writes DB synchronously | Always from cache | Strong | High (writes) | Medium | Consistency-critical reads |
| Write-Behind | Cache writes DB async | Always from cache | Eventual | Low (writes) | High | Write-heavy, loss-tolerant |
| Read-Through | App writes DB directly | Cache loads from DB on miss | Eventual | Medium | Medium | Clean abstraction layer |
Redis vs Memcached
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Strings, lists, sets, hashes, sorted sets, streams | Strings only (key-value) |
| Persistence | RDB snapshots + AOF | None (pure in-memory) |
| Replication | Master-replica, Redis Sentinel, Redis Cluster | None built-in |
| Pub/Sub | Built-in | Not available |
| Lua scripting | Yes | No |
| Multi-threaded | Single-threaded (I/O threads in 6.0+) | Multi-threaded |
| Memory efficiency | Higher overhead per key | More memory-efficient for simple k/v |
| Max value size | 512 MB | 1 MB |
| Clustering | Redis Cluster (auto-sharding) | Client-side sharding |
| Use case | Feature-rich caching, sessions, leaderboards, queues | Simple, high-throughput caching |
Rule of thumb:
- Use Redis when you need data structures, persistence, or pub/sub
- Use Memcached when you need raw speed for simple key-value caching with multi-threading
Cache Eviction Policies
When the cache is full, something must be removed to make space. The eviction policy determines what gets removed.
LRU (Least Recently Used)
Evicts the item that has not been accessed for the longest time.
Cache (capacity = 3):
Access: A, B, C, A, D
Step 1: [A] -- add A
Step 2: [A, B] -- add B
Step 3: [A, B, C] -- add C (full)
Step 4: [B, C, A] -- access A (move to end)
Step 5: [C, A, D] -- add D, evict B (least recent)
Best for: General-purpose caching. Most workloads have temporal locality.
LFU (Least Frequently Used)
Evicts the item that has been accessed the fewest times.
Cache (capacity = 3):
Access: A, A, A, B, C, D
A: freq=3, B: freq=1, C: freq=1
Add D --> evict B or C (lowest frequency)
A stays because freq=3
Best for: Workloads where some items are permanently hot (e.g., trending content).
TTL (Time To Live)
Items expire after a set duration regardless of access patterns.
redis.setex("session:abc", 1800, data) # Expires in 30 minutes
redis.setex("user:123", 3600, data) # Expires in 1 hour
Best for: Session data, tokens, any data with a natural expiration.
Other Policies
| Policy | Description | Use Case |
|---|---|---|
| FIFO | First in, first out | Simple, predictable eviction |
| Random | Evict a random entry | Surprisingly effective in practice |
| LRU-K | Evict based on Kth-to-last access | Better than LRU for scan-resistant caching |
Cache Stampede Prevention
A cache stampede (thundering herd) occurs when a popular cache entry expires and many requests simultaneously hit the database.
Normal:
Request --> Cache HIT --> Return data
Stampede (cache entry expires):
Request 1 ---> Cache MISS ---> DB Query
Request 2 ---> Cache MISS ---> DB Query <-- ALL hit DB
Request 3 ---> Cache MISS ---> DB Query at once!
Request 4 ---> Cache MISS ---> DB Query
...
Request 1000 -> Cache MISS --> DB Query <-- DB overwhelmed
Prevention Strategies
1. Locking (Mutex)
Only one request is allowed to rebuild the cache. Others wait or get stale data.
def get_with_lock(key):
value = cache.get(key)
if value:
return value
# Try to acquire lock
if cache.set(f"lock:{key}", 1, nx=True, ex=10):
# I got the lock - rebuild cache
value = db.query(key)
cache.set(key, value, ex=3600)
cache.delete(f"lock:{key}")
return value
else:
# Someone else is rebuilding - wait or return stale
time.sleep(0.1)
return cache.get(key) # Retry
2. Early Expiration (Stale-While-Revalidate)
Cache entries contain a "soft TTL" before the "hard TTL". When soft TTL expires, one request rebuilds while others serve stale data.
|<-- Fresh data -->|<-- Stale but served -->|<-- Hard expire -->|
0s 50s 60s
^
One request rebuilds
Others get stale data
3. Probabilistic Early Expiration
Each request has a small probability of triggering a cache refresh before expiry. As expiry approaches, the probability increases.
# XFetch algorithm
def should_recompute(ttl, delta, beta=1.0):
now = time.time()
expiry = now + ttl
random_val = -delta * beta * math.log(random.random())
return now + random_val >= expiry
Cache Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton
Strategies
1. TTL-Based Expiration
- Set a time-to-live on every cache entry
- Simple and predictable
- Stale data window = TTL duration
2. Event-Based Invalidation
- When data changes, explicitly delete or update the cache entry
- Requires coupling between write path and cache
def update_user(user_id, data):
db.update("UPDATE users SET ... WHERE id = %s", user_id, data)
cache.delete(f"user:{user_id}") # Invalidate cache
3. Version-Based Invalidation
- Include a version number in the cache key
- When data changes, increment the version
- Old cache entries naturally expire (orphaned)
version = db.get_version("users")
cache_key = f"user:{user_id}:v{version}"
4. Publish/Subscribe Invalidation
- Write events are published to a channel
- Cache nodes subscribe and invalidate locally
Writer --> Pub/Sub Channel --> Cache Node 1 (invalidate)
--> Cache Node 2 (invalidate)
--> Cache Node 3 (invalidate)
Distributed Caching
When a single cache server is not enough, you distribute the cache across multiple nodes.
Consistent Hashing for Cache Distribution
Consistent Hash Ring
Node A
/ \
/ \
Node D Node B
\ /
\ /
Node C
key "user:42" --> hash --> lands on Node B
key "user:99" --> hash --> lands on Node D
If Node B fails:
- Only keys on Node B are redistributed
- Nodes A, C, D are unaffected
Cache Cluster Topologies
1. Client-Side Sharding
App --> hash(key) --> pick server
Server 1: keys A-M
Server 2: keys N-Z
Simple but inflexible. Adding/removing servers requires rehashing.
2. Proxy-Based (e.g., Twemproxy, Codis)
App --> Proxy --> Server 1
--> Server 2
--> Server 3
Proxy handles routing. Transparent to the application.
3. Redis Cluster
App --> Any Node --> Redirects to correct node
16384 hash slots distributed across nodes
Automatic failover with replicas
CDN Caching
CDN caching is a specialized form of caching that operates at the edge, close to users.
User (Tokyo) --> CDN Edge (Tokyo) --> Origin Server (US-East)
|
Cache HIT? Return immediately
Cache MISS? Fetch from origin, cache, return
Cache Headers That Control CDN Behavior
| Header | Example | Meaning |
|---|---|---|
Cache-Control | max-age=3600, public | Cache for 1 hour, any cache can store |
Cache-Control | private, no-cache | Do not cache in shared caches |
ETag | "abc123" | Version identifier for conditional requests |
Vary | Accept-Encoding | Cache separate versions per encoding |
Surrogate-Control | max-age=86400 | CDN-specific cache directive |
Covered in more detail in 9.9.b CDN and Content Delivery.
Multi-Level Caching Architecture
Real-world systems use multiple cache layers:
+------------------------------------------------------------------+
| MULTI-LEVEL CACHING |
| |
| Browser Cache (L1) |
| | |
| v |
| CDN Edge Cache (L2) |
| | |
| v |
| API Gateway Cache (L3) |
| | |
| v |
| Application-Level Cache (L4) -- e.g., in-process Caffeine |
| | |
| v |
| Distributed Cache (L5) -- e.g., Redis Cluster |
| | |
| v |
| Database Query Cache (L6) |
| | |
| v |
| Database (source of truth) |
+------------------------------------------------------------------+
Caching Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Cache everything | Wastes memory, evicts hot data | Cache only frequently accessed data |
| No TTL | Stale data forever | Always set TTL, even if long |
| Cache and DB race | Write to DB and cache concurrently; inconsistency | Delete cache on write, let next read repopulate |
| Huge cache values | Slow serialization, network overhead | Break into smaller keys or compress |
| Ignoring thundering herd | DB overwhelmed on cache expiry | Use locking or early refresh |
Key Takeaways
- Cache-aside is the most common and flexible pattern -- start here
- Write-through guarantees consistency but adds write latency
- Write-behind is fast but risks data loss -- use for non-critical data
- Redis is the default choice unless you need Memcached's simplicity
- LRU is the default eviction policy -- only switch if you have data to justify it
- Cache stampede is a real production issue -- always plan for it
- Invalidation is hard -- TTL + event-based is the most common combination
- Multi-level caching compounds benefits but also compounds complexity
Explain-It Challenge
"You are designing a social media feed. A celebrity with 50 million followers posts an update. Explain how you would cache this post so that all followers see it quickly, without overwhelming your database, and without serving stale data to users who interact with it (likes, comments). Walk through your caching strategy layer by layer."