Episode 9 — System Design / 9.9 — Core Infrastructure

9.9.a Caching Strategies

Why Caching Matters

Caching is the single most impactful performance optimization in system design. By storing frequently accessed data closer to the consumer, caching reduces latency, decreases load on databases, and improves throughput.

The numbers tell the story:

Operation	Latency
L1 cache reference	~0.5 ns
L2 cache reference	~7 ns
RAM reference	~100 ns
Read from SSD	~150,000 ns
Read from HDD	~10,000,000 ns
Network round trip (same datacenter)	~500,000 ns
Network round trip (cross-continent)	~150,000,000 ns

A cache hit on Redis (~1ms) vs a database query (~10-50ms) is a 10-50x improvement. At scale, this difference is existential.

Caching Patterns

1. Cache-Aside (Lazy Loading)

The application is responsible for reading and writing to the cache. The cache sits beside the database, not between.

             Cache-Aside Pattern
             
  Application                    Cache (Redis)
      |                              |
      |--- 1. Check cache ---------->|
      |<-- 2. Cache MISS ------------|
      |                              
      |--- 3. Query DB ------------> Database
      |<-- 4. Return data ----------|
      |                              
      |--- 5. Write to cache ------->|  Cache (Redis)
      |                              |
      
  NEXT REQUEST (Cache HIT):
      |--- 1. Check cache ---------->|
      |<-- 2. Return data ----------|  (DB never touched)

How it works:

Application checks the cache first
On cache miss, application queries the database
Application writes the result to cache
Subsequent reads are served from cache

Pros:

Only requested data is cached (no wasted memory)
Cache failure does not break the system (graceful degradation)
Simple to implement

Cons:

First request is always a cache miss (cold start)
Data can become stale (cache and DB can diverge)
Application bears the complexity of cache management

When to use: Read-heavy workloads where stale data is tolerable for short periods.

# Cache-Aside Implementation
def get_user(user_id):
    # Step 1: Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # Step 2: Cache miss - query database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    
    # Step 3: Populate cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    
    return user

2. Write-Through

Every write goes to the cache AND the database synchronously. The cache is always consistent with the database.

             Write-Through Pattern
             
  Application        Cache           Database
      |                |                 |
      |-- Write ------>|                 |
      |                |-- Write ------->|
      |                |<-- ACK ---------|
      |<-- ACK --------|                 |
      |                |                 |
      |-- Read ------->|                 |
      |<-- Data -------|  (always fresh) |

How it works:

Application writes to the cache
Cache synchronously writes to the database
Write is acknowledged only after both succeed
Reads always hit the cache (guaranteed fresh)

Pros:

Data is never stale in cache
Reads are always fast and consistent
Simple mental model

Cons:

Write latency is higher (two writes per operation)
Cache is filled with data that may never be read
If cache goes down, writes must bypass it

When to use: Systems where data consistency between cache and DB is critical, and write volume is moderate.

3. Write-Behind (Write-Back)

Writes go to the cache immediately, and the cache asynchronously flushes to the database in batches.

             Write-Behind Pattern
             
  Application        Cache           Database
      |                |                 |
      |-- Write ------>|                 |
      |<-- ACK --------|  (immediate)   |
      |                |                 |
      |                |  ... time passes ...
      |                |                 |
      |                |-- Batch Write ->|
      |                |<-- ACK ---------|
      |                |                 |
      
  Writes are batched and flushed asynchronously

How it works:

Application writes to the cache
Cache immediately acknowledges the write
Cache batches writes and flushes to DB asynchronously
Reads are served from cache (always fast)

Pros:

Extremely fast writes (no DB latency on write path)
Batching reduces DB load
Good for write-heavy workloads

Cons:

Risk of data loss if cache crashes before flush
Complexity in ensuring eventual consistency
Harder to debug and reason about

When to use: Write-heavy workloads where slight data loss risk is acceptable (analytics, counters, session data).

4. Read-Through

The cache sits between the application and the database. The cache itself is responsible for loading data from the database on a miss.

             Read-Through Pattern
             
  Application        Cache           Database
      |                |                 |
      |-- Read ------->|                 |
      |                |-- (miss) ------>|
      |                |<-- Data --------|
      |                | (stores it)     |
      |<-- Data -------|                 |
      |                |                 |
      
  The CACHE loads data from DB, not the application

How it works:

Application always reads from the cache
On cache miss, the cache itself queries the database
Cache stores the result and returns it
Application never talks to the database directly for reads

Pros:

Application code is simpler (no cache management logic)
Separation of concerns
Easy to swap cache implementations

Cons:

First request is still a cache miss
Cache library/layer must understand the data model
Less flexibility than cache-aside

When to use: When you want a clean abstraction and are using a caching library that supports read-through (e.g., Caffeine, NCache).

Pattern Comparison Table

Pattern	Write Path	Read Path	Consistency	Latency	Complexity	Best For
Cache-Aside	App writes DB, then cache	App checks cache, then DB	Eventual	Medium	Low	General purpose, read-heavy
Write-Through	Cache writes DB synchronously	Always from cache	Strong	High (writes)	Medium	Consistency-critical reads
Write-Behind	Cache writes DB async	Always from cache	Eventual	Low (writes)	High	Write-heavy, loss-tolerant
Read-Through	App writes DB directly	Cache loads from DB on miss	Eventual	Medium	Medium	Clean abstraction layer

Redis vs Memcached

Feature	Redis	Memcached
Data structures	Strings, lists, sets, hashes, sorted sets, streams	Strings only (key-value)
Persistence	RDB snapshots + AOF	None (pure in-memory)
Replication	Master-replica, Redis Sentinel, Redis Cluster	None built-in
Pub/Sub	Built-in	Not available
Lua scripting	Yes	No
Multi-threaded	Single-threaded (I/O threads in 6.0+)	Multi-threaded
Memory efficiency	Higher overhead per key	More memory-efficient for simple k/v
Max value size	512 MB	1 MB
Clustering	Redis Cluster (auto-sharding)	Client-side sharding
Use case	Feature-rich caching, sessions, leaderboards, queues	Simple, high-throughput caching

Rule of thumb:

Use Redis when you need data structures, persistence, or pub/sub
Use Memcached when you need raw speed for simple key-value caching with multi-threading

Cache Eviction Policies

When the cache is full, something must be removed to make space. The eviction policy determines what gets removed.

LRU (Least Recently Used)

Evicts the item that has not been accessed for the longest time.

  Cache (capacity = 3):
  
  Access: A, B, C, A, D
  
  Step 1: [A]           -- add A
  Step 2: [A, B]        -- add B
  Step 3: [A, B, C]     -- add C (full)
  Step 4: [B, C, A]     -- access A (move to end)
  Step 5: [C, A, D]     -- add D, evict B (least recent)

Best for: General-purpose caching. Most workloads have temporal locality.

LFU (Least Frequently Used)

Evicts the item that has been accessed the fewest times.

  Cache (capacity = 3):
  
  Access: A, A, A, B, C, D
  
  A: freq=3, B: freq=1, C: freq=1
  
  Add D --> evict B or C (lowest frequency)
  A stays because freq=3

Best for: Workloads where some items are permanently hot (e.g., trending content).

TTL (Time To Live)

Items expire after a set duration regardless of access patterns.

  redis.setex("session:abc", 1800, data)  # Expires in 30 minutes
  redis.setex("user:123", 3600, data)     # Expires in 1 hour

Best for: Session data, tokens, any data with a natural expiration.

Other Policies

Policy	Description	Use Case
FIFO	First in, first out	Simple, predictable eviction
Random	Evict a random entry	Surprisingly effective in practice
LRU-K	Evict based on Kth-to-last access	Better than LRU for scan-resistant caching

Cache Stampede Prevention

A cache stampede (thundering herd) occurs when a popular cache entry expires and many requests simultaneously hit the database.

  Normal:
  Request --> Cache HIT --> Return data
  
  Stampede (cache entry expires):
  Request 1 ---> Cache MISS ---> DB Query
  Request 2 ---> Cache MISS ---> DB Query    <-- ALL hit DB
  Request 3 ---> Cache MISS ---> DB Query        at once!
  Request 4 ---> Cache MISS ---> DB Query
  ...
  Request 1000 -> Cache MISS --> DB Query    <-- DB overwhelmed

Prevention Strategies

1. Locking (Mutex)

Only one request is allowed to rebuild the cache. Others wait or get stale data.

def get_with_lock(key):
    value = cache.get(key)
    if value:
        return value
    
    # Try to acquire lock
    if cache.set(f"lock:{key}", 1, nx=True, ex=10):
        # I got the lock - rebuild cache
        value = db.query(key)
        cache.set(key, value, ex=3600)
        cache.delete(f"lock:{key}")
        return value
    else:
        # Someone else is rebuilding - wait or return stale
        time.sleep(0.1)
        return cache.get(key)  # Retry

2. Early Expiration (Stale-While-Revalidate)

Cache entries contain a "soft TTL" before the "hard TTL". When soft TTL expires, one request rebuilds while others serve stale data.

  |<-- Fresh data -->|<-- Stale but served -->|<-- Hard expire -->|
  0s                 50s                       60s
                      ^
                      One request rebuilds
                      Others get stale data

3. Probabilistic Early Expiration

Each request has a small probability of triggering a cache refresh before expiry. As expiry approaches, the probability increases.

# XFetch algorithm
def should_recompute(ttl, delta, beta=1.0):
    now = time.time()
    expiry = now + ttl
    random_val = -delta * beta * math.log(random.random())
    return now + random_val >= expiry

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

Strategies

1. TTL-Based Expiration

Set a time-to-live on every cache entry
Simple and predictable
Stale data window = TTL duration

2. Event-Based Invalidation

When data changes, explicitly delete or update the cache entry
Requires coupling between write path and cache

def update_user(user_id, data):
    db.update("UPDATE users SET ... WHERE id = %s", user_id, data)
    cache.delete(f"user:{user_id}")  # Invalidate cache

3. Version-Based Invalidation

Include a version number in the cache key
When data changes, increment the version
Old cache entries naturally expire (orphaned)

version = db.get_version("users")
cache_key = f"user:{user_id}:v{version}"

4. Publish/Subscribe Invalidation

Write events are published to a channel
Cache nodes subscribe and invalidate locally

  Writer --> Pub/Sub Channel --> Cache Node 1 (invalidate)
                             --> Cache Node 2 (invalidate)
                             --> Cache Node 3 (invalidate)

Distributed Caching

When a single cache server is not enough, you distribute the cache across multiple nodes.

Consistent Hashing for Cache Distribution

          Consistent Hash Ring
          
              Node A
             /      \
            /        \
     Node D           Node B
            \        /
             \      /
              Node C
              
  key "user:42" --> hash --> lands on Node B
  key "user:99" --> hash --> lands on Node D
  
  If Node B fails:
  - Only keys on Node B are redistributed
  - Nodes A, C, D are unaffected

Cache Cluster Topologies

1. Client-Side Sharding

  App --> hash(key) --> pick server
  
  Server 1: keys A-M
  Server 2: keys N-Z

Simple but inflexible. Adding/removing servers requires rehashing.

2. Proxy-Based (e.g., Twemproxy, Codis)

  App --> Proxy --> Server 1
                --> Server 2
                --> Server 3

Proxy handles routing. Transparent to the application.

3. Redis Cluster

  App --> Any Node --> Redirects to correct node
  
  16384 hash slots distributed across nodes
  Automatic failover with replicas

CDN Caching

CDN caching is a specialized form of caching that operates at the edge, close to users.

  User (Tokyo) --> CDN Edge (Tokyo) --> Origin Server (US-East)
                   |
                   Cache HIT? Return immediately
                   Cache MISS? Fetch from origin, cache, return

Cache Headers That Control CDN Behavior

Header	Example	Meaning
`Cache-Control`	`max-age=3600, public`	Cache for 1 hour, any cache can store
`Cache-Control`	`private, no-cache`	Do not cache in shared caches
`ETag`	`"abc123"`	Version identifier for conditional requests
`Vary`	`Accept-Encoding`	Cache separate versions per encoding
`Surrogate-Control`	`max-age=86400`	CDN-specific cache directive

Covered in more detail in 9.9.b CDN and Content Delivery.

Multi-Level Caching Architecture

Real-world systems use multiple cache layers:

  +------------------------------------------------------------------+
  |                    MULTI-LEVEL CACHING                           |
  |                                                                  |
  |  Browser Cache (L1)                                              |
  |      |                                                           |
  |      v                                                           |
  |  CDN Edge Cache (L2)                                             |
  |      |                                                           |
  |      v                                                           |
  |  API Gateway Cache (L3)                                          |
  |      |                                                           |
  |      v                                                           |
  |  Application-Level Cache (L4) -- e.g., in-process Caffeine       |
  |      |                                                           |
  |      v                                                           |
  |  Distributed Cache (L5) -- e.g., Redis Cluster                   |
  |      |                                                           |
  |      v                                                           |
  |  Database Query Cache (L6)                                       |
  |      |                                                           |
  |      v                                                           |
  |  Database (source of truth)                                      |
  +------------------------------------------------------------------+

Caching Anti-Patterns

Anti-Pattern	Problem	Solution
Cache everything	Wastes memory, evicts hot data	Cache only frequently accessed data
No TTL	Stale data forever	Always set TTL, even if long
Cache and DB race	Write to DB and cache concurrently; inconsistency	Delete cache on write, let next read repopulate
Huge cache values	Slow serialization, network overhead	Break into smaller keys or compress
Ignoring thundering herd	DB overwhelmed on cache expiry	Use locking or early refresh

Key Takeaways

Cache-aside is the most common and flexible pattern -- start here
Write-through guarantees consistency but adds write latency
Write-behind is fast but risks data loss -- use for non-critical data
Redis is the default choice unless you need Memcached's simplicity
LRU is the default eviction policy -- only switch if you have data to justify it
Cache stampede is a real production issue -- always plan for it
Invalidation is hard -- TTL + event-based is the most common combination
Multi-level caching compounds benefits but also compounds complexity

Explain-It Challenge

"You are designing a social media feed. A celebrity with 50 million followers posts an update. Explain how you would cache this post so that all followers see it quickly, without overwhelming your database, and without serving stale data to users who interact with it (likes, comments). Walk through your caching strategy layer by layer."