Episode 9 — System Design / 9.9 — Core Infrastructure

9.9.a Caching Strategies

Why Caching Matters

Caching is the single most impactful performance optimization in system design. By storing frequently accessed data closer to the consumer, caching reduces latency, decreases load on databases, and improves throughput.

The numbers tell the story:

OperationLatency
L1 cache reference~0.5 ns
L2 cache reference~7 ns
RAM reference~100 ns
Read from SSD~150,000 ns
Read from HDD~10,000,000 ns
Network round trip (same datacenter)~500,000 ns
Network round trip (cross-continent)~150,000,000 ns

A cache hit on Redis (~1ms) vs a database query (~10-50ms) is a 10-50x improvement. At scale, this difference is existential.


Caching Patterns

1. Cache-Aside (Lazy Loading)

The application is responsible for reading and writing to the cache. The cache sits beside the database, not between.

             Cache-Aside Pattern
             
  Application                    Cache (Redis)
      |                              |
      |--- 1. Check cache ---------->|
      |<-- 2. Cache MISS ------------|
      |                              
      |--- 3. Query DB ------------> Database
      |<-- 4. Return data ----------|
      |                              
      |--- 5. Write to cache ------->|  Cache (Redis)
      |                              |
      
  NEXT REQUEST (Cache HIT):
      |--- 1. Check cache ---------->|
      |<-- 2. Return data ----------|  (DB never touched)

How it works:

  1. Application checks the cache first
  2. On cache miss, application queries the database
  3. Application writes the result to cache
  4. Subsequent reads are served from cache

Pros:

  • Only requested data is cached (no wasted memory)
  • Cache failure does not break the system (graceful degradation)
  • Simple to implement

Cons:

  • First request is always a cache miss (cold start)
  • Data can become stale (cache and DB can diverge)
  • Application bears the complexity of cache management

When to use: Read-heavy workloads where stale data is tolerable for short periods.

# Cache-Aside Implementation
def get_user(user_id):
    # Step 1: Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # Step 2: Cache miss - query database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    
    # Step 3: Populate cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    
    return user

2. Write-Through

Every write goes to the cache AND the database synchronously. The cache is always consistent with the database.

             Write-Through Pattern
             
  Application        Cache           Database
      |                |                 |
      |-- Write ------>|                 |
      |                |-- Write ------->|
      |                |<-- ACK ---------|
      |<-- ACK --------|                 |
      |                |                 |
      |-- Read ------->|                 |
      |<-- Data -------|  (always fresh) |

How it works:

  1. Application writes to the cache
  2. Cache synchronously writes to the database
  3. Write is acknowledged only after both succeed
  4. Reads always hit the cache (guaranteed fresh)

Pros:

  • Data is never stale in cache
  • Reads are always fast and consistent
  • Simple mental model

Cons:

  • Write latency is higher (two writes per operation)
  • Cache is filled with data that may never be read
  • If cache goes down, writes must bypass it

When to use: Systems where data consistency between cache and DB is critical, and write volume is moderate.


3. Write-Behind (Write-Back)

Writes go to the cache immediately, and the cache asynchronously flushes to the database in batches.

             Write-Behind Pattern
             
  Application        Cache           Database
      |                |                 |
      |-- Write ------>|                 |
      |<-- ACK --------|  (immediate)   |
      |                |                 |
      |                |  ... time passes ...
      |                |                 |
      |                |-- Batch Write ->|
      |                |<-- ACK ---------|
      |                |                 |
      
  Writes are batched and flushed asynchronously

How it works:

  1. Application writes to the cache
  2. Cache immediately acknowledges the write
  3. Cache batches writes and flushes to DB asynchronously
  4. Reads are served from cache (always fast)

Pros:

  • Extremely fast writes (no DB latency on write path)
  • Batching reduces DB load
  • Good for write-heavy workloads

Cons:

  • Risk of data loss if cache crashes before flush
  • Complexity in ensuring eventual consistency
  • Harder to debug and reason about

When to use: Write-heavy workloads where slight data loss risk is acceptable (analytics, counters, session data).


4. Read-Through

The cache sits between the application and the database. The cache itself is responsible for loading data from the database on a miss.

             Read-Through Pattern
             
  Application        Cache           Database
      |                |                 |
      |-- Read ------->|                 |
      |                |-- (miss) ------>|
      |                |<-- Data --------|
      |                | (stores it)     |
      |<-- Data -------|                 |
      |                |                 |
      
  The CACHE loads data from DB, not the application

How it works:

  1. Application always reads from the cache
  2. On cache miss, the cache itself queries the database
  3. Cache stores the result and returns it
  4. Application never talks to the database directly for reads

Pros:

  • Application code is simpler (no cache management logic)
  • Separation of concerns
  • Easy to swap cache implementations

Cons:

  • First request is still a cache miss
  • Cache library/layer must understand the data model
  • Less flexibility than cache-aside

When to use: When you want a clean abstraction and are using a caching library that supports read-through (e.g., Caffeine, NCache).


Pattern Comparison Table

PatternWrite PathRead PathConsistencyLatencyComplexityBest For
Cache-AsideApp writes DB, then cacheApp checks cache, then DBEventualMediumLowGeneral purpose, read-heavy
Write-ThroughCache writes DB synchronouslyAlways from cacheStrongHigh (writes)MediumConsistency-critical reads
Write-BehindCache writes DB asyncAlways from cacheEventualLow (writes)HighWrite-heavy, loss-tolerant
Read-ThroughApp writes DB directlyCache loads from DB on missEventualMediumMediumClean abstraction layer

Redis vs Memcached

FeatureRedisMemcached
Data structuresStrings, lists, sets, hashes, sorted sets, streamsStrings only (key-value)
PersistenceRDB snapshots + AOFNone (pure in-memory)
ReplicationMaster-replica, Redis Sentinel, Redis ClusterNone built-in
Pub/SubBuilt-inNot available
Lua scriptingYesNo
Multi-threadedSingle-threaded (I/O threads in 6.0+)Multi-threaded
Memory efficiencyHigher overhead per keyMore memory-efficient for simple k/v
Max value size512 MB1 MB
ClusteringRedis Cluster (auto-sharding)Client-side sharding
Use caseFeature-rich caching, sessions, leaderboards, queuesSimple, high-throughput caching

Rule of thumb:

  • Use Redis when you need data structures, persistence, or pub/sub
  • Use Memcached when you need raw speed for simple key-value caching with multi-threading

Cache Eviction Policies

When the cache is full, something must be removed to make space. The eviction policy determines what gets removed.

LRU (Least Recently Used)

Evicts the item that has not been accessed for the longest time.

  Cache (capacity = 3):
  
  Access: A, B, C, A, D
  
  Step 1: [A]           -- add A
  Step 2: [A, B]        -- add B
  Step 3: [A, B, C]     -- add C (full)
  Step 4: [B, C, A]     -- access A (move to end)
  Step 5: [C, A, D]     -- add D, evict B (least recent)

Best for: General-purpose caching. Most workloads have temporal locality.

LFU (Least Frequently Used)

Evicts the item that has been accessed the fewest times.

  Cache (capacity = 3):
  
  Access: A, A, A, B, C, D
  
  A: freq=3, B: freq=1, C: freq=1
  
  Add D --> evict B or C (lowest frequency)
  A stays because freq=3

Best for: Workloads where some items are permanently hot (e.g., trending content).

TTL (Time To Live)

Items expire after a set duration regardless of access patterns.

  redis.setex("session:abc", 1800, data)  # Expires in 30 minutes
  redis.setex("user:123", 3600, data)     # Expires in 1 hour

Best for: Session data, tokens, any data with a natural expiration.

Other Policies

PolicyDescriptionUse Case
FIFOFirst in, first outSimple, predictable eviction
RandomEvict a random entrySurprisingly effective in practice
LRU-KEvict based on Kth-to-last accessBetter than LRU for scan-resistant caching

Cache Stampede Prevention

A cache stampede (thundering herd) occurs when a popular cache entry expires and many requests simultaneously hit the database.

  Normal:
  Request --> Cache HIT --> Return data
  
  Stampede (cache entry expires):
  Request 1 ---> Cache MISS ---> DB Query
  Request 2 ---> Cache MISS ---> DB Query    <-- ALL hit DB
  Request 3 ---> Cache MISS ---> DB Query        at once!
  Request 4 ---> Cache MISS ---> DB Query
  ...
  Request 1000 -> Cache MISS --> DB Query    <-- DB overwhelmed

Prevention Strategies

1. Locking (Mutex)

Only one request is allowed to rebuild the cache. Others wait or get stale data.

def get_with_lock(key):
    value = cache.get(key)
    if value:
        return value
    
    # Try to acquire lock
    if cache.set(f"lock:{key}", 1, nx=True, ex=10):
        # I got the lock - rebuild cache
        value = db.query(key)
        cache.set(key, value, ex=3600)
        cache.delete(f"lock:{key}")
        return value
    else:
        # Someone else is rebuilding - wait or return stale
        time.sleep(0.1)
        return cache.get(key)  # Retry

2. Early Expiration (Stale-While-Revalidate)

Cache entries contain a "soft TTL" before the "hard TTL". When soft TTL expires, one request rebuilds while others serve stale data.

  |<-- Fresh data -->|<-- Stale but served -->|<-- Hard expire -->|
  0s                 50s                       60s
                      ^
                      One request rebuilds
                      Others get stale data

3. Probabilistic Early Expiration

Each request has a small probability of triggering a cache refresh before expiry. As expiry approaches, the probability increases.

# XFetch algorithm
def should_recompute(ttl, delta, beta=1.0):
    now = time.time()
    expiry = now + ttl
    random_val = -delta * beta * math.log(random.random())
    return now + random_val >= expiry

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

Strategies

1. TTL-Based Expiration

  • Set a time-to-live on every cache entry
  • Simple and predictable
  • Stale data window = TTL duration

2. Event-Based Invalidation

  • When data changes, explicitly delete or update the cache entry
  • Requires coupling between write path and cache
def update_user(user_id, data):
    db.update("UPDATE users SET ... WHERE id = %s", user_id, data)
    cache.delete(f"user:{user_id}")  # Invalidate cache

3. Version-Based Invalidation

  • Include a version number in the cache key
  • When data changes, increment the version
  • Old cache entries naturally expire (orphaned)
version = db.get_version("users")
cache_key = f"user:{user_id}:v{version}"

4. Publish/Subscribe Invalidation

  • Write events are published to a channel
  • Cache nodes subscribe and invalidate locally
  Writer --> Pub/Sub Channel --> Cache Node 1 (invalidate)
                             --> Cache Node 2 (invalidate)
                             --> Cache Node 3 (invalidate)

Distributed Caching

When a single cache server is not enough, you distribute the cache across multiple nodes.

Consistent Hashing for Cache Distribution

          Consistent Hash Ring
          
              Node A
             /      \
            /        \
     Node D           Node B
            \        /
             \      /
              Node C
              
  key "user:42" --> hash --> lands on Node B
  key "user:99" --> hash --> lands on Node D
  
  If Node B fails:
  - Only keys on Node B are redistributed
  - Nodes A, C, D are unaffected

Cache Cluster Topologies

1. Client-Side Sharding

  App --> hash(key) --> pick server
  
  Server 1: keys A-M
  Server 2: keys N-Z

Simple but inflexible. Adding/removing servers requires rehashing.

2. Proxy-Based (e.g., Twemproxy, Codis)

  App --> Proxy --> Server 1
                --> Server 2
                --> Server 3

Proxy handles routing. Transparent to the application.

3. Redis Cluster

  App --> Any Node --> Redirects to correct node
  
  16384 hash slots distributed across nodes
  Automatic failover with replicas

CDN Caching

CDN caching is a specialized form of caching that operates at the edge, close to users.

  User (Tokyo) --> CDN Edge (Tokyo) --> Origin Server (US-East)
                   |
                   Cache HIT? Return immediately
                   Cache MISS? Fetch from origin, cache, return

Cache Headers That Control CDN Behavior

HeaderExampleMeaning
Cache-Controlmax-age=3600, publicCache for 1 hour, any cache can store
Cache-Controlprivate, no-cacheDo not cache in shared caches
ETag"abc123"Version identifier for conditional requests
VaryAccept-EncodingCache separate versions per encoding
Surrogate-Controlmax-age=86400CDN-specific cache directive

Covered in more detail in 9.9.b CDN and Content Delivery.


Multi-Level Caching Architecture

Real-world systems use multiple cache layers:

  +------------------------------------------------------------------+
  |                    MULTI-LEVEL CACHING                           |
  |                                                                  |
  |  Browser Cache (L1)                                              |
  |      |                                                           |
  |      v                                                           |
  |  CDN Edge Cache (L2)                                             |
  |      |                                                           |
  |      v                                                           |
  |  API Gateway Cache (L3)                                          |
  |      |                                                           |
  |      v                                                           |
  |  Application-Level Cache (L4) -- e.g., in-process Caffeine       |
  |      |                                                           |
  |      v                                                           |
  |  Distributed Cache (L5) -- e.g., Redis Cluster                   |
  |      |                                                           |
  |      v                                                           |
  |  Database Query Cache (L6)                                       |
  |      |                                                           |
  |      v                                                           |
  |  Database (source of truth)                                      |
  +------------------------------------------------------------------+

Caching Anti-Patterns

Anti-PatternProblemSolution
Cache everythingWastes memory, evicts hot dataCache only frequently accessed data
No TTLStale data foreverAlways set TTL, even if long
Cache and DB raceWrite to DB and cache concurrently; inconsistencyDelete cache on write, let next read repopulate
Huge cache valuesSlow serialization, network overheadBreak into smaller keys or compress
Ignoring thundering herdDB overwhelmed on cache expiryUse locking or early refresh

Key Takeaways

  1. Cache-aside is the most common and flexible pattern -- start here
  2. Write-through guarantees consistency but adds write latency
  3. Write-behind is fast but risks data loss -- use for non-critical data
  4. Redis is the default choice unless you need Memcached's simplicity
  5. LRU is the default eviction policy -- only switch if you have data to justify it
  6. Cache stampede is a real production issue -- always plan for it
  7. Invalidation is hard -- TTL + event-based is the most common combination
  8. Multi-level caching compounds benefits but also compounds complexity

Explain-It Challenge

"You are designing a social media feed. A celebrity with 50 million followers posts an update. Explain how you would cache this post so that all followers see it quickly, without overwhelming your database, and without serving stale data to users who interact with it (likes, comments). Walk through your caching strategy layer by layer."