Episode 9 — System Design / 9.9 — Core Infrastructure

9.9 Quick Revision -- Core Infrastructure Cheat Sheet

Caching At a Glance

Patterns

Pattern	How	Consistency	Speed	Risk
Cache-Aside	App checks cache, then DB on miss	Eventual	Medium	Stale reads
Write-Through	Write cache + DB synchronously	Strong	Slow writes	Cache bloat
Write-Behind	Write cache, async flush to DB	Eventual	Fast writes	Data loss
Read-Through	Cache loads from DB on miss	Eventual	Medium	Cache dependency

Eviction Policies

Policy	Evicts	Best For
LRU	Least recently used	General purpose (default)
LFU	Least frequently used	Hot/cold data distinction
TTL	Expired items	Session data, tokens
FIFO	Oldest item	Simple, predictable

Redis vs Memcached

Redis: Data structures + Persistence + Pub/Sub + Clustering
       --> Choose when you need more than simple key-value

Memcached: Simple key-value + Multi-threaded + Memory-efficient
           --> Choose for raw speed, simple caching

Cache Stampede Prevention

Locking -- one request rebuilds, others wait
Stale-while-revalidate -- serve stale, refresh in background
Probabilistic early expiry -- random early refresh before TTL

CDN At a Glance

User --> CDN Edge (nearby) --> Cache HIT: serve instantly
                           --> Cache MISS: fetch from origin, cache, serve

Aspect	Pull CDN	Push CDN
Loading	On-demand	Pre-uploaded
Best for	Most use cases	Known static content
Origin load	Higher	Lower

Key Cache Headers

Static assets:   Cache-Control: public, max-age=31536000, immutable
HTML pages:      Cache-Control: public, s-maxage=300, stale-while-revalidate=60
API responses:   Cache-Control: public, s-maxage=60
Personalized:    Cache-Control: private, no-store

Cache Invalidation

Versioned URLs (best) -- app.abc123.js
Purge API -- explicit invalidation
Cache tags -- purge by tag/surrogate key

Load Balancing At a Glance

Layer 4 vs Layer 7

Layer 4: TCP/UDP level. Fast. Cannot inspect HTTP content.
         Use for: databases, raw TCP, maximum throughput

Layer 7: HTTP level. Content-aware. SSL termination.
         Use for: web apps, APIs, microservices, URL routing

Algorithms Quick Reference

Algorithm	Key Property	When to Use
Round Robin	Equal distribution	Homogeneous servers
Weighted Round Robin	Proportional distribution	Different server capacities
Least Connections	Adaptive to load	Variable request duration
IP Hash	Session affinity	Stateful without cookies
Consistent Hashing	Minimal redistribution	Caching, stateful services
Least Response Time	Fastest server wins	Variable server performance

Health Checks

TCP check:   Is port open? (Layer 4, fast, shallow)
HTTP check:  Does /health return 200? (Layer 7, thorough)
Deep check:  Is DB, cache, disk all OK? (Application-level)

Config: interval=10s, timeout=5s, unhealthy_threshold=3, healthy_threshold=2

Sticky Sessions -- AVOID

Problem:  Uneven load, server failure loses sessions, scaling issues
Solution: Stateless servers + shared session store (Redis)

API Gateway At a Glance

What It Does

Routing | Auth | Rate Limiting | Transformation | Aggregation | SSL | Logging | CORS

Gateway vs Load Balancer

API Gateway:    "Should this request be allowed? Which SERVICE handles it?"
Load Balancer:  "Which SERVER INSTANCE should handle this connection?"

Together: Client --> Gateway --> Load Balancer --> Server Instances

Rate Limiting Algorithms

Algorithm	Description
Token Bucket	Tokens refill at fixed rate; allows bursts
Leaky Bucket	Requests processed at fixed rate; excess queued
Fixed Window	Count per time window; burst at boundary
Sliding Window	Weighted overlap of windows; balanced

BFF Pattern

Mobile App --> Mobile BFF --> Backend Services (optimized for mobile)
Web App    --> Web BFF    --> Backend Services (optimized for web)
Partner    --> Partner GW --> Backend Services (API keys, rate limits)

Message Queues At a Glance

Queue vs Topic

Queue (Point-to-Point):  Each message --> ONE consumer
Topic (Pub/Sub):         Each message --> ALL subscribers

RabbitMQ vs Kafka vs SQS

	RabbitMQ	Kafka	SQS
Type	Broker	Streaming	Managed queue
Throughput	50K/s	Millions/s	Unlimited (std)
Retention	Until consumed	Days/weeks	Up to 14 days
Replay	No	Yes	No
Ordering	Per-queue	Per-partition	Best-effort / FIFO
Best for	Routing, tasks	Events, analytics	Simple async, AWS

Delivery Guarantees

At-most-once:  May lose messages. Never duplicates.  (Metrics, logs)
At-least-once: Never loses. May duplicate.           (Most business ops)
Exactly-once:  Never loses. Never duplicates.        (Financial, critical)

Practical approach: At-least-once + Idempotent consumers

Dead Letter Queue

Main Queue --> Consumer fails 3x --> DLQ (investigate, fix, replay)
Always configure. Always monitor. Always alert on DLQ depth > 0.

Backpressure

Producer faster than consumer? Queue grows unbounded --> OOM!
Solutions: Bounded queue | Rate limit producer | Auto-scale consumers

Microservices At a Glance

Core Principles

1. Single responsibility per service
2. Database per service (no shared DB!)
3. Loose coupling, high cohesion
4. Independent deployment
5. Design for failure

Service Discovery

Client-side:  Client queries registry, picks instance (Eureka)
Server-side:  Client calls LB/router, it resolves (K8s Services)
Service mesh: Sidecar proxy handles discovery (Istio, Linkerd)

Communication

Sync (REST/gRPC): Need immediate response. Queries, validation.
Async (Events):   Fire-and-forget. Background tasks, notifications.

Distributed Transactions -- Saga Pattern

Choreography: Services publish events, others react. (Simple sagas, 3-4 steps)
Orchestration: Central coordinator drives the flow.  (Complex sagas, 5+ steps)

Each step has a compensating transaction for rollback.

Deployment Strategies

Strategy	Rollback	Downtime	Risk
Blue-Green	Instant	None	Low
Canary	Fast	None	Very low
Rolling	Slow	None	Medium

When Monolith > Microservices

- Team < 15-20 engineers
- Domain not well understood yet
- Strong consistency required
- Speed of development is priority
- Simple scaling needs (vertical or horizontal monolith)

Infrastructure Decision Flowchart

CACHING:
  Read-heavy? --> Cache-aside with Redis
  Write-heavy, loss-tolerant? --> Write-behind
  Consistency critical? --> Write-through
  
CDN:
  Static assets? --> Always CDN
  Global users? --> CDN for everything cacheable
  Personalized content? --> Skip CDN for that content
  
LOAD BALANCING:
  HTTP traffic? --> Layer 7 (ALB)
  Raw TCP / max performance? --> Layer 4 (NLB)
  Caching / stateful? --> Consistent hashing
  Variable request time? --> Least connections
  
API GATEWAY:
  Microservices? --> Always use a gateway
  Multiple client types? --> BFF pattern
  Third-party API access? --> Gateway with API keys + rate limiting
  
MESSAGE QUEUES:
  Async processing needed? --> Queue
  Event broadcasting? --> Topic
  High throughput events? --> Kafka
  Complex routing? --> RabbitMQ
  Simple + AWS? --> SQS
  
MICROSERVICES:
  Team > 20? --> Consider microservices
  Different scaling needs? --> Extract that service
  Need tech diversity? --> Extract that service
  Starting fresh? --> Start monolith, extract later

The Complete Picture

+------------------------------------------------------------------+
|                                                                    |
|  Users                                                             |
|    |                                                               |
|    v                                                               |
|  [DNS / GSLB] --> nearest region                                   |
|    |                                                               |
|    v                                                               |
|  [CDN] --> static assets served from edge                          |
|    |                                                               |
|    v  (cache miss / dynamic request)                               |
|  [API Gateway] --> auth, rate limit, route                         |
|    |                                                               |
|    v                                                               |
|  [Load Balancer] --> distribute to healthy instances                |
|    |                                                               |
|    v                                                               |
|  [App Servers] --> stateless, auto-scaled                          |
|    |         \                                                     |
|    v          v                                                    |
|  [Cache]    [Message Queue] --> async processing                   |
|    |          |                                                     |
|    v          v                                                     |
|  [Database] [Workers] --> background tasks                         |
|                                                                    |
+------------------------------------------------------------------+

Top Interview Tips

Always name the specific technology -- "I would use Redis for caching" not just "I would add a cache."
Justify every component -- "I am adding a CDN because users are globally distributed."
Discuss trade-offs -- "Write-behind is fast but risks data loss."
Know the numbers -- Redis ~1ms, DB ~10-50ms, cross-continent ~150ms.
Start simple, add complexity -- Do not start with Kafka and microservices. Scale when needed.
Address failure modes -- "If the cache goes down, we fall through to the database."
Mention monitoring -- "We would monitor cache hit rate, queue depth, and p99 latency."