Episode 9 — System Design / 9.9 — Core Infrastructure

9.9 Quick Revision -- Core Infrastructure Cheat Sheet


Caching At a Glance

Patterns

PatternHowConsistencySpeedRisk
Cache-AsideApp checks cache, then DB on missEventualMediumStale reads
Write-ThroughWrite cache + DB synchronouslyStrongSlow writesCache bloat
Write-BehindWrite cache, async flush to DBEventualFast writesData loss
Read-ThroughCache loads from DB on missEventualMediumCache dependency

Eviction Policies

PolicyEvictsBest For
LRULeast recently usedGeneral purpose (default)
LFULeast frequently usedHot/cold data distinction
TTLExpired itemsSession data, tokens
FIFOOldest itemSimple, predictable

Redis vs Memcached

Redis: Data structures + Persistence + Pub/Sub + Clustering
       --> Choose when you need more than simple key-value

Memcached: Simple key-value + Multi-threaded + Memory-efficient
           --> Choose for raw speed, simple caching

Cache Stampede Prevention

  1. Locking -- one request rebuilds, others wait
  2. Stale-while-revalidate -- serve stale, refresh in background
  3. Probabilistic early expiry -- random early refresh before TTL

CDN At a Glance

User --> CDN Edge (nearby) --> Cache HIT: serve instantly
                           --> Cache MISS: fetch from origin, cache, serve
AspectPull CDNPush CDN
LoadingOn-demandPre-uploaded
Best forMost use casesKnown static content
Origin loadHigherLower

Key Cache Headers

Static assets:   Cache-Control: public, max-age=31536000, immutable
HTML pages:      Cache-Control: public, s-maxage=300, stale-while-revalidate=60
API responses:   Cache-Control: public, s-maxage=60
Personalized:    Cache-Control: private, no-store

Cache Invalidation

  1. Versioned URLs (best) -- app.abc123.js
  2. Purge API -- explicit invalidation
  3. Cache tags -- purge by tag/surrogate key

Load Balancing At a Glance

Layer 4 vs Layer 7

Layer 4: TCP/UDP level. Fast. Cannot inspect HTTP content.
         Use for: databases, raw TCP, maximum throughput

Layer 7: HTTP level. Content-aware. SSL termination.
         Use for: web apps, APIs, microservices, URL routing

Algorithms Quick Reference

AlgorithmKey PropertyWhen to Use
Round RobinEqual distributionHomogeneous servers
Weighted Round RobinProportional distributionDifferent server capacities
Least ConnectionsAdaptive to loadVariable request duration
IP HashSession affinityStateful without cookies
Consistent HashingMinimal redistributionCaching, stateful services
Least Response TimeFastest server winsVariable server performance

Health Checks

TCP check:   Is port open? (Layer 4, fast, shallow)
HTTP check:  Does /health return 200? (Layer 7, thorough)
Deep check:  Is DB, cache, disk all OK? (Application-level)

Config: interval=10s, timeout=5s, unhealthy_threshold=3, healthy_threshold=2

Sticky Sessions -- AVOID

Problem:  Uneven load, server failure loses sessions, scaling issues
Solution: Stateless servers + shared session store (Redis)

API Gateway At a Glance

What It Does

Routing | Auth | Rate Limiting | Transformation | Aggregation | SSL | Logging | CORS

Gateway vs Load Balancer

API Gateway:    "Should this request be allowed? Which SERVICE handles it?"
Load Balancer:  "Which SERVER INSTANCE should handle this connection?"

Together: Client --> Gateway --> Load Balancer --> Server Instances

Rate Limiting Algorithms

AlgorithmDescription
Token BucketTokens refill at fixed rate; allows bursts
Leaky BucketRequests processed at fixed rate; excess queued
Fixed WindowCount per time window; burst at boundary
Sliding WindowWeighted overlap of windows; balanced

BFF Pattern

Mobile App --> Mobile BFF --> Backend Services (optimized for mobile)
Web App    --> Web BFF    --> Backend Services (optimized for web)
Partner    --> Partner GW --> Backend Services (API keys, rate limits)

Message Queues At a Glance

Queue vs Topic

Queue (Point-to-Point):  Each message --> ONE consumer
Topic (Pub/Sub):         Each message --> ALL subscribers

RabbitMQ vs Kafka vs SQS

RabbitMQKafkaSQS
TypeBrokerStreamingManaged queue
Throughput50K/sMillions/sUnlimited (std)
RetentionUntil consumedDays/weeksUp to 14 days
ReplayNoYesNo
OrderingPer-queuePer-partitionBest-effort / FIFO
Best forRouting, tasksEvents, analyticsSimple async, AWS

Delivery Guarantees

At-most-once:  May lose messages. Never duplicates.  (Metrics, logs)
At-least-once: Never loses. May duplicate.           (Most business ops)
Exactly-once:  Never loses. Never duplicates.        (Financial, critical)

Practical approach: At-least-once + Idempotent consumers

Dead Letter Queue

Main Queue --> Consumer fails 3x --> DLQ (investigate, fix, replay)
Always configure. Always monitor. Always alert on DLQ depth > 0.

Backpressure

Producer faster than consumer? Queue grows unbounded --> OOM!
Solutions: Bounded queue | Rate limit producer | Auto-scale consumers

Microservices At a Glance

Core Principles

1. Single responsibility per service
2. Database per service (no shared DB!)
3. Loose coupling, high cohesion
4. Independent deployment
5. Design for failure

Service Discovery

Client-side:  Client queries registry, picks instance (Eureka)
Server-side:  Client calls LB/router, it resolves (K8s Services)
Service mesh: Sidecar proxy handles discovery (Istio, Linkerd)

Communication

Sync (REST/gRPC): Need immediate response. Queries, validation.
Async (Events):   Fire-and-forget. Background tasks, notifications.

Distributed Transactions -- Saga Pattern

Choreography: Services publish events, others react. (Simple sagas, 3-4 steps)
Orchestration: Central coordinator drives the flow.  (Complex sagas, 5+ steps)

Each step has a compensating transaction for rollback.

Deployment Strategies

StrategyRollbackDowntimeRisk
Blue-GreenInstantNoneLow
CanaryFastNoneVery low
RollingSlowNoneMedium

When Monolith > Microservices

- Team < 15-20 engineers
- Domain not well understood yet
- Strong consistency required
- Speed of development is priority
- Simple scaling needs (vertical or horizontal monolith)

Infrastructure Decision Flowchart

CACHING:
  Read-heavy? --> Cache-aside with Redis
  Write-heavy, loss-tolerant? --> Write-behind
  Consistency critical? --> Write-through
  
CDN:
  Static assets? --> Always CDN
  Global users? --> CDN for everything cacheable
  Personalized content? --> Skip CDN for that content
  
LOAD BALANCING:
  HTTP traffic? --> Layer 7 (ALB)
  Raw TCP / max performance? --> Layer 4 (NLB)
  Caching / stateful? --> Consistent hashing
  Variable request time? --> Least connections
  
API GATEWAY:
  Microservices? --> Always use a gateway
  Multiple client types? --> BFF pattern
  Third-party API access? --> Gateway with API keys + rate limiting
  
MESSAGE QUEUES:
  Async processing needed? --> Queue
  Event broadcasting? --> Topic
  High throughput events? --> Kafka
  Complex routing? --> RabbitMQ
  Simple + AWS? --> SQS
  
MICROSERVICES:
  Team > 20? --> Consider microservices
  Different scaling needs? --> Extract that service
  Need tech diversity? --> Extract that service
  Starting fresh? --> Start monolith, extract later

The Complete Picture

+------------------------------------------------------------------+
|                                                                    |
|  Users                                                             |
|    |                                                               |
|    v                                                               |
|  [DNS / GSLB] --> nearest region                                   |
|    |                                                               |
|    v                                                               |
|  [CDN] --> static assets served from edge                          |
|    |                                                               |
|    v  (cache miss / dynamic request)                               |
|  [API Gateway] --> auth, rate limit, route                         |
|    |                                                               |
|    v                                                               |
|  [Load Balancer] --> distribute to healthy instances                |
|    |                                                               |
|    v                                                               |
|  [App Servers] --> stateless, auto-scaled                          |
|    |         \                                                     |
|    v          v                                                    |
|  [Cache]    [Message Queue] --> async processing                   |
|    |          |                                                     |
|    v          v                                                     |
|  [Database] [Workers] --> background tasks                         |
|                                                                    |
+------------------------------------------------------------------+

Top Interview Tips

  1. Always name the specific technology -- "I would use Redis for caching" not just "I would add a cache."
  2. Justify every component -- "I am adding a CDN because users are globally distributed."
  3. Discuss trade-offs -- "Write-behind is fast but risks data loss."
  4. Know the numbers -- Redis ~1ms, DB ~10-50ms, cross-continent ~150ms.
  5. Start simple, add complexity -- Do not start with Kafka and microservices. Scale when needed.
  6. Address failure modes -- "If the cache goes down, we fall through to the database."
  7. Mention monitoring -- "We would monitor cache hit rate, queue depth, and p99 latency."