9.9 Quick Revision -- Core Infrastructure Cheat Sheet
Caching At a Glance
Patterns
| Pattern | How | Consistency | Speed | Risk |
|---|
| Cache-Aside | App checks cache, then DB on miss | Eventual | Medium | Stale reads |
| Write-Through | Write cache + DB synchronously | Strong | Slow writes | Cache bloat |
| Write-Behind | Write cache, async flush to DB | Eventual | Fast writes | Data loss |
| Read-Through | Cache loads from DB on miss | Eventual | Medium | Cache dependency |
Eviction Policies
| Policy | Evicts | Best For |
|---|
| LRU | Least recently used | General purpose (default) |
| LFU | Least frequently used | Hot/cold data distinction |
| TTL | Expired items | Session data, tokens |
| FIFO | Oldest item | Simple, predictable |
Redis vs Memcached
Redis: Data structures + Persistence + Pub/Sub + Clustering
--> Choose when you need more than simple key-value
Memcached: Simple key-value + Multi-threaded + Memory-efficient
--> Choose for raw speed, simple caching
Cache Stampede Prevention
- Locking -- one request rebuilds, others wait
- Stale-while-revalidate -- serve stale, refresh in background
- Probabilistic early expiry -- random early refresh before TTL
CDN At a Glance
User --> CDN Edge (nearby) --> Cache HIT: serve instantly
--> Cache MISS: fetch from origin, cache, serve
| Aspect | Pull CDN | Push CDN |
|---|
| Loading | On-demand | Pre-uploaded |
| Best for | Most use cases | Known static content |
| Origin load | Higher | Lower |
Key Cache Headers
Static assets: Cache-Control: public, max-age=31536000, immutable
HTML pages: Cache-Control: public, s-maxage=300, stale-while-revalidate=60
API responses: Cache-Control: public, s-maxage=60
Personalized: Cache-Control: private, no-store
Cache Invalidation
- Versioned URLs (best) --
app.abc123.js
- Purge API -- explicit invalidation
- Cache tags -- purge by tag/surrogate key
Load Balancing At a Glance
Layer 4 vs Layer 7
Layer 4: TCP/UDP level. Fast. Cannot inspect HTTP content.
Use for: databases, raw TCP, maximum throughput
Layer 7: HTTP level. Content-aware. SSL termination.
Use for: web apps, APIs, microservices, URL routing
Algorithms Quick Reference
| Algorithm | Key Property | When to Use |
|---|
| Round Robin | Equal distribution | Homogeneous servers |
| Weighted Round Robin | Proportional distribution | Different server capacities |
| Least Connections | Adaptive to load | Variable request duration |
| IP Hash | Session affinity | Stateful without cookies |
| Consistent Hashing | Minimal redistribution | Caching, stateful services |
| Least Response Time | Fastest server wins | Variable server performance |
Health Checks
TCP check: Is port open? (Layer 4, fast, shallow)
HTTP check: Does /health return 200? (Layer 7, thorough)
Deep check: Is DB, cache, disk all OK? (Application-level)
Config: interval=10s, timeout=5s, unhealthy_threshold=3, healthy_threshold=2
Sticky Sessions -- AVOID
Problem: Uneven load, server failure loses sessions, scaling issues
Solution: Stateless servers + shared session store (Redis)
API Gateway At a Glance
What It Does
Routing | Auth | Rate Limiting | Transformation | Aggregation | SSL | Logging | CORS
Gateway vs Load Balancer
API Gateway: "Should this request be allowed? Which SERVICE handles it?"
Load Balancer: "Which SERVER INSTANCE should handle this connection?"
Together: Client --> Gateway --> Load Balancer --> Server Instances
Rate Limiting Algorithms
| Algorithm | Description |
|---|
| Token Bucket | Tokens refill at fixed rate; allows bursts |
| Leaky Bucket | Requests processed at fixed rate; excess queued |
| Fixed Window | Count per time window; burst at boundary |
| Sliding Window | Weighted overlap of windows; balanced |
BFF Pattern
Mobile App --> Mobile BFF --> Backend Services (optimized for mobile)
Web App --> Web BFF --> Backend Services (optimized for web)
Partner --> Partner GW --> Backend Services (API keys, rate limits)
Message Queues At a Glance
Queue vs Topic
Queue (Point-to-Point): Each message --> ONE consumer
Topic (Pub/Sub): Each message --> ALL subscribers
RabbitMQ vs Kafka vs SQS
| RabbitMQ | Kafka | SQS |
|---|
| Type | Broker | Streaming | Managed queue |
| Throughput | 50K/s | Millions/s | Unlimited (std) |
| Retention | Until consumed | Days/weeks | Up to 14 days |
| Replay | No | Yes | No |
| Ordering | Per-queue | Per-partition | Best-effort / FIFO |
| Best for | Routing, tasks | Events, analytics | Simple async, AWS |
Delivery Guarantees
At-most-once: May lose messages. Never duplicates. (Metrics, logs)
At-least-once: Never loses. May duplicate. (Most business ops)
Exactly-once: Never loses. Never duplicates. (Financial, critical)
Practical approach: At-least-once + Idempotent consumers
Dead Letter Queue
Main Queue --> Consumer fails 3x --> DLQ (investigate, fix, replay)
Always configure. Always monitor. Always alert on DLQ depth > 0.
Backpressure
Producer faster than consumer? Queue grows unbounded --> OOM!
Solutions: Bounded queue | Rate limit producer | Auto-scale consumers
Microservices At a Glance
Core Principles
1. Single responsibility per service
2. Database per service (no shared DB!)
3. Loose coupling, high cohesion
4. Independent deployment
5. Design for failure
Service Discovery
Client-side: Client queries registry, picks instance (Eureka)
Server-side: Client calls LB/router, it resolves (K8s Services)
Service mesh: Sidecar proxy handles discovery (Istio, Linkerd)
Communication
Sync (REST/gRPC): Need immediate response. Queries, validation.
Async (Events): Fire-and-forget. Background tasks, notifications.
Distributed Transactions -- Saga Pattern
Choreography: Services publish events, others react. (Simple sagas, 3-4 steps)
Orchestration: Central coordinator drives the flow. (Complex sagas, 5+ steps)
Each step has a compensating transaction for rollback.
Deployment Strategies
| Strategy | Rollback | Downtime | Risk |
|---|
| Blue-Green | Instant | None | Low |
| Canary | Fast | None | Very low |
| Rolling | Slow | None | Medium |
When Monolith > Microservices
- Team < 15-20 engineers
- Domain not well understood yet
- Strong consistency required
- Speed of development is priority
- Simple scaling needs (vertical or horizontal monolith)
Infrastructure Decision Flowchart
CACHING:
Read-heavy? --> Cache-aside with Redis
Write-heavy, loss-tolerant? --> Write-behind
Consistency critical? --> Write-through
CDN:
Static assets? --> Always CDN
Global users? --> CDN for everything cacheable
Personalized content? --> Skip CDN for that content
LOAD BALANCING:
HTTP traffic? --> Layer 7 (ALB)
Raw TCP / max performance? --> Layer 4 (NLB)
Caching / stateful? --> Consistent hashing
Variable request time? --> Least connections
API GATEWAY:
Microservices? --> Always use a gateway
Multiple client types? --> BFF pattern
Third-party API access? --> Gateway with API keys + rate limiting
MESSAGE QUEUES:
Async processing needed? --> Queue
Event broadcasting? --> Topic
High throughput events? --> Kafka
Complex routing? --> RabbitMQ
Simple + AWS? --> SQS
MICROSERVICES:
Team > 20? --> Consider microservices
Different scaling needs? --> Extract that service
Need tech diversity? --> Extract that service
Starting fresh? --> Start monolith, extract later
The Complete Picture
+------------------------------------------------------------------+
| |
| Users |
| | |
| v |
| [DNS / GSLB] --> nearest region |
| | |
| v |
| [CDN] --> static assets served from edge |
| | |
| v (cache miss / dynamic request) |
| [API Gateway] --> auth, rate limit, route |
| | |
| v |
| [Load Balancer] --> distribute to healthy instances |
| | |
| v |
| [App Servers] --> stateless, auto-scaled |
| | \ |
| v v |
| [Cache] [Message Queue] --> async processing |
| | | |
| v v |
| [Database] [Workers] --> background tasks |
| |
+------------------------------------------------------------------+
Top Interview Tips
- Always name the specific technology -- "I would use Redis for caching" not just "I would add a cache."
- Justify every component -- "I am adding a CDN because users are globally distributed."
- Discuss trade-offs -- "Write-behind is fast but risks data loss."
- Know the numbers -- Redis ~1ms, DB ~10-50ms, cross-continent ~150ms.
- Start simple, add complexity -- Do not start with Kafka and microservices. Scale when needed.
- Address failure modes -- "If the cache goes down, we fall through to the database."
- Mention monitoring -- "We would monitor cache hit rate, queue depth, and p99 latency."