9.7 — System Design Foundations: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim top-to-bottom in one pass before quizzes or interviews.
- If a row feels fuzzy — open the matching lesson:
README.md → 9.7.a…9.7.e.
- Deep practice —
9.7-Exercise-Questions.md.
- Polish phrasing —
9.7-Interview-Questions.md.
HLD vs LLD (master table)
| HLD | LLD |
|---|
| Scope | Entire system | Single module/service |
| Decisions | Services, DBs, caches, protocols | Classes, methods, patterns |
| Diagrams | Architecture, data flow | Class, sequence, UML |
| Trade-offs | CAP, latency vs consistency | Coupling vs cohesion |
| Interview | "Design Twitter at scale" | "Design a Parking Lot" |
Building Blocks of HLD
| Block | Purpose | Examples |
|---|
| Services | Business logic units | Auth Service, Feed Service |
| Databases | Persistent storage | PostgreSQL, Cassandra, DynamoDB |
| Caches | Fast reads, reduce DB load | Redis, Memcached |
| Queues | Async processing, decoupling | Kafka, SQS, RabbitMQ |
| Load Balancers | Distribute traffic | Nginx, AWS ALB |
| CDN | Edge-cache static assets | CloudFront, Cloudflare |
| API Gateway | Routing, auth, rate limiting | Kong, AWS API Gateway |
Requirements Checklist (5 min)
1. What are the core features? (3-5 bullet points)
2. Who are the users? (mobile, web, API)
3. What scale? (1K, 1M, 1B users)
4. Read/write ratio? (100:1, 10:1, 1:1)
5. Quality attributes? (latency, availability, consistency)
6. What is out of scope? (explicitly exclude)
CAP Theorem
CP = Consistent + Partition-tolerant (rejects requests during partition)
AP = Available + Partition-tolerant (serves stale data during partition)
CA = Not practical in distributed systems (partitions are inevitable)
| Choice | Use When | Examples |
|---|
| CP | Banking, inventory, leader election | HBase, Zookeeper |
| AP | Social feeds, shopping carts, analytics | Cassandra, DynamoDB |
Availability Targets
| Nines | Availability | Downtime/Year |
|---|
| 2 | 99% | 3.65 days |
| 3 | 99.9% | 8.76 hours |
| 4 | 99.99% | 52.6 minutes |
| 5 | 99.999% | 5.26 minutes |
Estimation Pipeline
Users → DAU → QPS → Storage → Bandwidth → Cache
Quick Math Constants
| Constant | Value |
|---|
| Seconds/day | ~86,400 (~10^5) |
| Seconds/month | ~2.5M |
| Seconds/year | ~30M |
| 1M req/day | ~12 QPS |
| 1B req/day | ~12K QPS |
Powers of 2
| Power | Value | Name |
|---|
| 2^10 | ~1K | Kilo |
| 2^20 | ~1M | Mega |
| 2^30 | ~1G | Giga |
| 2^40 | ~1T | Tera |
Estimation Template
DAU = MAU * 0.2 (typical social app)
QPS = DAU * actions_per_user / 86,400
Peak = QPS * 3
Storage = daily_writes * bytes_per_write * 365 (yearly)
Cache = 20% of daily unique data (80-20 rule)
Latency Numbers
| Operation | Latency |
|---|
| L1 cache | ~1 ns |
| RAM | ~100 ns |
| SSD random read | ~100 us |
| HDD random read | ~10 ms |
| Same-datacenter RTT | ~0.5 ms |
| Cross-continent RTT | ~150 ms |
| Redis GET | ~1 ms |
| PostgreSQL query | ~1-5 ms |
Decomposition Heuristics
| Heuristic | Meaning |
|---|
| Single responsibility | One service, one business capability |
| Data ownership | Each service owns its database |
| Rate of change | Fast-changing modules separated from stable ones |
| Scaling needs | Different scaling profiles = different services |
| Bounded context (DDD) | Group by business domain |
Communication Patterns
| Pattern | When | Example |
|---|
| Sync (REST/gRPC) | Need immediate answer | Auth check, data fetch |
| Async (Queue/Event) | Caller doesn't wait | Email, video transcode, analytics |
| Hybrid | Critical path sync + side effects async | Post tweet (sync) + fan-out (async) |
Database Selection
| Need | Choose |
|---|
| Transactions, joins, ACID | SQL (PostgreSQL, MySQL) |
| High write throughput, time-series | Cassandra |
| Flexible schema, documents | MongoDB, DynamoDB |
| Fast lookups by key | Redis, DynamoDB |
| Full-text search | Elasticsearch |
| Relationships, graphs | Neo4j |
Caching Strategies
| Strategy | How | When |
|---|
| Cache-aside | App checks cache; fills on miss | Most common; general purpose |
| Write-through | Write to cache + DB together | Strong consistency needed |
| Write-behind | Write to cache; async flush to DB | High write throughput |
| TTL-based | Entries expire after N seconds | Acceptable staleness |
Fan-Out Models (Timeline)
| Model | Writes | Reads | Best For |
|---|
| Push (fan-out on write) | Expensive (write to all followers) | Cheap (pre-computed) | Most users |
| Pull (fan-out on read) | Cheap | Expensive (merge at read) | Celebrity accounts |
| Hybrid | Push for normal; pull for celebrities | Mix | Twitter-like systems |
Interview 5-Phase Framework
Phase 1: Requirements 5 min
Phase 2: Estimation 5 min
Phase 3: High-Level Design 15 min
Phase 4: Deep Dive 15 min
Phase 5: Wrap Up 5 min
Common Mistakes (Avoid These)
| Mistake | Fix |
|---|
| Skip requirements | Always ask 5+ questions first |
| Over-engineer | Match solution to scale |
| No trade-offs | Justify every decision |
| Monologue | Check in with interviewer |
| Deep dive too early | Draw full picture first |
| Ignore failures | Mention redundancy, retries |
| Skip estimation | Show your math |
Curveball Response Framework
1. ACKNOWLEDGE "That's a great point."
2. STATE IMPACT "If X happens, the impact would be..."
3. PROPOSE FIX "To handle this, I would..."
4. TRADE-OFF "The trade-off is..."
SLA / SLO / SLI
| Term | What | Example |
|---|
| SLI | Metric you measure | P99 latency = 350ms |
| SLO | Target for that metric | P99 latency < 500ms |
| SLA | Contract with consequences | Credit if P99 > 500ms for a month |
Whiteboard Layout
┌───────────────────┬──────────────────────┐
│ REQUIREMENTS │ ESTIMATION │
│ (top-left) │ (top-right) │
├───────────────────┴──────────────────────┤
│ │
│ ARCHITECTURE DIAGRAM │
│ (center) │
│ │
├──────────────────┬───────────────────────┤
│ API ENDPOINTS │ TRADE-OFFS │
│ (bottom-left) │ (bottom-right) │
└──────────────────┴───────────────────────┘
← Back to 9.7 — System Design Foundations (README)