Episode 9 — System Design / 9.7 — System Design Foundations

9.7 — System Design Foundations: Quick Revision

Compact cheat sheet. Print-friendly.

How to use this material (instructions)

  1. Skim top-to-bottom in one pass before quizzes or interviews.
  2. If a row feels fuzzy — open the matching lesson: README.md9.7.a9.7.e.
  3. Deep practice9.7-Exercise-Questions.md.
  4. Polish phrasing9.7-Interview-Questions.md.

HLD vs LLD (master table)

HLDLLD
ScopeEntire systemSingle module/service
DecisionsServices, DBs, caches, protocolsClasses, methods, patterns
DiagramsArchitecture, data flowClass, sequence, UML
Trade-offsCAP, latency vs consistencyCoupling vs cohesion
Interview"Design Twitter at scale""Design a Parking Lot"

Building Blocks of HLD

BlockPurposeExamples
ServicesBusiness logic unitsAuth Service, Feed Service
DatabasesPersistent storagePostgreSQL, Cassandra, DynamoDB
CachesFast reads, reduce DB loadRedis, Memcached
QueuesAsync processing, decouplingKafka, SQS, RabbitMQ
Load BalancersDistribute trafficNginx, AWS ALB
CDNEdge-cache static assetsCloudFront, Cloudflare
API GatewayRouting, auth, rate limitingKong, AWS API Gateway

Requirements Checklist (5 min)

  1. What are the core features?          (3-5 bullet points)
  2. Who are the users?                   (mobile, web, API)
  3. What scale?                          (1K, 1M, 1B users)
  4. Read/write ratio?                    (100:1, 10:1, 1:1)
  5. Quality attributes?                  (latency, availability, consistency)
  6. What is out of scope?                (explicitly exclude)

CAP Theorem

  CP = Consistent + Partition-tolerant   (rejects requests during partition)
  AP = Available + Partition-tolerant    (serves stale data during partition)
  CA = Not practical in distributed systems (partitions are inevitable)
ChoiceUse WhenExamples
CPBanking, inventory, leader electionHBase, Zookeeper
APSocial feeds, shopping carts, analyticsCassandra, DynamoDB

Availability Targets

NinesAvailabilityDowntime/Year
299%3.65 days
399.9%8.76 hours
499.99%52.6 minutes
599.999%5.26 minutes

Estimation Pipeline

  Users → DAU → QPS → Storage → Bandwidth → Cache

Quick Math Constants

ConstantValue
Seconds/day~86,400 (~10^5)
Seconds/month~2.5M
Seconds/year~30M
1M req/day~12 QPS
1B req/day~12K QPS

Powers of 2

PowerValueName
2^10~1KKilo
2^20~1MMega
2^30~1GGiga
2^40~1TTera

Estimation Template

  DAU     = MAU * 0.2 (typical social app)
  QPS     = DAU * actions_per_user / 86,400
  Peak    = QPS * 3
  Storage = daily_writes * bytes_per_write * 365 (yearly)
  Cache   = 20% of daily unique data (80-20 rule)

Latency Numbers

OperationLatency
L1 cache~1 ns
RAM~100 ns
SSD random read~100 us
HDD random read~10 ms
Same-datacenter RTT~0.5 ms
Cross-continent RTT~150 ms
Redis GET~1 ms
PostgreSQL query~1-5 ms

Decomposition Heuristics

HeuristicMeaning
Single responsibilityOne service, one business capability
Data ownershipEach service owns its database
Rate of changeFast-changing modules separated from stable ones
Scaling needsDifferent scaling profiles = different services
Bounded context (DDD)Group by business domain

Communication Patterns

PatternWhenExample
Sync (REST/gRPC)Need immediate answerAuth check, data fetch
Async (Queue/Event)Caller doesn't waitEmail, video transcode, analytics
HybridCritical path sync + side effects asyncPost tweet (sync) + fan-out (async)

Database Selection

NeedChoose
Transactions, joins, ACIDSQL (PostgreSQL, MySQL)
High write throughput, time-seriesCassandra
Flexible schema, documentsMongoDB, DynamoDB
Fast lookups by keyRedis, DynamoDB
Full-text searchElasticsearch
Relationships, graphsNeo4j

Caching Strategies

StrategyHowWhen
Cache-asideApp checks cache; fills on missMost common; general purpose
Write-throughWrite to cache + DB togetherStrong consistency needed
Write-behindWrite to cache; async flush to DBHigh write throughput
TTL-basedEntries expire after N secondsAcceptable staleness

Fan-Out Models (Timeline)

ModelWritesReadsBest For
Push (fan-out on write)Expensive (write to all followers)Cheap (pre-computed)Most users
Pull (fan-out on read)CheapExpensive (merge at read)Celebrity accounts
HybridPush for normal; pull for celebritiesMixTwitter-like systems

Interview 5-Phase Framework

  Phase 1: Requirements        5 min
  Phase 2: Estimation          5 min
  Phase 3: High-Level Design  15 min
  Phase 4: Deep Dive          15 min
  Phase 5: Wrap Up             5 min

Common Mistakes (Avoid These)

MistakeFix
Skip requirementsAlways ask 5+ questions first
Over-engineerMatch solution to scale
No trade-offsJustify every decision
MonologueCheck in with interviewer
Deep dive too earlyDraw full picture first
Ignore failuresMention redundancy, retries
Skip estimationShow your math

Curveball Response Framework

  1. ACKNOWLEDGE    "That's a great point."
  2. STATE IMPACT   "If X happens, the impact would be..."
  3. PROPOSE FIX    "To handle this, I would..."
  4. TRADE-OFF      "The trade-off is..."

SLA / SLO / SLI

TermWhatExample
SLIMetric you measureP99 latency = 350ms
SLOTarget for that metricP99 latency < 500ms
SLAContract with consequencesCredit if P99 > 500ms for a month

Whiteboard Layout

  ┌───────────────────┬──────────────────────┐
  │ REQUIREMENTS      │ ESTIMATION           │
  │ (top-left)        │ (top-right)          │
  ├───────────────────┴──────────────────────┤
  │                                          │
  │         ARCHITECTURE DIAGRAM             │
  │              (center)                    │
  │                                          │
  ├──────────────────┬───────────────────────┤
  │ API ENDPOINTS    │ TRADE-OFFS            │
  │ (bottom-left)    │ (bottom-right)        │
  └──────────────────┴───────────────────────┘

← Back to 9.7 — System Design Foundations (README)