Episode 9 — System Design / 9.7 — System Design Foundations

9.7 — Interview Questions: System Design Foundations (HLD)

Practice questions with model answers for high-level design topics commonly asked in software engineering interviews at top tech companies.

How to use this material (instructions)

Read lessons in order — README.md, then 9.7.a through 9.7.e.
Practice out loud — 2-3 minutes per question; avoid reading the model answer mid-answer.
Draw diagrams — for architecture questions, practice on a whiteboard or Excalidraw.
Pair with exercises — use 9.7-Exercise-Questions.md for drills.
Quick review — skim 9.7-Quick-Revision.md the night before.

Beginner (Q1–Q4)

Q1. What is High-Level Design (HLD) and how does it differ from Low-Level Design (LLD)?

Model answer:

HLD is the process of defining a system's architecture at the macro level: which services exist, how they communicate, what databases and caches are used, and how the system scales. It answers "How is the system organized?"

LLD focuses on the internal structure of a single service or module: classes, interfaces, design patterns, and method signatures. It answers "How is the code inside each service organized?"

Aspect	HLD	LLD
Scope	Entire system	Single module/service
Decisions	Service boundaries, DB choices, protocols	Classes, patterns, interfaces
Diagrams	Architecture diagrams, data flow	Class diagrams, sequence diagrams
Interview	"Design Twitter at scale"	"Design classes for a Parking Lot"

In a typical system design interview, you spend 35-40 minutes on HLD and may briefly touch LLD during a deep dive. Dedicated LLD rounds focus entirely on code-level design.

Q2. What are functional vs non-functional requirements? Why are both critical in system design?

Model answer:

Functional requirements describe what the system does — the features and behaviors visible to users. Examples for a URL shortener: create short URL, redirect to original URL, track click analytics.

Non-functional requirements describe how well the system performs — quality attributes that constrain the design. Examples: handle 100M URLs/month, redirect latency under 50ms, 99.9% availability, data durability.

Both are critical because:

Functional requirements determine which components you need (a service for shortening, a database for mappings, an analytics pipeline).
Non-functional requirements determine how you architect those components (caching for latency, replication for availability, sharding for scale).

A design that meets all functional requirements but collapses at 10x expected load is a failed design. Conversely, a perfectly scalable system that is missing a core feature is building the wrong thing.

In interviews, always spend the first 5 minutes clarifying both types. Write them on the board so they guide every subsequent decision.

Q3. Explain the CAP theorem. How does it influence system design decisions?

Model answer:

The CAP theorem states that in a distributed system experiencing a network partition, you can only guarantee two of three properties:

Consistency (C): Every read returns the most recent write (all nodes see the same data at the same time).
Availability (A): Every request receives a response (even if some nodes are down).
Partition Tolerance (P): The system continues operating despite network partitions between nodes.

Since network partitions are inevitable in distributed systems, the real choice is between:

CP (Consistency + Partition Tolerance): During a partition, some requests may be rejected to maintain consistency. Examples: HBase, MongoDB (default config), Zookeeper. Use for: banking, inventory management.
AP (Availability + Partition Tolerance): During a partition, all requests get a response, but data may be temporarily inconsistent. Examples: Cassandra, DynamoDB, CouchDB. Use for: social media feeds, shopping carts, analytics.

In interviews, I explicitly state my CAP trade-off choice and justify it: "For a social media timeline, I choose AP because users seeing a slightly stale feed is acceptable, but the system being unavailable is not."

Q4. Walk through the 5-phase framework for a system design interview.

Model answer:

  Phase 1: REQUIREMENTS (5 min)
  ─────────────────────────────
  Ask clarifying questions. List functional requirements (P0, P1),
  non-functional requirements (scale, latency, availability), and
  explicitly state what is out of scope. Write these on the board.

  Phase 2: ESTIMATION (5 min)
  ────────────────────────────
  Convert user numbers into QPS, storage, bandwidth, and cache needs.
  Show your math out loud. Highlight insights ("460K read QPS means
  we need aggressive caching").

  Phase 3: HIGH-LEVEL DESIGN (15 min)
  ────────────────────────────────────
  Draw the architecture diagram: clients → load balancer → API gateway
  → services → data stores. Walk through the request path step by step.
  Label every arrow. This is the main event.

  Phase 4: DEEP DIVE (15 min)
  ────────────────────────────
  The interviewer picks 1-2 components to explore deeper. Be prepared
  to discuss: DB schema, caching strategy, sharding, fan-out, failure
  handling. Discuss trade-offs for every decision.

  Phase 5: WRAP UP (5 min)
  ─────────────────────────
  Identify bottlenecks, single points of failure, and future improvements.
  Mention monitoring and observability. Show awareness of what you would
  do with more time or 10x scale.

The framework keeps you on track and ensures you cover all the signals interviewers look for: requirements gathering, quantitative reasoning, architectural thinking, depth, and trade-off analysis.

Intermediate (Q5–Q8)

Q5. How do you decide between SQL and NoSQL databases in a system design?

Model answer:

The choice depends on the data model, query patterns, scale, and consistency requirements.

Factor	Choose SQL (PostgreSQL, MySQL)	Choose NoSQL (Cassandra, DynamoDB, MongoDB)
Data model	Structured, relational, joins needed	Flexible schema, denormalized, key-value or document
Consistency	ACID transactions required (banking, inventory)	Eventual consistency acceptable (feeds, analytics)
Query patterns	Complex queries, aggregations, ad-hoc analysis	Simple lookups by key, range scans
Scale	Vertical scaling first; read replicas for reads	Horizontal scaling built-in; distributed by design
Write volume	Moderate writes (< 10K QPS per instance)	Very high writes (Cassandra handles 100K+ QPS)

My decision process in an interview:

Start with SQL as the default — most systems benefit from ACID and structured data.
Switch to NoSQL if: (a) write volume is very high, (b) data is naturally denormalized (chat messages by conversation), (c) I need horizontal scaling from day one, or (d) the schema evolves rapidly.
Use both when appropriate — SQL for users/orders (relational, transactional), NoSQL for activity feeds/analytics (high volume, denormalized).

Example: For Twitter, I would use PostgreSQL for user profiles (structured, relational) and Cassandra for the tweet store (write-heavy, time-series, denormalized).

Q6. Explain push vs pull models for generating a social media timeline. When would you use each?

Model answer:

Push model (Fan-Out on Write): When Alice posts a tweet, the system immediately pushes it to every follower's pre-computed timeline cache.

  Alice posts tweet → Look up Alice's 10K followers
                    → Write tweet to each follower's cache
                    → When Bob reads timeline, it's already there

Pro: Reads are fast (pre-computed, O(1) cache read).
Con: Writes are expensive for users with many followers. If a celebrity has 50M followers, one tweet triggers 50M cache writes.

Pull model (Fan-Out on Read): When Bob reads his timeline, the system fetches recent tweets from all accounts Bob follows and merges them.

  Bob requests timeline → Look up Bob's 500 followed accounts
                        → Fetch latest tweets from each
                        → Merge, sort, return

Pro: No write amplification — posting is cheap.
Con: Reads are slow (many queries per timeline request).

Hybrid model (what Twitter uses):

Regular users (< 10K followers): Push model.
Celebrities (> 10K followers): Excluded from push. Their tweets are fetched at read time and merged with the pre-computed cache.
This balances write cost and read performance.

I would choose push for most social apps (fast reads are critical for UX), switch to hybrid when celebrity accounts create write amplification problems.

Q7. How do you estimate capacity for a system with 100M daily active users?

Model answer:

I follow a systematic pipeline: Users → QPS → Storage → Bandwidth → Cache.

Step 1: QPS

  DAU: 100M
  Assume each user performs 10 read actions and 2 write actions per day.

  Read QPS = 100M * 10 / 86,400 ≈ 11,574 → ~12K QPS
  Write QPS = 100M * 2 / 86,400 ≈ 2,315 → ~2.3K QPS
  Peak (3x): Read ~36K QPS, Write ~7K QPS

Step 2: Storage

  Assume each write produces ~1 KB of data.
  Daily: 200M writes * 1 KB = 200 GB/day
  Yearly: 200 GB * 365 = 73 TB/year
  With 3x replication: 219 TB/year

Step 3: Bandwidth

  Egress = Read QPS * response size
  = 12K * 50 KB (typical page) = 600 MB/s
  Peak: 1.8 GB/s

Step 4: Cache

  Cache hottest 20% of daily unique data.
  If 50M unique items accessed/day at 1 KB each:
  20% = 10M * 1 KB = 10 GB (fits in a single Redis)
  With richer data: adjust up proportionally.

Key insight: These numbers tell me I need: multiple app servers behind a load balancer, a caching layer to protect the database, and eventually database sharding or read replicas. A single PostgreSQL instance would not survive 36K peak read QPS without a cache in front.

Q8. What are the most common mistakes in system design interviews?

Model answer:

Jumping into design without requirements. This leads to building the wrong thing or designing at the wrong scale. Fix: Spend the first 5 minutes asking clarifying questions.
Over-engineering for small scale. Using Kubernetes, CQRS, event sourcing, and microservices for a 1K-user system. Fix: Match complexity to scale. A monolith with PostgreSQL is fine for small/medium.
Not discussing trade-offs. Saying "I'll use NoSQL" without explaining why. Fix: For every decision, state the alternative and why you rejected it.
Monologuing without checking in. Talking for 15 minutes without engaging the interviewer. Fix: Pause after each phase: "Does this direction make sense?"
Going deep too early. Spending 10 minutes on database schema before drawing the overall architecture. Fix: Draw the complete high-level picture first, then dive into specifics.
Ignoring failure scenarios. Assuming everything works perfectly. Fix: Proactively mention: "What if this service goes down? We have a replica. What if the cache fails? We fall back to the database."
Skipping estimation. Not showing any math. Fix: Even rough numbers (12K QPS, 73 TB/year) demonstrate quantitative thinking and justify architectural choices.

Advanced (Q9–Q11)

Q9. How would you design a system that needs to handle a "thundering herd" problem — 1M users all requesting the same resource at the same time?

Model answer:

A thundering herd occurs when a cached item expires and millions of concurrent requests simultaneously hit the database for the same key. This can overload the database.

Solutions (layered):

Request coalescing / single-flight: When the cache misses, only ONE request goes to the database. All other concurrent requests for the same key wait for that single fetch to complete, then all get the result from the newly populated cache.
Cache stampede prevention (locking): When a cache entry expires, the first request acquires a distributed lock (Redis SETNX), fetches from DB, and refreshes the cache. Other requests wait or serve the stale value.
Stale-while-revalidate: Serve the stale cached value immediately while asynchronously refreshing in the background. Users see slightly old data for a brief window but never hit the database simultaneously.
Proactive cache warming: For known hot keys (trending topics, celebrity profiles), refresh the cache BEFORE it expires using a background job. The cache never goes empty.
CDN / edge caching: For static or semi-static content, CDN absorbs the thundering herd at the edge. Origin server sees minimal traffic.

Architecture:

  1M concurrent requests
         │
    ┌────▼─────┐       Only 1 request
    │  Cache   │──────────────────────►┌──────────┐
    │  (MISS)  │  request coalescing   │ Database │
    └────┬─────┘◄──────────────────────└──────────┘
         │           result
    All 1M requests
    served from newly
    populated cache

In an interview, I would mention all approaches and then choose based on the use case. For a social media trending page: CDN + stale-while-revalidate. For a flash sale e-commerce page: request coalescing + proactive warming.

Q10. Explain how you would shard a database. What are the trade-offs of different sharding strategies?

Model answer:

Sharding splits a large database into smaller, independent partitions (shards) across multiple machines. Each shard holds a subset of the data.

Sharding strategies:

Strategy	How It Works	Pros	Cons
Range-based	Shard by range of values (users A-M on shard 1, N-Z on shard 2)	Simple, range queries work within a shard	Uneven distribution (hotspots); need rebalancing
Hash-based	Hash the shard key and mod by number of shards (hash(user_id) % N)	Even distribution	Range queries span all shards; adding shards requires rehashing
Consistent hashing	Hash key maps to a position on a ring; each shard owns a segment	Adding/removing shards moves minimal data	More complex implementation; virtual nodes needed for balance
Directory-based	A lookup table maps each key to a shard	Maximum flexibility; supports complex routing	Directory is a single point of failure; extra lookup latency

Choosing a shard key: The shard key is the most important decision. A good shard key:

Distributes data evenly across shards
Allows most queries to hit a single shard (avoid scatter-gather)
Does not create hotspots (avoid sharding by timestamp alone)

Example for Twitter:

Shard tweets by tweet_id (hash-based): even distribution, but fetching "all tweets by user X" requires querying all shards.
Shard tweets by user_id: all of a user's tweets on one shard (efficient for user-centric queries), but celebrity users create hotspots.
Hybrid: Shard by user_id for the primary store; maintain a secondary index by tweet_id for direct lookups.

Trade-offs I always mention:

Cross-shard queries are expensive (avoid joins across shards)
Transactions across shards are very difficult (use sagas or eventual consistency)
Rebalancing shards when adding capacity requires data migration

Q11. You are asked to design a system for global scale (users in 50+ countries). What architectural considerations change compared to a single-region design?

Model answer:

Global scale introduces latency, data residency, and consistency challenges that do not exist in single-region architectures.

Key considerations:

1. Multi-region deployment:

  ┌──────────┐     ┌──────────┐     ┌──────────┐
  │ US-East  │     │ EU-West  │     │ AP-South │
  │ Region   │◄───►│ Region   │◄───►│ Region   │
  │          │     │          │     │          │
  │ App + DB │     │ App + DB │     │ App + DB │
  └──────────┘     └──────────┘     └──────────┘
       ▲                ▲                ▲
       │                │                │
  US users         EU users         Asia users

Route users to the nearest region using DNS-based routing (Route 53, Cloudflare) or anycast.

2. Data replication strategy:

Active-passive: One primary region handles all writes; other regions have read replicas. Simple but writes have high latency for non-primary users.
Active-active: Each region accepts writes. Requires conflict resolution (last-write-wins, CRDTs, or application-level merging). Complex but lower write latency globally.

3. Consistency trade-offs:

Cross-region replication latency is 100-300ms. Strong consistency across regions means every write waits for global replication — high latency.
Most global systems choose eventual consistency with causal consistency for critical flows (e.g., a user should see their own writes immediately).

4. Data residency and compliance:

GDPR (EU), data localization laws (India, Russia, China) may require that certain user data stays in-region.
Design data routing so EU user data stays in EU-West region.

5. CDN for static content:

Deploy CDN edge locations in every region. Static assets (images, JS, CSS) should never cross continents.

6. Failure isolation:

A regional outage should not bring down the entire system. Each region operates independently and can survive if another region goes offline.
Use cell-based architecture: each region is a self-contained "cell" with its own app servers, databases, and caches.

In an interview, I would draw the multi-region architecture, explain the replication strategy, and discuss the consistency trade-off: "For a social media platform, I would use active-active with eventual consistency and last-write-wins conflict resolution. For a banking system, I would use active-passive with strong consistency, accepting higher write latency."

Quick-fire

Question	Short Answer
Can you have CA in CAP?	Not practically — network partitions are inevitable
Push or pull for Twitter timelines?	Hybrid (push for regular users, pull for celebrities)
SQL or NoSQL for a chat app's messages?	NoSQL (Cassandra) — write-heavy, time-series, denormalized
How many nines is 99.99%?	Four nines (~52 min downtime/year)
Peak QPS vs average QPS?	Peak is 2-5x average; design for peak
First thing in a system design interview?	Clarify requirements (never start drawing immediately)

Interview tips

Lead with requirements. The first 5 minutes set the quality of the entire interview.
Show your math. Even rough estimation demonstrates engineering maturity.
Justify every decision with the trade-off you are making. "I chose X because Y; the alternative was Z."
Draw big and label everything. The diagram is your primary communication tool.
Engage the interviewer. "Does this direction make sense? Should I go deeper on caching?"

← Back to 9.7 — System Design Foundations (README)