Episode 9 — System Design / 9.11 — Real World System Design Problems
9.11.d Design a Social Media Feed (Twitter / Instagram)
Problem Statement
Design the news feed system for a social media platform like Twitter or Instagram. Users post content and see a personalized feed of posts from accounts they follow. The system must handle celebrity accounts with millions of followers.
1. Requirements
Functional Requirements
- Users can create posts (text, images, videos)
- Users follow/unfollow other users
- Users see a personalized feed of posts from followed accounts
- Feed is ranked by relevance (not purely chronological)
- Users can like, comment, and share posts
- Support for celebrity accounts (millions of followers)
Non-Functional Requirements
- Feed generation latency: < 500ms
- Feed freshness: new posts appear within 30 seconds
- Support 300 million daily active users
- Each user follows ~200 accounts on average
- 99.9% availability
- Eventually consistent (slight feed delays are acceptable)
2. Capacity Estimation
Traffic
Daily active users: 300 million
New posts per day: 500 million
Feed refreshes/day: 300M users * 10 refreshes = 3 billion
Feed requests/second: 3B / 86,400 ~= 35,000/sec
Post writes/second: 500M / 86,400 ~= 5,800/sec
Storage
Average post size: 1 KB (text + metadata)
Media per post: Average 500 KB (images/thumbnails)
Daily post storage: 500M * 1 KB = 500 GB (metadata)
Daily media storage: 500M * 0.5 * 500 KB = 125 TB (media)
Fan-out Numbers
Average followers: 200
Celebrity followers: 50 million (top accounts)
Total fan-out writes: 500M posts * 200 avg followers = 100 billion/day
Fan-out writes/sec: 100B / 86,400 ~= 1.16 million/sec
3. High-Level Architecture
+----------+ +-------------------+ +-----------------+
| Client |---->| API Gateway |---->| Post Service |
| (App) | | + Load Balancer | | (Create/Read) |
+----------+ +-------------------+ +--------+--------+
| |
| +------v------+
| | Post Store |
| | (Cassandra) |
+--------v--------+ +------+------+
| Feed Service | |
| (Generation) | +------v------+
+--------+--------+ | Media Store |
| | (S3 + CDN) |
+--------v--------+ +-------------+
| Feed Cache |
| (Redis) | +--------------+
+-----------------+ | Fan-out |
| Service |
+-----------------+ +------+-------+
| Social Graph | |
| Service | +------v-------+
| (who follows | | Message Queue|
| whom) | | (Kafka) |
+-----------------+ +--------------+
|
+-----------------+ +--------------+
| Graph Store | | Ranking |
| (Neo4j/Redis) | | Service (ML) |
+-----------------+ +--------------+
4. API Design
POST /api/v1/posts
Headers: Authorization: Bearer <token>
Body: {
"content": "Check out this sunset!",
"media_ids": ["media_123", "media_456"],
"location": { "lat": 37.7749, "lng": -122.4194 },
"tags": ["sunset", "california"]
}
Response 201: { "post_id": "post_789", "created_at": "..." }
GET /api/v1/feed?cursor={cursor}&limit=20
Headers: Authorization: Bearer <token>
Response 200: {
"posts": [
{
"post_id": "post_789",
"author": { "user_id": "u_42", "username": "jane", "avatar": "..." },
"content": "Check out this sunset!",
"media": [{ "url": "...", "type": "image" }],
"likes_count": 1523,
"comments_count": 87,
"created_at": "2026-04-11T18:00:00Z",
"is_liked": false
}
],
"next_cursor": "eyJ0cyI6MTY4MTIwMDAwMH0="
}
POST /api/v1/users/{user_id}/follow
Response 200: { "following": true }
DELETE /api/v1/users/{user_id}/follow
Response 200: { "following": false }
POST /api/v1/posts/{post_id}/like
Response 200: { "liked": true, "likes_count": 1524 }
GET /api/v1/posts/{post_id}/comments?cursor={cursor}&limit=20
Response 200: { "comments": [...], "next_cursor": "..." }
5. Database Schema
Posts Table (Cassandra)
CREATE TABLE posts (
post_id TIMEUUID PRIMARY KEY,
author_id UUID,
content TEXT,
media_urls LIST<TEXT>,
location MAP<TEXT, DOUBLE>,
tags SET<TEXT>,
likes_count COUNTER,
comments_count COUNTER,
created_at TIMESTAMP
);
User Feed (Cassandra -- for fan-out on write)
CREATE TABLE user_feed (
user_id UUID,
post_id TIMEUUID,
author_id UUID,
created_at TIMESTAMP,
PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
-- Each user's feed is a partition sorted by time
-- Feed generation: read from this table, paginated
Social Graph (Redis or Neo4j)
Redis Sets:
following:{user_id} -> SET of user_ids this user follows
followers:{user_id} -> SET of user_ids that follow this user
Operations:
SADD following:u1 u2 -- u1 follows u2
SREM following:u1 u2 -- u1 unfollows u2
SCARD followers:u2 -- count of u2's followers
SISMEMBER following:u1 u2 -- does u1 follow u2?
SMEMBERS followers:u2 -- all followers of u2
Social Graph (PostgreSQL -- source of truth)
CREATE TABLE follows (
follower_id UUID NOT NULL,
followee_id UUID NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (follower_id, followee_id)
);
CREATE INDEX idx_followee ON follows(followee_id);
6. Deep Dive: Fan-Out Strategies
Fan-Out on Write (Push Model)
User A creates a post:
1. Store post in posts table
2. Fetch all followers of User A: [B, C, D, E, ...]
3. For each follower, insert post_id into their feed:
user_feed[B] <- post_id
user_feed[C] <- post_id
user_feed[D] <- post_id
...
When User B opens their feed:
1. Read from user_feed[B] (already pre-computed)
2. Fetch post details for each post_id
3. Return to client
Pros:
- Feed reads are FAST (pre-computed)
- Simple read path
Cons:
- Celebrity problem: user with 50M followers = 50M writes per post
- Wasted work for inactive users
- High write amplification
Fan-Out on Read (Pull Model)
User B opens their feed:
1. Fetch list of accounts B follows: [A, C, D, ...]
2. For each followed account, fetch recent posts
3. Merge and sort by timestamp/relevance
4. Return top N posts
No pre-computation needed.
Pros:
- No write amplification
- No wasted work for inactive users
- Celebrity posts are efficient
Cons:
- Feed reads are SLOW (fetching from many sources)
- High read latency at feed generation time
Hybrid Approach (Recommended)
+-----------------------+ +------------------------+
| Regular Users | | Celebrity Users |
| (< 10K followers) | | (> 10K followers) |
| | | |
| Fan-out on WRITE | | Fan-out on READ |
| Push to follower feeds| | Pull at feed time |
+-----------------------+ +------------------------+
\ /
\ /
+---- Feed Generation ----------+
| |
| 1. Read pre-computed feed |
| (regular user posts) |
| 2. Fetch celebrity posts |
| from followed celebrities |
| 3. Merge + Rank |
| 4. Return top N |
+-------------------------------+
Threshold: Users with > 10,000 followers are treated as celebrities.
When celebrity posts:
- Post stored in posts table ONLY
- NO fan-out to follower feeds
- When followers fetch feed, we pull celebrity posts at read time
When regular user posts:
- Fan-out to all followers' feeds
- Standard push model
7. Deep Dive: Feed Ranking
Ranking Pipeline
Candidate --> Feature --> Scoring --> Re-ranking --> Feed
Generation Extraction Model + Filtering
(500 posts) (per post) (ML model) (diversity) (20 posts)
Feature Categories
Post Features:
- Age of post (freshness)
- Media type (image/video/text)
- Engagement rate (likes/impressions)
- Content category
Author Features:
- Relationship closeness (interaction frequency)
- Author engagement rate
- Is author a close friend?
User Features:
- Interests and past engagement patterns
- Time of day preferences
- Device type
Cross Features:
- Has user liked similar posts before?
- Do user and author share mutual connections?
Scoring Formula (Simplified)
score = w1 * affinity_score
+ w2 * post_engagement_rate
+ w3 * freshness_decay(post_age)
+ w4 * content_type_preference
+ w5 * interaction_probability (ML model)
freshness_decay(age_hours) = 1 / (1 + age_hours / 12)
affinity_score = (
likes_on_author_posts * 0.4 +
comments_on_author_posts * 0.3 +
profile_views * 0.2 +
DM_interactions * 0.1
) / total_interactions
8. Deep Dive: The Celebrity Problem
Problem: @taylorswift has 50 million followers. She posts once.
Fan-out on write: 50 million feed inserts = catastrophic
Solution:
1. Tag users with follower count > threshold as "celebrity"
2. Celebrity posts are NOT fanned out
3. Store celebrity posts in a separate "celebrity_posts" table
4. At feed generation time:
a. Fetch pre-computed feed (from regular followees)
b. Fetch recent posts from followed celebrities (small list)
c. Merge and rank
Celebrity post fetch is fast because:
- Average user follows ~5-10 celebrities
- Each celebrity has a small number of recent posts
- Celebrity posts are heavily cached
Caching Strategy for Celebrity Posts
Redis Cache:
Key: celebrity_posts:{user_id}
Value: Sorted set of recent post_ids (last 100)
TTL: 1 hour (refreshed on new post)
When celebrity posts:
1. Add post_id to their sorted set in Redis
2. Trim set to last 100 entries
3. Invalidate CDN cache for their profile
Celebrity post cache hit rate: > 99%
(millions of followers requesting the same posts)
9. Feed Caching
Feed Cache Architecture:
+-------------------+
| User opens feed |
+--------+----------+
|
+--------v----------+
| Check feed cache |
| (Redis) |
+--------+----------+
|
+----+----+
| |
HIT MISS
| |
v v
Return Generate feed
cached (hybrid fan-out)
feed |
v
Cache generated
feed in Redis
|
v
Return feed
Cache Key: feed:{user_id}
Cache Value: List of post_ids (last 200)
Cache TTL: 5 minutes
Invalidation: On new post from followed user (for push users)
Feed Pagination with Cursors
First page: GET /feed?limit=20
-> Returns posts + cursor="eyJ0czoxNjgxMjAwMDAwfQ=="
Next page: GET /feed?cursor=eyJ0czoxNjgxMjAwMDAwfQ==&limit=20
-> Decode cursor: { "ts": 1681200000, "score": 0.87 }
-> Fetch posts with score < 0.87 or ts < 1681200000
Why cursors over offset/limit:
- New posts don't shift the pagination window
- No "duplicate post" problem on page 2
- More efficient (no OFFSET scan)
10. Scaling Considerations
Fan-Out Service Scaling
Fan-out is the most write-intensive component:
Post rate: 5,800 posts/sec
Average fan-out: 200 followers
Total fan-out writes: 1.16M writes/sec
Fan-out workers: 100 workers, each handling ~12K writes/sec
Queue: Kafka partitioned by author_id (ensure ordering per author)
Worker processing:
1. Consume post event from Kafka
2. Fetch follower list (from Redis cache)
3. Batch insert into user_feed table (batches of 500)
4. Ack Kafka offset
Database Sharding
Posts: Shard by post_id (hash-based)
User Feed: Shard by user_id (each user's feed on one shard)
Social Graph: Shard by user_id
Feed reads only hit ONE shard (the user's shard) -> fast!
Geographic Distribution
+------------------+ +------------------+
| US Region | | Asia Region |
| - Feed Service | sync | - Feed Service |
| - Feed Cache | <------> | - Feed Cache |
| - Post Store | Kafka | - Post Store |
+------------------+ Bridge +------------------+
Posts are replicated across regions.
Feed caches are region-local.
Users are pinned to their nearest region.
Cross-region follows add ~200ms latency to fan-out (acceptable).
11. Key Tradeoffs
| Decision | Option A | Option B | Our Choice |
|---|---|---|---|
| Fan-out strategy | Push (on write) | Pull (on read) | Hybrid |
| Celebrity threshold | 10K followers | 100K followers | 10K (configurable) |
| Feed ordering | Chronological | Ranked (ML) | Ranked |
| Feed cache TTL | Short (1 min) | Long (10 min) | 5 min |
| Social graph store | Relational DB | Graph DB | Redis + PostgreSQL |
| Pagination | Offset-based | Cursor-based | Cursor-based |
| Engagement counts | Real-time accurate | Approximate | Approximate |
12. Failure Scenarios and Mitigations
Scenario Mitigation
------------------------------------------------------------------------
Fan-out service lag Serve slightly stale feed from cache
Celebrity posts fill the gap
Cache failure Fall back to fan-out-on-read for all users
Kafka consumer lag Scale up consumers; feed still serves from cache
Social graph Redis failure Fall back to PostgreSQL (higher latency)
Ranking model failure Fall back to chronological ordering
Celebrity detection delay Conservative threshold; manual override
Post store unavailable Circuit breaker; serve cached feed only
Key Takeaways
- Hybrid fan-out is the industry standard -- push for regular users, pull for celebrities. Pure push or pure pull both fail at scale.
- The celebrity problem is the defining challenge of social media feed design -- expect interviewers to probe this specifically.
- Feed ranking is a major differentiator -- even a simple scoring model dramatically improves engagement over chronological feeds.
- Cursor-based pagination is required for feeds where content is constantly being inserted -- offset-based pagination causes duplicates.
- Eventually consistent feeds are perfectly acceptable -- users do not notice a 5-30 second delay in seeing new posts.