Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.d Design a Social Media Feed (Twitter / Instagram)

Problem Statement

Design the news feed system for a social media platform like Twitter or Instagram. Users post content and see a personalized feed of posts from accounts they follow. The system must handle celebrity accounts with millions of followers.

1. Requirements

Functional Requirements

Users can create posts (text, images, videos)
Users follow/unfollow other users
Users see a personalized feed of posts from followed accounts
Feed is ranked by relevance (not purely chronological)
Users can like, comment, and share posts
Support for celebrity accounts (millions of followers)

Non-Functional Requirements

Feed generation latency: < 500ms
Feed freshness: new posts appear within 30 seconds
Support 300 million daily active users
Each user follows ~200 accounts on average
99.9% availability
Eventually consistent (slight feed delays are acceptable)

2. Capacity Estimation

Traffic

Daily active users:     300 million
New posts per day:      500 million
Feed refreshes/day:     300M users * 10 refreshes = 3 billion
Feed requests/second:   3B / 86,400 ~= 35,000/sec
Post writes/second:     500M / 86,400 ~= 5,800/sec

Storage

Average post size:      1 KB (text + metadata)
Media per post:         Average 500 KB (images/thumbnails)
Daily post storage:     500M * 1 KB = 500 GB (metadata)
Daily media storage:    500M * 0.5 * 500 KB = 125 TB (media)

Fan-out Numbers

Average followers:       200
Celebrity followers:     50 million (top accounts)
Total fan-out writes:    500M posts * 200 avg followers = 100 billion/day
Fan-out writes/sec:      100B / 86,400 ~= 1.16 million/sec

3. High-Level Architecture

+----------+     +-------------------+     +-----------------+
|  Client  |---->|   API Gateway     |---->| Post Service    |
|  (App)   |     |   + Load Balancer |     | (Create/Read)   |
+----------+     +-------------------+     +--------+--------+
                          |                         |
                          |                  +------v------+
                          |                  | Post Store  |
                          |                  | (Cassandra) |
                 +--------v--------+         +------+------+
                 | Feed Service    |                |
                 | (Generation)    |         +------v------+
                 +--------+--------+         | Media Store |
                          |                  | (S3 + CDN)  |
                 +--------v--------+         +-------------+
                 | Feed Cache      |
                 | (Redis)         |         +--------------+
                 +-----------------+         | Fan-out      |
                                             | Service      |
                 +-----------------+         +------+-------+
                 | Social Graph    |                |
                 | Service         |         +------v-------+
                 | (who follows    |         | Message Queue|
                 |  whom)          |         | (Kafka)      |
                 +-----------------+         +--------------+
                          |
                 +-----------------+         +--------------+
                 | Graph Store     |         | Ranking      |
                 | (Neo4j/Redis)   |         | Service (ML) |
                 +-----------------+         +--------------+

4. API Design

POST /api/v1/posts
  Headers: Authorization: Bearer <token>
  Body: {
    "content": "Check out this sunset!",
    "media_ids": ["media_123", "media_456"],
    "location": { "lat": 37.7749, "lng": -122.4194 },
    "tags": ["sunset", "california"]
  }
  Response 201: { "post_id": "post_789", "created_at": "..." }

GET /api/v1/feed?cursor={cursor}&limit=20
  Headers: Authorization: Bearer <token>
  Response 200: {
    "posts": [
      {
        "post_id": "post_789",
        "author": { "user_id": "u_42", "username": "jane", "avatar": "..." },
        "content": "Check out this sunset!",
        "media": [{ "url": "...", "type": "image" }],
        "likes_count": 1523,
        "comments_count": 87,
        "created_at": "2026-04-11T18:00:00Z",
        "is_liked": false
      }
    ],
    "next_cursor": "eyJ0cyI6MTY4MTIwMDAwMH0="
  }

POST /api/v1/users/{user_id}/follow
  Response 200: { "following": true }

DELETE /api/v1/users/{user_id}/follow
  Response 200: { "following": false }

POST /api/v1/posts/{post_id}/like
  Response 200: { "liked": true, "likes_count": 1524 }

GET /api/v1/posts/{post_id}/comments?cursor={cursor}&limit=20
  Response 200: { "comments": [...], "next_cursor": "..." }

5. Database Schema

Posts Table (Cassandra)

CREATE TABLE posts (
    post_id       TIMEUUID PRIMARY KEY,
    author_id     UUID,
    content       TEXT,
    media_urls    LIST<TEXT>,
    location      MAP<TEXT, DOUBLE>,
    tags          SET<TEXT>,
    likes_count   COUNTER,
    comments_count COUNTER,
    created_at    TIMESTAMP
);

User Feed (Cassandra -- for fan-out on write)

CREATE TABLE user_feed (
    user_id       UUID,
    post_id       TIMEUUID,
    author_id     UUID,
    created_at    TIMESTAMP,
    PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
-- Each user's feed is a partition sorted by time
-- Feed generation: read from this table, paginated

Social Graph (Redis or Neo4j)

Redis Sets:
  following:{user_id}  -> SET of user_ids this user follows
  followers:{user_id}  -> SET of user_ids that follow this user

Operations:
  SADD    following:u1 u2        -- u1 follows u2
  SREM    following:u1 u2        -- u1 unfollows u2
  SCARD   followers:u2           -- count of u2's followers
  SISMEMBER following:u1 u2      -- does u1 follow u2?
  SMEMBERS followers:u2          -- all followers of u2

Social Graph (PostgreSQL -- source of truth)

CREATE TABLE follows (
    follower_id   UUID NOT NULL,
    followee_id   UUID NOT NULL,
    created_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id)
);

CREATE INDEX idx_followee ON follows(followee_id);

6. Deep Dive: Fan-Out Strategies

Fan-Out on Write (Push Model)

User A creates a post:

1. Store post in posts table
2. Fetch all followers of User A: [B, C, D, E, ...]
3. For each follower, insert post_id into their feed:

   user_feed[B] <- post_id
   user_feed[C] <- post_id
   user_feed[D] <- post_id
   ...

When User B opens their feed:
1. Read from user_feed[B] (already pre-computed)
2. Fetch post details for each post_id
3. Return to client

Pros:
  - Feed reads are FAST (pre-computed)
  - Simple read path
  
Cons:
  - Celebrity problem: user with 50M followers = 50M writes per post
  - Wasted work for inactive users
  - High write amplification

Fan-Out on Read (Pull Model)

User B opens their feed:

1. Fetch list of accounts B follows: [A, C, D, ...]
2. For each followed account, fetch recent posts
3. Merge and sort by timestamp/relevance
4. Return top N posts

No pre-computation needed.

Pros:
  - No write amplification
  - No wasted work for inactive users
  - Celebrity posts are efficient

Cons:
  - Feed reads are SLOW (fetching from many sources)
  - High read latency at feed generation time

Hybrid Approach (Recommended)

+-----------------------+          +------------------------+
| Regular Users         |          | Celebrity Users        |
| (< 10K followers)    |          | (> 10K followers)      |
|                       |          |                        |
| Fan-out on WRITE      |          | Fan-out on READ        |
| Push to follower feeds|          | Pull at feed time      |
+-----------------------+          +------------------------+
         \                                  /
          \                                /
           +---- Feed Generation ----------+
           |                               |
           | 1. Read pre-computed feed     |
           |    (regular user posts)       |
           | 2. Fetch celebrity posts      |
           |    from followed celebrities  |
           | 3. Merge + Rank              |
           | 4. Return top N              |
           +-------------------------------+

Threshold: Users with > 10,000 followers are treated as celebrities.

When celebrity posts:
  - Post stored in posts table ONLY
  - NO fan-out to follower feeds
  - When followers fetch feed, we pull celebrity posts at read time

When regular user posts:
  - Fan-out to all followers' feeds
  - Standard push model

7. Deep Dive: Feed Ranking

Ranking Pipeline

Candidate       -->  Feature        -->  Scoring    -->  Re-ranking  -->  Feed
Generation           Extraction          Model           + Filtering      
(500 posts)          (per post)          (ML model)      (diversity)      (20 posts)

Feature Categories

Post Features:
  - Age of post (freshness)
  - Media type (image/video/text)
  - Engagement rate (likes/impressions)
  - Content category

Author Features:
  - Relationship closeness (interaction frequency)
  - Author engagement rate
  - Is author a close friend?

User Features:
  - Interests and past engagement patterns
  - Time of day preferences
  - Device type

Cross Features:
  - Has user liked similar posts before?
  - Do user and author share mutual connections?

Scoring Formula (Simplified)

score = w1 * affinity_score
      + w2 * post_engagement_rate
      + w3 * freshness_decay(post_age)
      + w4 * content_type_preference
      + w5 * interaction_probability (ML model)

freshness_decay(age_hours) = 1 / (1 + age_hours / 12)

affinity_score = (
    likes_on_author_posts * 0.4 +
    comments_on_author_posts * 0.3 +
    profile_views * 0.2 +
    DM_interactions * 0.1
) / total_interactions

8. Deep Dive: The Celebrity Problem

Problem: @taylorswift has 50 million followers. She posts once.

Fan-out on write: 50 million feed inserts = catastrophic

Solution:

1. Tag users with follower count > threshold as "celebrity"
2. Celebrity posts are NOT fanned out
3. Store celebrity posts in a separate "celebrity_posts" table
4. At feed generation time:
   a. Fetch pre-computed feed (from regular followees)
   b. Fetch recent posts from followed celebrities (small list)
   c. Merge and rank

Celebrity post fetch is fast because:
  - Average user follows ~5-10 celebrities
  - Each celebrity has a small number of recent posts
  - Celebrity posts are heavily cached

Caching Strategy for Celebrity Posts

Redis Cache:
  Key:   celebrity_posts:{user_id}
  Value: Sorted set of recent post_ids (last 100)
  TTL:   1 hour (refreshed on new post)

When celebrity posts:
1. Add post_id to their sorted set in Redis
2. Trim set to last 100 entries
3. Invalidate CDN cache for their profile

Celebrity post cache hit rate: > 99%
(millions of followers requesting the same posts)

9. Feed Caching

Feed Cache Architecture:

+-------------------+
| User opens feed   |
+--------+----------+
         |
+--------v----------+
| Check feed cache  |
| (Redis)           |
+--------+----------+
         |
    +----+----+
    |         |
  HIT       MISS
    |         |
    v         v
  Return   Generate feed
  cached   (hybrid fan-out)
  feed         |
               v
          Cache generated
          feed in Redis
               |
               v
          Return feed

Cache Key:    feed:{user_id}
Cache Value:  List of post_ids (last 200)
Cache TTL:    5 minutes
Invalidation: On new post from followed user (for push users)

Feed Pagination with Cursors

First page:   GET /feed?limit=20
              -> Returns posts + cursor="eyJ0czoxNjgxMjAwMDAwfQ=="
              
Next page:    GET /feed?cursor=eyJ0czoxNjgxMjAwMDAwfQ==&limit=20
              -> Decode cursor: { "ts": 1681200000, "score": 0.87 }
              -> Fetch posts with score < 0.87 or ts < 1681200000

Why cursors over offset/limit:
  - New posts don't shift the pagination window
  - No "duplicate post" problem on page 2
  - More efficient (no OFFSET scan)

10. Scaling Considerations

Fan-Out Service Scaling

Fan-out is the most write-intensive component:

Post rate:            5,800 posts/sec
Average fan-out:      200 followers
Total fan-out writes: 1.16M writes/sec

Fan-out workers: 100 workers, each handling ~12K writes/sec
Queue: Kafka partitioned by author_id (ensure ordering per author)

Worker processing:
1. Consume post event from Kafka
2. Fetch follower list (from Redis cache)
3. Batch insert into user_feed table (batches of 500)
4. Ack Kafka offset

Database Sharding

Posts:     Shard by post_id (hash-based)
User Feed: Shard by user_id (each user's feed on one shard)
Social Graph: Shard by user_id

Feed reads only hit ONE shard (the user's shard) -> fast!

Geographic Distribution

+------------------+          +------------------+
| US Region        |          | Asia Region      |
| - Feed Service   |  sync    | - Feed Service   |
| - Feed Cache     | <------> | - Feed Cache     |
| - Post Store     | Kafka    | - Post Store     |
+------------------+ Bridge   +------------------+

Posts are replicated across regions.
Feed caches are region-local.
Users are pinned to their nearest region.
Cross-region follows add ~200ms latency to fan-out (acceptable).

11. Key Tradeoffs

Decision	Option A	Option B	Our Choice
Fan-out strategy	Push (on write)	Pull (on read)	Hybrid
Celebrity threshold	10K followers	100K followers	10K (configurable)
Feed ordering	Chronological	Ranked (ML)	Ranked
Feed cache TTL	Short (1 min)	Long (10 min)	5 min
Social graph store	Relational DB	Graph DB	Redis + PostgreSQL
Pagination	Offset-based	Cursor-based	Cursor-based
Engagement counts	Real-time accurate	Approximate	Approximate

12. Failure Scenarios and Mitigations

Scenario                          Mitigation
------------------------------------------------------------------------
Fan-out service lag               Serve slightly stale feed from cache
                                  Celebrity posts fill the gap
Cache failure                     Fall back to fan-out-on-read for all users
Kafka consumer lag                Scale up consumers; feed still serves from cache
Social graph Redis failure        Fall back to PostgreSQL (higher latency)
Ranking model failure             Fall back to chronological ordering
Celebrity detection delay         Conservative threshold; manual override
Post store unavailable            Circuit breaker; serve cached feed only

Key Takeaways

Hybrid fan-out is the industry standard -- push for regular users, pull for celebrities. Pure push or pure pull both fail at scale.
The celebrity problem is the defining challenge of social media feed design -- expect interviewers to probe this specifically.
Feed ranking is a major differentiator -- even a simple scoring model dramatically improves engagement over chronological feeds.
Cursor-based pagination is required for feeds where content is constantly being inserted -- offset-based pagination causes duplicates.
Eventually consistent feeds are perfectly acceptable -- users do not notice a 5-30 second delay in seeing new posts.