Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.d Design a Social Media Feed (Twitter / Instagram)

Problem Statement

Design the news feed system for a social media platform like Twitter or Instagram. Users post content and see a personalized feed of posts from accounts they follow. The system must handle celebrity accounts with millions of followers.


1. Requirements

Functional Requirements

  • Users can create posts (text, images, videos)
  • Users follow/unfollow other users
  • Users see a personalized feed of posts from followed accounts
  • Feed is ranked by relevance (not purely chronological)
  • Users can like, comment, and share posts
  • Support for celebrity accounts (millions of followers)

Non-Functional Requirements

  • Feed generation latency: < 500ms
  • Feed freshness: new posts appear within 30 seconds
  • Support 300 million daily active users
  • Each user follows ~200 accounts on average
  • 99.9% availability
  • Eventually consistent (slight feed delays are acceptable)

2. Capacity Estimation

Traffic

Daily active users:     300 million
New posts per day:      500 million
Feed refreshes/day:     300M users * 10 refreshes = 3 billion
Feed requests/second:   3B / 86,400 ~= 35,000/sec
Post writes/second:     500M / 86,400 ~= 5,800/sec

Storage

Average post size:      1 KB (text + metadata)
Media per post:         Average 500 KB (images/thumbnails)
Daily post storage:     500M * 1 KB = 500 GB (metadata)
Daily media storage:    500M * 0.5 * 500 KB = 125 TB (media)

Fan-out Numbers

Average followers:       200
Celebrity followers:     50 million (top accounts)
Total fan-out writes:    500M posts * 200 avg followers = 100 billion/day
Fan-out writes/sec:      100B / 86,400 ~= 1.16 million/sec

3. High-Level Architecture

+----------+     +-------------------+     +-----------------+
|  Client  |---->|   API Gateway     |---->| Post Service    |
|  (App)   |     |   + Load Balancer |     | (Create/Read)   |
+----------+     +-------------------+     +--------+--------+
                          |                         |
                          |                  +------v------+
                          |                  | Post Store  |
                          |                  | (Cassandra) |
                 +--------v--------+         +------+------+
                 | Feed Service    |                |
                 | (Generation)    |         +------v------+
                 +--------+--------+         | Media Store |
                          |                  | (S3 + CDN)  |
                 +--------v--------+         +-------------+
                 | Feed Cache      |
                 | (Redis)         |         +--------------+
                 +-----------------+         | Fan-out      |
                                             | Service      |
                 +-----------------+         +------+-------+
                 | Social Graph    |                |
                 | Service         |         +------v-------+
                 | (who follows    |         | Message Queue|
                 |  whom)          |         | (Kafka)      |
                 +-----------------+         +--------------+
                          |
                 +-----------------+         +--------------+
                 | Graph Store     |         | Ranking      |
                 | (Neo4j/Redis)   |         | Service (ML) |
                 +-----------------+         +--------------+

4. API Design

POST /api/v1/posts
  Headers: Authorization: Bearer <token>
  Body: {
    "content": "Check out this sunset!",
    "media_ids": ["media_123", "media_456"],
    "location": { "lat": 37.7749, "lng": -122.4194 },
    "tags": ["sunset", "california"]
  }
  Response 201: { "post_id": "post_789", "created_at": "..." }

GET /api/v1/feed?cursor={cursor}&limit=20
  Headers: Authorization: Bearer <token>
  Response 200: {
    "posts": [
      {
        "post_id": "post_789",
        "author": { "user_id": "u_42", "username": "jane", "avatar": "..." },
        "content": "Check out this sunset!",
        "media": [{ "url": "...", "type": "image" }],
        "likes_count": 1523,
        "comments_count": 87,
        "created_at": "2026-04-11T18:00:00Z",
        "is_liked": false
      }
    ],
    "next_cursor": "eyJ0cyI6MTY4MTIwMDAwMH0="
  }

POST /api/v1/users/{user_id}/follow
  Response 200: { "following": true }

DELETE /api/v1/users/{user_id}/follow
  Response 200: { "following": false }

POST /api/v1/posts/{post_id}/like
  Response 200: { "liked": true, "likes_count": 1524 }

GET /api/v1/posts/{post_id}/comments?cursor={cursor}&limit=20
  Response 200: { "comments": [...], "next_cursor": "..." }

5. Database Schema

Posts Table (Cassandra)

CREATE TABLE posts (
    post_id       TIMEUUID PRIMARY KEY,
    author_id     UUID,
    content       TEXT,
    media_urls    LIST<TEXT>,
    location      MAP<TEXT, DOUBLE>,
    tags          SET<TEXT>,
    likes_count   COUNTER,
    comments_count COUNTER,
    created_at    TIMESTAMP
);

User Feed (Cassandra -- for fan-out on write)

CREATE TABLE user_feed (
    user_id       UUID,
    post_id       TIMEUUID,
    author_id     UUID,
    created_at    TIMESTAMP,
    PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
-- Each user's feed is a partition sorted by time
-- Feed generation: read from this table, paginated

Social Graph (Redis or Neo4j)

Redis Sets:
  following:{user_id}  -> SET of user_ids this user follows
  followers:{user_id}  -> SET of user_ids that follow this user

Operations:
  SADD    following:u1 u2        -- u1 follows u2
  SREM    following:u1 u2        -- u1 unfollows u2
  SCARD   followers:u2           -- count of u2's followers
  SISMEMBER following:u1 u2      -- does u1 follow u2?
  SMEMBERS followers:u2          -- all followers of u2

Social Graph (PostgreSQL -- source of truth)

CREATE TABLE follows (
    follower_id   UUID NOT NULL,
    followee_id   UUID NOT NULL,
    created_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id)
);

CREATE INDEX idx_followee ON follows(followee_id);

6. Deep Dive: Fan-Out Strategies

Fan-Out on Write (Push Model)

User A creates a post:

1. Store post in posts table
2. Fetch all followers of User A: [B, C, D, E, ...]
3. For each follower, insert post_id into their feed:

   user_feed[B] <- post_id
   user_feed[C] <- post_id
   user_feed[D] <- post_id
   ...

When User B opens their feed:
1. Read from user_feed[B] (already pre-computed)
2. Fetch post details for each post_id
3. Return to client
Pros:
  - Feed reads are FAST (pre-computed)
  - Simple read path
  
Cons:
  - Celebrity problem: user with 50M followers = 50M writes per post
  - Wasted work for inactive users
  - High write amplification

Fan-Out on Read (Pull Model)

User B opens their feed:

1. Fetch list of accounts B follows: [A, C, D, ...]
2. For each followed account, fetch recent posts
3. Merge and sort by timestamp/relevance
4. Return top N posts

No pre-computation needed.
Pros:
  - No write amplification
  - No wasted work for inactive users
  - Celebrity posts are efficient

Cons:
  - Feed reads are SLOW (fetching from many sources)
  - High read latency at feed generation time

Hybrid Approach (Recommended)

+-----------------------+          +------------------------+
| Regular Users         |          | Celebrity Users        |
| (< 10K followers)    |          | (> 10K followers)      |
|                       |          |                        |
| Fan-out on WRITE      |          | Fan-out on READ        |
| Push to follower feeds|          | Pull at feed time      |
+-----------------------+          +------------------------+
         \                                  /
          \                                /
           +---- Feed Generation ----------+
           |                               |
           | 1. Read pre-computed feed     |
           |    (regular user posts)       |
           | 2. Fetch celebrity posts      |
           |    from followed celebrities  |
           | 3. Merge + Rank              |
           | 4. Return top N              |
           +-------------------------------+
Threshold: Users with > 10,000 followers are treated as celebrities.

When celebrity posts:
  - Post stored in posts table ONLY
  - NO fan-out to follower feeds
  - When followers fetch feed, we pull celebrity posts at read time

When regular user posts:
  - Fan-out to all followers' feeds
  - Standard push model

7. Deep Dive: Feed Ranking

Ranking Pipeline

Candidate       -->  Feature        -->  Scoring    -->  Re-ranking  -->  Feed
Generation           Extraction          Model           + Filtering      
(500 posts)          (per post)          (ML model)      (diversity)      (20 posts)

Feature Categories

Post Features:
  - Age of post (freshness)
  - Media type (image/video/text)
  - Engagement rate (likes/impressions)
  - Content category

Author Features:
  - Relationship closeness (interaction frequency)
  - Author engagement rate
  - Is author a close friend?

User Features:
  - Interests and past engagement patterns
  - Time of day preferences
  - Device type

Cross Features:
  - Has user liked similar posts before?
  - Do user and author share mutual connections?

Scoring Formula (Simplified)

score = w1 * affinity_score
      + w2 * post_engagement_rate
      + w3 * freshness_decay(post_age)
      + w4 * content_type_preference
      + w5 * interaction_probability (ML model)

freshness_decay(age_hours) = 1 / (1 + age_hours / 12)

affinity_score = (
    likes_on_author_posts * 0.4 +
    comments_on_author_posts * 0.3 +
    profile_views * 0.2 +
    DM_interactions * 0.1
) / total_interactions

8. Deep Dive: The Celebrity Problem

Problem: @taylorswift has 50 million followers. She posts once.

Fan-out on write: 50 million feed inserts = catastrophic

Solution:

1. Tag users with follower count > threshold as "celebrity"
2. Celebrity posts are NOT fanned out
3. Store celebrity posts in a separate "celebrity_posts" table
4. At feed generation time:
   a. Fetch pre-computed feed (from regular followees)
   b. Fetch recent posts from followed celebrities (small list)
   c. Merge and rank

Celebrity post fetch is fast because:
  - Average user follows ~5-10 celebrities
  - Each celebrity has a small number of recent posts
  - Celebrity posts are heavily cached

Caching Strategy for Celebrity Posts

Redis Cache:
  Key:   celebrity_posts:{user_id}
  Value: Sorted set of recent post_ids (last 100)
  TTL:   1 hour (refreshed on new post)

When celebrity posts:
1. Add post_id to their sorted set in Redis
2. Trim set to last 100 entries
3. Invalidate CDN cache for their profile

Celebrity post cache hit rate: > 99%
(millions of followers requesting the same posts)

9. Feed Caching

Feed Cache Architecture:

+-------------------+
| User opens feed   |
+--------+----------+
         |
+--------v----------+
| Check feed cache  |
| (Redis)           |
+--------+----------+
         |
    +----+----+
    |         |
  HIT       MISS
    |         |
    v         v
  Return   Generate feed
  cached   (hybrid fan-out)
  feed         |
               v
          Cache generated
          feed in Redis
               |
               v
          Return feed

Cache Key:    feed:{user_id}
Cache Value:  List of post_ids (last 200)
Cache TTL:    5 minutes
Invalidation: On new post from followed user (for push users)

Feed Pagination with Cursors

First page:   GET /feed?limit=20
              -> Returns posts + cursor="eyJ0czoxNjgxMjAwMDAwfQ=="
              
Next page:    GET /feed?cursor=eyJ0czoxNjgxMjAwMDAwfQ==&limit=20
              -> Decode cursor: { "ts": 1681200000, "score": 0.87 }
              -> Fetch posts with score < 0.87 or ts < 1681200000

Why cursors over offset/limit:
  - New posts don't shift the pagination window
  - No "duplicate post" problem on page 2
  - More efficient (no OFFSET scan)

10. Scaling Considerations

Fan-Out Service Scaling

Fan-out is the most write-intensive component:

Post rate:            5,800 posts/sec
Average fan-out:      200 followers
Total fan-out writes: 1.16M writes/sec

Fan-out workers: 100 workers, each handling ~12K writes/sec
Queue: Kafka partitioned by author_id (ensure ordering per author)

Worker processing:
1. Consume post event from Kafka
2. Fetch follower list (from Redis cache)
3. Batch insert into user_feed table (batches of 500)
4. Ack Kafka offset

Database Sharding

Posts:     Shard by post_id (hash-based)
User Feed: Shard by user_id (each user's feed on one shard)
Social Graph: Shard by user_id

Feed reads only hit ONE shard (the user's shard) -> fast!

Geographic Distribution

+------------------+          +------------------+
| US Region        |          | Asia Region      |
| - Feed Service   |  sync    | - Feed Service   |
| - Feed Cache     | <------> | - Feed Cache     |
| - Post Store     | Kafka    | - Post Store     |
+------------------+ Bridge   +------------------+

Posts are replicated across regions.
Feed caches are region-local.
Users are pinned to their nearest region.
Cross-region follows add ~200ms latency to fan-out (acceptable).

11. Key Tradeoffs

DecisionOption AOption BOur Choice
Fan-out strategyPush (on write)Pull (on read)Hybrid
Celebrity threshold10K followers100K followers10K (configurable)
Feed orderingChronologicalRanked (ML)Ranked
Feed cache TTLShort (1 min)Long (10 min)5 min
Social graph storeRelational DBGraph DBRedis + PostgreSQL
PaginationOffset-basedCursor-basedCursor-based
Engagement countsReal-time accurateApproximateApproximate

12. Failure Scenarios and Mitigations

Scenario                          Mitigation
------------------------------------------------------------------------
Fan-out service lag               Serve slightly stale feed from cache
                                  Celebrity posts fill the gap
Cache failure                     Fall back to fan-out-on-read for all users
Kafka consumer lag                Scale up consumers; feed still serves from cache
Social graph Redis failure        Fall back to PostgreSQL (higher latency)
Ranking model failure             Fall back to chronological ordering
Celebrity detection delay         Conservative threshold; manual override
Post store unavailable            Circuit breaker; serve cached feed only

Key Takeaways

  1. Hybrid fan-out is the industry standard -- push for regular users, pull for celebrities. Pure push or pure pull both fail at scale.
  2. The celebrity problem is the defining challenge of social media feed design -- expect interviewers to probe this specifically.
  3. Feed ranking is a major differentiator -- even a simple scoring model dramatically improves engagement over chronological feeds.
  4. Cursor-based pagination is required for feeds where content is constantly being inserted -- offset-based pagination causes duplicates.
  5. Eventually consistent feeds are perfectly acceptable -- users do not notice a 5-30 second delay in seeing new posts.