Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.e Design a Video Streaming Platform (YouTube / Netflix)

Problem Statement

Design a video streaming platform that allows users to upload, transcode, store, and stream videos. The system must support adaptive bitrate streaming, a recommendation engine, and serve content globally via CDN.


1. Requirements

Functional Requirements

  • Users upload videos (up to 10 GB)
  • System transcodes videos into multiple resolutions and formats
  • Users stream videos with adaptive bitrate
  • Search videos by title, tags, description
  • Recommendation engine suggests videos
  • Like, comment, subscribe functionality
  • View counts and analytics dashboard for creators

Non-Functional Requirements

  • Video playback start time: < 2 seconds
  • Support 2 billion monthly active users
  • 500 hours of video uploaded per minute
  • 99.99% availability for streaming
  • Global content delivery (low latency worldwide)
  • Support mobile, web, smart TV clients

2. Capacity Estimation

Traffic

Monthly active users:    2 billion
Daily active users:      800 million
Average watch time:      40 minutes/day
Videos uploaded/minute:  500 hours
Videos uploaded/day:     500 * 60 * 24 = 720,000 hours = 43.2M minutes

Concurrent viewers:      ~50 million (peak)
Video plays/second:      800M * 5 plays/day / 86,400 ~= 46,000 plays/sec

Storage

Original upload size:     Average 500 MB per video (10 min avg)
Transcoded versions:      5 resolutions * 3 formats = 15 versions
Expansion factor:         ~3x original size (all versions combined)
Daily upload storage:     720,000 hours * 3 GB/hour * 3 = 6.5 PB/day
Annual storage:           ~2.4 EB (exabytes)

Bandwidth

Average video bitrate:    5 Mbps (1080p)
Concurrent streams:       50 million
Total egress bandwidth:   50M * 5 Mbps = 250 Tbps (served by CDN)
Upload bandwidth:         500 hours/min * 3 GB/hour / 60 = 25 GB/sec

3. High-Level Architecture

+----------+     +-------------------+     +------------------+
|  Client  |---->|   API Gateway     |---->| Video Service    |
|          |     |   + Load Balancer |     | (Metadata CRUD)  |
+----+-----+     +-------------------+     +--------+---------+
     |                    |                         |
     |                    |                 +-------v--------+
     |                    |                 | Metadata Store  |
     |                    |                 | (PostgreSQL)    |
     |                    |                 +----------------+
     |           +--------v--------+
     |           | Upload Service  |        +----------------+
     |           +--------+--------+        | Search Service |
     |                    |                 | (Elasticsearch)|
     |           +--------v--------+        +----------------+
     |           | Object Storage  |
     |           | (S3 - Original) |        +----------------+
     |           +--------+--------+        | Recommendation |
     |                    |                 | Engine (ML)    |
     |           +--------v--------+        +----------------+
     |           | Transcoding     |
     |           | Pipeline        |
     |           | (Distributed    |
     |           |  Workers)       |
     |           +--------+--------+
     |                    |
     |           +--------v--------+
     |           | Transcoded      |
     |           | Storage (S3)    |
     |           +--------+--------+
     |                    |
     |           +--------v--------+
     +---------->| CDN (CloudFront)|
      Stream     | Edge Servers    |
                 +-----------------+

4. API Design

POST /api/v1/videos/upload-url
  Headers: Authorization: Bearer <token>
  Body: {
    "filename": "vacation.mp4",
    "file_size": 524288000,
    "content_type": "video/mp4"
  }
  Response 200: {
    "upload_id": "upload_abc123",
    "presigned_url": "https://s3.../upload?signature=...",
    "chunk_size": 5242880,
    "total_chunks": 100
  }

PUT /api/v1/videos/upload/{upload_id}/chunk/{chunk_number}
  Body: Binary chunk data
  Response 200: { "chunk_number": 1, "status": "received" }

POST /api/v1/videos/upload/{upload_id}/complete
  Body: {
    "title": "My Vacation Video",
    "description": "Trip to Hawaii",
    "tags": ["travel", "hawaii"],
    "visibility": "public"
  }
  Response 202: {
    "video_id": "vid_789",
    "status": "processing",
    "estimated_time": 300
  }

GET /api/v1/videos/{video_id}
  Response 200: {
    "video_id": "vid_789",
    "title": "My Vacation Video",
    "description": "Trip to Hawaii",
    "author": { "channel_id": "ch_42", "name": "TravelJane" },
    "duration": 600,
    "view_count": 152300,
    "likes": 8700,
    "streams": {
      "dash": "https://cdn.example.com/vid_789/manifest.mpd",
      "hls": "https://cdn.example.com/vid_789/master.m3u8"
    },
    "thumbnails": { "default": "...", "medium": "...", "high": "..." },
    "created_at": "2026-04-11T10:00:00Z"
  }

GET /api/v1/feed/recommended?cursor={cursor}&limit=20
  Response 200: { "videos": [...], "next_cursor": "..." }

GET /api/v1/search?q=hawaii+travel&cursor={cursor}
  Response 200: { "results": [...], "next_cursor": "..." }

POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/comments
  Body: { "text": "Great video!" }

5. Database Schema

Video Metadata (PostgreSQL)

CREATE TABLE videos (
    video_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    channel_id      UUID NOT NULL REFERENCES channels(channel_id),
    title           VARCHAR(500) NOT NULL,
    description     TEXT,
    duration_sec    INTEGER,
    visibility      VARCHAR(20) DEFAULT 'public',
    status          VARCHAR(20) DEFAULT 'processing',
    original_url    VARCHAR(2048),
    thumbnail_url   VARCHAR(2048),
    view_count      BIGINT DEFAULT 0,
    like_count      BIGINT DEFAULT 0,
    dislike_count   BIGINT DEFAULT 0,
    comment_count   BIGINT DEFAULT 0,
    tags            TEXT[],
    language        VARCHAR(10),
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    published_at    TIMESTAMP
);

CREATE INDEX idx_videos_channel ON videos(channel_id, created_at DESC);
CREATE INDEX idx_videos_status ON videos(status);

Transcoding Jobs (PostgreSQL)

CREATE TABLE transcoding_jobs (
    job_id          UUID PRIMARY KEY,
    video_id        UUID REFERENCES videos(video_id),
    resolution      VARCHAR(20),    -- '360p', '720p', '1080p', '4k'
    codec           VARCHAR(20),    -- 'h264', 'h265', 'vp9', 'av1'
    status          VARCHAR(20),    -- 'queued', 'processing', 'completed', 'failed'
    output_url      VARCHAR(2048),
    bitrate         INTEGER,
    file_size       BIGINT,
    started_at      TIMESTAMP,
    completed_at    TIMESTAMP,
    error_message   TEXT
);

View Events (ClickHouse -- analytics)

CREATE TABLE view_events (
    event_id       UUID,
    video_id       UUID,
    user_id        UUID,
    watch_duration INTEGER,
    total_duration INTEGER,
    quality        String,
    device_type    String,
    country        String,
    timestamp      DateTime
) ENGINE = MergeTree()
ORDER BY (video_id, timestamp);

6. Deep Dive: Video Upload and Transcoding Pipeline

Upload Flow (Resumable, Chunked)

Client                    Upload Service              S3
  |                            |                       |
  |-- Request upload URL ----->|                       |
  |<-- presigned URLs ---------|                       |
  |                            |                       |
  |-- Upload chunk 1 -------->|-----> S3 multipart -->|
  |<-- chunk 1 ack -----------|                       |
  |                            |                       |
  |-- Upload chunk 2 -------->|-----> S3 multipart -->|
  |<-- chunk 2 ack -----------|                       |
  |                            |                       |
  |   (network failure)        |                       |
  |                            |                       |
  |-- Resume: chunk 3 ------->|-----> S3 multipart -->|
  |<-- chunk 3 ack -----------|                       |
  |                            |                       |
  |-- Complete upload -------->|-- Complete multipart->|
  |<-- video_id, processing --|                       |
  |                            |                       |
  |                            |-- Publish to Kafka -->|
  |                            |   (transcode event)   |

Transcoding Pipeline

                     +-------------------+
                     | Kafka: transcode  |
                     | job queue         |
                     +--------+----------+
                              |
              +---------------+---------------+
              |               |               |
     +--------v------+ +-----v--------+ +----v---------+
     | Worker Pool 1 | | Worker Pool 2| | Worker Pool 3|
     | (360p + 480p) | | (720p+1080p) | | (1440p + 4K) |
     | (CPU-based)   | | (GPU-based)  | | (GPU-based)  |
     +--------+------+ +-----+--------+ +----+---------+
              |               |               |
              +-------+-------+-------+-------+
                      |               |
              +-------v-------+ +-----v---------+
              | Transcoded    | | Thumbnail     |
              | Storage (S3)  | | Generator     |
              +---------------+ +---------------+
                      |
              +-------v--------+
              | Manifest       |
              | Generator      |
              | (DASH + HLS)   |
              +-------+--------+
                      |
              +-------v--------+
              | CDN Origin     |
              | Push           |
              +----------------+

Transcoding Configuration

Resolution   Bitrate (video)   Codec    Target Device
---------------------------------------------------------
240p         400 Kbps          H.264    Low-end mobile
360p         800 Kbps          H.264    Mobile
480p         1.5 Mbps          H.264    SD screens
720p         3 Mbps            H.264    HD screens
1080p        6 Mbps            H.265    Full HD
1440p        12 Mbps           VP9      High-end
2160p (4K)   25 Mbps           VP9/AV1  4K displays

Each resolution also segmented into 4-6 second chunks
for adaptive streaming.

7. Deep Dive: Adaptive Bitrate Streaming

How ABR Works

Client                          CDN Edge Server
  |                                  |
  |-- Request master.m3u8 --------->|
  |<-- Manifest (all qualities) ----|
  |                                  |
  |  Manifest contains:              |
  |  #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
  |  360p/playlist.m3u8
  |  #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
  |  720p/playlist.m3u8
  |  #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
  |  1080p/playlist.m3u8
  |                                  |
  |-- Start with 360p segment 0 --->|  (safe start)
  |<-- Segment data ----------------|
  |                                  |
  |  (bandwidth measurement: fast!)  |
  |                                  |
  |-- Switch to 720p segment 1 ---->|  (upgrade quality)
  |<-- Segment data ----------------|
  |                                  |
  |  (bandwidth drops)               |
  |                                  |
  |-- Drop to 480p segment 2 ----->|  (downgrade quality)
  |<-- Segment data ----------------|

HLS Manifest Structure

#EXTM3U
#EXT-X-VERSION:3

#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
https://cdn.example.com/vid_789/360p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480
https://cdn.example.com/vid_789/480p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
https://cdn.example.com/vid_789/720p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
https://cdn.example.com/vid_789/1080p/playlist.m3u8

Segment Playlist (720p)

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:0

#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_000.ts
#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_001.ts
#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_002.ts
...

8. Deep Dive: CDN Architecture

                    +------------------+
                    |   Origin Server  |
                    |   (S3 bucket)    |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v------+ +----v-------+ +----v-------+
     | Regional PoP  | | Regional   | | Regional   |
     | US-East       | | EU-West    | | AP-South   |
     +--------+------+ +----+-------+ +----+-------+
              |              |              |
       +------+------+  +---+---+    +-----+-----+
       |             |  |       |    |           |
    +--v--+  +---v--+ +v---+ +v--+ +v---+  +--v--+
    |Edge |  |Edge  | |Edge| |Edg| |Edge|  |Edge |
    |NYC  |  |DC    | |LON | |FRA| |MUM |  |SIN  |
    +-----+  +-----+  +----+ +---+ +----+  +-----+

Cache hierarchy:
  Edge (L1) -> Regional PoP (L2) -> Origin (S3)

Cache hit rates:
  Popular videos:  > 99% at edge
  Long tail:       ~60% at edge, ~90% at regional

CDN Cache Strategy

Popular videos (top 10%):
  - Pre-pushed to all edge servers
  - TTL: 30 days
  - Pinned in cache (never evicted)

Regular videos:
  - Cached on first request (pull-through)
  - TTL: 7 days
  - LRU eviction

Long-tail videos (bottom 50%):
  - Served from regional PoP or origin
  - Not cached at edge (too many, too infrequent)
  - TTL: 1 day at regional

9. Recommendation Engine (Simplified)

+------------------+     +-------------------+     +------------------+
| User Activity    |     | Video Features    |     | Collaborative    |
| (watch history,  |---->| (tags, category,  |---->| Filtering        |
|  likes, searches)|     |  embeddings)      |     | (similar users)  |
+------------------+     +-------------------+     +--------+---------+
                                                            |
                                                   +--------v--------+
                                                   | Candidate       |
                                                   | Generation      |
                                                   | (~1000 videos)  |
                                                   +--------+--------+
                                                            |
                                                   +--------v--------+
                                                   | Ranking Model   |
                                                   | (predict watch  |
                                                   |  probability)   |
                                                   +--------+--------+
                                                            |
                                                   +--------v--------+
                                                   | Filtering       |
                                                   | (watched, NSFW, |
                                                   |  duplicates)    |
                                                   +--------+--------+
                                                            |
                                                   +--------v--------+
                                                   | Top 20 results  |
                                                   +-----------------+

Two-Stage Architecture

Stage 1: Candidate Generation (fast, broad)
  - Collaborative filtering: users who watched X also watched Y
  - Content-based: videos with similar tags/embeddings
  - Trending: popular videos in user's region
  - Pool: ~1,000 candidate videos

Stage 2: Ranking (slow, precise)
  - Deep neural network predicts P(click), P(watch>50%), P(like)
  - Features: user history, video metadata, context (time, device)
  - Score = weighted combination of predicted engagement metrics
  - Select top 20, ensure diversity (no 20 cat videos)

10. Scaling Considerations

Video Processing at Scale

500 hours uploaded per minute
Each video -> 8 resolutions * 2 codecs = 16 transcoding jobs
Total: 500 * 60 / 10 min avg * 16 = 48,000 transcoding jobs/minute

Worker fleet:
  - GPU instances for HD/4K transcoding
  - CPU instances for lower resolutions
  - Auto-scaling based on queue depth
  - Spot/preemptible instances for cost savings (with retry logic)

Storage Optimization

1. Tiered storage:
   Hot (< 30 days):  S3 Standard
   Warm (30-90 days): S3 Infrequent Access
   Cold (> 90 days):  S3 Glacier (for rarely watched videos)

2. Deduplication:
   - Hash uploaded content
   - If duplicate exists, create reference instead of new copy
   - Saves 10-15% storage

3. Codec efficiency:
   - AV1 codec: 30% smaller files than H.264 at same quality
   - Gradually re-encode popular content to newer codecs

View Count Scaling

Problem: Millions of views per second for viral videos.
Solution: Approximate counting with periodic flush.

1. Client sends view event to Kafka
2. Stream processor aggregates per-video counts
3. Flush to database every 30 seconds
4. Cache approximate count in Redis

Real-time display: Redis counter (approximate)
Accurate analytics: ClickHouse (batch-updated hourly)

11. Key Tradeoffs

DecisionOption AOption BOur Choice
Upload methodSingle uploadChunked/resumableChunked
Streaming protocolProgressive downloadABR (HLS/DASH)HLS + DASH
Transcoding timingOn upload (eager)On first play (lazy)Eager
CDN cachingPush all to edgePull-through cacheHybrid
RecommendationCollaborative onlyHybrid (CF + content)Hybrid
View countingExact (DB write/view)Approximate (batched)Approximate
Storage tieringAll hot storageHot/warm/cold tiersTiered

12. Failure Scenarios and Mitigations

Scenario                          Mitigation
------------------------------------------------------------------------
Transcoding worker failure        Retry from Kafka; idempotent jobs
Upload interrupted                Resumable upload; chunks already stored
CDN edge failure                  DNS failover to next nearest edge
Origin S3 outage                  Cross-region replication
Viral video spike                 CDN absorbs load; pre-warm popular content
Recommendation cold start         Show trending/popular for new users
Copyright content uploaded         Content ID fingerprinting during transcode

Key Takeaways

  1. Chunked resumable uploads are essential for large video files -- network interruptions are common, especially on mobile.
  2. Adaptive bitrate streaming (HLS/DASH) is the industry standard -- it adapts to network conditions without buffering.
  3. CDN is the critical scaling layer -- without it, origin servers cannot handle the egress bandwidth of billions of streams.
  4. Transcoding is compute-intensive -- GPU-based workers with auto-scaling and spot instances keep costs manageable.
  5. Storage costs dominate at scale -- tiered storage and efficient codecs (AV1) provide significant savings over time.