Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.e Design a Video Streaming Platform (YouTube / Netflix)

Problem Statement

Design a video streaming platform that allows users to upload, transcode, store, and stream videos. The system must support adaptive bitrate streaming, a recommendation engine, and serve content globally via CDN.

1. Requirements

Functional Requirements

Users upload videos (up to 10 GB)
System transcodes videos into multiple resolutions and formats
Users stream videos with adaptive bitrate
Search videos by title, tags, description
Recommendation engine suggests videos
Like, comment, subscribe functionality
View counts and analytics dashboard for creators

Non-Functional Requirements

Video playback start time: < 2 seconds
Support 2 billion monthly active users
500 hours of video uploaded per minute
99.99% availability for streaming
Global content delivery (low latency worldwide)
Support mobile, web, smart TV clients

2. Capacity Estimation

Traffic

Monthly active users:    2 billion
Daily active users:      800 million
Average watch time:      40 minutes/day
Videos uploaded/minute:  500 hours
Videos uploaded/day:     500 * 60 * 24 = 720,000 hours = 43.2M minutes

Concurrent viewers:      ~50 million (peak)
Video plays/second:      800M * 5 plays/day / 86,400 ~= 46,000 plays/sec

Storage

Original upload size:     Average 500 MB per video (10 min avg)
Transcoded versions:      5 resolutions * 3 formats = 15 versions
Expansion factor:         ~3x original size (all versions combined)
Daily upload storage:     720,000 hours * 3 GB/hour * 3 = 6.5 PB/day
Annual storage:           ~2.4 EB (exabytes)

Bandwidth

Average video bitrate:    5 Mbps (1080p)
Concurrent streams:       50 million
Total egress bandwidth:   50M * 5 Mbps = 250 Tbps (served by CDN)
Upload bandwidth:         500 hours/min * 3 GB/hour / 60 = 25 GB/sec

3. High-Level Architecture

+----------+     +-------------------+     +------------------+
|  Client  |---->|   API Gateway     |---->| Video Service    |
|          |     |   + Load Balancer |     | (Metadata CRUD)  |
+----+-----+     +-------------------+     +--------+---------+
     |                    |                         |
     |                    |                 +-------v--------+
     |                    |                 | Metadata Store  |
     |                    |                 | (PostgreSQL)    |
     |                    |                 +----------------+
     |           +--------v--------+
     |           | Upload Service  |        +----------------+
     |           +--------+--------+        | Search Service |
     |                    |                 | (Elasticsearch)|
     |           +--------v--------+        +----------------+
     |           | Object Storage  |
     |           | (S3 - Original) |        +----------------+
     |           +--------+--------+        | Recommendation |
     |                    |                 | Engine (ML)    |
     |           +--------v--------+        +----------------+
     |           | Transcoding     |
     |           | Pipeline        |
     |           | (Distributed    |
     |           |  Workers)       |
     |           +--------+--------+
     |                    |
     |           +--------v--------+
     |           | Transcoded      |
     |           | Storage (S3)    |
     |           +--------+--------+
     |                    |
     |           +--------v--------+
     +---------->| CDN (CloudFront)|
      Stream     | Edge Servers    |
                 +-----------------+

4. API Design

POST /api/v1/videos/upload-url
  Headers: Authorization: Bearer <token>
  Body: {
    "filename": "vacation.mp4",
    "file_size": 524288000,
    "content_type": "video/mp4"
  }
  Response 200: {
    "upload_id": "upload_abc123",
    "presigned_url": "https://s3.../upload?signature=...",
    "chunk_size": 5242880,
    "total_chunks": 100
  }

PUT /api/v1/videos/upload/{upload_id}/chunk/{chunk_number}
  Body: Binary chunk data
  Response 200: { "chunk_number": 1, "status": "received" }

POST /api/v1/videos/upload/{upload_id}/complete
  Body: {
    "title": "My Vacation Video",
    "description": "Trip to Hawaii",
    "tags": ["travel", "hawaii"],
    "visibility": "public"
  }
  Response 202: {
    "video_id": "vid_789",
    "status": "processing",
    "estimated_time": 300
  }

GET /api/v1/videos/{video_id}
  Response 200: {
    "video_id": "vid_789",
    "title": "My Vacation Video",
    "description": "Trip to Hawaii",
    "author": { "channel_id": "ch_42", "name": "TravelJane" },
    "duration": 600,
    "view_count": 152300,
    "likes": 8700,
    "streams": {
      "dash": "https://cdn.example.com/vid_789/manifest.mpd",
      "hls": "https://cdn.example.com/vid_789/master.m3u8"
    },
    "thumbnails": { "default": "...", "medium": "...", "high": "..." },
    "created_at": "2026-04-11T10:00:00Z"
  }

GET /api/v1/feed/recommended?cursor={cursor}&limit=20
  Response 200: { "videos": [...], "next_cursor": "..." }

GET /api/v1/search?q=hawaii+travel&cursor={cursor}
  Response 200: { "results": [...], "next_cursor": "..." }

POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/comments
  Body: { "text": "Great video!" }

5. Database Schema

Video Metadata (PostgreSQL)

CREATE TABLE videos (
    video_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    channel_id      UUID NOT NULL REFERENCES channels(channel_id),
    title           VARCHAR(500) NOT NULL,
    description     TEXT,
    duration_sec    INTEGER,
    visibility      VARCHAR(20) DEFAULT 'public',
    status          VARCHAR(20) DEFAULT 'processing',
    original_url    VARCHAR(2048),
    thumbnail_url   VARCHAR(2048),
    view_count      BIGINT DEFAULT 0,
    like_count      BIGINT DEFAULT 0,
    dislike_count   BIGINT DEFAULT 0,
    comment_count   BIGINT DEFAULT 0,
    tags            TEXT[],
    language        VARCHAR(10),
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    published_at    TIMESTAMP
);

CREATE INDEX idx_videos_channel ON videos(channel_id, created_at DESC);
CREATE INDEX idx_videos_status ON videos(status);

Transcoding Jobs (PostgreSQL)

CREATE TABLE transcoding_jobs (
    job_id          UUID PRIMARY KEY,
    video_id        UUID REFERENCES videos(video_id),
    resolution      VARCHAR(20),    -- '360p', '720p', '1080p', '4k'
    codec           VARCHAR(20),    -- 'h264', 'h265', 'vp9', 'av1'
    status          VARCHAR(20),    -- 'queued', 'processing', 'completed', 'failed'
    output_url      VARCHAR(2048),
    bitrate         INTEGER,
    file_size       BIGINT,
    started_at      TIMESTAMP,
    completed_at    TIMESTAMP,
    error_message   TEXT
);

View Events (ClickHouse -- analytics)

CREATE TABLE view_events (
    event_id       UUID,
    video_id       UUID,
    user_id        UUID,
    watch_duration INTEGER,
    total_duration INTEGER,
    quality        String,
    device_type    String,
    country        String,
    timestamp      DateTime
) ENGINE = MergeTree()
ORDER BY (video_id, timestamp);

6. Deep Dive: Video Upload and Transcoding Pipeline

Upload Flow (Resumable, Chunked)

Client                    Upload Service              S3
  |                            |                       |
  |-- Request upload URL ----->|                       |
  |<-- presigned URLs ---------|                       |
  |                            |                       |
  |-- Upload chunk 1 -------->|-----> S3 multipart -->|
  |<-- chunk 1 ack -----------|                       |
  |                            |                       |
  |-- Upload chunk 2 -------->|-----> S3 multipart -->|
  |<-- chunk 2 ack -----------|                       |
  |                            |                       |
  |   (network failure)        |                       |
  |                            |                       |
  |-- Resume: chunk 3 ------->|-----> S3 multipart -->|
  |<-- chunk 3 ack -----------|                       |
  |                            |                       |
  |-- Complete upload -------->|-- Complete multipart->|
  |<-- video_id, processing --|                       |
  |                            |                       |
  |                            |-- Publish to Kafka -->|
  |                            |   (transcode event)   |

Transcoding Pipeline

                     +-------------------+
                     | Kafka: transcode  |
                     | job queue         |
                     +--------+----------+
                              |
              +---------------+---------------+
              |               |               |
     +--------v------+ +-----v--------+ +----v---------+
     | Worker Pool 1 | | Worker Pool 2| | Worker Pool 3|
     | (360p + 480p) | | (720p+1080p) | | (1440p + 4K) |
     | (CPU-based)   | | (GPU-based)  | | (GPU-based)  |
     +--------+------+ +-----+--------+ +----+---------+
              |               |               |
              +-------+-------+-------+-------+
                      |               |
              +-------v-------+ +-----v---------+
              | Transcoded    | | Thumbnail     |
              | Storage (S3)  | | Generator     |
              +---------------+ +---------------+
                      |
              +-------v--------+
              | Manifest       |
              | Generator      |
              | (DASH + HLS)   |
              +-------+--------+
                      |
              +-------v--------+
              | CDN Origin     |
              | Push           |
              +----------------+

Transcoding Configuration

Resolution   Bitrate (video)   Codec    Target Device
---------------------------------------------------------
240p         400 Kbps          H.264    Low-end mobile
360p         800 Kbps          H.264    Mobile
480p         1.5 Mbps          H.264    SD screens
720p         3 Mbps            H.264    HD screens
1080p        6 Mbps            H.265    Full HD
1440p        12 Mbps           VP9      High-end
2160p (4K)   25 Mbps           VP9/AV1  4K displays

Each resolution also segmented into 4-6 second chunks
for adaptive streaming.

7. Deep Dive: Adaptive Bitrate Streaming

How ABR Works

Client                          CDN Edge Server
  |                                  |
  |-- Request master.m3u8 --------->|
  |<-- Manifest (all qualities) ----|
  |                                  |
  |  Manifest contains:              |
  |  #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
  |  360p/playlist.m3u8
  |  #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
  |  720p/playlist.m3u8
  |  #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
  |  1080p/playlist.m3u8
  |                                  |
  |-- Start with 360p segment 0 --->|  (safe start)
  |<-- Segment data ----------------|
  |                                  |
  |  (bandwidth measurement: fast!)  |
  |                                  |
  |-- Switch to 720p segment 1 ---->|  (upgrade quality)
  |<-- Segment data ----------------|
  |                                  |
  |  (bandwidth drops)               |
  |                                  |
  |-- Drop to 480p segment 2 ----->|  (downgrade quality)
  |<-- Segment data ----------------|

HLS Manifest Structure

#EXTM3U
#EXT-X-VERSION:3

#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
https://cdn.example.com/vid_789/360p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480
https://cdn.example.com/vid_789/480p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
https://cdn.example.com/vid_789/720p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
https://cdn.example.com/vid_789/1080p/playlist.m3u8

Segment Playlist (720p)

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:0

#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_000.ts
#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_001.ts
#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_002.ts
...

8. Deep Dive: CDN Architecture

                    +------------------+
                    |   Origin Server  |
                    |   (S3 bucket)    |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v------+ +----v-------+ +----v-------+
     | Regional PoP  | | Regional   | | Regional   |
     | US-East       | | EU-West    | | AP-South   |
     +--------+------+ +----+-------+ +----+-------+
              |              |              |
       +------+------+  +---+---+    +-----+-----+
       |             |  |       |    |           |
    +--v--+  +---v--+ +v---+ +v--+ +v---+  +--v--+
    |Edge |  |Edge  | |Edge| |Edg| |Edge|  |Edge |
    |NYC  |  |DC    | |LON | |FRA| |MUM |  |SIN  |
    +-----+  +-----+  +----+ +---+ +----+  +-----+

Cache hierarchy:
  Edge (L1) -> Regional PoP (L2) -> Origin (S3)

Cache hit rates:
  Popular videos:  > 99% at edge
  Long tail:       ~60% at edge, ~90% at regional

CDN Cache Strategy

Popular videos (top 10%):
  - Pre-pushed to all edge servers
  - TTL: 30 days
  - Pinned in cache (never evicted)

Regular videos:
  - Cached on first request (pull-through)
  - TTL: 7 days
  - LRU eviction

Long-tail videos (bottom 50%):
  - Served from regional PoP or origin
  - Not cached at edge (too many, too infrequent)
  - TTL: 1 day at regional

9. Recommendation Engine (Simplified)

+------------------+     +-------------------+     +------------------+
| User Activity    |     | Video Features    |     | Collaborative    |
| (watch history,  |---->| (tags, category,  |---->| Filtering        |
|  likes, searches)|     |  embeddings)      |     | (similar users)  |
+------------------+     +-------------------+     +--------+---------+
                                                            |
                                                   +--------v--------+
                                                   | Candidate       |
                                                   | Generation      |
                                                   | (~1000 videos)  |
                                                   +--------+--------+
                                                            |
                                                   +--------v--------+
                                                   | Ranking Model   |
                                                   | (predict watch  |
                                                   |  probability)   |
                                                   +--------+--------+
                                                            |
                                                   +--------v--------+
                                                   | Filtering       |
                                                   | (watched, NSFW, |
                                                   |  duplicates)    |
                                                   +--------+--------+
                                                            |
                                                   +--------v--------+
                                                   | Top 20 results  |
                                                   +-----------------+

Two-Stage Architecture

Stage 1: Candidate Generation (fast, broad)
  - Collaborative filtering: users who watched X also watched Y
  - Content-based: videos with similar tags/embeddings
  - Trending: popular videos in user's region
  - Pool: ~1,000 candidate videos

Stage 2: Ranking (slow, precise)
  - Deep neural network predicts P(click), P(watch>50%), P(like)
  - Features: user history, video metadata, context (time, device)
  - Score = weighted combination of predicted engagement metrics
  - Select top 20, ensure diversity (no 20 cat videos)

10. Scaling Considerations

Video Processing at Scale

500 hours uploaded per minute
Each video -> 8 resolutions * 2 codecs = 16 transcoding jobs
Total: 500 * 60 / 10 min avg * 16 = 48,000 transcoding jobs/minute

Worker fleet:
  - GPU instances for HD/4K transcoding
  - CPU instances for lower resolutions
  - Auto-scaling based on queue depth
  - Spot/preemptible instances for cost savings (with retry logic)

Storage Optimization

1. Tiered storage:
   Hot (< 30 days):  S3 Standard
   Warm (30-90 days): S3 Infrequent Access
   Cold (> 90 days):  S3 Glacier (for rarely watched videos)

2. Deduplication:
   - Hash uploaded content
   - If duplicate exists, create reference instead of new copy
   - Saves 10-15% storage

3. Codec efficiency:
   - AV1 codec: 30% smaller files than H.264 at same quality
   - Gradually re-encode popular content to newer codecs

View Count Scaling

Problem: Millions of views per second for viral videos.
Solution: Approximate counting with periodic flush.

1. Client sends view event to Kafka
2. Stream processor aggregates per-video counts
3. Flush to database every 30 seconds
4. Cache approximate count in Redis

Real-time display: Redis counter (approximate)
Accurate analytics: ClickHouse (batch-updated hourly)

11. Key Tradeoffs

Decision	Option A	Option B	Our Choice
Upload method	Single upload	Chunked/resumable	Chunked
Streaming protocol	Progressive download	ABR (HLS/DASH)	HLS + DASH
Transcoding timing	On upload (eager)	On first play (lazy)	Eager
CDN caching	Push all to edge	Pull-through cache	Hybrid
Recommendation	Collaborative only	Hybrid (CF + content)	Hybrid
View counting	Exact (DB write/view)	Approximate (batched)	Approximate
Storage tiering	All hot storage	Hot/warm/cold tiers	Tiered

12. Failure Scenarios and Mitigations

Scenario                          Mitigation
------------------------------------------------------------------------
Transcoding worker failure        Retry from Kafka; idempotent jobs
Upload interrupted                Resumable upload; chunks already stored
CDN edge failure                  DNS failover to next nearest edge
Origin S3 outage                  Cross-region replication
Viral video spike                 CDN absorbs load; pre-warm popular content
Recommendation cold start         Show trending/popular for new users
Copyright content uploaded         Content ID fingerprinting during transcode

Key Takeaways

Chunked resumable uploads are essential for large video files -- network interruptions are common, especially on mobile.
Adaptive bitrate streaming (HLS/DASH) is the industry standard -- it adapts to network conditions without buffering.
CDN is the critical scaling layer -- without it, origin servers cannot handle the egress bandwidth of billions of streams.
Transcoding is compute-intensive -- GPU-based workers with auto-scaling and spot instances keep costs manageable.
Storage costs dominate at scale -- tiered storage and efficient codecs (AV1) provide significant savings over time.