Episode 9 — System Design / 9.11 — Real World System Design Problems
9.11.e Design a Video Streaming Platform (YouTube / Netflix)
Problem Statement
Design a video streaming platform that allows users to upload, transcode, store, and stream videos. The system must support adaptive bitrate streaming, a recommendation engine, and serve content globally via CDN.
1. Requirements
Functional Requirements
- Users upload videos (up to 10 GB)
- System transcodes videos into multiple resolutions and formats
- Users stream videos with adaptive bitrate
- Search videos by title, tags, description
- Recommendation engine suggests videos
- Like, comment, subscribe functionality
- View counts and analytics dashboard for creators
Non-Functional Requirements
- Video playback start time: < 2 seconds
- Support 2 billion monthly active users
- 500 hours of video uploaded per minute
- 99.99% availability for streaming
- Global content delivery (low latency worldwide)
- Support mobile, web, smart TV clients
2. Capacity Estimation
Traffic
Monthly active users: 2 billion
Daily active users: 800 million
Average watch time: 40 minutes/day
Videos uploaded/minute: 500 hours
Videos uploaded/day: 500 * 60 * 24 = 720,000 hours = 43.2M minutes
Concurrent viewers: ~50 million (peak)
Video plays/second: 800M * 5 plays/day / 86,400 ~= 46,000 plays/sec
Storage
Original upload size: Average 500 MB per video (10 min avg)
Transcoded versions: 5 resolutions * 3 formats = 15 versions
Expansion factor: ~3x original size (all versions combined)
Daily upload storage: 720,000 hours * 3 GB/hour * 3 = 6.5 PB/day
Annual storage: ~2.4 EB (exabytes)
Bandwidth
Average video bitrate: 5 Mbps (1080p)
Concurrent streams: 50 million
Total egress bandwidth: 50M * 5 Mbps = 250 Tbps (served by CDN)
Upload bandwidth: 500 hours/min * 3 GB/hour / 60 = 25 GB/sec
3. High-Level Architecture
+----------+ +-------------------+ +------------------+
| Client |---->| API Gateway |---->| Video Service |
| | | + Load Balancer | | (Metadata CRUD) |
+----+-----+ +-------------------+ +--------+---------+
| | |
| | +-------v--------+
| | | Metadata Store |
| | | (PostgreSQL) |
| | +----------------+
| +--------v--------+
| | Upload Service | +----------------+
| +--------+--------+ | Search Service |
| | | (Elasticsearch)|
| +--------v--------+ +----------------+
| | Object Storage |
| | (S3 - Original) | +----------------+
| +--------+--------+ | Recommendation |
| | | Engine (ML) |
| +--------v--------+ +----------------+
| | Transcoding |
| | Pipeline |
| | (Distributed |
| | Workers) |
| +--------+--------+
| |
| +--------v--------+
| | Transcoded |
| | Storage (S3) |
| +--------+--------+
| |
| +--------v--------+
+---------->| CDN (CloudFront)|
Stream | Edge Servers |
+-----------------+
4. API Design
POST /api/v1/videos/upload-url
Headers: Authorization: Bearer <token>
Body: {
"filename": "vacation.mp4",
"file_size": 524288000,
"content_type": "video/mp4"
}
Response 200: {
"upload_id": "upload_abc123",
"presigned_url": "https://s3.../upload?signature=...",
"chunk_size": 5242880,
"total_chunks": 100
}
PUT /api/v1/videos/upload/{upload_id}/chunk/{chunk_number}
Body: Binary chunk data
Response 200: { "chunk_number": 1, "status": "received" }
POST /api/v1/videos/upload/{upload_id}/complete
Body: {
"title": "My Vacation Video",
"description": "Trip to Hawaii",
"tags": ["travel", "hawaii"],
"visibility": "public"
}
Response 202: {
"video_id": "vid_789",
"status": "processing",
"estimated_time": 300
}
GET /api/v1/videos/{video_id}
Response 200: {
"video_id": "vid_789",
"title": "My Vacation Video",
"description": "Trip to Hawaii",
"author": { "channel_id": "ch_42", "name": "TravelJane" },
"duration": 600,
"view_count": 152300,
"likes": 8700,
"streams": {
"dash": "https://cdn.example.com/vid_789/manifest.mpd",
"hls": "https://cdn.example.com/vid_789/master.m3u8"
},
"thumbnails": { "default": "...", "medium": "...", "high": "..." },
"created_at": "2026-04-11T10:00:00Z"
}
GET /api/v1/feed/recommended?cursor={cursor}&limit=20
Response 200: { "videos": [...], "next_cursor": "..." }
GET /api/v1/search?q=hawaii+travel&cursor={cursor}
Response 200: { "results": [...], "next_cursor": "..." }
POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/comments
Body: { "text": "Great video!" }
5. Database Schema
Video Metadata (PostgreSQL)
CREATE TABLE videos (
video_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
channel_id UUID NOT NULL REFERENCES channels(channel_id),
title VARCHAR(500) NOT NULL,
description TEXT,
duration_sec INTEGER,
visibility VARCHAR(20) DEFAULT 'public',
status VARCHAR(20) DEFAULT 'processing',
original_url VARCHAR(2048),
thumbnail_url VARCHAR(2048),
view_count BIGINT DEFAULT 0,
like_count BIGINT DEFAULT 0,
dislike_count BIGINT DEFAULT 0,
comment_count BIGINT DEFAULT 0,
tags TEXT[],
language VARCHAR(10),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
published_at TIMESTAMP
);
CREATE INDEX idx_videos_channel ON videos(channel_id, created_at DESC);
CREATE INDEX idx_videos_status ON videos(status);
Transcoding Jobs (PostgreSQL)
CREATE TABLE transcoding_jobs (
job_id UUID PRIMARY KEY,
video_id UUID REFERENCES videos(video_id),
resolution VARCHAR(20), -- '360p', '720p', '1080p', '4k'
codec VARCHAR(20), -- 'h264', 'h265', 'vp9', 'av1'
status VARCHAR(20), -- 'queued', 'processing', 'completed', 'failed'
output_url VARCHAR(2048),
bitrate INTEGER,
file_size BIGINT,
started_at TIMESTAMP,
completed_at TIMESTAMP,
error_message TEXT
);
View Events (ClickHouse -- analytics)
CREATE TABLE view_events (
event_id UUID,
video_id UUID,
user_id UUID,
watch_duration INTEGER,
total_duration INTEGER,
quality String,
device_type String,
country String,
timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (video_id, timestamp);
6. Deep Dive: Video Upload and Transcoding Pipeline
Upload Flow (Resumable, Chunked)
Client Upload Service S3
| | |
|-- Request upload URL ----->| |
|<-- presigned URLs ---------| |
| | |
|-- Upload chunk 1 -------->|-----> S3 multipart -->|
|<-- chunk 1 ack -----------| |
| | |
|-- Upload chunk 2 -------->|-----> S3 multipart -->|
|<-- chunk 2 ack -----------| |
| | |
| (network failure) | |
| | |
|-- Resume: chunk 3 ------->|-----> S3 multipart -->|
|<-- chunk 3 ack -----------| |
| | |
|-- Complete upload -------->|-- Complete multipart->|
|<-- video_id, processing --| |
| | |
| |-- Publish to Kafka -->|
| | (transcode event) |
Transcoding Pipeline
+-------------------+
| Kafka: transcode |
| job queue |
+--------+----------+
|
+---------------+---------------+
| | |
+--------v------+ +-----v--------+ +----v---------+
| Worker Pool 1 | | Worker Pool 2| | Worker Pool 3|
| (360p + 480p) | | (720p+1080p) | | (1440p + 4K) |
| (CPU-based) | | (GPU-based) | | (GPU-based) |
+--------+------+ +-----+--------+ +----+---------+
| | |
+-------+-------+-------+-------+
| |
+-------v-------+ +-----v---------+
| Transcoded | | Thumbnail |
| Storage (S3) | | Generator |
+---------------+ +---------------+
|
+-------v--------+
| Manifest |
| Generator |
| (DASH + HLS) |
+-------+--------+
|
+-------v--------+
| CDN Origin |
| Push |
+----------------+
Transcoding Configuration
Resolution Bitrate (video) Codec Target Device
---------------------------------------------------------
240p 400 Kbps H.264 Low-end mobile
360p 800 Kbps H.264 Mobile
480p 1.5 Mbps H.264 SD screens
720p 3 Mbps H.264 HD screens
1080p 6 Mbps H.265 Full HD
1440p 12 Mbps VP9 High-end
2160p (4K) 25 Mbps VP9/AV1 4K displays
Each resolution also segmented into 4-6 second chunks
for adaptive streaming.
7. Deep Dive: Adaptive Bitrate Streaming
How ABR Works
Client CDN Edge Server
| |
|-- Request master.m3u8 --------->|
|<-- Manifest (all qualities) ----|
| |
| Manifest contains: |
| #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
| 360p/playlist.m3u8
| #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
| 720p/playlist.m3u8
| #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
| 1080p/playlist.m3u8
| |
|-- Start with 360p segment 0 --->| (safe start)
|<-- Segment data ----------------|
| |
| (bandwidth measurement: fast!) |
| |
|-- Switch to 720p segment 1 ---->| (upgrade quality)
|<-- Segment data ----------------|
| |
| (bandwidth drops) |
| |
|-- Drop to 480p segment 2 ----->| (downgrade quality)
|<-- Segment data ----------------|
HLS Manifest Structure
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
https://cdn.example.com/vid_789/360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480
https://cdn.example.com/vid_789/480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
https://cdn.example.com/vid_789/720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
https://cdn.example.com/vid_789/1080p/playlist.m3u8
Segment Playlist (720p)
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_000.ts
#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_001.ts
#EXTINF:6.0,
https://cdn.example.com/vid_789/720p/segment_002.ts
...
8. Deep Dive: CDN Architecture
+------------------+
| Origin Server |
| (S3 bucket) |
+--------+---------+
|
+--------------+--------------+
| | |
+--------v------+ +----v-------+ +----v-------+
| Regional PoP | | Regional | | Regional |
| US-East | | EU-West | | AP-South |
+--------+------+ +----+-------+ +----+-------+
| | |
+------+------+ +---+---+ +-----+-----+
| | | | | |
+--v--+ +---v--+ +v---+ +v--+ +v---+ +--v--+
|Edge | |Edge | |Edge| |Edg| |Edge| |Edge |
|NYC | |DC | |LON | |FRA| |MUM | |SIN |
+-----+ +-----+ +----+ +---+ +----+ +-----+
Cache hierarchy:
Edge (L1) -> Regional PoP (L2) -> Origin (S3)
Cache hit rates:
Popular videos: > 99% at edge
Long tail: ~60% at edge, ~90% at regional
CDN Cache Strategy
Popular videos (top 10%):
- Pre-pushed to all edge servers
- TTL: 30 days
- Pinned in cache (never evicted)
Regular videos:
- Cached on first request (pull-through)
- TTL: 7 days
- LRU eviction
Long-tail videos (bottom 50%):
- Served from regional PoP or origin
- Not cached at edge (too many, too infrequent)
- TTL: 1 day at regional
9. Recommendation Engine (Simplified)
+------------------+ +-------------------+ +------------------+
| User Activity | | Video Features | | Collaborative |
| (watch history, |---->| (tags, category, |---->| Filtering |
| likes, searches)| | embeddings) | | (similar users) |
+------------------+ +-------------------+ +--------+---------+
|
+--------v--------+
| Candidate |
| Generation |
| (~1000 videos) |
+--------+--------+
|
+--------v--------+
| Ranking Model |
| (predict watch |
| probability) |
+--------+--------+
|
+--------v--------+
| Filtering |
| (watched, NSFW, |
| duplicates) |
+--------+--------+
|
+--------v--------+
| Top 20 results |
+-----------------+
Two-Stage Architecture
Stage 1: Candidate Generation (fast, broad)
- Collaborative filtering: users who watched X also watched Y
- Content-based: videos with similar tags/embeddings
- Trending: popular videos in user's region
- Pool: ~1,000 candidate videos
Stage 2: Ranking (slow, precise)
- Deep neural network predicts P(click), P(watch>50%), P(like)
- Features: user history, video metadata, context (time, device)
- Score = weighted combination of predicted engagement metrics
- Select top 20, ensure diversity (no 20 cat videos)
10. Scaling Considerations
Video Processing at Scale
500 hours uploaded per minute
Each video -> 8 resolutions * 2 codecs = 16 transcoding jobs
Total: 500 * 60 / 10 min avg * 16 = 48,000 transcoding jobs/minute
Worker fleet:
- GPU instances for HD/4K transcoding
- CPU instances for lower resolutions
- Auto-scaling based on queue depth
- Spot/preemptible instances for cost savings (with retry logic)
Storage Optimization
1. Tiered storage:
Hot (< 30 days): S3 Standard
Warm (30-90 days): S3 Infrequent Access
Cold (> 90 days): S3 Glacier (for rarely watched videos)
2. Deduplication:
- Hash uploaded content
- If duplicate exists, create reference instead of new copy
- Saves 10-15% storage
3. Codec efficiency:
- AV1 codec: 30% smaller files than H.264 at same quality
- Gradually re-encode popular content to newer codecs
View Count Scaling
Problem: Millions of views per second for viral videos.
Solution: Approximate counting with periodic flush.
1. Client sends view event to Kafka
2. Stream processor aggregates per-video counts
3. Flush to database every 30 seconds
4. Cache approximate count in Redis
Real-time display: Redis counter (approximate)
Accurate analytics: ClickHouse (batch-updated hourly)
11. Key Tradeoffs
| Decision | Option A | Option B | Our Choice |
|---|---|---|---|
| Upload method | Single upload | Chunked/resumable | Chunked |
| Streaming protocol | Progressive download | ABR (HLS/DASH) | HLS + DASH |
| Transcoding timing | On upload (eager) | On first play (lazy) | Eager |
| CDN caching | Push all to edge | Pull-through cache | Hybrid |
| Recommendation | Collaborative only | Hybrid (CF + content) | Hybrid |
| View counting | Exact (DB write/view) | Approximate (batched) | Approximate |
| Storage tiering | All hot storage | Hot/warm/cold tiers | Tiered |
12. Failure Scenarios and Mitigations
Scenario Mitigation
------------------------------------------------------------------------
Transcoding worker failure Retry from Kafka; idempotent jobs
Upload interrupted Resumable upload; chunks already stored
CDN edge failure DNS failover to next nearest edge
Origin S3 outage Cross-region replication
Viral video spike CDN absorbs load; pre-warm popular content
Recommendation cold start Show trending/popular for new users
Copyright content uploaded Content ID fingerprinting during transcode
Key Takeaways
- Chunked resumable uploads are essential for large video files -- network interruptions are common, especially on mobile.
- Adaptive bitrate streaming (HLS/DASH) is the industry standard -- it adapts to network conditions without buffering.
- CDN is the critical scaling layer -- without it, origin servers cannot handle the egress bandwidth of billions of streams.
- Transcoding is compute-intensive -- GPU-based workers with auto-scaling and spot instances keep costs manageable.
- Storage costs dominate at scale -- tiered storage and efficient codecs (AV1) provide significant savings over time.