Episode 9 — System Design / 9.11 — Real World System Design Problems
9.11.h Design a File Storage System (Google Drive / Dropbox)
Problem Statement
Design a cloud file storage service like Google Drive or Dropbox that allows users to upload, download, sync, and share files across devices. The system must support file versioning, deduplication, and real-time sync.
1. Requirements
Functional Requirements
- Upload and download files (up to 10 GB per file)
- Sync files across multiple devices in real-time
- File and folder organization (create, move, rename, delete)
- Share files/folders with specific users or via public link
- File versioning (view and restore previous versions)
- Offline access with sync on reconnection
- Search files by name and content
Non-Functional Requirements
- Support 500 million registered users, 100 million DAU
- Storage: 15 GB free tier, up to 2 TB paid
- Upload/download speed: maximize available bandwidth
- Sync latency: < 10 seconds for small file changes
- Data durability: 99.999999999% (11 nines)
- 99.9% availability
2. Capacity Estimation
Traffic
Daily active users: 100 million
Files per user: Average 200 files
Total files: 100 billion
Average file size: 500 KB
File operations/day: 1 billion (upload, download, sync, rename)
Uploads per second: 1B * 0.1 / 86,400 ~= 1,150 uploads/sec
Downloads per second: 1B * 0.4 / 86,400 ~= 4,600 downloads/sec
Storage
Total logical storage: 100B files * 500 KB = 50 PB
With 3x replication: 150 PB raw storage
Deduplication savings: ~30% (common files shared across users)
Effective storage: ~105 PB
Daily new data: 1B ops * 0.1 uploads * 500 KB = 50 TB/day
Bandwidth
Upload bandwidth: 1,150/sec * 500 KB = 575 MB/sec
Download bandwidth: 4,600/sec * 500 KB = 2.3 GB/sec
Peak (3x): ~7 GB/sec download
3. High-Level Architecture
+----------+ +-------------------+
| Client |----->| API Gateway |
| (Desktop | | + Load Balancer |
| App / | +---------+---------+
| Mobile) | |
+----+-----+ +-----------+-----------+
| | |
| +------v------+ +-------v-------+
| | Metadata | | Sync Service |
| | Service | | (WebSocket) |
| +------+------+ +-------+-------+
| | |
| +------v------+ +-------v-------+
| | Metadata DB | | Notification |
| | (PostgreSQL)| | Queue (Kafka) |
| +-------------+ +---------------+
|
| +--------------+ +---------------+
+-->| Upload | | Block/Chunk |
| | Service |----->| Storage |
| +--------------+ | (S3) |
| +---------------+
| +--------------+
+-->| Download | +---------------+
| Service |----->| CDN |
+--------------+ +---------------+
+----------------+ +---------------+
| Sharing | | Versioning |
| Service | | Service |
+----------------+ +---------------+
4. API Design
POST /api/v1/files/upload/init
Headers: Authorization: Bearer <token>
Body: {
"filename": "report.pdf",
"file_size": 10485760,
"parent_folder_id": "folder_123",
"content_hash": "sha256:abc123..."
}
Response 200: {
"upload_id": "up_456",
"chunk_size": 4194304,
"total_chunks": 3,
"chunks_to_upload": [0, 1, 2], // may be fewer with dedup
"upload_urls": [
{ "chunk": 0, "url": "https://upload.example.com/..." },
{ "chunk": 1, "url": "https://upload.example.com/..." },
{ "chunk": 2, "url": "https://upload.example.com/..." }
]
}
PUT /api/v1/files/upload/{upload_id}/chunk/{chunk_number}
Body: Binary chunk data
Headers: Content-SHA256: <chunk_hash>
Response 200: { "chunk": 0, "status": "received", "verified": true }
POST /api/v1/files/upload/{upload_id}/complete
Response 201: {
"file_id": "file_789",
"version": 1,
"created_at": "2026-04-11T10:00:00Z"
}
GET /api/v1/files/{file_id}/download
Response 302: Redirect to presigned CDN/S3 URL
GET /api/v1/files/{file_id}/versions
Response 200: {
"versions": [
{ "version": 3, "modified_at": "...", "size": 10485760, "modified_by": "..." },
{ "version": 2, "modified_at": "...", "size": 10400000, "modified_by": "..." },
{ "version": 1, "modified_at": "...", "size": 9800000, "modified_by": "..." }
]
}
POST /api/v1/files/{file_id}/share
Body: {
"shared_with": [
{ "user_id": "user_42", "permission": "editor" },
{ "email": "jane@example.com", "permission": "viewer" }
],
"link_sharing": { "enabled": true, "permission": "viewer" }
}
Response 200: {
"share_link": "https://drive.example.com/s/Xk9mP2"
}
GET /api/v1/sync/changes?cursor={cursor}
Response 200: {
"changes": [
{ "type": "modified", "file_id": "f_1", "version": 3, "timestamp": "..." },
{ "type": "created", "file_id": "f_2", "version": 1, "timestamp": "..." },
{ "type": "deleted", "file_id": "f_3", "timestamp": "..." }
],
"cursor": "new_cursor_value",
"has_more": false
}
5. Database Schema
File Metadata (PostgreSQL)
CREATE TABLE files (
file_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
owner_id UUID NOT NULL REFERENCES users(user_id),
parent_folder_id UUID REFERENCES files(file_id),
name VARCHAR(500) NOT NULL,
is_folder BOOLEAN DEFAULT FALSE,
size_bytes BIGINT DEFAULT 0,
mime_type VARCHAR(200),
current_version INTEGER DEFAULT 1,
content_hash VARCHAR(64),
is_deleted BOOLEAN DEFAULT FALSE,
deleted_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE (parent_folder_id, name, owner_id)
);
CREATE INDEX idx_files_parent ON files(parent_folder_id) WHERE NOT is_deleted;
CREATE INDEX idx_files_owner ON files(owner_id);
File Versions (PostgreSQL)
CREATE TABLE file_versions (
version_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
file_id UUID NOT NULL REFERENCES files(file_id),
version_number INTEGER NOT NULL,
size_bytes BIGINT NOT NULL,
content_hash VARCHAR(64) NOT NULL,
modified_by UUID NOT NULL REFERENCES users(user_id),
chunk_ids UUID[] NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE (file_id, version_number)
);
Chunks (PostgreSQL)
CREATE TABLE chunks (
chunk_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chunk_hash VARCHAR(64) UNIQUE NOT NULL,
size_bytes INTEGER NOT NULL,
storage_path VARCHAR(500) NOT NULL,
reference_count INTEGER DEFAULT 1,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- reference_count tracks how many file versions use this chunk
-- When count reaches 0, chunk can be garbage collected
Sharing Permissions (PostgreSQL)
CREATE TABLE file_shares (
share_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
file_id UUID NOT NULL REFERENCES files(file_id),
shared_with_id UUID REFERENCES users(user_id), -- null for link sharing
permission VARCHAR(20) NOT NULL, -- 'viewer','editor','owner'
share_link VARCHAR(50) UNIQUE,
link_enabled BOOLEAN DEFAULT FALSE,
created_by UUID NOT NULL REFERENCES users(user_id),
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Sync Log (PostgreSQL -- append-only)
CREATE TABLE sync_log (
log_id BIGSERIAL PRIMARY KEY,
user_id UUID NOT NULL,
file_id UUID NOT NULL,
operation VARCHAR(20) NOT NULL, -- 'create','update','delete','move'
version_number INTEGER,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_sync_user_ts ON sync_log(user_id, log_id);
-- Cursor-based sync: client stores last log_id they processed
6. Deep Dive: Chunking and Deduplication
File Chunking Strategy
File: report.pdf (10 MB)
Chunk size: 4 MB
+------------------+------------------+------------------+
| Chunk 0 | Chunk 1 | Chunk 2 |
| 4 MB | 4 MB | 2 MB |
| hash: abc123 | hash: def456 | hash: ghi789 |
+------------------+------------------+------------------+
Each chunk:
1. SHA-256 hash computed client-side
2. Client asks server: "Do you have chunk with hash abc123?"
3. If YES: skip upload (dedup hit!)
4. If NO: upload chunk
Benefits:
- Resumable uploads (re-upload only missing chunks)
- Deduplication (identical chunks stored once)
- Delta sync (only changed chunks re-uploaded)
Content-Defined Chunking (Rabin Fingerprint)
Fixed-size chunking problem:
Inserting 1 byte at the start shifts ALL chunk boundaries.
Every chunk hash changes. Zero dedup benefit.
Original: [AAAA][BBBB][CCCC][DDDD]
+1 byte: [xAAA][ABBB][BCCC][CDDD]D <- all chunks changed!
Content-defined chunking (CDC):
Use rolling hash to find natural chunk boundaries.
Boundary = where hash(rolling_window) % target_size == magic_value
Original: [AAA|BBBBB|CCC|DDDD] <- boundaries based on content
+1 byte: [xAAA|BBBBB|CCC|DDDD] <- only first chunk changed!
Result: ~90% of chunks remain identical after small edits.
Deduplication Levels
Level 1: Whole-file dedup
- Hash entire file content
- If hash exists in system: create reference, skip upload
- Savings: 10-15% (identical files across users)
Level 2: Chunk-level dedup
- Hash each chunk independently
- Common chunks shared across files and users
- Savings: 25-35% (especially for similar documents)
Level 3: Cross-user dedup
- Chunks shared globally, not per-user
- reference_count tracks usage
- Privacy consideration: only hash, not content, is compared
7. Deep Dive: Sync Protocol
Real-Time Sync Architecture
Device A Sync Service Notification Device B
(editor) (server) Queue (Kafka) (syncing)
| | | |
|-- File modified ->| | |
| (delta chunks) | | |
| |-- Update metadata | |
| |-- Store new chunks | |
| | | |
| |-- Publish change ->| |
| | |-- Push via WS -->|
| | | |
| | | "file_123 |
| | | version 3" |
| | | |
| |<--- GET changes ---|<-- Client fetches|
| | since cursor | change details |
| | | |
| |--- Changed chunks->| |
| | (download only | |
| | new chunks) | |
| | | Apply changes |
Long Polling + WebSocket Hybrid
Primary: WebSocket connection for instant notification
Fallback: Long polling (for restrictive networks)
Backup: Periodic polling every 60 seconds (catch missed events)
Notification payload (lightweight -- just triggers sync):
{
"type": "file_changed",
"file_id": "file_123",
"version": 3,
"timestamp": 1681200000
}
Client then fetches full change details via REST API.
Conflict Resolution
Scenario: User A and User B edit the same file simultaneously.
Timeline:
T1: Both A and B have file at version 2
T2: A saves changes -> version 3 created
T3: B saves changes (based on version 2) -> CONFLICT
Resolution strategies:
1. Last-writer-wins (LWW):
B's changes overwrite A's. A's changes lost.
Simple but data loss risk.
2. Automatic merge (for supported formats):
If changes are in different parts of the file, auto-merge.
Works well for text files, not for binary.
3. Conflict copy (Dropbox approach -- recommended):
Keep BOTH versions.
B's file saved as "report (B's conflicting copy).pdf"
User manually resolves.
4. Operational transform (Google Docs approach):
Real-time character-level merging.
Complex but seamless for collaborative editing.
Our approach: Conflict copy for general files,
operational transform for document editing.
Conflict Detection:
Client sends: { file_id, base_version: 2, new_content_hash: "xyz" }
Server checks: current_version == 2?
YES -> Accept, create version 3
NO -> Conflict! current_version is 3
Create conflict copy, notify user
8. Deep Dive: File Sharing
Permission Model
Permission hierarchy:
Owner > Editor > Commenter > Viewer
Owner: Full control, manage sharing, delete
Editor: Read, write, rename, move (within shared folder)
Commenter: Read, add comments
Viewer: Read only, download
Inheritance:
Sharing a folder shares all contents recursively.
Child files inherit parent folder permissions.
Explicit permissions override inherited ones.
Permission Check (Recursive with Caching)
def check_permission(user_id, file_id, required_permission):
# Check cache first
cached = redis.get(f"perm:{user_id}:{file_id}")
if cached:
return has_permission(cached, required_permission)
# Check direct permission
direct = db.query("""
SELECT permission FROM file_shares
WHERE file_id = %s AND shared_with_id = %s
""", file_id, user_id)
if direct:
redis.setex(f"perm:{user_id}:{file_id}", 3600, direct.permission)
return has_permission(direct.permission, required_permission)
# Check parent folder (recursion)
file = db.query("SELECT parent_folder_id FROM files WHERE file_id = %s", file_id)
if file.parent_folder_id:
return check_permission(user_id, file.parent_folder_id, required_permission)
return False # No permission found
Public Link Sharing
Share link: https://drive.example.com/s/Xk9mP2
Server:
1. Look up share_link "Xk9mP2" in file_shares table
2. Check: is link_enabled?
3. Check: has link expired?
4. Check: permission level on link
5. Serve file accordingly (viewer = read-only, no download option)
Link token: randomly generated, not guessable
Optional: password-protected links
Optional: expiration date on links
9. Storage Architecture
Storage Tiers:
+------------------+ +------------------+ +------------------+
| Hot Storage | | Warm Storage | | Cold Storage |
| (S3 Standard) | | (S3 IA) | | (S3 Glacier) |
| | | | | |
| Recently active | | Accessed in last | | No access in |
| files (< 30 days)| | 30-90 days | | 90+ days |
| Lowest latency | | Slightly higher | | Minutes to hours |
| | | retrieval cost | | retrieval time |
+------------------+ +------------------+ +------------------+
Lifecycle policy:
- Files not accessed for 30 days -> move to Warm
- Files not accessed for 90 days -> move to Cold
- Metadata always stays in hot storage (fast listing)
Data Durability
S3 provides 11 nines of durability (99.999999999%).
Additional measures:
1. Cross-region replication for paid tiers
2. Chunk-level checksums verified on read
3. Background integrity scanner (weekly full scan)
4. Versioning prevents accidental deletion (30-day trash)
10. Scaling Considerations
Metadata Service Scaling
PostgreSQL with read replicas:
Primary: handles all writes (file creation, moves, renames)
Replicas: handle reads (file listing, search, sync queries)
Sharding strategy: shard by user_id
- All of a user's files on one shard
- File listing never crosses shards
- Sharing creates cross-shard references (resolved at app layer)
Upload/Download Scaling
Upload: Client -> Presigned URL -> Direct to S3 (bypasses our servers)
Download: Client -> CDN -> S3 (CDN absorbs repeat downloads)
Our servers only handle:
- Generating presigned URLs
- Updating metadata
- Coordinating sync
This reduces bandwidth through our servers by 95%+.
Sync Optimization
Debounce: Batch rapid changes (e.g., auto-save every 2 seconds)
Wait 5 seconds of no changes before syncing.
Delta compression: Only upload changed chunks, not entire file.
Average file edit touches < 10% of chunks.
Bandwidth throttling: Limit sync bandwidth on metered connections.
User-configurable in settings.
11. Key Tradeoffs
| Decision | Option A | Option B | Our Choice |
|---|---|---|---|
| Chunking method | Fixed-size | Content-defined (CDC) | CDC |
| Conflict resolution | Last-writer-wins | Conflict copy | Conflict copy |
| Sync notification | Polling | WebSocket | WS + polling |
| Deduplication scope | Per-user | Global | Global |
| File storage | Own storage cluster | Cloud object store | S3 |
| Permission check | Compute every time | Cache with invalidation | Cached |
| Versioning retention | Unlimited | 30 days / 100 versions | 30 days |
12. Failure Scenarios and Mitigations
Scenario Mitigation
------------------------------------------------------------------------
Upload interrupted Resumable chunked upload; retry missing chunks
Sync conflict Conflict copy created; user notified
Metadata DB failure Failover to replica; promote to primary
S3 regional outage Cross-region replication for paid users
WebSocket disconnect Auto-reconnect; periodic polling as backup
Chunk corruption SHA-256 checksum validation on read; re-fetch
Accidental file deletion 30-day trash with restore capability
Storage quota exceeded Block new uploads; notify user; grace period
Key Takeaways
- Content-defined chunking (Rabin fingerprint) is critical for efficient delta sync -- without it, a 1-byte insert invalidates every chunk.
- Deduplication at the chunk level saves 25-35% storage and dramatically reduces upload time when files are similar.
- Conflict copy is the safest default for file sync -- data loss from last-writer-wins is unacceptable for user files.
- Direct-to-S3 uploads with presigned URLs keep bandwidth off application servers, which only handle lightweight metadata operations.
- The sync log with cursor-based polling is the backbone of multi-device sync -- it provides exactly-once delivery of change notifications.