Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.h Design a File Storage System (Google Drive / Dropbox)

Problem Statement

Design a cloud file storage service like Google Drive or Dropbox that allows users to upload, download, sync, and share files across devices. The system must support file versioning, deduplication, and real-time sync.

1. Requirements

Functional Requirements

Upload and download files (up to 10 GB per file)
Sync files across multiple devices in real-time
File and folder organization (create, move, rename, delete)
Share files/folders with specific users or via public link
File versioning (view and restore previous versions)
Offline access with sync on reconnection
Search files by name and content

Non-Functional Requirements

Support 500 million registered users, 100 million DAU
Storage: 15 GB free tier, up to 2 TB paid
Upload/download speed: maximize available bandwidth
Sync latency: < 10 seconds for small file changes
Data durability: 99.999999999% (11 nines)
99.9% availability

2. Capacity Estimation

Traffic

Daily active users:     100 million
Files per user:         Average 200 files
Total files:            100 billion
Average file size:      500 KB
File operations/day:    1 billion (upload, download, sync, rename)
Uploads per second:     1B * 0.1 / 86,400 ~= 1,150 uploads/sec
Downloads per second:   1B * 0.4 / 86,400 ~= 4,600 downloads/sec

Storage

Total logical storage:  100B files * 500 KB = 50 PB
With 3x replication:    150 PB raw storage
Deduplication savings:  ~30% (common files shared across users)
Effective storage:      ~105 PB
Daily new data:         1B ops * 0.1 uploads * 500 KB = 50 TB/day

Bandwidth

Upload bandwidth:       1,150/sec * 500 KB = 575 MB/sec
Download bandwidth:     4,600/sec * 500 KB = 2.3 GB/sec
Peak (3x):              ~7 GB/sec download

3. High-Level Architecture

+----------+      +-------------------+
| Client   |----->| API Gateway       |
| (Desktop |      | + Load Balancer   |
|  App /   |      +---------+---------+
|  Mobile) |                |
+----+-----+    +-----------+-----------+
     |          |                       |
     |   +------v------+       +-------v-------+
     |   | Metadata    |       | Sync Service  |
     |   | Service     |       | (WebSocket)   |
     |   +------+------+       +-------+-------+
     |          |                      |
     |   +------v------+       +-------v-------+
     |   | Metadata DB |       | Notification  |
     |   | (PostgreSQL)|       | Queue (Kafka) |
     |   +-------------+       +---------------+
     |
     |   +--------------+      +---------------+
     +-->| Upload       |      | Block/Chunk   |
     |   | Service      |----->| Storage       |
     |   +--------------+      | (S3)          |
     |                         +---------------+
     |   +--------------+
     +-->| Download     |      +---------------+
         | Service      |----->| CDN           |
         +--------------+      +---------------+

     +----------------+        +---------------+
     | Sharing        |        | Versioning    |
     | Service        |        | Service       |
     +----------------+        +---------------+

4. API Design

POST /api/v1/files/upload/init
  Headers: Authorization: Bearer <token>
  Body: {
    "filename": "report.pdf",
    "file_size": 10485760,
    "parent_folder_id": "folder_123",
    "content_hash": "sha256:abc123..."
  }
  Response 200: {
    "upload_id": "up_456",
    "chunk_size": 4194304,
    "total_chunks": 3,
    "chunks_to_upload": [0, 1, 2],   // may be fewer with dedup
    "upload_urls": [
      { "chunk": 0, "url": "https://upload.example.com/..." },
      { "chunk": 1, "url": "https://upload.example.com/..." },
      { "chunk": 2, "url": "https://upload.example.com/..." }
    ]
  }

PUT /api/v1/files/upload/{upload_id}/chunk/{chunk_number}
  Body: Binary chunk data
  Headers: Content-SHA256: <chunk_hash>
  Response 200: { "chunk": 0, "status": "received", "verified": true }

POST /api/v1/files/upload/{upload_id}/complete
  Response 201: {
    "file_id": "file_789",
    "version": 1,
    "created_at": "2026-04-11T10:00:00Z"
  }

GET /api/v1/files/{file_id}/download
  Response 302: Redirect to presigned CDN/S3 URL

GET /api/v1/files/{file_id}/versions
  Response 200: {
    "versions": [
      { "version": 3, "modified_at": "...", "size": 10485760, "modified_by": "..." },
      { "version": 2, "modified_at": "...", "size": 10400000, "modified_by": "..." },
      { "version": 1, "modified_at": "...", "size": 9800000, "modified_by": "..." }
    ]
  }

POST /api/v1/files/{file_id}/share
  Body: {
    "shared_with": [
      { "user_id": "user_42", "permission": "editor" },
      { "email": "jane@example.com", "permission": "viewer" }
    ],
    "link_sharing": { "enabled": true, "permission": "viewer" }
  }
  Response 200: {
    "share_link": "https://drive.example.com/s/Xk9mP2"
  }

GET /api/v1/sync/changes?cursor={cursor}
  Response 200: {
    "changes": [
      { "type": "modified", "file_id": "f_1", "version": 3, "timestamp": "..." },
      { "type": "created", "file_id": "f_2", "version": 1, "timestamp": "..." },
      { "type": "deleted", "file_id": "f_3", "timestamp": "..." }
    ],
    "cursor": "new_cursor_value",
    "has_more": false
  }

5. Database Schema

File Metadata (PostgreSQL)

CREATE TABLE files (
    file_id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    owner_id        UUID NOT NULL REFERENCES users(user_id),
    parent_folder_id UUID REFERENCES files(file_id),
    name            VARCHAR(500) NOT NULL,
    is_folder       BOOLEAN DEFAULT FALSE,
    size_bytes      BIGINT DEFAULT 0,
    mime_type       VARCHAR(200),
    current_version INTEGER DEFAULT 1,
    content_hash    VARCHAR(64),
    is_deleted      BOOLEAN DEFAULT FALSE,
    deleted_at      TIMESTAMP,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (parent_folder_id, name, owner_id)
);

CREATE INDEX idx_files_parent ON files(parent_folder_id) WHERE NOT is_deleted;
CREATE INDEX idx_files_owner ON files(owner_id);

File Versions (PostgreSQL)

CREATE TABLE file_versions (
    version_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id         UUID NOT NULL REFERENCES files(file_id),
    version_number  INTEGER NOT NULL,
    size_bytes      BIGINT NOT NULL,
    content_hash    VARCHAR(64) NOT NULL,
    modified_by     UUID NOT NULL REFERENCES users(user_id),
    chunk_ids       UUID[] NOT NULL,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (file_id, version_number)
);

Chunks (PostgreSQL)

CREATE TABLE chunks (
    chunk_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    chunk_hash      VARCHAR(64) UNIQUE NOT NULL,
    size_bytes      INTEGER NOT NULL,
    storage_path    VARCHAR(500) NOT NULL,
    reference_count INTEGER DEFAULT 1,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- reference_count tracks how many file versions use this chunk
-- When count reaches 0, chunk can be garbage collected

Sharing Permissions (PostgreSQL)

CREATE TABLE file_shares (
    share_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id         UUID NOT NULL REFERENCES files(file_id),
    shared_with_id  UUID REFERENCES users(user_id),  -- null for link sharing
    permission      VARCHAR(20) NOT NULL,             -- 'viewer','editor','owner'
    share_link      VARCHAR(50) UNIQUE,
    link_enabled    BOOLEAN DEFAULT FALSE,
    created_by      UUID NOT NULL REFERENCES users(user_id),
    expires_at      TIMESTAMP,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Sync Log (PostgreSQL -- append-only)

CREATE TABLE sync_log (
    log_id          BIGSERIAL PRIMARY KEY,
    user_id         UUID NOT NULL,
    file_id         UUID NOT NULL,
    operation       VARCHAR(20) NOT NULL,   -- 'create','update','delete','move'
    version_number  INTEGER,
    timestamp       TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_sync_user_ts ON sync_log(user_id, log_id);
-- Cursor-based sync: client stores last log_id they processed

6. Deep Dive: Chunking and Deduplication

File Chunking Strategy

File: report.pdf (10 MB)
Chunk size: 4 MB

+------------------+------------------+------------------+
| Chunk 0          | Chunk 1          | Chunk 2          |
| 4 MB             | 4 MB             | 2 MB             |
| hash: abc123     | hash: def456     | hash: ghi789     |
+------------------+------------------+------------------+

Each chunk:
1. SHA-256 hash computed client-side
2. Client asks server: "Do you have chunk with hash abc123?"
3. If YES: skip upload (dedup hit!)
4. If NO: upload chunk

Benefits:
  - Resumable uploads (re-upload only missing chunks)
  - Deduplication (identical chunks stored once)
  - Delta sync (only changed chunks re-uploaded)

Content-Defined Chunking (Rabin Fingerprint)

Fixed-size chunking problem:
  Inserting 1 byte at the start shifts ALL chunk boundaries.
  Every chunk hash changes. Zero dedup benefit.

  Original:  [AAAA][BBBB][CCCC][DDDD]
  +1 byte:   [xAAA][ABBB][BCCC][CDDD]D  <- all chunks changed!

Content-defined chunking (CDC):
  Use rolling hash to find natural chunk boundaries.
  Boundary = where hash(rolling_window) % target_size == magic_value

  Original:  [AAA|BBBBB|CCC|DDDD]      <- boundaries based on content
  +1 byte:   [xAAA|BBBBB|CCC|DDDD]     <- only first chunk changed!

Result: ~90% of chunks remain identical after small edits.

Deduplication Levels

Level 1: Whole-file dedup
  - Hash entire file content
  - If hash exists in system: create reference, skip upload
  - Savings: 10-15% (identical files across users)

Level 2: Chunk-level dedup
  - Hash each chunk independently
  - Common chunks shared across files and users
  - Savings: 25-35% (especially for similar documents)

Level 3: Cross-user dedup
  - Chunks shared globally, not per-user
  - reference_count tracks usage
  - Privacy consideration: only hash, not content, is compared

7. Deep Dive: Sync Protocol

Real-Time Sync Architecture

Device A           Sync Service         Notification       Device B
(editor)           (server)             Queue (Kafka)      (syncing)
   |                   |                    |                  |
   |-- File modified ->|                    |                  |
   |   (delta chunks)  |                    |                  |
   |                   |-- Update metadata  |                  |
   |                   |-- Store new chunks |                  |
   |                   |                    |                  |
   |                   |-- Publish change ->|                  |
   |                   |                    |-- Push via WS -->|
   |                   |                    |                  |
   |                   |                    |   "file_123      |
   |                   |                    |    version 3"    |
   |                   |                    |                  |
   |                   |<--- GET changes ---|<-- Client fetches|
   |                   |     since cursor   |   change details |
   |                   |                    |                  |
   |                   |--- Changed chunks->|                  |
   |                   |    (download only  |                  |
   |                   |     new chunks)    |                  |
   |                   |                    |    Apply changes |

Long Polling + WebSocket Hybrid

Primary: WebSocket connection for instant notification
Fallback: Long polling (for restrictive networks)
Backup: Periodic polling every 60 seconds (catch missed events)

Notification payload (lightweight -- just triggers sync):
{
  "type": "file_changed",
  "file_id": "file_123",
  "version": 3,
  "timestamp": 1681200000
}

Client then fetches full change details via REST API.

Conflict Resolution

Scenario: User A and User B edit the same file simultaneously.

Timeline:
  T1: Both A and B have file at version 2
  T2: A saves changes -> version 3 created
  T3: B saves changes (based on version 2) -> CONFLICT

Resolution strategies:

1. Last-writer-wins (LWW):
   B's changes overwrite A's. A's changes lost.
   Simple but data loss risk.

2. Automatic merge (for supported formats):
   If changes are in different parts of the file, auto-merge.
   Works well for text files, not for binary.

3. Conflict copy (Dropbox approach -- recommended):
   Keep BOTH versions.
   B's file saved as "report (B's conflicting copy).pdf"
   User manually resolves.

4. Operational transform (Google Docs approach):
   Real-time character-level merging.
   Complex but seamless for collaborative editing.

Our approach: Conflict copy for general files,
              operational transform for document editing.

Conflict Detection:
  Client sends: { file_id, base_version: 2, new_content_hash: "xyz" }
  Server checks: current_version == 2?
    YES -> Accept, create version 3
    NO  -> Conflict! current_version is 3
           Create conflict copy, notify user

8. Deep Dive: File Sharing

Permission Model

Permission hierarchy:
  Owner > Editor > Commenter > Viewer

  Owner:     Full control, manage sharing, delete
  Editor:    Read, write, rename, move (within shared folder)
  Commenter: Read, add comments
  Viewer:    Read only, download

Inheritance:
  Sharing a folder shares all contents recursively.
  Child files inherit parent folder permissions.
  Explicit permissions override inherited ones.

Permission Check (Recursive with Caching)

def check_permission(user_id, file_id, required_permission):
    # Check cache first
    cached = redis.get(f"perm:{user_id}:{file_id}")
    if cached:
        return has_permission(cached, required_permission)
    
    # Check direct permission
    direct = db.query("""
        SELECT permission FROM file_shares
        WHERE file_id = %s AND shared_with_id = %s
    """, file_id, user_id)
    
    if direct:
        redis.setex(f"perm:{user_id}:{file_id}", 3600, direct.permission)
        return has_permission(direct.permission, required_permission)
    
    # Check parent folder (recursion)
    file = db.query("SELECT parent_folder_id FROM files WHERE file_id = %s", file_id)
    if file.parent_folder_id:
        return check_permission(user_id, file.parent_folder_id, required_permission)
    
    return False  # No permission found

Public Link Sharing

Share link: https://drive.example.com/s/Xk9mP2

Server:
1. Look up share_link "Xk9mP2" in file_shares table
2. Check: is link_enabled?
3. Check: has link expired?
4. Check: permission level on link
5. Serve file accordingly (viewer = read-only, no download option)

Link token: randomly generated, not guessable
Optional: password-protected links
Optional: expiration date on links

9. Storage Architecture

Storage Tiers:

+------------------+     +------------------+     +------------------+
| Hot Storage      |     | Warm Storage     |     | Cold Storage     |
| (S3 Standard)    |     | (S3 IA)          |     | (S3 Glacier)     |
|                  |     |                  |     |                  |
| Recently active  |     | Accessed in last |     | No access in     |
| files (< 30 days)|     | 30-90 days       |     | 90+ days         |
| Lowest latency   |     | Slightly higher  |     | Minutes to hours |
|                  |     | retrieval cost   |     | retrieval time   |
+------------------+     +------------------+     +------------------+

Lifecycle policy:
  - Files not accessed for 30 days -> move to Warm
  - Files not accessed for 90 days -> move to Cold
  - Metadata always stays in hot storage (fast listing)

Data Durability

S3 provides 11 nines of durability (99.999999999%).

Additional measures:
  1. Cross-region replication for paid tiers
  2. Chunk-level checksums verified on read
  3. Background integrity scanner (weekly full scan)
  4. Versioning prevents accidental deletion (30-day trash)

10. Scaling Considerations

Metadata Service Scaling

PostgreSQL with read replicas:
  Primary: handles all writes (file creation, moves, renames)
  Replicas: handle reads (file listing, search, sync queries)
  
Sharding strategy: shard by user_id
  - All of a user's files on one shard
  - File listing never crosses shards
  - Sharing creates cross-shard references (resolved at app layer)

Upload/Download Scaling

Upload: Client -> Presigned URL -> Direct to S3 (bypasses our servers)
Download: Client -> CDN -> S3 (CDN absorbs repeat downloads)

Our servers only handle:
  - Generating presigned URLs
  - Updating metadata
  - Coordinating sync

This reduces bandwidth through our servers by 95%+.

Sync Optimization

Debounce: Batch rapid changes (e.g., auto-save every 2 seconds)
           Wait 5 seconds of no changes before syncing.

Delta compression: Only upload changed chunks, not entire file.
           Average file edit touches < 10% of chunks.

Bandwidth throttling: Limit sync bandwidth on metered connections.
           User-configurable in settings.

11. Key Tradeoffs

Decision	Option A	Option B	Our Choice
Chunking method	Fixed-size	Content-defined (CDC)	CDC
Conflict resolution	Last-writer-wins	Conflict copy	Conflict copy
Sync notification	Polling	WebSocket	WS + polling
Deduplication scope	Per-user	Global	Global
File storage	Own storage cluster	Cloud object store	S3
Permission check	Compute every time	Cache with invalidation	Cached
Versioning retention	Unlimited	30 days / 100 versions	30 days

12. Failure Scenarios and Mitigations

Scenario                          Mitigation
------------------------------------------------------------------------
Upload interrupted                Resumable chunked upload; retry missing chunks
Sync conflict                     Conflict copy created; user notified
Metadata DB failure               Failover to replica; promote to primary
S3 regional outage                Cross-region replication for paid users
WebSocket disconnect              Auto-reconnect; periodic polling as backup
Chunk corruption                  SHA-256 checksum validation on read; re-fetch
Accidental file deletion          30-day trash with restore capability
Storage quota exceeded            Block new uploads; notify user; grace period

Key Takeaways

Content-defined chunking (Rabin fingerprint) is critical for efficient delta sync -- without it, a 1-byte insert invalidates every chunk.
Deduplication at the chunk level saves 25-35% storage and dramatically reduces upload time when files are similar.
Conflict copy is the safest default for file sync -- data loss from last-writer-wins is unacceptable for user files.
Direct-to-S3 uploads with presigned URLs keep bandwidth off application servers, which only handle lightweight metadata operations.
The sync log with cursor-based polling is the backbone of multi-device sync -- it provides exactly-once delivery of change notifications.