Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.a Design a URL Shortener (TinyURL / Bitly)

Problem Statement

Design a URL shortening service like TinyURL or Bitly that takes long URLs and generates short, unique aliases. Users can click the short URL to be redirected to the original.


1. Requirements

Functional Requirements

  • Given a long URL, generate a short unique URL
  • When users access the short URL, redirect to the original URL
  • Users can optionally pick a custom short URL
  • URLs expire after a configurable time (default: 5 years)
  • Analytics: track click counts, referrer, geo-location

Non-Functional Requirements

  • Very low latency for redirection (< 50ms)
  • High availability (99.99% uptime)
  • Short URLs should not be predictable (security)
  • System is read-heavy (100:1 read to write ratio)

2. Capacity Estimation

Traffic

New URLs per month:     100 million
Reads per month:        10 billion (100:1 ratio)

Writes per second:      100M / (30 * 24 * 3600) ~= 40 URLs/sec
Reads per second:       40 * 100 = 4,000 redirections/sec
Peak reads per second:  ~20,000 redirections/sec

Storage (5-year horizon)

Total URLs:             100M * 12 * 5 = 6 billion URLs
Average URL size:       500 bytes (original URL) + 100 bytes (metadata)
Total storage:          6B * 600 bytes = 3.6 TB

Bandwidth

Write bandwidth:        40 * 600 bytes = 24 KB/s
Read bandwidth:         4,000 * 600 bytes = 2.4 MB/s

Memory (caching hot URLs -- 80/20 rule)

Daily reads:            4,000 * 86,400 = ~345 million/day
Cache 20% of daily:     345M * 0.2 * 600 bytes = ~41 GB

3. High-Level Architecture

                         +-------------------+
                         |   Load Balancer   |
                         +--------+----------+
                                  |
                    +-------------+-------------+
                    |                           |
             +------+------+           +-------+-------+
             | Write API   |           | Read/Redirect |
             | Service     |           | Service       |
             +------+------+           +-------+-------+
                    |                          |
                    |                   +------+------+
                    |                   |   Cache     |
                    |                   |  (Redis)    |
                    |                   +------+------+
                    |                          |
              +-----+--------------------------+-----+
              |         Database (Primary)           |
              |         + Read Replicas              |
              +-----+-------------------+------------+
                    |                   |
              +-----+-----+     +------+------+
              | Analytics  |     | Expiration  |
              | Service    |     | Service     |
              +------------+     +-------------+

4. API Design

REST Endpoints

POST /api/v1/urls
  Headers: Authorization: Bearer <token>
  Body: {
    "long_url": "https://example.com/very/long/path?query=params",
    "custom_alias": "my-link",       // optional
    "expiration_date": "2031-01-01"  // optional
  }
  Response 201: {
    "short_url": "https://short.ly/ab3Kx9",
    "long_url": "https://example.com/very/long/path?query=params",
    "created_at": "2026-04-11T10:00:00Z",
    "expires_at": "2031-01-01T00:00:00Z"
  }

GET /{short_code}
  Response 301/302: Redirect to original URL
  (301 = permanent, 302 = temporary; use 302 for analytics accuracy)

GET /api/v1/urls/{short_code}/stats
  Headers: Authorization: Bearer <token>
  Response 200: {
    "short_url": "https://short.ly/ab3Kx9",
    "total_clicks": 15230,
    "clicks_by_day": [...],
    "top_referrers": [...],
    "top_countries": [...]
  }

DELETE /api/v1/urls/{short_code}
  Headers: Authorization: Bearer <token>
  Response 204: No Content

5. Encoding Algorithm: Generating Short URLs

Option A: Base62 Encoding of Auto-Increment ID

Characters: [a-z, A-Z, 0-9] = 62 characters

Length 6: 62^6 = ~56.8 billion unique URLs
Length 7: 62^7 = ~3.5 trillion unique URLs

7 characters is sufficient for our 6 billion URL requirement.

Process:

1. Insert URL into DB, get auto-increment ID
2. Convert ID to base62

   Example: ID = 125348721
   
   125348721 / 62 = 2021753 remainder 55 -> 't'
   2021753   / 62 = 32609   remainder 15 -> 'p'
   32609     / 62 = 525     remainder 59 -> '7'
   525       / 62 = 8       remainder 29 -> 'D'
   8         / 62 = 0       remainder 8  -> 'i'
   
   Short URL: "iD7pt"

Pros: Simple, no collisions, deterministic. Cons: Predictable (sequential IDs), single point of failure for ID generation.

Option B: MD5/SHA256 Hash + Base62

1. Hash the long URL: MD5("https://example.com/...") = "5d41402abc4b..."
2. Take first 7 characters of base62-encoded hash
3. Check for collision; if collision, append counter and rehash

Pros: Not sequential, URL-dependent. Cons: Potential collisions, extra DB lookups.

Option C: Pre-Generated Key Service (Recommended)

+------------------+          +-------------------+
| Key Generation   | -------> | Key Database      |
| Service (KGS)    |          | (unused/used keys)|
+------------------+          +-------------------+
        |
        | Pre-fetch batch of keys
        v
+------------------+
| Application      |
| Server (in-mem)  |
+------------------+
1. KGS pre-generates millions of unique 7-char base62 strings
2. Stores them in a "keys" table with status: unused/used
3. Application servers request a batch (e.g., 1,000 keys) in advance
4. On URL creation, pick next unused key from local batch
5. When batch runs low, fetch another batch from KGS

Pros: No collision, no real-time computation, horizontally scalable. Cons: Key wastage if server crashes with unused keys (acceptable).


6. Database Schema

URL Table

CREATE TABLE urls (
    id            BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code    VARCHAR(7) UNIQUE NOT NULL,
    original_url  VARCHAR(2048) NOT NULL,
    user_id       BIGINT,
    created_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at    TIMESTAMP,
    click_count   BIGINT DEFAULT 0,
    is_active     BOOLEAN DEFAULT TRUE
);

-- Indexes
CREATE INDEX idx_short_code ON urls(short_code);
CREATE INDEX idx_user_id ON urls(user_id);
CREATE INDEX idx_expires_at ON urls(expires_at) WHERE is_active = TRUE;

Analytics Table

CREATE TABLE click_events (
    id            BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code    VARCHAR(7) NOT NULL,
    clicked_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    referrer      VARCHAR(2048),
    user_agent    VARCHAR(512),
    ip_address    VARCHAR(45),
    country       VARCHAR(2),
    device_type   VARCHAR(20)
);

-- Partitioned by date for efficient querying
-- Index on short_code + clicked_at for analytics queries

Database Choice

OptionReason
NoSQLSimple key-value lookups, massive read throughput
DynamoDBAuto-scaling, single-digit ms latency, managed
CassandraHigh write throughput, linear scalability
PostgreSQLIf you need ACID for analytics, strong consistency

Recommendation: Use a NoSQL store (DynamoDB or Cassandra) for the URL mapping and a separate analytics store (ClickHouse or Cassandra) for events.


7. Deep Dive: Redirection Flow

Client                 LB              Read Service           Cache          DB
  |                    |                    |                    |             |
  |--- GET /ab3Kx9 --->|                    |                    |             |
  |                    |--- forward ------->|                    |             |
  |                    |                    |--- GET ab3Kx9 ---->|             |
  |                    |                    |                    |             |
  |                    |                    |<-- cache HIT ------|             |
  |                    |                    |    (original_url)  |             |
  |                    |                    |                    |             |
  |                    |<--- 302 Redirect --|                    |             |
  |<--- 302 ----------|                    |                    |             |
  |    Location: https://example.com/...   |                    |             |
  |                    |                    |                    |             |
  |                    |    (async)         |                    |             |
  |                    |                    |--- log click ----->| Analytics   |
  |                    |                    |    (message queue) | Service     |

301 vs 302 Redirect

301 Moved Permanently:
  - Browser caches the redirect
  - Fewer requests to our servers
  - PROBLEM: We lose analytics data on subsequent visits

302 Found (Temporary):
  - Browser does NOT cache
  - Every click hits our server
  - We get accurate analytics

Decision: Use 302 if analytics matter, 301 if reducing load matters.
         Bitly uses 301. Most analytics-focused services use 302.

8. Deep Dive: Caching Strategy

Cache Architecture:
+--------------------------------------------------+
|                  Cache Layer                      |
|                                                  |
|  +----------+  +----------+  +----------+        |
|  | Redis-1  |  | Redis-2  |  | Redis-3  |        |
|  | (shard)  |  | (shard)  |  | (shard)  |        |
|  +----------+  +----------+  +----------+        |
|                                                  |
|  Consistent hashing on short_code                |
|  TTL: 24 hours for standard, 1 hour for hot     |
+--------------------------------------------------+

Eviction Policy

- LRU (Least Recently Used) eviction
- Hot URLs (>100 clicks/hour) get pinned
- 80/20 rule: 20% of URLs get 80% of traffic
- Cache hit ratio target: > 90%

Cache Warming

On startup:
1. Load top 1,000 URLs by click count from DB
2. Pre-populate cache
3. Set longer TTL for these entries

9. Deep Dive: Analytics Pipeline

Click Event --> Kafka --> Stream Processor --> ClickHouse
                           (Flink/Spark)      (Analytics DB)
                               |
                               v
                         Real-time Counter
                          (Redis HyperLogLog
                           for unique visitors)

Unique Visitor Counting with HyperLogLog

PFADD short:ab3Kx9:visitors <user_fingerprint>
PFCOUNT short:ab3Kx9:visitors
-- Returns approximate unique count with < 1% error
-- Uses only 12 KB per counter regardless of cardinality

10. Scaling Considerations

Database Sharding

Strategy: Hash-based sharding on short_code

Shard key = hash(short_code) % num_shards

Shard 0: short_codes where hash % 4 == 0
Shard 1: short_codes where hash % 4 == 1
Shard 2: short_codes where hash % 4 == 2
Shard 3: short_codes where hash % 4 == 3

Each shard has its own primary + replicas.

Read Replicas

Primary DB: handles all writes (~40 writes/sec -- trivial)
Read replicas: handle redirections (~4,000 reads/sec)
Replicas per shard: 3 (for redundancy and load distribution)

Rate Limiting

- Limit URL creation: 100 URLs per hour per user
- Limit redirections: 1,000 per minute per IP (prevent abuse)
- Use token bucket algorithm at the API gateway

Geographic Distribution

+------------------+     +------------------+     +------------------+
|  US-East Region  |     |  EU-West Region  |     |  AP-South Region |
|  - App Servers   |     |  - App Servers   |     |  - App Servers   |
|  - Redis Cache   |     |  - Redis Cache   |     |  - Redis Cache   |
|  - DB Replica    |     |  - DB Replica    |     |  - DB Replica    |
+------------------+     +------------------+     +------------------+
         \                       |                       /
          \                      |                      /
           +--------- Primary DB Cluster --------------+

11. Key Tradeoffs

DecisionOption AOption BOur Choice
Short code generationHash-basedPre-generated keysPre-gen
Redirect type301 (cached)302 (trackable)302
DatabaseSQL (consistency)NoSQL (scale)NoSQL
AnalyticsSync (accurate)Async (fast redirect)Async
Custom aliasesSame tableSeparate tableSame table
ExpirationLazy deletionActive cleanupBoth

12. Failure Scenarios and Mitigations

Scenario                     Mitigation
------------------------------------------------------------------
KGS goes down                Local buffer of pre-fetched keys
Cache failure                Fall back to DB reads (higher latency)
DB primary fails             Promote read replica to primary
Hash collision               Retry with appended counter
Hot URL (viral)              Dedicated cache entry, CDN caching

Key Takeaways

  1. Read-heavy systems benefit enormously from caching (90%+ hit rate achievable).
  2. Pre-generated keys avoid runtime collision handling and simplify the write path.
  3. 302 redirects are preferred when analytics matter, despite higher server load.
  4. Async analytics keeps the redirect path fast (< 10ms from cache).
  5. The URL shortener is an excellent "warm-up" problem -- simple but covers caching, hashing, DB choice, and scaling fundamentals.