Episode 9 — System Design / 9.11 — Real World System Design Problems
9.11.a Design a URL Shortener (TinyURL / Bitly)
Problem Statement
Design a URL shortening service like TinyURL or Bitly that takes long URLs and generates short, unique aliases. Users can click the short URL to be redirected to the original.
1. Requirements
Functional Requirements
- Given a long URL, generate a short unique URL
- When users access the short URL, redirect to the original URL
- Users can optionally pick a custom short URL
- URLs expire after a configurable time (default: 5 years)
- Analytics: track click counts, referrer, geo-location
Non-Functional Requirements
- Very low latency for redirection (< 50ms)
- High availability (99.99% uptime)
- Short URLs should not be predictable (security)
- System is read-heavy (100:1 read to write ratio)
2. Capacity Estimation
Traffic
New URLs per month: 100 million
Reads per month: 10 billion (100:1 ratio)
Writes per second: 100M / (30 * 24 * 3600) ~= 40 URLs/sec
Reads per second: 40 * 100 = 4,000 redirections/sec
Peak reads per second: ~20,000 redirections/sec
Storage (5-year horizon)
Total URLs: 100M * 12 * 5 = 6 billion URLs
Average URL size: 500 bytes (original URL) + 100 bytes (metadata)
Total storage: 6B * 600 bytes = 3.6 TB
Bandwidth
Write bandwidth: 40 * 600 bytes = 24 KB/s
Read bandwidth: 4,000 * 600 bytes = 2.4 MB/s
Memory (caching hot URLs -- 80/20 rule)
Daily reads: 4,000 * 86,400 = ~345 million/day
Cache 20% of daily: 345M * 0.2 * 600 bytes = ~41 GB
3. High-Level Architecture
+-------------------+
| Load Balancer |
+--------+----------+
|
+-------------+-------------+
| |
+------+------+ +-------+-------+
| Write API | | Read/Redirect |
| Service | | Service |
+------+------+ +-------+-------+
| |
| +------+------+
| | Cache |
| | (Redis) |
| +------+------+
| |
+-----+--------------------------+-----+
| Database (Primary) |
| + Read Replicas |
+-----+-------------------+------------+
| |
+-----+-----+ +------+------+
| Analytics | | Expiration |
| Service | | Service |
+------------+ +-------------+
4. API Design
REST Endpoints
POST /api/v1/urls
Headers: Authorization: Bearer <token>
Body: {
"long_url": "https://example.com/very/long/path?query=params",
"custom_alias": "my-link", // optional
"expiration_date": "2031-01-01" // optional
}
Response 201: {
"short_url": "https://short.ly/ab3Kx9",
"long_url": "https://example.com/very/long/path?query=params",
"created_at": "2026-04-11T10:00:00Z",
"expires_at": "2031-01-01T00:00:00Z"
}
GET /{short_code}
Response 301/302: Redirect to original URL
(301 = permanent, 302 = temporary; use 302 for analytics accuracy)
GET /api/v1/urls/{short_code}/stats
Headers: Authorization: Bearer <token>
Response 200: {
"short_url": "https://short.ly/ab3Kx9",
"total_clicks": 15230,
"clicks_by_day": [...],
"top_referrers": [...],
"top_countries": [...]
}
DELETE /api/v1/urls/{short_code}
Headers: Authorization: Bearer <token>
Response 204: No Content
5. Encoding Algorithm: Generating Short URLs
Option A: Base62 Encoding of Auto-Increment ID
Characters: [a-z, A-Z, 0-9] = 62 characters
Length 6: 62^6 = ~56.8 billion unique URLs
Length 7: 62^7 = ~3.5 trillion unique URLs
7 characters is sufficient for our 6 billion URL requirement.
Process:
1. Insert URL into DB, get auto-increment ID
2. Convert ID to base62
Example: ID = 125348721
125348721 / 62 = 2021753 remainder 55 -> 't'
2021753 / 62 = 32609 remainder 15 -> 'p'
32609 / 62 = 525 remainder 59 -> '7'
525 / 62 = 8 remainder 29 -> 'D'
8 / 62 = 0 remainder 8 -> 'i'
Short URL: "iD7pt"
Pros: Simple, no collisions, deterministic. Cons: Predictable (sequential IDs), single point of failure for ID generation.
Option B: MD5/SHA256 Hash + Base62
1. Hash the long URL: MD5("https://example.com/...") = "5d41402abc4b..."
2. Take first 7 characters of base62-encoded hash
3. Check for collision; if collision, append counter and rehash
Pros: Not sequential, URL-dependent. Cons: Potential collisions, extra DB lookups.
Option C: Pre-Generated Key Service (Recommended)
+------------------+ +-------------------+
| Key Generation | -------> | Key Database |
| Service (KGS) | | (unused/used keys)|
+------------------+ +-------------------+
|
| Pre-fetch batch of keys
v
+------------------+
| Application |
| Server (in-mem) |
+------------------+
1. KGS pre-generates millions of unique 7-char base62 strings
2. Stores them in a "keys" table with status: unused/used
3. Application servers request a batch (e.g., 1,000 keys) in advance
4. On URL creation, pick next unused key from local batch
5. When batch runs low, fetch another batch from KGS
Pros: No collision, no real-time computation, horizontally scalable. Cons: Key wastage if server crashes with unused keys (acceptable).
6. Database Schema
URL Table
CREATE TABLE urls (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(7) UNIQUE NOT NULL,
original_url VARCHAR(2048) NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
click_count BIGINT DEFAULT 0,
is_active BOOLEAN DEFAULT TRUE
);
-- Indexes
CREATE INDEX idx_short_code ON urls(short_code);
CREATE INDEX idx_user_id ON urls(user_id);
CREATE INDEX idx_expires_at ON urls(expires_at) WHERE is_active = TRUE;
Analytics Table
CREATE TABLE click_events (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(7) NOT NULL,
clicked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
referrer VARCHAR(2048),
user_agent VARCHAR(512),
ip_address VARCHAR(45),
country VARCHAR(2),
device_type VARCHAR(20)
);
-- Partitioned by date for efficient querying
-- Index on short_code + clicked_at for analytics queries
Database Choice
| Option | Reason |
|---|---|
| NoSQL | Simple key-value lookups, massive read throughput |
| DynamoDB | Auto-scaling, single-digit ms latency, managed |
| Cassandra | High write throughput, linear scalability |
| PostgreSQL | If you need ACID for analytics, strong consistency |
Recommendation: Use a NoSQL store (DynamoDB or Cassandra) for the URL mapping and a separate analytics store (ClickHouse or Cassandra) for events.
7. Deep Dive: Redirection Flow
Client LB Read Service Cache DB
| | | | |
|--- GET /ab3Kx9 --->| | | |
| |--- forward ------->| | |
| | |--- GET ab3Kx9 ---->| |
| | | | |
| | |<-- cache HIT ------| |
| | | (original_url) | |
| | | | |
| |<--- 302 Redirect --| | |
|<--- 302 ----------| | | |
| Location: https://example.com/... | | |
| | | | |
| | (async) | | |
| | |--- log click ----->| Analytics |
| | | (message queue) | Service |
301 vs 302 Redirect
301 Moved Permanently:
- Browser caches the redirect
- Fewer requests to our servers
- PROBLEM: We lose analytics data on subsequent visits
302 Found (Temporary):
- Browser does NOT cache
- Every click hits our server
- We get accurate analytics
Decision: Use 302 if analytics matter, 301 if reducing load matters.
Bitly uses 301. Most analytics-focused services use 302.
8. Deep Dive: Caching Strategy
Cache Architecture:
+--------------------------------------------------+
| Cache Layer |
| |
| +----------+ +----------+ +----------+ |
| | Redis-1 | | Redis-2 | | Redis-3 | |
| | (shard) | | (shard) | | (shard) | |
| +----------+ +----------+ +----------+ |
| |
| Consistent hashing on short_code |
| TTL: 24 hours for standard, 1 hour for hot |
+--------------------------------------------------+
Eviction Policy
- LRU (Least Recently Used) eviction
- Hot URLs (>100 clicks/hour) get pinned
- 80/20 rule: 20% of URLs get 80% of traffic
- Cache hit ratio target: > 90%
Cache Warming
On startup:
1. Load top 1,000 URLs by click count from DB
2. Pre-populate cache
3. Set longer TTL for these entries
9. Deep Dive: Analytics Pipeline
Click Event --> Kafka --> Stream Processor --> ClickHouse
(Flink/Spark) (Analytics DB)
|
v
Real-time Counter
(Redis HyperLogLog
for unique visitors)
Unique Visitor Counting with HyperLogLog
PFADD short:ab3Kx9:visitors <user_fingerprint>
PFCOUNT short:ab3Kx9:visitors
-- Returns approximate unique count with < 1% error
-- Uses only 12 KB per counter regardless of cardinality
10. Scaling Considerations
Database Sharding
Strategy: Hash-based sharding on short_code
Shard key = hash(short_code) % num_shards
Shard 0: short_codes where hash % 4 == 0
Shard 1: short_codes where hash % 4 == 1
Shard 2: short_codes where hash % 4 == 2
Shard 3: short_codes where hash % 4 == 3
Each shard has its own primary + replicas.
Read Replicas
Primary DB: handles all writes (~40 writes/sec -- trivial)
Read replicas: handle redirections (~4,000 reads/sec)
Replicas per shard: 3 (for redundancy and load distribution)
Rate Limiting
- Limit URL creation: 100 URLs per hour per user
- Limit redirections: 1,000 per minute per IP (prevent abuse)
- Use token bucket algorithm at the API gateway
Geographic Distribution
+------------------+ +------------------+ +------------------+
| US-East Region | | EU-West Region | | AP-South Region |
| - App Servers | | - App Servers | | - App Servers |
| - Redis Cache | | - Redis Cache | | - Redis Cache |
| - DB Replica | | - DB Replica | | - DB Replica |
+------------------+ +------------------+ +------------------+
\ | /
\ | /
+--------- Primary DB Cluster --------------+
11. Key Tradeoffs
| Decision | Option A | Option B | Our Choice |
|---|---|---|---|
| Short code generation | Hash-based | Pre-generated keys | Pre-gen |
| Redirect type | 301 (cached) | 302 (trackable) | 302 |
| Database | SQL (consistency) | NoSQL (scale) | NoSQL |
| Analytics | Sync (accurate) | Async (fast redirect) | Async |
| Custom aliases | Same table | Separate table | Same table |
| Expiration | Lazy deletion | Active cleanup | Both |
12. Failure Scenarios and Mitigations
Scenario Mitigation
------------------------------------------------------------------
KGS goes down Local buffer of pre-fetched keys
Cache failure Fall back to DB reads (higher latency)
DB primary fails Promote read replica to primary
Hash collision Retry with appended counter
Hot URL (viral) Dedicated cache entry, CDN caching
Key Takeaways
- Read-heavy systems benefit enormously from caching (90%+ hit rate achievable).
- Pre-generated keys avoid runtime collision handling and simplify the write path.
- 302 redirects are preferred when analytics matter, despite higher server load.
- Async analytics keeps the redirect path fast (< 10ms from cache).
- The URL shortener is an excellent "warm-up" problem -- simple but covers caching, hashing, DB choice, and scaling fundamentals.