Episode 9 — System Design / 9.11 — Real World System Design Problems
9.11.f Design an E-Commerce Platform (Amazon)
Problem Statement
Design a large-scale e-commerce platform supporting product catalog, search, shopping cart, checkout, payment processing, inventory management, and order fulfillment. The system must handle flash sales and holiday traffic spikes.
1. Requirements
Functional Requirements
- Product catalog with categories, search, and filtering
- Shopping cart (persistent across sessions)
- Checkout flow with address, shipping, payment
- Order placement and tracking
- Inventory management (real-time stock tracking)
- Seller portal for product listing management
- Reviews and ratings
- Wishlist and recommendations
Non-Functional Requirements
- Support 500 million products in catalog
- 100 million daily active users
- Handle 10x traffic during flash sales (e.g., Prime Day)
- 99.99% availability for checkout and payment
- Payment processing must be ACID-compliant
- Search latency: < 200ms
- Page load time: < 2 seconds
2. Capacity Estimation
Traffic
Daily active users: 100 million
Product views/day: 2 billion
Search queries/day: 500 million
Add-to-cart/day: 200 million
Orders/day: 10 million
Orders/second: 10M / 86,400 ~= 115 orders/sec
Peak orders/second: ~5,000 (flash sales)
Search queries/second: 500M / 86,400 ~= 5,800/sec
Storage
Products: 500 million * 10 KB avg = 5 TB (metadata)
Product images: 500M * 5 images * 500 KB = 1.25 PB
Orders (per year): 3.65 billion * 2 KB = 7.3 TB
User data: 500M users * 5 KB = 2.5 TB
Bandwidth
Product page (with images): Average 2 MB
Page views/sec: 2B / 86,400 ~= 23,000/sec
Bandwidth: 23,000 * 2 MB = 46 GB/sec (CDN-served)
3. High-Level Architecture
+----------+ +-------------------+
| Client |---->| API Gateway |
| (Web/ | | + Auth + Rate |
| Mobile)| | Limiting |
+----------+ +--------+----------+
|
+----------------+------------------+
| | |
+--------v------+ +-------v-------+ +-------v--------+
| Product | | Cart | | Order |
| Service | | Service | | Service |
+--------+------+ +-------+-------+ +-------+--------+
| | |
+--------v------+ +-------v-------+ +-------v--------+
| Product DB | | Cart Store | | Order DB |
| (PostgreSQL + | | (Redis) | | (PostgreSQL) |
| Elasticsearch)| +---------------+ +----------------+
+---------------+ |
+--------v---------+ +------------------+
| Checkout | | Inventory |
| Orchestrator |--->| Service |
+--------+---------+ +--------+---------+
| |
+--------v---------+ +--------v---------+
| Payment | | Inventory DB |
| Service | | (PostgreSQL) |
+--------+---------+ +------------------+
|
+--------v---------+ +------------------+
| Payment Gateway | | Notification |
| (Stripe/etc) | | Service |
+------------------+ +------------------+
+------------------+ +------------------+ +------------------+
| Search Service | | Recommendation | | Shipping/ |
| (Elasticsearch) | | Engine | | Fulfillment |
+------------------+ +------------------+ +------------------+
4. API Design
GET /api/v1/products?category=electronics&sort=price_asc&page=1&limit=20
Response 200: {
"products": [
{
"product_id": "prod_123",
"title": "Wireless Headphones",
"price": 79.99,
"currency": "USD",
"rating": 4.5,
"review_count": 2341,
"image_url": "https://cdn.example.com/products/prod_123/thumb.jpg",
"in_stock": true,
"seller": { "seller_id": "s_42", "name": "AudioTech" }
}
],
"total": 15420,
"page": 1,
"total_pages": 771
}
GET /api/v1/products/{product_id}
Response 200: Full product details with all images, specs, reviews
GET /api/v1/search?q=wireless+headphones&filters[price_min]=20&filters[brand]=Sony
Response 200: { "results": [...], "facets": { "brands": [...], "price_ranges": [...] } }
POST /api/v1/cart/items
Body: { "product_id": "prod_123", "quantity": 2 }
Response 200: Updated cart
GET /api/v1/cart
Response 200: { "items": [...], "subtotal": 159.98, "item_count": 2 }
DELETE /api/v1/cart/items/{product_id}
Response 200: Updated cart
POST /api/v1/orders/checkout
Body: {
"shipping_address_id": "addr_456",
"payment_method_id": "pm_789",
"shipping_method": "express"
}
Response 201: {
"order_id": "ord_abc",
"status": "confirmed",
"total": 172.97,
"estimated_delivery": "2026-04-14"
}
GET /api/v1/orders/{order_id}
Response 200: Order details with line items and tracking
5. Database Schema
Products (PostgreSQL)
CREATE TABLE products (
product_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
seller_id UUID NOT NULL REFERENCES sellers(seller_id),
title VARCHAR(500) NOT NULL,
description TEXT,
category_id UUID REFERENCES categories(category_id),
price DECIMAL(10, 2) NOT NULL,
currency VARCHAR(3) DEFAULT 'USD',
brand VARCHAR(200),
specifications JSONB,
image_urls TEXT[],
rating_avg DECIMAL(2, 1) DEFAULT 0,
review_count INTEGER DEFAULT 0,
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_products_category ON products(category_id, price);
CREATE INDEX idx_products_seller ON products(seller_id);
CREATE INDEX idx_products_brand ON products(brand);
Inventory (PostgreSQL -- strong consistency required)
CREATE TABLE inventory (
product_id UUID PRIMARY KEY REFERENCES products(product_id),
warehouse_id UUID NOT NULL,
quantity INTEGER NOT NULL DEFAULT 0,
reserved INTEGER NOT NULL DEFAULT 0,
available INTEGER GENERATED ALWAYS AS (quantity - reserved) STORED,
low_stock_threshold INTEGER DEFAULT 10,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
version INTEGER DEFAULT 0 -- optimistic locking
);
-- Critical: available = quantity - reserved
-- Use optimistic locking to prevent overselling
Orders (PostgreSQL)
CREATE TABLE orders (
order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(user_id),
status VARCHAR(30) NOT NULL DEFAULT 'pending',
subtotal DECIMAL(12, 2) NOT NULL,
shipping_cost DECIMAL(8, 2) DEFAULT 0,
tax DECIMAL(8, 2) DEFAULT 0,
total DECIMAL(12, 2) NOT NULL,
shipping_address JSONB NOT NULL,
payment_method_id VARCHAR(255),
payment_status VARCHAR(30) DEFAULT 'pending',
idempotency_key VARCHAR(255) UNIQUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
order_item_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
order_id UUID NOT NULL REFERENCES orders(order_id),
product_id UUID NOT NULL,
seller_id UUID NOT NULL,
quantity INTEGER NOT NULL,
unit_price DECIMAL(10, 2) NOT NULL,
total_price DECIMAL(10, 2) NOT NULL
);
Cart (Redis)
Key: cart:{user_id}
Type: Hash
Fields: {product_id}: {quantity, added_at, price_at_add}
TTL: 30 days
Example:
HSET cart:u_42 prod_123 '{"qty":2,"price":79.99,"added":"2026-04-11"}'
HGETALL cart:u_42
6. Deep Dive: Checkout and Order Processing
SAGA Pattern for Distributed Transactions
Checkout is a multi-step process across multiple services.
We use the SAGA pattern (choreography-based) instead of 2PC.
Step 1: Reserve Inventory
Step 2: Process Payment
Step 3: Create Order
Step 4: Notify User
If any step fails, execute compensating actions:
Happy Path:
Reserve Inventory -> Process Payment -> Create Order -> Notify
OK OK OK OK
Failure at Payment:
Reserve Inventory -> Process Payment -> COMPENSATE: Release Inventory
OK FAIL
Orchestrator Flow:
Checkout Inventory Payment Order Notification
Orchestrator Service Service Service Service
| | | | |
|-- Reserve ---->| | | |
|<-- Reserved ---| | | |
| | | | |
|-- Charge ---------------------->| | |
|<-- Payment OK ------------------| | |
| | | | |
|-- Create Order --------------------------------->| |
|<-- Order Created --------------------------------| |
| | | | |
|-- Confirm Inventory -->| | | |
|<-- Confirmed ---------| | | |
| | | | |
|-- Send Notification ------------------------------------------>|
|<-- Sent ----------------------------------------------------------
Preventing Overselling (Optimistic Locking)
-- Reserve inventory for checkout
UPDATE inventory
SET reserved = reserved + :quantity,
version = version + 1,
updated_at = NOW()
WHERE product_id = :product_id
AND version = :current_version
AND (quantity - reserved) >= :quantity;
-- If 0 rows affected: either version conflict or insufficient stock
-- Retry with fresh version or return "out of stock"
Preventing Double Orders (Idempotency)
Client generates idempotency_key before checkout.
Server checks: if order with this key exists, return existing order.
POST /api/v1/orders/checkout
Headers: Idempotency-Key: "uuid-unique-per-attempt"
def checkout(request):
existing = orders.find_by_idempotency_key(request.idempotency_key)
if existing:
return existing # Idempotent: return same result
# ... proceed with new order
7. Deep Dive: Product Search
Elasticsearch Architecture
+-------------------+
| Product DB | CDC (Change Data Capture)
| (PostgreSQL) |------+
+-------------------+ |
v
+------+--------+
| Kafka |
| (product |
| changes) |
+------+--------+
|
+------v--------+
| ES Indexer |
| Service |
+------+--------+
|
+------------+------------+
| | |
+--------v---+ +-----v------+ +---v----------+
| ES Node 1 | | ES Node 2 | | ES Node 3 |
| (Primary) | | (Primary) | | (Primary) |
| Shard 0,3 | | Shard 1,4 | | Shard 2,5 |
+------------+ +------------+ +--------------+
Search Index Mapping
{
"mappings": {
"properties": {
"title": { "type": "text", "analyzer": "standard" },
"description": { "type": "text" },
"category": { "type": "keyword" },
"brand": { "type": "keyword" },
"price": { "type": "float" },
"rating": { "type": "float" },
"in_stock": { "type": "boolean" },
"tags": { "type": "keyword" },
"seller_id": { "type": "keyword" },
"created_at": { "type": "date" }
}
}
}
Search Query with Facets
{
"query": {
"bool": {
"must": [
{ "multi_match": {
"query": "wireless headphones",
"fields": ["title^3", "description", "tags"]
}}
],
"filter": [
{ "term": { "in_stock": true } },
{ "range": { "price": { "gte": 20, "lte": 200 } } }
]
}
},
"aggs": {
"brands": { "terms": { "field": "brand", "size": 20 } },
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 25 }, { "from": 25, "to": 50 },
{ "from": 50, "to": 100 }, { "from": 100 }
]
}
}
},
"sort": [
{ "_score": "desc" },
{ "rating": "desc" }
]
}
8. Deep Dive: Handling Flash Sales
Problem: 10x normal traffic in seconds. 1 million users trying to buy 1,000 units.
Architecture for Flash Sales:
+-------------------+
| CDN (static pages)|
+--------+----------+
|
+--------v----------+
| Rate Limiter |
| (Token Bucket) |
+--------+----------+
|
+--------v----------+
| Queuing Layer |
| (Virtual Queue) |
+--------+----------+
|
+--------v----------+
| Order Processing |
| (limited workers) |
+--------+----------+
|
+--------v----------+
| Inventory Lock |
| (Redis Lua) |
+-------------------+
Virtual Queue System
1. User clicks "Buy Now" -> enters virtual queue
2. Queue position shown: "You are #4,523 in line"
3. Batches of 100 users released every second
4. Released users have 5-minute window to complete checkout
5. If they don't complete, slot released to next user
Redis implementation:
ZADD flash_queue:{sale_id} {timestamp} {user_id}
ZRANK flash_queue:{sale_id} {user_id} -> position
Redis-Based Inventory Lock for Flash Sales
-- Atomic decrement with check (Lua script)
local stock = tonumber(redis.call('GET', KEYS[1]))
if stock and stock > 0 then
redis.call('DECR', KEYS[1])
return 1 -- success
else
return 0 -- sold out
end
Pre-warming for Flash Sales
1. Pre-scale services 1 hour before sale
2. Pre-load product details into all cache layers
3. Pre-generate static product pages on CDN
4. Set up dedicated database connection pools
5. Enable circuit breakers on non-critical services
6. Disable non-essential features (recommendations, reviews)
9. Scaling Considerations
Service Isolation
Critical path (must not fail):
Cart -> Checkout -> Inventory -> Payment -> Order
Non-critical (graceful degradation):
Recommendations, Reviews, Wishlist, Analytics
Strategy: Circuit breakers isolate non-critical service failures
from affecting the checkout flow.
Database Strategy
Product Catalog: PostgreSQL (read replicas) + Elasticsearch (search)
Inventory: PostgreSQL (strong consistency, optimistic locking)
Orders: PostgreSQL (ACID required) sharded by user_id
Cart: Redis (speed, TTL-based expiration)
Sessions: Redis (fast access, TTL)
Analytics: ClickHouse (columnar, aggregations)
Caching Layers
Layer 1: CDN (product images, static pages)
Layer 2: Application cache (Redis -- product details, prices)
Layer 3: Database query cache (PostgreSQL shared buffers)
Cache invalidation:
- Price changes: immediate cache bust via pub/sub
- Product updates: CDC -> Kafka -> cache invalidation service
- Inventory: never cached (must be real-time for checkout)
10. Key Tradeoffs
| Decision | Option A | Option B | Our Choice |
|---|---|---|---|
| Transaction model | 2-Phase Commit | SAGA pattern | SAGA |
| Inventory lock | Pessimistic (DB lock) | Optimistic (version) | Optimistic |
| Cart storage | Database | Redis | Redis |
| Search | DB full-text search | Elasticsearch | Elasticsearch |
| Flash sale queue | First-come-first-served | Virtual queue | Virtual queue |
| Price consistency | Real-time everywhere | Eventual (cache TTL) | Eventual (5s) |
| Order ID generation | Auto-increment | UUID/Snowflake | Snowflake |
11. Failure Scenarios and Mitigations
Scenario Mitigation
------------------------------------------------------------------------
Payment succeeds, order DB fails SAGA compensating: refund payment
Inventory reserved, user abandons TTL on reservations (15 min auto-release)
Redis cart data loss Periodic backup to DB; rebuild from DB
Search index out of sync CDC pipeline ensures eventual consistency
Flash sale overselling Redis atomic decrement; DB-level constraint
Payment gateway timeout Retry with idempotency key; check status
Double charge Idempotency key on payment request
Key Takeaways
- SAGA pattern is the practical choice for distributed transactions in microservices -- 2PC is too slow and fragile at scale.
- Optimistic locking with version numbers prevents overselling while maintaining high throughput.
- Flash sales require special architecture -- virtual queues, pre-warming, and Redis-based inventory are essential.
- Search must be decoupled from the primary database -- Elasticsearch with CDC keeps search fast without impacting transactions.
- Cart in Redis provides sub-millisecond access and natural TTL expiration, but back it up to durable storage periodically.