Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.f Design an E-Commerce Platform (Amazon)

Problem Statement

Design a large-scale e-commerce platform supporting product catalog, search, shopping cart, checkout, payment processing, inventory management, and order fulfillment. The system must handle flash sales and holiday traffic spikes.


1. Requirements

Functional Requirements

  • Product catalog with categories, search, and filtering
  • Shopping cart (persistent across sessions)
  • Checkout flow with address, shipping, payment
  • Order placement and tracking
  • Inventory management (real-time stock tracking)
  • Seller portal for product listing management
  • Reviews and ratings
  • Wishlist and recommendations

Non-Functional Requirements

  • Support 500 million products in catalog
  • 100 million daily active users
  • Handle 10x traffic during flash sales (e.g., Prime Day)
  • 99.99% availability for checkout and payment
  • Payment processing must be ACID-compliant
  • Search latency: < 200ms
  • Page load time: < 2 seconds

2. Capacity Estimation

Traffic

Daily active users:     100 million
Product views/day:      2 billion
Search queries/day:     500 million
Add-to-cart/day:        200 million
Orders/day:             10 million
Orders/second:          10M / 86,400 ~= 115 orders/sec
Peak orders/second:     ~5,000 (flash sales)
Search queries/second:  500M / 86,400 ~= 5,800/sec

Storage

Products:               500 million * 10 KB avg = 5 TB (metadata)
Product images:         500M * 5 images * 500 KB = 1.25 PB
Orders (per year):      3.65 billion * 2 KB = 7.3 TB
User data:              500M users * 5 KB = 2.5 TB

Bandwidth

Product page (with images): Average 2 MB
Page views/sec:             2B / 86,400 ~= 23,000/sec
Bandwidth:                  23,000 * 2 MB = 46 GB/sec (CDN-served)

3. High-Level Architecture

+----------+     +-------------------+
|  Client  |---->|   API Gateway     |
|  (Web/   |     |   + Auth + Rate   |
|   Mobile)|     |     Limiting      |
+----------+     +--------+----------+
                          |
         +----------------+------------------+
         |                |                  |
+--------v------+ +-------v-------+ +-------v--------+
| Product       | | Cart          | | Order          |
| Service       | | Service       | | Service        |
+--------+------+ +-------+-------+ +-------+--------+
         |                |                  |
+--------v------+ +-------v-------+ +-------v--------+
| Product DB    | | Cart Store    | | Order DB       |
| (PostgreSQL + | | (Redis)       | | (PostgreSQL)   |
|  Elasticsearch)| +---------------+ +----------------+
+---------------+          |
                  +--------v---------+    +------------------+
                  | Checkout         |    | Inventory        |
                  | Orchestrator     |--->| Service          |
                  +--------+---------+    +--------+---------+
                           |                       |
                  +--------v---------+    +--------v---------+
                  | Payment          |    | Inventory DB     |
                  | Service          |    | (PostgreSQL)     |
                  +--------+---------+    +------------------+
                           |
                  +--------v---------+    +------------------+
                  | Payment Gateway  |    | Notification     |
                  | (Stripe/etc)     |    | Service          |
                  +------------------+    +------------------+

+------------------+    +------------------+    +------------------+
| Search Service   |    | Recommendation   |    | Shipping/        |
| (Elasticsearch)  |    | Engine           |    | Fulfillment      |
+------------------+    +------------------+    +------------------+

4. API Design

GET /api/v1/products?category=electronics&sort=price_asc&page=1&limit=20
  Response 200: {
    "products": [
      {
        "product_id": "prod_123",
        "title": "Wireless Headphones",
        "price": 79.99,
        "currency": "USD",
        "rating": 4.5,
        "review_count": 2341,
        "image_url": "https://cdn.example.com/products/prod_123/thumb.jpg",
        "in_stock": true,
        "seller": { "seller_id": "s_42", "name": "AudioTech" }
      }
    ],
    "total": 15420,
    "page": 1,
    "total_pages": 771
  }

GET /api/v1/products/{product_id}
  Response 200: Full product details with all images, specs, reviews

GET /api/v1/search?q=wireless+headphones&filters[price_min]=20&filters[brand]=Sony
  Response 200: { "results": [...], "facets": { "brands": [...], "price_ranges": [...] } }

POST /api/v1/cart/items
  Body: { "product_id": "prod_123", "quantity": 2 }
  Response 200: Updated cart

GET /api/v1/cart
  Response 200: { "items": [...], "subtotal": 159.98, "item_count": 2 }

DELETE /api/v1/cart/items/{product_id}
  Response 200: Updated cart

POST /api/v1/orders/checkout
  Body: {
    "shipping_address_id": "addr_456",
    "payment_method_id": "pm_789",
    "shipping_method": "express"
  }
  Response 201: {
    "order_id": "ord_abc",
    "status": "confirmed",
    "total": 172.97,
    "estimated_delivery": "2026-04-14"
  }

GET /api/v1/orders/{order_id}
  Response 200: Order details with line items and tracking

5. Database Schema

Products (PostgreSQL)

CREATE TABLE products (
    product_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    seller_id       UUID NOT NULL REFERENCES sellers(seller_id),
    title           VARCHAR(500) NOT NULL,
    description     TEXT,
    category_id     UUID REFERENCES categories(category_id),
    price           DECIMAL(10, 2) NOT NULL,
    currency        VARCHAR(3) DEFAULT 'USD',
    brand           VARCHAR(200),
    specifications  JSONB,
    image_urls      TEXT[],
    rating_avg      DECIMAL(2, 1) DEFAULT 0,
    review_count    INTEGER DEFAULT 0,
    status          VARCHAR(20) DEFAULT 'active',
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_products_category ON products(category_id, price);
CREATE INDEX idx_products_seller ON products(seller_id);
CREATE INDEX idx_products_brand ON products(brand);

Inventory (PostgreSQL -- strong consistency required)

CREATE TABLE inventory (
    product_id      UUID PRIMARY KEY REFERENCES products(product_id),
    warehouse_id    UUID NOT NULL,
    quantity         INTEGER NOT NULL DEFAULT 0,
    reserved        INTEGER NOT NULL DEFAULT 0,
    available       INTEGER GENERATED ALWAYS AS (quantity - reserved) STORED,
    low_stock_threshold INTEGER DEFAULT 10,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    version         INTEGER DEFAULT 0   -- optimistic locking
);

-- Critical: available = quantity - reserved
-- Use optimistic locking to prevent overselling

Orders (PostgreSQL)

CREATE TABLE orders (
    order_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL REFERENCES users(user_id),
    status          VARCHAR(30) NOT NULL DEFAULT 'pending',
    subtotal        DECIMAL(12, 2) NOT NULL,
    shipping_cost   DECIMAL(8, 2) DEFAULT 0,
    tax             DECIMAL(8, 2) DEFAULT 0,
    total           DECIMAL(12, 2) NOT NULL,
    shipping_address JSONB NOT NULL,
    payment_method_id VARCHAR(255),
    payment_status  VARCHAR(30) DEFAULT 'pending',
    idempotency_key VARCHAR(255) UNIQUE,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    order_item_id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id        UUID NOT NULL REFERENCES orders(order_id),
    product_id      UUID NOT NULL,
    seller_id       UUID NOT NULL,
    quantity        INTEGER NOT NULL,
    unit_price      DECIMAL(10, 2) NOT NULL,
    total_price     DECIMAL(10, 2) NOT NULL
);

Cart (Redis)

Key:    cart:{user_id}
Type:   Hash
Fields: {product_id}: {quantity, added_at, price_at_add}
TTL:    30 days

Example:
  HSET cart:u_42 prod_123 '{"qty":2,"price":79.99,"added":"2026-04-11"}'
  HGETALL cart:u_42

6. Deep Dive: Checkout and Order Processing

SAGA Pattern for Distributed Transactions

Checkout is a multi-step process across multiple services.
We use the SAGA pattern (choreography-based) instead of 2PC.

Step 1: Reserve Inventory
Step 2: Process Payment
Step 3: Create Order
Step 4: Notify User

If any step fails, execute compensating actions:

Happy Path:
  Reserve Inventory -> Process Payment -> Create Order -> Notify
       OK                  OK                OK            OK

Failure at Payment:
  Reserve Inventory -> Process Payment -> COMPENSATE: Release Inventory
       OK                 FAIL
Orchestrator Flow:

Checkout          Inventory        Payment          Order           Notification
Orchestrator      Service          Service          Service         Service
    |                |                |                |                |
    |-- Reserve ---->|                |                |                |
    |<-- Reserved ---|                |                |                |
    |                |                |                |                |
    |-- Charge ---------------------->|                |                |
    |<-- Payment OK ------------------|                |                |
    |                |                |                |                |
    |-- Create Order --------------------------------->|                |
    |<-- Order Created --------------------------------|                |
    |                |                |                |                |
    |-- Confirm Inventory -->|        |                |                |
    |<-- Confirmed ---------|        |                |                |
    |                |                |                |                |
    |-- Send Notification ------------------------------------------>|
    |<-- Sent ----------------------------------------------------------

Preventing Overselling (Optimistic Locking)

-- Reserve inventory for checkout
UPDATE inventory
SET reserved = reserved + :quantity,
    version = version + 1,
    updated_at = NOW()
WHERE product_id = :product_id
  AND version = :current_version
  AND (quantity - reserved) >= :quantity;

-- If 0 rows affected: either version conflict or insufficient stock
-- Retry with fresh version or return "out of stock"

Preventing Double Orders (Idempotency)

Client generates idempotency_key before checkout.
Server checks: if order with this key exists, return existing order.

POST /api/v1/orders/checkout
  Headers: Idempotency-Key: "uuid-unique-per-attempt"

def checkout(request):
    existing = orders.find_by_idempotency_key(request.idempotency_key)
    if existing:
        return existing  # Idempotent: return same result
    
    # ... proceed with new order

7. Deep Dive: Product Search

Elasticsearch Architecture

+-------------------+
| Product DB        |  CDC (Change Data Capture)
| (PostgreSQL)      |------+
+-------------------+      |
                           v
                    +------+--------+
                    | Kafka         |
                    | (product      |
                    |  changes)     |
                    +------+--------+
                           |
                    +------v--------+
                    | ES Indexer    |
                    | Service       |
                    +------+--------+
                           |
              +------------+------------+
              |            |            |
     +--------v---+ +-----v------+ +---v----------+
     | ES Node 1  | | ES Node 2  | | ES Node 3    |
     | (Primary)  | | (Primary)  | | (Primary)     |
     | Shard 0,3  | | Shard 1,4  | | Shard 2,5    |
     +------------+ +------------+ +--------------+

Search Index Mapping

{
  "mappings": {
    "properties": {
      "title":        { "type": "text", "analyzer": "standard" },
      "description":  { "type": "text" },
      "category":     { "type": "keyword" },
      "brand":        { "type": "keyword" },
      "price":        { "type": "float" },
      "rating":       { "type": "float" },
      "in_stock":     { "type": "boolean" },
      "tags":         { "type": "keyword" },
      "seller_id":    { "type": "keyword" },
      "created_at":   { "type": "date" }
    }
  }
}

Search Query with Facets

{
  "query": {
    "bool": {
      "must": [
        { "multi_match": {
            "query": "wireless headphones",
            "fields": ["title^3", "description", "tags"]
        }}
      ],
      "filter": [
        { "term": { "in_stock": true } },
        { "range": { "price": { "gte": 20, "lte": 200 } } }
      ]
    }
  },
  "aggs": {
    "brands": { "terms": { "field": "brand", "size": 20 } },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 25 }, { "from": 25, "to": 50 },
          { "from": 50, "to": 100 }, { "from": 100 }
        ]
      }
    }
  },
  "sort": [
    { "_score": "desc" },
    { "rating": "desc" }
  ]
}

8. Deep Dive: Handling Flash Sales

Problem: 10x normal traffic in seconds. 1 million users trying to buy 1,000 units.

Architecture for Flash Sales:

                    +-------------------+
                    | CDN (static pages)|
                    +--------+----------+
                             |
                    +--------v----------+
                    | Rate Limiter      |
                    | (Token Bucket)    |
                    +--------+----------+
                             |
                    +--------v----------+
                    | Queuing Layer     |
                    | (Virtual Queue)   |
                    +--------+----------+
                             |
                    +--------v----------+
                    | Order Processing  |
                    | (limited workers) |
                    +--------+----------+
                             |
                    +--------v----------+
                    | Inventory Lock    |
                    | (Redis Lua)       |
                    +-------------------+

Virtual Queue System

1. User clicks "Buy Now" -> enters virtual queue
2. Queue position shown: "You are #4,523 in line"
3. Batches of 100 users released every second
4. Released users have 5-minute window to complete checkout
5. If they don't complete, slot released to next user

Redis implementation:
  ZADD flash_queue:{sale_id} {timestamp} {user_id}
  ZRANK flash_queue:{sale_id} {user_id}  -> position

Redis-Based Inventory Lock for Flash Sales

-- Atomic decrement with check (Lua script)
local stock = tonumber(redis.call('GET', KEYS[1]))
if stock and stock > 0 then
    redis.call('DECR', KEYS[1])
    return 1  -- success
else
    return 0  -- sold out
end

Pre-warming for Flash Sales

1. Pre-scale services 1 hour before sale
2. Pre-load product details into all cache layers
3. Pre-generate static product pages on CDN
4. Set up dedicated database connection pools
5. Enable circuit breakers on non-critical services
6. Disable non-essential features (recommendations, reviews)

9. Scaling Considerations

Service Isolation

Critical path (must not fail):
  Cart -> Checkout -> Inventory -> Payment -> Order

Non-critical (graceful degradation):
  Recommendations, Reviews, Wishlist, Analytics

Strategy: Circuit breakers isolate non-critical service failures
from affecting the checkout flow.

Database Strategy

Product Catalog:  PostgreSQL (read replicas) + Elasticsearch (search)
Inventory:        PostgreSQL (strong consistency, optimistic locking)
Orders:           PostgreSQL (ACID required) sharded by user_id
Cart:             Redis (speed, TTL-based expiration)
Sessions:         Redis (fast access, TTL)
Analytics:        ClickHouse (columnar, aggregations)

Caching Layers

Layer 1: CDN (product images, static pages)
Layer 2: Application cache (Redis -- product details, prices)
Layer 3: Database query cache (PostgreSQL shared buffers)

Cache invalidation:
  - Price changes: immediate cache bust via pub/sub
  - Product updates: CDC -> Kafka -> cache invalidation service
  - Inventory: never cached (must be real-time for checkout)

10. Key Tradeoffs

DecisionOption AOption BOur Choice
Transaction model2-Phase CommitSAGA patternSAGA
Inventory lockPessimistic (DB lock)Optimistic (version)Optimistic
Cart storageDatabaseRedisRedis
SearchDB full-text searchElasticsearchElasticsearch
Flash sale queueFirst-come-first-servedVirtual queueVirtual queue
Price consistencyReal-time everywhereEventual (cache TTL)Eventual (5s)
Order ID generationAuto-incrementUUID/SnowflakeSnowflake

11. Failure Scenarios and Mitigations

Scenario                          Mitigation
------------------------------------------------------------------------
Payment succeeds, order DB fails  SAGA compensating: refund payment
Inventory reserved, user abandons TTL on reservations (15 min auto-release)
Redis cart data loss              Periodic backup to DB; rebuild from DB
Search index out of sync          CDC pipeline ensures eventual consistency
Flash sale overselling            Redis atomic decrement; DB-level constraint
Payment gateway timeout           Retry with idempotency key; check status
Double charge                     Idempotency key on payment request

Key Takeaways

  1. SAGA pattern is the practical choice for distributed transactions in microservices -- 2PC is too slow and fragile at scale.
  2. Optimistic locking with version numbers prevents overselling while maintaining high throughput.
  3. Flash sales require special architecture -- virtual queues, pre-warming, and Redis-based inventory are essential.
  4. Search must be decoupled from the primary database -- Elasticsearch with CDC keeps search fast without impacting transactions.
  5. Cart in Redis provides sub-millisecond access and natural TTL expiration, but back it up to durable storage periodically.