Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.j Design a Ride-Sharing Platform (Uber)

Problem Statement

Design a ride-sharing platform like Uber that connects riders with nearby drivers. The system must handle real-time location tracking, driver-rider matching, ETA calculation, surge pricing, and trip management.


1. Requirements

Functional Requirements

  • Riders request rides by specifying pickup and dropoff locations
  • System matches riders with nearby available drivers
  • Real-time location tracking of drivers
  • ETA calculation for arrival and trip duration
  • Fare estimation before ride confirmation
  • Surge pricing during high demand periods
  • Trip management (start, in-progress, complete, cancel)
  • Payment processing at trip completion
  • Rating system for riders and drivers

Non-Functional Requirements

  • Match rider to driver within 10 seconds
  • Location updates every 3-5 seconds from active drivers
  • Support 20 million daily active riders
  • Support 5 million active drivers
  • 99.99% availability in active markets
  • Low latency for location queries (< 100ms)

2. Capacity Estimation

Traffic

Daily rides:            15 million
Rides per second:       15M / 86,400 ~= 175 rides/sec
Peak rides/sec:         ~1,000 rides/sec

Active drivers:         5 million sending location every 4 seconds
Location updates/sec:   5M / 4 = 1,250,000 updates/sec

Storage

Location update:        ~100 bytes (driver_id, lat, lng, timestamp, heading, speed)
Daily location data:    1.25M/sec * 86,400 * 100 bytes = 10.8 TB/day
Trip records:           15M/day * 1 KB = 15 GB/day
Trip records (yearly):  5.5 TB

Historical location storage (for analytics): retain 90 days = ~970 TB

Geospatial Query Volume

Nearby driver queries:   ~500,000/sec (ride requests + app opens)
Each query: find drivers within 5 km radius

3. High-Level Architecture

+----------+      +-------------------+
| Rider    |----->| API Gateway       |
| App      |      | + Load Balancer   |
+----------+      +--------+----------+
                           |
+----------+    +----------+----------+----------+
| Driver   |    |          |          |          |
| App      |    v          v          v          v
+----+-----+ +--+---+ +---+----+ +--+-----+ +--+-------+
     |       | Ride  | | Trip   | | Pricing| | Payment  |
     |       | Match | | Service| | Service| | Service  |
     |       +--+----+ +---+----+ +--------+ +----------+
     |          |          |
     |       +--v----------v--+    +------------------+
     |       | Geospatial     |    | Driver Location  |
     |       | Index Service  |    | Service          |
     |       +-------+--------+    +--------+---------+
     |               |                      |
     |       +-------v--------+    +--------v---------+
     |       | Geospatial DB  |    | Location Store   |
     |       | (Redis/custom) |    | (Redis + Kafka)  |
     |       +----------------+    +------------------+
     |
     +--- Location updates every 4s via WebSocket/UDP ---+
                                                         |
                                               +---------v---------+
                                               | Location Ingestion|
                                               | Service           |
                                               +-------------------+

+------------------+    +------------------+    +------------------+
| Notification     |    | Analytics        |    | ETA Service      |
| Service          |    | (Trip data, maps)|    | (ML-based)       |
+------------------+    +------------------+    +------------------+

4. API Design

POST /api/v1/rides/estimate
  Headers: Authorization: Bearer <rider_token>
  Body: {
    "pickup": { "lat": 37.7749, "lng": -122.4194 },
    "dropoff": { "lat": 37.7849, "lng": -122.4094 },
    "ride_type": "standard"    // standard, premium, pool
  }
  Response 200: {
    "fare_estimate": { "min": 12.50, "max": 16.00, "currency": "USD" },
    "surge_multiplier": 1.2,
    "eta_minutes": 4,
    "distance_km": 3.2,
    "duration_minutes": 12
  }

POST /api/v1/rides/request
  Body: {
    "pickup": { "lat": 37.7749, "lng": -122.4194, "address": "123 Market St" },
    "dropoff": { "lat": 37.7849, "lng": -122.4094, "address": "456 Mission St" },
    "ride_type": "standard",
    "payment_method_id": "pm_789"
  }
  Response 202: {
    "ride_id": "ride_abc",
    "status": "matching",
    "estimated_wait": 180
  }

GET /api/v1/rides/{ride_id}
  Response 200: {
    "ride_id": "ride_abc",
    "status": "in_progress",     // matching|driver_assigned|arriving|in_progress|completed|cancelled
    "driver": {
      "driver_id": "drv_42",
      "name": "John D.",
      "rating": 4.9,
      "vehicle": { "make": "Toyota", "model": "Camry", "plate": "ABC 1234" },
      "location": { "lat": 37.7755, "lng": -122.4180 }
    },
    "pickup": { ... },
    "dropoff": { ... },
    "eta_arrival": 3,
    "fare_estimate": { "min": 12.50, "max": 16.00 }
  }

POST /api/v1/drivers/location
  Headers: Authorization: Bearer <driver_token>
  Body: {
    "lat": 37.7752,
    "lng": -122.4185,
    "heading": 45,
    "speed": 30,
    "timestamp": 1681200000000
  }
  Response 200: { "ack": true }

POST /api/v1/rides/{ride_id}/complete
  Headers: Authorization: Bearer <driver_token>
  Response 200: {
    "ride_id": "ride_abc",
    "fare": 14.20,
    "distance_km": 3.2,
    "duration_minutes": 14,
    "surge_multiplier": 1.2
  }

POST /api/v1/rides/{ride_id}/rate
  Body: { "rating": 5, "comment": "Great driver!" }
  Response 200: { "rated": true }

5. Database Schema

Drivers Table (PostgreSQL)

CREATE TABLE drivers (
    driver_id       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL REFERENCES users(user_id),
    license_number  VARCHAR(50) UNIQUE NOT NULL,
    vehicle_make    VARCHAR(100),
    vehicle_model   VARCHAR(100),
    vehicle_year    INTEGER,
    license_plate   VARCHAR(20),
    vehicle_type    VARCHAR(20),       -- 'standard', 'premium', 'xl'
    rating_avg      DECIMAL(2,1) DEFAULT 5.0,
    total_trips     INTEGER DEFAULT 0,
    status          VARCHAR(20) DEFAULT 'offline',  -- online, offline, on_trip
    city_id         UUID,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Trips Table (PostgreSQL)

CREATE TABLE trips (
    trip_id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    rider_id          UUID NOT NULL REFERENCES users(user_id),
    driver_id         UUID REFERENCES drivers(driver_id),
    status            VARCHAR(20) NOT NULL DEFAULT 'requested',
    ride_type         VARCHAR(20) NOT NULL,
    pickup_lat        DOUBLE PRECISION NOT NULL,
    pickup_lng        DOUBLE PRECISION NOT NULL,
    pickup_address    VARCHAR(500),
    dropoff_lat       DOUBLE PRECISION NOT NULL,
    dropoff_lng       DOUBLE PRECISION NOT NULL,
    dropoff_address   VARCHAR(500),
    fare_amount       DECIMAL(10, 2),
    surge_multiplier  DECIMAL(3, 2) DEFAULT 1.00,
    distance_km       DECIMAL(8, 2),
    duration_minutes  INTEGER,
    payment_method_id VARCHAR(255),
    payment_status    VARCHAR(20) DEFAULT 'pending',
    requested_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    accepted_at       TIMESTAMP,
    pickup_at         TIMESTAMP,
    dropoff_at        TIMESTAMP,
    cancelled_at      TIMESTAMP,
    cancel_reason     VARCHAR(200)
);

CREATE INDEX idx_trips_rider ON trips(rider_id, requested_at DESC);
CREATE INDEX idx_trips_driver ON trips(driver_id, requested_at DESC);
CREATE INDEX idx_trips_status ON trips(status) WHERE status IN ('requested', 'accepted');

Driver Location (Redis)

Key:    driver_location:{driver_id}
Value:  {
    "lat": 37.7752,
    "lng": -122.4185,
    "heading": 45,
    "speed": 30,
    "status": "available",
    "vehicle_type": "standard",
    "timestamp": 1681200000000
}
TTL:    30 seconds (auto-expire stale locations)

Geospatial Index (Redis GEO):
  GEOADD drivers:available {lng} {lat} {driver_id}
  GEORADIUS drivers:available {lng} {lat} 5 km COUNT 20 ASC

6. Deep Dive: Geospatial Indexing

Approach 1: Geohash Grid

Geohash divides the world into a grid of cells.

Precision levels:
  Precision 4: ~39 km x 19 km cells
  Precision 5: ~5 km x 5 km cells   <-- good for nearby search
  Precision 6: ~1.2 km x 0.6 km cells
  Precision 7: ~150 m x 150 m cells

Location (37.7749, -122.4194) -> Geohash: "9q8yyk"

Nearby search: query the target cell + 8 neighboring cells

  +--------+--------+--------+
  | 9q8yym | 9q8yyn | 9q8yyp |
  +--------+--------+--------+
  | 9q8yy5 | 9q8yyk | 9q8yys |  <-- target cell
  +--------+--------+--------+
  | 9q8yy4 | 9q8yyh | 9q8yyj |
  +--------+--------+--------+

Redis implementation:
  SET drivers:cell:9q8yyk -> { driver_1, driver_5, driver_12 }
  SET drivers:cell:9q8yym -> { driver_3, driver_8 }
  ...

  Find nearby: SUNION across target cell + 8 neighbors
  Then filter by exact distance and availability.

Approach 2: Quadtree (Dynamic Grid)

Quadtree dynamically subdivides space based on driver density.

Dense area (Manhattan):
  +---+---+
  |1,2|3,4|  Split cells with > 10 drivers
  +---+---+
  | 5 |6,7|  Deeper splits in dense areas
  +---+---+

Rural area:
  +-----------+
  |     1     |  Single large cell (few drivers)
  +-----------+

Benefits:
  - Adapts to driver density
  - Dense areas: small cells for precise matching
  - Rural areas: large cells for broader search
  - In-memory tree, fast lookup O(log N)

Our Choice: Redis GEO (Recommended for Simplicity)

Redis GEO uses a sorted set with geohash-encoded scores.

Commands:
  GEOADD drivers:available -122.4194 37.7749 "driver_42"
  GEOADD drivers:available -122.4180 37.7755 "driver_13"
  
  GEORADIUS drivers:available -122.4194 37.7749 5 km
    COUNT 20 ASC WITHCOORD WITHDIST
  
  Returns:
    1) "driver_13" - 0.15 km - (37.7755, -122.4180)
    2) "driver_42" - 0.00 km - (37.7749, -122.4194)
    ...

Performance: O(N+log(M)) where N = elements in radius, M = total elements
With Redis Cluster: shard by city/region.

7. Deep Dive: Driver-Rider Matching

Matching Algorithm

Rider requests ride at (37.7749, -122.4194):

Step 1: Find nearby available drivers
  GEORADIUS drivers:available ... 5 km COUNT 50 ASC

Step 2: Score each candidate driver
  score = w1 * (1 / distance)           // closer is better
        + w2 * driver_rating             // higher rated is better
        + w3 * (1 / eta_to_pickup)       // lower ETA is better
        + w4 * acceptance_rate           // reliable drivers preferred
        + w5 * vehicle_match             // matches ride type
  
  Weights: w1=0.3, w2=0.15, w3=0.3, w4=0.15, w5=0.1

Step 3: Offer ride to top-scored driver
  - Send push notification
  - Driver has 15 seconds to accept
  - If declined or timeout: offer to next driver
  - Maximum 3 attempts before telling rider "no drivers available"

Step 4: Driver accepts
  - Update driver status to "on_trip"
  - Remove from geospatial index
  - Create trip record
  - Notify rider with driver details and ETA

Matching Flow

Rider                  Matching Service          Driver
  |                         |                      |
  |-- Request ride -------->|                      |
  |                         |-- Find nearby ------>|
  |                         |   drivers (Redis)    |
  |                         |                      |
  |                         |-- Score & rank ----->|
  |                         |                      |
  |                         |-- Offer to driver #1 -->|
  |                         |                         |
  |                         |        (15s timeout)    |
  |                         |                         |
  |                         |<-- ACCEPT --------------|
  |                         |                         |
  |<-- Driver assigned -----|                         |
  |    (driver info, ETA)   |                         |
  |                         |-- Update trip status -->|
  |                         |                         |

Handling Concurrent Ride Requests

Problem: Two riders near each other both get matched to the same driver.

Solution: Optimistic locking with atomic status update.

  -- Try to reserve driver (atomic operation)
  result = redis.eval("""
    local status = redis.call('HGET', 'driver:'..KEYS[1], 'status')
    if status == 'available' then
      redis.call('HSET', 'driver:'..KEYS[1], 'status', 'reserved')
      redis.call('ZREM', 'drivers:available', KEYS[1])
      return 1
    end
    return 0
  """, driver_id)
  
  if result == 0:  -- Driver already taken
      try next driver in ranked list

8. Deep Dive: ETA Calculation

ETA Components

Total ETA = Driver-to-pickup ETA + Pickup-to-dropoff ETA

Factors:
  - Distance (road distance, not straight line)
  - Current traffic conditions
  - Time of day
  - Road type (highway vs city streets)
  - Historical trip data for the route

ETA Architecture

                    +-------------------+
                    | ETA Service       |
                    +--------+----------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v------+ +----v-------+ +---v----------+
     | Road Graph    | | Traffic    | | ML Model     |
     | (OSRM/Google  | | Data       | | (prediction) |
     |  Maps API)    | | (real-time)| |              |
     +---------------+ +------------+ +--------------+

Simplified ETA Calculation

1. Query routing engine for route distance and base duration
   OSRM: /route/v1/driving/{lng1},{lat1};{lng2},{lat2}
   Returns: distance = 5.2 km, base_duration = 12 min

2. Apply traffic multiplier
   traffic_multiplier = get_traffic_factor(route_segments, current_time)
   adjusted_duration = base_duration * traffic_multiplier
   
   traffic_multiplier ranges:
     1.0 = free flow
     1.5 = moderate traffic
     2.5 = heavy traffic
     4.0 = gridlock

3. Apply ML correction
   ml_correction = model.predict(route, time_of_day, day_of_week, weather)
   final_eta = adjusted_duration * ml_correction

4. Add pickup time buffer (2 minutes)
   total_eta = final_eta + 2 min

9. Deep Dive: Surge Pricing

Surge Pricing Algorithm

Input: Supply and demand in a geographic zone

Demand:  Number of ride requests in zone in last 5 minutes
Supply:  Number of available drivers in zone

Ratio = Demand / Supply

Surge multiplier mapping:
  ratio < 1.0:  multiplier = 1.0  (normal pricing)
  ratio 1.0-1.5: multiplier = 1.2
  ratio 1.5-2.0: multiplier = 1.5
  ratio 2.0-3.0: multiplier = 2.0
  ratio 3.0-5.0: multiplier = 2.5
  ratio > 5.0:   multiplier = 3.0 (cap)

Zone grid: city divided into ~1 km x 1 km hexagonal zones

Surge Computation Pipeline

Location Updates --> Aggregate by Zone --> Compute Ratios --> Update Surge Map

Every 2 minutes:
1. Count active ride requests per zone (last 5 min window)
2. Count available drivers per zone (current snapshot)
3. Compute demand/supply ratio
4. Map ratio to surge multiplier
5. Smooth transitions (max 0.5x change per cycle)
6. Publish updated surge map to all services

Surge Map (Redis):
  Key:   surge:{city_id}:{zone_id}
  Value: { "multiplier": 1.5, "updated_at": 1681200000 }
  TTL:   5 minutes (auto-expire if not refreshed)

Surge Display

Rider sees:
  +-----------------------------+
  |  Surge pricing in effect    |
  |  1.5x normal fare           |
  |                             |
  |  Estimated fare: $18-$24    |
  |  (normally $12-$16)         |
  |                             |
  |  [Accept Surge] [Wait]      |
  +-----------------------------+

10. Location Ingestion at Scale

5 million drivers * 1 update / 4 seconds = 1.25 million updates/sec

Ingestion Pipeline:

Driver App --> UDP/WebSocket --> Location Ingestion Service
                                        |
                              +---------+---------+
                              |                   |
                        Redis GEO             Kafka Topic
                        (real-time index)     (historical storage)
                                                  |
                                            +-----v-----+
                                            | Spark/Flink|
                                            | (analytics)|
                                            +-----+-----+
                                                  |
                                            +-----v-----+
                                            | Time Series|
                                            | DB (Druid) |
                                            +------------+

Batching and Compression

Instead of 1 update every 4 seconds:
Client batches 5 updates and sends every 20 seconds.

Batch: [
  { lat: 37.7749, lng: -122.4194, ts: 100 },
  { lat: 37.7751, lng: -122.4190, ts: 104 },
  { lat: 37.7755, lng: -122.4185, ts: 108 },
  { lat: 37.7760, lng: -122.4180, ts: 112 },
  { lat: 37.7765, lng: -122.4175, ts: 116 }
]

Reduces network calls by 5x.
Server processes batch and updates Redis with latest position.
Kafka receives all points for trip reconstruction.

11. Scaling Considerations

Geographic Sharding

Shard by city/region:
  US-West:   San Francisco, Los Angeles, Seattle
  US-East:   New York, Boston, Miami
  Europe:    London, Paris, Berlin
  Asia:      Mumbai, Delhi, Singapore

Each region has:
  - Independent matching service
  - Local Redis cluster for driver locations
  - Local trip database
  - Independent surge computation

Cross-region: Only user accounts are global.
              Trips are always local to a city.

Redis Cluster for Geospatial

Partition by city:
  Redis Cluster 1: drivers:sf:available, drivers:la:available
  Redis Cluster 2: drivers:nyc:available, drivers:bos:available
  
Within a city, all available drivers in ONE Redis GEO set.
GEORADIUS is fast: ~1ms for 50 results in a 5km radius.

12. Key Tradeoffs

DecisionOption AOption BOur Choice
Location protocolHTTP (simple)WebSocket/UDP (fast)WebSocket
Geospatial indexPostGISRedis GEORedis GEO
MatchingGlobal optimizationGreedy nearestScored greedy
Surge computationReal-time per requestPeriodic batchPeriodic (2min)
ETAStatic road distanceML + trafficML + traffic
Location batchingEvery update (4s)Batched (20s)Batched
Trip databaseGlobal single DBPer-city shardingPer-city

13. Failure Scenarios and Mitigations

Scenario                          Mitigation
------------------------------------------------------------------------
Redis GEO failure                 Fallback to PostgreSQL/PostGIS (higher latency)
Driver location stale             30-second TTL auto-removes stale entries
No drivers available              Expand search radius; show "no drivers" with ETA
Matching service overload         Queue ride requests; process FIFO
Payment failure after trip        Retry payment; hold trip in "payment_pending"
GPS drift / inaccurate location   Snap to road using map matching
Surge price controversy           Cap at 3x; show clear pricing before confirmation
Driver cancels after accepting    Penalize driver; re-match rider immediately

Key Takeaways

  1. Geospatial indexing (Redis GEO or geohash grid) is the core data structure -- efficient nearby-driver queries at 1M+ updates/sec.
  2. Matching is a scored greedy algorithm, not a global optimization -- speed matters more than theoretical optimality.
  3. Surge pricing uses zone-based supply/demand ratios computed periodically, not per-request -- this keeps it responsive and predictable.
  4. Location ingestion at 1.25M updates/sec requires batching, UDP/WebSocket, and writing to Redis (real-time) + Kafka (historical) in parallel.
  5. Per-city sharding is natural for ride-sharing -- trips never cross city boundaries, making horizontal scaling straightforward.