Home/System Design/Design Uber/Lyft - Ride-Sharing System

Design Uber/Lyft - Ride-Sharing System

Explain Like I'm 5

Imagine you have a magical map that shows you all the toy cars nearby, and you can ask one of them to come pick you up and take you to the park! When you tap on your phone, it's like raising your hand and saying "I need a ride!" The map looks at all the drivers (like taxi drivers with their cars) and finds the closest one who isn't busy. Then that driver comes to pick you up, and you can watch them getting closer on your map! When you arrive at the park, you pay them some money, and everyone is happy! The tricky part is making sure the map works super fast for millions of people at the same time, and finding the closest driver in just a second!

Key Features

  • Request a ride from current location to destination
  • Real-time driver location tracking and ETA updates
  • Automatic matching of riders with nearby available drivers
  • Dynamic pricing (surge pricing) based on demand
  • In-app payment processing and receipt generation
  • Rating system for drivers and riders
  • Trip history and fare estimation

Requirements

Functional Requirements:

  • Riders can request rides with pickup and drop-off locations
  • System matches riders with nearest available drivers
  • Real-time location tracking for both riders and drivers
  • Fare calculation based on distance, time, and demand
  • In-app notifications for ride status updates
  • Payment processing and trip history

Non-Functional Requirements:

  • High availability (99.99% uptime)
  • Low latency for location updates (<1 second)
  • Scalable to handle millions of concurrent users
  • Accurate driver matching within 5 seconds
  • Support for surge pricing during peak hours

Capacity Estimation

Assumptions:

  • 100 million active users (riders + drivers)
  • 10 million daily active users (10% DAU)
  • 1 million concurrent rides at peak
  • Average 2 rides per active user per day
  • Each location update every 3 seconds

Storage Calculation:

  • User data: 100M users × 1KB = 100GB
  • Trip history: 20M trips/day × 5KB × 365 days = 36.5TB/year
  • Location data: 1M active drivers × 28,800 updates/day × 100 bytes = 2.8TB/day
  • Total storage per year: ~1PB

Bandwidth Calculation:

  • Location updates: 1M drivers × 1 update/3s × 100 bytes = 33MB/s
  • Rider updates: 1M riders × 1 update/3s × 100 bytes = 33MB/s
  • Total incoming: ~66MB/s (~500 Gbps)

API Design

1. Request Ride:

POST /api/v1/rides/request
Request:
{
  "rider_id": 12345,
  "pickup_location": {
    "latitude": 37.7749,
    "longitude": -122.4194
  },
  "dropoff_location": {
    "latitude": 37.8044,
    "longitude": -122.2712
  },
  "ride_type": "uberX"
}

Response:
{
  "ride_id": "rid_abc123",
  "estimated_fare": 25.50,
  "estimated_time": 5,
  "status": "searching"
}

2. Update Location (WebSocket):

// Driver sends location every 3 seconds
{
  "type": "location_update",
  "driver_id": 67890,
  "latitude": 37.7750,
  "longitude": -122.4195,
  "timestamp": 1698765432
}

3. Get Ride Status:

GET /api/v1/rides/{ride_id}

Response:
{
  "ride_id": "rid_abc123",
  "status": "driver_assigned",
  "driver": {
    "id": 67890,
    "name": "John Smith",
    "rating": 4.8,
    "current_location": {
      "latitude": 37.7752,
      "longitude": -122.4190
    },
    "eta": 3
  }
}

Database Design

Users Table:

CREATE TABLE users (
  user_id BIGINT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100) UNIQUE,
  phone VARCHAR(20),
  user_type ENUM('rider', 'driver', 'both'),
  rating DECIMAL(2,1),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_email (email)
);

Drivers Table:

CREATE TABLE drivers (
  driver_id BIGINT PRIMARY KEY,
  user_id BIGINT,
  vehicle_type VARCHAR(50),
  license_plate VARCHAR(20),
  current_latitude DECIMAL(10,8),
  current_longitude DECIMAL(11,8),
  is_available BOOLEAN DEFAULT TRUE,
  last_location_update TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(user_id),
  INDEX idx_location (current_latitude, current_longitude),
  INDEX idx_available (is_available)
);

Rides Table:

CREATE TABLE rides (
  ride_id VARCHAR(50) PRIMARY KEY,
  rider_id BIGINT,
  driver_id BIGINT,
  pickup_lat DECIMAL(10,8),
  pickup_lng DECIMAL(11,8),
  dropoff_lat DECIMAL(10,8),
  dropoff_lng DECIMAL(11,8),
  status ENUM('requested', 'driver_assigned', 'in_progress', 'completed', 'cancelled'),
  fare DECIMAL(10,2),
  distance_km DECIMAL(10,2),
  duration_minutes INT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  completed_at TIMESTAMP,
  FOREIGN KEY (rider_id) REFERENCES users(user_id),
  FOREIGN KEY (driver_id) REFERENCES drivers(driver_id),
  INDEX idx_rider (rider_id),
  INDEX idx_driver (driver_id),
  INDEX idx_status (status),
  INDEX idx_created (created_at)
);

High-Level Architecture


┌─────────────┐         ┌─────────────┐
│   Riders    │         │   Drivers   │
│  (Mobile)   │         │   (Mobile)  │
└──────┬──────┘         └──────┬──────┘
       │                       │
       │    WebSocket / HTTP   │
       └───────────┬───────────┘
                   │
            ┌──────▼──────┐
            │   CDN /     │
            │   Load      │
            │   Balancer  │
            └──────┬──────┘
                   │
       ┌───────────┼───────────┐
       │           │           │
┌──────▼──────┐ ┌─▼────────┐ ┌▼──────────┐
│   Ride      │ │ Location │ │  Payment  │
│   Service   │ │ Service  │ │  Service  │
└──────┬──────┘ └─┬────────┘ └┬──────────┘
       │           │           │
       └───────────┼───────────┘
                   │
       ┌───────────┼────────────┐
       │           │            │
┌──────▼──────┐ ┌─▼─────────┐ ┌▼──────────┐
│  PostgreSQL │ │   Redis   │ │  Kafka    │
│  (Rides DB) │ │  (Cache)  │ │  (Events) │
└─────────────┘ └───────────┘ └───────────┘
       │
┌──────▼──────────┐
│  QuadTree DB    │
│ (Geo-Spatial)   │
└─────────────────┘

Deep Dive: Key Components

1. Driver Matching Algorithm (QuadTree)

Finding the nearest available driver efficiently is critical. We use a QuadTree data structure to partition the map into regions and quickly query drivers within a radius.

DriverMatchingService.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
public class DriverMatchingService {
private QuadTree driverQuadTree;
private RedisCache cache;
/**
* Finds the nearest available driver for a ride request.
* Uses QuadTree for efficient spatial search.
*/
public Driver findNearestDriver(double lat, double lng, String rideType) {
// 1. Search within increasing radius until we find drivers
double[] radiusSteps = {1.0, 2.0, 5.0, 10.0}; // in km
for (double radius : radiusSteps) {
List<Driver> nearbyDrivers = driverQuadTree.searchRadius(lat, lng, radius);
// 2. Filter by availability and vehicle type
List<Driver> availableDrivers = nearbyDrivers.stream()
.filter(d -> d.isAvailable())
.filter(d -> d.getVehicleType().equals(rideType))
.collect(Collectors.toList());
if (!availableDrivers.isEmpty()) {
// 3. Sort by distance and return closest
return availableDrivers.stream()
.min(Comparator.comparingDouble(d ->
calculateDistance(lat, lng, d.getLatitude(), d.getLongitude())
))
.orElse(null);
}
}
return null; // No drivers found
}
/**
* Calculate distance between two points using Haversine formula.
*/
private double calculateDistance(double lat1, double lng1,
double lat2, double lng2) {
double earthRadius = 6371; // km
double dLat = Math.toRadians(lat2 - lat1);
double dLng = Math.toRadians(lng2 - lng1);
double a = Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.cos(Math.toRadians(lat1)) * Math.cos(Math.toRadians(lat2)) *
Math.sin(dLng/2) * Math.sin(dLng/2);
double c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
return earthRadius * c;
}
public static void main(String[] args) {
DriverMatchingService service = new DriverMatchingService();
// Example: Find driver near San Francisco
Driver nearestDriver = service.findNearestDriver(37.7749, -122.4194, "uberX");
if (nearestDriver != null) {
System.out.println("Found driver: " + nearestDriver.getName());
} else {
System.out.println("No drivers available");
}
}
}

2. Real-Time Location Tracking (WebSocket)

Drivers send location updates every 3 seconds. We use WebSocket for bidirectional real-time communication and Redis for fast location storage.

LocationTrackingService.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
public class LocationTrackingService {
private WebSocketServer wsServer;
private RedisCache redis;
private QuadTree driverQuadTree;
private KafkaProducer kafkaProducer;
/**
* Handles incoming location updates from drivers.
* Updates QuadTree and broadcasts to riders tracking this driver.
*/
public void handleLocationUpdate(LocationUpdate update) {
long driverId = update.getDriverId();
double lat = update.getLatitude();
double lng = update.getLongitude();
// 1. Update driver location in Redis (fast cache)
String cacheKey = "driver:location:" + driverId;
redis.setWithExpiry(cacheKey, update, 10); // 10 seconds TTL
// 2. Update QuadTree for spatial queries
driverQuadTree.updateLocation(driverId, lat, lng);
// 3. Find all riders tracking this driver (active rides)
List<Long> trackingRiders = getActiveRidersForDriver(driverId);
// 4. Broadcast location to tracking riders via WebSocket
for (Long riderId : trackingRiders) {
wsServer.sendToClient(riderId, new LocationBroadcast(
driverId, lat, lng, update.getTimestamp()
));
}
// 5. Store in Kafka for analytics and trip history
kafkaProducer.send("location-updates", update);
}
/**
* WebSocket connection handler for riders and drivers.
*/
public void onClientConnect(WebSocketConnection conn, long userId, String userType) {
System.out.println(userType + " " + userId + " connected");
if (userType.equals("driver")) {
// Start receiving location updates
conn.onMessage(message -> {
LocationUpdate update = parseLocationUpdate(message);
handleLocationUpdate(update);
});
} else if (userType.equals("rider")) {
// Send initial driver location if ride is active
Ride activeRide = getActiveRide(userId);
if (activeRide != null && activeRide.getDriverId() != null) {
String cacheKey = "driver:location:" + activeRide.getDriverId();
LocationUpdate lastLocation = redis.get(cacheKey);
if (lastLocation != null) {
conn.send(new LocationBroadcast(
activeRide.getDriverId(),
lastLocation.getLatitude(),
lastLocation.getLongitude(),
lastLocation.getTimestamp()
));
}
}
}
}
private List<Long> getActiveRidersForDriver(long driverId) {
// Query rides with status 'driver_assigned' or 'in_progress'
return rideRepository.findActiveRidersByDriver(driverId);
}
}

3. Dynamic Pricing (Surge Pricing)

Fare calculation considers base fare, distance, time, and demand-based surge multiplier.

PricingService.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
public class PricingService {
private static final double BASE_FARE = 2.50;
private static final double COST_PER_KM = 1.50;
private static final double COST_PER_MINUTE = 0.30;
private static final double MIN_FARE = 5.00;
/**
* Calculates fare with surge pricing based on supply-demand ratio.
*/
public double calculateFare(double distanceKm, int durationMinutes,
double pickupLat, double pickupLng) {
// 1. Base fare calculation
double baseFare = BASE_FARE +
(distanceKm * COST_PER_KM) +
(durationMinutes * COST_PER_MINUTE);
// 2. Apply minimum fare
baseFare = Math.max(baseFare, MIN_FARE);
// 3. Calculate surge multiplier based on demand
double surgeMultiplier = calculateSurgeMultiplier(pickupLat, pickupLng);
// 4. Apply surge pricing
double finalFare = baseFare * surgeMultiplier;
// 5. Round to 2 decimal places
return Math.round(finalFare * 100.0) / 100.0;
}
/**
* Calculates surge multiplier based on supply-demand ratio in the area.
* Returns 1.0 (no surge) to 3.0 (3x surge).
*/
private double calculateSurgeMultiplier(double lat, double lng) {
// 1. Get demand: number of ride requests in last 5 minutes
int demand = getDemandInArea(lat, lng, 5.0); // 5km radius
// 2. Get supply: number of available drivers in area
int supply = getAvailableDriversInArea(lat, lng, 5.0);
// 3. Calculate demand/supply ratio
if (supply == 0) {
return 3.0; // Maximum surge if no drivers
}
double ratio = (double) demand / supply;
// 4. Map ratio to surge multiplier
if (ratio < 1.0) {
return 1.0; // No surge: more supply than demand
} else if (ratio < 2.0) {
return 1.0 + (ratio - 1.0) * 0.5; // 1.0x to 1.5x
} else if (ratio < 4.0) {
return 1.5 + (ratio - 2.0) * 0.5; // 1.5x to 2.5x
} else {
return Math.min(3.0, 2.5 + (ratio - 4.0) * 0.1); // 2.5x to 3.0x
}
}
/**
* Gets the number of ride requests in the area in last N minutes.
*/
private int getDemandInArea(double lat, double lng, double radiusKm) {
String cacheKey = "demand:" + lat + ":" + lng + ":" + radiusKm;
Integer cached = redis.get(cacheKey);
if (cached != null) {
return cached;
}
// Query recent ride requests from database
int demand = rideRepository.countRecentRequestsInArea(
lat, lng, radiusKm, 5 // last 5 minutes
);
redis.setWithExpiry(cacheKey, demand, 60); // Cache for 1 minute
return demand;
}
/**
* Gets the number of available drivers in the area.
*/
private int getAvailableDriversInArea(double lat, double lng, double radiusKm) {
List<Driver> drivers = driverQuadTree.searchRadius(lat, lng, radiusKm);
return (int) drivers.stream()
.filter(Driver::isAvailable)
.count();
}
public static void main(String[] args) {
PricingService service = new PricingService();
// Example: 10km trip, 20 minutes, from downtown SF
double fare = service.calculateFare(10.0, 20, 37.7749, -122.4194);
System.out.println("Estimated fare: $" + fare);
}
}

Trade-offs and Optimizations

1. QuadTree vs Geohash vs H3

QuadTree: Dynamic regions, good for varying driver density. Geohash: Simple, fixed grid. H3: Uber's choice, hexagonal grid for uniform distance.

2. WebSocket vs Server-Sent Events (SSE)

WebSocket: Bidirectional, better for real-time updates. SSE: Simpler, one-way from server. We use WebSocket for location tracking.

3. SQL vs NoSQL for Rides

SQL (PostgreSQL): ACID transactions, good for payments. NoSQL (Cassandra): Better for high-write location data. Hybrid approach recommended.

Optimizations:

  • Use Redis for caching driver locations and surge multipliers
  • Partition database by geographic regions (sharding by location)
  • Use Kafka for async processing of completed rides
  • Implement connection pooling for WebSocket connections
  • Use CDN for static assets and driver profile images
  • Implement circuit breakers for payment service failures

Follow-up Interview Questions

Q: How do you handle driver cancellations?

A: When a driver cancels, immediately re-run the matching algorithm to find another driver. Store cancellation in database for analytics and potential driver rating impact. Notify rider via push notification and WebSocket.

Q: How do you prevent drivers from gaming the surge pricing system?

A: Implement surge pricing calculation on server-side (never trust client). Monitor driver patterns (e.g., going offline when surge starts). Use ML to detect anomalous behavior. Apply penalties for consistent gaming behavior.

Q: How do you handle poor network connectivity for location updates?

A: Implement exponential backoff for retries. Buffer location updates on client and send in batch when connection restores. Show 'last known location' to riders with timestamp. Use GPS interpolation to estimate position during connectivity loss.

Q: How would you scale this to 100 countries?

A: Deploy regional clusters (US-East, US-West, EU, Asia). Use geographic routing at DNS/CDN level. Partition database by country. Implement country-specific pricing rules. Handle currency conversion and local payment methods.

Q: How do you ensure payment security?

A: Use PCI-DSS compliant payment processors (Stripe, Braintree). Never store raw credit card data. Implement tokenization for saved cards. Use HTTPS for all API calls. Implement fraud detection with ML. Support 3D Secure for additional verification.

Real-World Implementation

Uber uses a combination of:

  • H3 hexagonal grid system for geospatial indexing
  • Apache Kafka for event streaming and location updates
  • Microservices architecture with 2000+ services
  • PostgreSQL and Cassandra for different data types
  • Redis for caching driver locations
  • Machine learning for ETAs, pricing, and fraud detection