Home/System Design/Design WhatsApp - Messaging Platform

Design WhatsApp - Messaging Platform

Explain Like I'm 5

Imagine you have a magical walkie-talkie that can send messages to your friends instantly, no matter where they are in the world! You can type a message like "Hi!" and press send, and BOOM - your friend gets it right away, even if they're in another country! It's like having a super-fast mailman who delivers your letters in less than a second! You can also send pictures of your drawings, voice messages where you talk, and even see a checkmark that tells you if your friend has read your message! The magic part is that your messages are locked in a special box (encryption) so only you and your friend can read them - not even the mailman can peek inside!

Key Features

  • Send and receive text messages in real-time
  • Send photos, videos, voice messages, and documents
  • End-to-end encryption for privacy
  • Message delivery status (sent, delivered, read receipts)
  • Group chats with up to 256 participants
  • Online/offline status and last seen
  • Message history stored on device and backed up

Requirements

Functional Requirements:

  • One-to-one messaging with real-time delivery
  • Group messaging with multiple participants
  • Send multimedia content (images, videos, voice)
  • Message status indicators (sent ✓, delivered ✓✓, read 🔵🔵)
  • End-to-end encryption for all messages
  • Offline message storage and sync when online

Non-Functional Requirements:

  • Low latency (<200ms message delivery)
  • High availability (99.99% uptime)
  • Scalable to 2 billion+ users
  • Minimal data usage for users with limited bandwidth
  • Support for offline message queuing

Capacity Estimation

Assumptions:

  • 2 billion total users worldwide
  • 500 million daily active users
  • Each user sends 40 messages per day on average
  • 20% of messages contain media (images, videos)
  • Average message size: 100 bytes (text), 500KB (media)

Storage Calculation:

  • Daily messages: 500M users × 40 msgs = 20 billion messages/day
  • Text storage: 20B × 0.8 × 100 bytes = 1.6TB/day
  • Media storage: 20B × 0.2 × 500KB = 2PB/day
  • Total per year: (1.6TB + 2PB) × 365 ≈ 730PB/year

Bandwidth Calculation:

  • Messages per second: 20B / 86400 ≈ 230,000 msgs/sec
  • Peak traffic (3x average): ~700,000 msgs/sec
  • Bandwidth: 700K × (100 bytes + 100KB media) ≈ 70GB/sec

API Design

1. Send Message:

POST /api/v1/messages/send
Request:
{
  "sender_id": 123456789,
  "receiver_id": 987654321,
  "message_type": "text",
  "content": "Hello! How are you?",
  "client_message_id": "msg_abc123",
  "timestamp": 1698765432000
}

Response:
{
  "message_id": "msg_server_xyz789",
  "status": "sent",
  "timestamp": 1698765432100
}

2. Receive Messages (WebSocket):

// Client subscribes to WebSocket
WS /api/v1/messages/subscribe?user_id=987654321

// Server pushes new messages
{
  "message_id": "msg_server_xyz789",
  "sender_id": 123456789,
  "receiver_id": 987654321,
  "message_type": "text",
  "content": "Hello! How are you?",
  "timestamp": 1698765432100,
  "encrypted": true
}

3. Update Delivery Status:

POST /api/v1/messages/status
Request:
{
  "message_id": "msg_server_xyz789",
  "receiver_id": 987654321,
  "status": "delivered"  // or "read"
}

Response:
{
  "success": true,
  "updated_at": 1698765433000
}

4. Send Media:

POST /api/v1/media/upload
Content-Type: multipart/form-data

Request:
{
  "sender_id": 123456789,
  "file": <binary data>,
  "file_type": "image/jpeg"
}

Response:
{
  "media_id": "media_abc123",
  "media_url": "https://cdn.whatsapp.com/...",
  "thumbnail_url": "https://cdn.whatsapp.com/thumb/...",
  "file_size": 524288
}

Database Design

Users Table (PostgreSQL):

CREATE TABLE users (
  user_id BIGINT PRIMARY KEY,
  phone_number VARCHAR(20) UNIQUE NOT NULL,
  username VARCHAR(100),
  profile_photo_url VARCHAR(255),
  status_message TEXT,
  is_online BOOLEAN DEFAULT FALSE,
  last_seen TIMESTAMP,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_phone (phone_number),
  INDEX idx_online (is_online)
);

Messages Table (Cassandra - High Write Throughput):

CREATE TABLE messages (
  message_id UUID PRIMARY KEY,
  sender_id BIGINT,
  receiver_id BIGINT,
  conversation_id UUID,
  message_type TEXT, -- 'text', 'image', 'video', 'voice'
  content TEXT,
  media_url TEXT,
  encrypted_content BLOB,
  status TEXT, -- 'sent', 'delivered', 'read'
  timestamp TIMESTAMP,
  PRIMARY KEY ((conversation_id), timestamp, message_id)
) WITH CLUSTERING ORDER BY (timestamp DESC);

-- Query messages by conversation efficiently
-- Recent messages appear first

Conversations Table:

CREATE TABLE conversations (
  conversation_id UUID PRIMARY KEY,
  participant_ids LIST<BIGINT>,
  conversation_type TEXT, -- 'one_to_one', 'group'
  last_message_id UUID,
  last_message_timestamp TIMESTAMP,
  created_at TIMESTAMP,
  INDEX idx_participants (participant_ids)
);

Groups Table:

CREATE TABLE groups (
  group_id UUID PRIMARY KEY,
  group_name VARCHAR(255),
  group_icon_url VARCHAR(255),
  admin_ids LIST<BIGINT>,
  member_ids LIST<BIGINT>,
  created_at TIMESTAMP,
  INDEX idx_members (member_ids)
);

High-Level Architecture


┌──────────────┐       ┌──────────────┐
│   Client 1   │       │   Client 2   │
│   (Mobile)   │       │   (Mobile)   │
└──────┬───────┘       └──────┬───────┘
       │                      │
       │   WebSocket / HTTP   │
       └──────────┬───────────┘
                  │
           ┌──────▼──────┐
           │     Load    │
           │   Balancer  │
           └──────┬──────┘
                  │
       ┌──────────┼──────────┐
       │          │          │
┌──────▼──────┐ ┌▼─────────┐ ┌▼──────────┐
│  WebSocket  │ │ Message  │ │   Media   │
│   Server    │ │ Service  │ │  Service  │
│  (Real-time)│ │  (API)   │ │ (Upload)  │
└──────┬──────┘ └┬─────────┘ └┬──────────┘
       │         │            │
       └─────────┼────────────┘
                 │
       ┌─────────┼────────────┐
       │         │            │
┌──────▼──────┐ ┌▼────────┐ ┌▼──────────┐
│   Redis     │ │Cassandra│ │PostgreSQL │
│  (Online    │ │(Messages│ │  (Users)  │
│  Presence)  │ │ Storage)│ │           │
└─────────────┘ └┬────────┘ └───────────┘
                 │
           ┌─────▼─────┐
           │  Kafka    │
           │ (Events)  │
           └─────┬─────┘
                 │
         ┌───────┴────────┐
         │                │
    ┌────▼─────┐    ┌────▼────┐
    │   S3     │    │  CDN    │
    │ (Media   │    │ (Media  │
    │ Storage) │    │Delivery)│
    └──────────┘    └─────────┘

Deep Dive: Key Components

1. Message Delivery System (Real-Time)

Messages must be delivered instantly when the recipient is online, or stored for later delivery when offline.

MessageDeliveryService.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
public class MessageDeliveryService {
private WebSocketConnectionPool wsPool;
private MessageQueue offlineQueue;
private CassandraClient cassandra;
private RedisCache redis;
private KafkaProducer kafkaProducer;
/**
* Sends a message from sender to receiver.
* Delivers immediately if online, queues if offline.
*/
public MessageResponse sendMessage(SendMessageRequest request) {
// 1. Generate unique message ID
String messageId = UUID.randomUUID().toString();
long timestamp = System.currentTimeMillis();
// 2. Encrypt message content (end-to-end encryption)
byte[] encryptedContent = encryptMessage(
request.getContent(),
request.getReceiverId()
);
// 3. Create message object
Message message = new Message(
messageId,
request.getSenderId(),
request.getReceiverId(),
request.getMessageType(),
encryptedContent,
timestamp
);
// 4. Store message in Cassandra (persistent storage)
cassandra.insertMessage(message);
// 5. Check if receiver is online
boolean isReceiverOnline = redis.get("user:online:" + request.getReceiverId());
if (isReceiverOnline) {
// 6a. Receiver is ONLINE - send via WebSocket immediately
WebSocketConnection receiverConnection = wsPool.getConnection(
request.getReceiverId()
);
if (receiverConnection != null && receiverConnection.isOpen()) {
receiverConnection.send(message);
message.setStatus("delivered");
// Update status in database
cassandra.updateMessageStatus(messageId, "delivered");
} else {
// Connection dropped, queue for later
queueOfflineMessage(request.getReceiverId(), messageId);
message.setStatus("sent");
}
} else {
// 6b. Receiver is OFFLINE - queue message
queueOfflineMessage(request.getReceiverId(), messageId);
message.setStatus("sent");
}
// 7. Send delivery receipt to sender
notifySender(request.getSenderId(), messageId, message.getStatus());
// 8. Publish event to Kafka for analytics
kafkaProducer.send("message-sent", new MessageEvent(
messageId, request.getSenderId(), request.getReceiverId(), timestamp
));
return new MessageResponse(messageId, message.getStatus(), timestamp);
}
/**
* Handles user coming online - delivers queued messages.
*/
public void handleUserOnline(long userId) {
System.out.println("User " + userId + " came online");
// 1. Mark user as online in Redis
redis.set("user:online:" + userId, true, 3600); // 1 hour TTL
// 2. Get all queued messages for this user
List<String> queuedMessageIds = offlineQueue.getMessages(userId);
if (queuedMessageIds.isEmpty()) {
return;
}
System.out.println("Delivering " + queuedMessageIds.size() +
" queued messages to user " + userId);
// 3. Get WebSocket connection
WebSocketConnection connection = wsPool.getConnection(userId);
if (connection == null || !connection.isOpen()) {
System.err.println("Failed to get connection for user " + userId);
return;
}
// 4. Fetch messages from Cassandra and deliver
for (String messageId : queuedMessageIds) {
Message message = cassandra.getMessage(messageId);
if (message != null) {
// Send message via WebSocket
connection.send(message);
// Update status to delivered
cassandra.updateMessageStatus(messageId, "delivered");
// Notify sender about delivery
notifySender(message.getSenderId(), messageId, "delivered");
}
// Remove from queue
offlineQueue.removeMessage(userId, messageId);
}
}
/**
* Queues message for offline user.
*/
private void queueOfflineMessage(long userId, String messageId) {
offlineQueue.addMessage(userId, messageId);
// Also store in Redis sorted set with timestamp for quick access
redis.zadd("offline:messages:" + userId,
System.currentTimeMillis(),
messageId);
}
/**
* Encrypts message using receiver's public key (end-to-end encryption).
*/
private byte[] encryptMessage(String content, long receiverId) {
// In reality: use Signal Protocol or similar E2E encryption
// For demo, simplified
PublicKey receiverPublicKey = getPublicKey(receiverId);
return CryptoUtil.encrypt(content.getBytes(), receiverPublicKey);
}
/**
* Notifies sender about message status update.
*/
private void notifySender(long senderId, String messageId, String status) {
WebSocketConnection senderConnection = wsPool.getConnection(senderId);
if (senderConnection != null && senderConnection.isOpen()) {
senderConnection.send(new StatusUpdate(messageId, status));
}
}
}

2. Group Messaging

Group messages must be delivered to multiple recipients efficiently. Use fan-out pattern to send to all members.

GroupMessagingService.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
public class GroupMessagingService {
private MessageDeliveryService messageDelivery;
private CassandraClient cassandra;
private RedisCache redis;
/**
* Sends a message to a group chat.
* Fans out message to all group members.
*/
public GroupMessageResponse sendGroupMessage(GroupMessageRequest request) {
String groupId = request.getGroupId();
long senderId = request.getSenderId();
// 1. Verify sender is a member of the group
Group group = getGroup(groupId);
if (!group.getMembers().contains(senderId)) {
throw new UnauthorizedException("User not in group");
}
// 2. Generate message ID and timestamp
String messageId = UUID.randomUUID().toString();
long timestamp = System.currentTimeMillis();
// 3. Store group message once
GroupMessage groupMessage = new GroupMessage(
messageId,
groupId,
senderId,
request.getContent(),
timestamp
);
cassandra.insertGroupMessage(groupMessage);
// 4. Fan out to all group members (except sender)
List<Long> recipients = group.getMembers().stream()
.filter(memberId -> memberId != senderId)
.collect(Collectors.toList());
System.out.println("Fanning out message to " + recipients.size() + " members");
// 5. Use parallel processing for large groups
if (recipients.size() > 50) {
// Large group: use async processing
fanOutAsync(messageId, recipients, groupMessage);
} else {
// Small group: send synchronously
fanOutSync(messageId, recipients, groupMessage);
}
// 6. Return success to sender
return new GroupMessageResponse(messageId, "sent", timestamp);
}
/**
* Synchronous fan-out for small groups.
*/
private void fanOutSync(String messageId, List<Long> recipients,
GroupMessage message) {
for (Long recipientId : recipients) {
try {
// Check if recipient is online
boolean isOnline = redis.get("user:online:" + recipientId);
if (isOnline) {
// Deliver via WebSocket
WebSocketConnection conn = wsPool.getConnection(recipientId);
if (conn != null && conn.isOpen()) {
conn.send(message);
}
} else {
// Queue for later delivery
offlineQueue.addMessage(recipientId, messageId);
}
} catch (Exception e) {
System.err.println("Failed to deliver to " + recipientId + ": " +
e.getMessage());
}
}
}
/**
* Asynchronous fan-out for large groups (>50 members).
*/
private void fanOutAsync(String messageId, List<Long> recipients,
GroupMessage message) {
// Split recipients into batches of 100
int batchSize = 100;
List<List<Long>> batches = Lists.partition(recipients, batchSize);
// Process batches in parallel using thread pool
ExecutorService executor = Executors.newFixedThreadPool(10);
for (List<Long> batch : batches) {
executor.submit(() -> {
fanOutSync(messageId, batch, message);
});
}
executor.shutdown();
}
/**
* Gets group information from cache or database.
*/
private Group getGroup(String groupId) {
// Try cache first
String cacheKey = "group:" + groupId;
Group cached = redis.get(cacheKey);
if (cached != null) {
return cached;
}
// Cache miss - fetch from database
Group group = cassandra.getGroup(groupId);
if (group != null) {
// Cache for 1 hour
redis.setWithExpiry(cacheKey, group, 3600);
}
return group;
}
/**
* Adds member to group.
*/
public void addMemberToGroup(String groupId, long userId, long adminId) {
Group group = getGroup(groupId);
// Verify admin permissions
if (!group.getAdmins().contains(adminId)) {
throw new UnauthorizedException("Only admins can add members");
}
// Add member to group
group.getMembers().add(userId);
cassandra.updateGroup(group);
// Invalidate cache
redis.delete("group:" + groupId);
// Send system message to group
sendSystemMessage(groupId, userId + " joined the group");
}
/**
* Sends system message to group (e.g., "Alice added Bob").
*/
private void sendSystemMessage(String groupId, String content) {
GroupMessageRequest systemMsg = new GroupMessageRequest(
groupId,
0, // System sender
"system",
content
);
sendGroupMessage(systemMsg);
}
}

3. Message Status & Read Receipts

Track message delivery and read status with checkmarks (✓ sent, ✓✓ delivered, 🔵🔵 read).

MessageStatusService.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
public class MessageStatusService {
private CassandraClient cassandra;
private WebSocketConnectionPool wsPool;
private RedisCache redis;
/**
* Updates message status when user receives/reads a message.
*/
public void updateMessageStatus(String messageId, long userId,
MessageStatus newStatus) {
// 1. Fetch message from database
Message message = cassandra.getMessage(messageId);
if (message == null) {
System.err.println("Message not found: " + messageId);
return;
}
// 2. Verify user is the receiver
if (message.getReceiverId() != userId) {
System.err.println("User " + userId + " not authorized for message " +
messageId);
return;
}
// 3. Update status (only allow forward progression)
MessageStatus currentStatus = message.getStatus();
if (!canTransition(currentStatus, newStatus)) {
System.err.println("Invalid status transition: " + currentStatus +
" -> " + newStatus);
return;
}
// 4. Update in database
message.setStatus(newStatus);
message.setStatusUpdatedAt(System.currentTimeMillis());
cassandra.updateMessage(message);
// 5. Send status update to sender (blue checkmarks!)
notifySenderOfStatusChange(message.getSenderId(), messageId, newStatus);
// 6. Update last seen if status is "read"
if (newStatus == MessageStatus.READ) {
updateLastSeen(userId);
}
}
/**
* Checks if status transition is valid.
* Status progression: SENT -> DELIVERED -> READ
*/
private boolean canTransition(MessageStatus current, MessageStatus next) {
// Status hierarchy
int currentLevel = getStatusLevel(current);
int nextLevel = getStatusLevel(next);
// Can only move forward, not backward
return nextLevel > currentLevel;
}
private int getStatusLevel(MessageStatus status) {
switch (status) {
case SENT: return 1;
case DELIVERED: return 2;
case READ: return 3;
default: return 0;
}
}
/**
* Sends status update notification to sender.
* This is how the blue checkmarks appear!
*/
private void notifySenderOfStatusChange(long senderId, String messageId,
MessageStatus status) {
WebSocketConnection senderConn = wsPool.getConnection(senderId);
if (senderConn != null && senderConn.isOpen()) {
// Send status update message
StatusUpdateNotification notification = new StatusUpdateNotification(
messageId,
status.toString().toLowerCase(),
System.currentTimeMillis()
);
senderConn.send(notification);
System.out.println("Sent status update to sender " + senderId + ": " +
messageId + " is now " + status);
} else {
// Sender offline - they'll see the updated status when they come online
System.out.println("Sender " + senderId + " offline, will sync later");
}
}
/**
* Marks multiple messages as read at once (bulk operation).
*/
public void markConversationAsRead(long userId, String conversationId) {
// 1. Get all unread messages in conversation
List<Message> unreadMessages = cassandra.getUnreadMessages(
conversationId,
userId
);
if (unreadMessages.isEmpty()) {
return;
}
System.out.println("Marking " + unreadMessages.size() +
" messages as read in conversation " + conversationId);
// 2. Batch update all messages to READ status
List<String> messageIds = unreadMessages.stream()
.map(Message::getMessageId)
.collect(Collectors.toList());
cassandra.batchUpdateStatus(messageIds, MessageStatus.READ);
// 3. Notify senders (group by sender to avoid duplicate notifications)
Map<Long, List<String>> messagesBySender = unreadMessages.stream()
.collect(Collectors.groupingBy(
Message::getSenderId,
Collectors.mapping(Message::getMessageId, Collectors.toList())
));
for (Map.Entry<Long, List<String>> entry : messagesBySender.entrySet()) {
long senderId = entry.getKey();
List<String> senderMessageIds = entry.getValue();
// Send single notification with all message IDs
notifyBulkStatusChange(senderId, senderMessageIds, MessageStatus.READ);
}
// 4. Update last seen
updateLastSeen(userId);
}
/**
* Sends bulk status update for multiple messages.
*/
private void notifyBulkStatusChange(long senderId, List<String> messageIds,
MessageStatus status) {
WebSocketConnection conn = wsPool.getConnection(senderId);
if (conn != null && conn.isOpen()) {
BulkStatusUpdate update = new BulkStatusUpdate(
messageIds,
status.toString().toLowerCase(),
System.currentTimeMillis()
);
conn.send(update);
}
}
/**
* Updates user's last seen timestamp.
*/
private void updateLastSeen(long userId) {
long timestamp = System.currentTimeMillis();
// Update in Redis for fast access
redis.set("user:last_seen:" + userId, timestamp);
// Also update in database (async)
CompletableFuture.runAsync(() -> {
cassandra.updateUserLastSeen(userId, timestamp);
});
}
/**
* Gets user's online status and last seen.
*/
public UserStatus getUserStatus(long userId) {
// Check if online
boolean isOnline = redis.get("user:online:" + userId);
if (isOnline) {
return new UserStatus(userId, true, "online");
} else {
// Get last seen from cache
Long lastSeen = redis.get("user:last_seen:" + userId);
if (lastSeen == null) {
// Cache miss - fetch from database
lastSeen = cassandra.getUserLastSeen(userId);
if (lastSeen != null) {
redis.set("user:last_seen:" + userId, lastSeen);
}
}
return new UserStatus(userId, false, formatLastSeen(lastSeen));
}
}
/**
* Formats last seen timestamp (e.g., "5 minutes ago", "yesterday").
*/
private String formatLastSeen(Long timestamp) {
if (timestamp == null) {
return "last seen a long time ago";
}
long now = System.currentTimeMillis();
long diffSeconds = (now - timestamp) / 1000;
if (diffSeconds < 60) {
return "last seen just now";
} else if (diffSeconds < 3600) {
return "last seen " + (diffSeconds / 60) + " minutes ago";
} else if (diffSeconds < 86400) {
return "last seen " + (diffSeconds / 3600) + " hours ago";
} else {
return "last seen " + (diffSeconds / 86400) + " days ago";
}
}
enum MessageStatus {
SENT,
DELIVERED,
READ
}
}

Trade-offs and Optimizations

1. WebSocket vs Long Polling

WebSocket: True real-time, persistent connection, better for active users. Long Polling: Fallback for restricted networks, higher latency. Use WebSocket with long polling fallback.

2. Store-and-Forward vs Direct Delivery

Store-and-Forward: Always persist message first (WhatsApp approach), guarantees delivery even if server crashes. Direct Delivery: Faster but can lose messages. WhatsApp uses store-and-forward.

3. Message Storage Duration

Store forever: Users can search old messages, expensive storage. Store limited time: Cheaper, but users lose history. WhatsApp stores messages on device, optional cloud backup.

4. Group Message Fan-out

Synchronous: Deliver to all immediately, slower for large groups. Asynchronous: Queue for background workers, faster response but delayed delivery. Use async for groups >50 members.

Optimizations:

  • Use Redis for online presence with TTL (auto-expire when user disconnects)
  • Batch status updates (send one notification for 10 read receipts)
  • Use Cassandra for messages (optimized for write-heavy workloads)
  • Implement message compression (reduce bandwidth by 60%)
  • Use CDN for media delivery (images, videos cached at edge)
  • Connection pooling for WebSocket servers (handle millions of connections)

Follow-up Interview Questions

Q: How do you implement end-to-end encryption?

A: Use Signal Protocol. Each user has public/private key pair. Messages encrypted with receiver's public key, only they can decrypt with private key. Keys stored only on device, never on server. Implement Perfect Forward Secrecy (PFS) with ephemeral keys that change per message.

Q: How do you handle messages when both users are offline?

A: Store message in Cassandra immediately. Add to offline message queue in Redis for both sender and receiver. When either comes online, deliver from queue. Sender gets 'sent' status, receiver gets queued message, sender then gets 'delivered' status.

Q: How would you implement message search?

A: Local search: Index messages on device using SQLite FTS (Full-Text Search). Server search: Use Elasticsearch for backed-up messages, shard by user_id. Challenge with E2E encryption: Can't search encrypted content on server. Solution: Client-side decryption + search, or encrypt search index with user's key.

Q: How do you handle network failures during message send?

A: Implement retry mechanism with exponential backoff. Store pending messages locally with 'sending' status. Keep trying to send (retry every 2s, then 4s, 8s, up to 30s). If still fails after 5 minutes, show 'failed' icon, let user manually retry. Use message_id for deduplication (don't send twice).

Q: How would you scale WebSocket servers to millions of connections?

A: Use multiple WebSocket servers behind load balancer with consistent hashing (route user_id to specific server). Each server handles ~100K connections. Use Redis Pub/Sub for server-to-server communication (route messages between servers). Implement connection pooling and keep-alive pings. Monitor CPU/memory, auto-scale based on active connections.

Real-World Implementation

WhatsApp's actual architecture:

  • Erlang for messaging servers (handles millions of concurrent connections)
  • XMPP protocol (modified) for real-time communication
  • Signal Protocol for end-to-end encryption
  • Cassandra for message storage (billions of messages/day)
  • Redis for online presence and message queues
  • FreeBSD servers optimized for network throughput
  • Client-side SQLite for local message storage
  • Media stored in Facebook's infrastructure (S3-like)