Design Twitter/X Feed
π― Explain Like I'm 5...
Imagine a magical bulletin board where your friends post messages, and you can see all their messages in order, newest first! That's what a Twitter/X feed is!
What is a Twitter/X Feed?
A personalized stream of posts (tweets) from people you follow, shown in reverse chronological order (newest first).
Example:
You follow Alice, Bob, and Charlie. When they post tweets, you see them in your feed!
π¨ How It Works (Simple Version)
- 1. You follow people (like subscribing to their posts)
- 2. When they tweet, their post goes to all their followers' feeds
- 3. Your feed shows all tweets from people you follow, newest first
- 4. You can scroll down to see older tweets!
π― Key Features to Design
- βPost Tweet: Users can create and publish tweets
- βFollow/Unfollow: Users can follow other users
- βTimeline/Feed: Show tweets from followed users
- βLike & Retweet: Interact with tweets
- βNotifications: Alert users about interactions
π Requirements
Functional Requirements:
- β’ Users can post tweets (280 characters max)
- β’ Users can follow/unfollow other users
- β’ Users see a timeline of tweets from followed users
- β’ Users can like, retweet, and reply to tweets
- β’ Real-time updates for new tweets
Non-Functional Requirements:
- β’ High availability (99.99% uptime)
- β’ Low latency (feed loads in <200ms)
- β’ Scalability (500M users, 200M daily active)
- β’ Handle 6,000 tweets per second
π Capacity Estimation
Assumptions:
- β’ 500 million total users
- β’ 200 million daily active users (DAU)
- β’ Each user follows 200 people on average
- β’ Each user posts 2 tweets per day
- β’ Each tweet is ~280 characters (~500 bytes with metadata)
Calculations:
- β’ Daily tweets: 200M users * 2 tweets = 400M tweets/day
- β’ Tweets per second: 400M / 86,400 β 4,630 tweets/sec
- β’ Peak tweets/sec (assume 2x): ~10,000 tweets/sec
- β’ Feed requests: 200M users * 20 reads/day = 4B reads/day
- β’ Reads per second: 4B / 86,400 β 46,000 reads/sec
- β’ Storage per day: 400M * 500 bytes = 200GB/day
- β’ Storage per year: 200GB * 365 = ~73TB/year
π API Design
// 1. Post TweetPOST /api/v1/tweetsRequest Body:{ "userId": 12345, "content": "Hello World! #Java", "mediaUrls": ["https://cdn.example.com/image.jpg"]}Response:{ "tweetId": 987654321, "userId": 12345, "content": "Hello World! #Java", "createdAt": "2024-01-15T10:30:00Z", "likes": 0, "retweets": 0}// 2. Get User Timeline (Feed)GET /api/v1/timeline?userId=12345&page=1&pageSize=20Response:{ "tweets": [ { "tweetId": 999, "userId": 67890, "username": "alice", "content": "Excited about system design!", "createdAt": "2024-01-15T11:00:00Z", "likes": 42, "retweets": 5 } // ... more tweets ], "nextPage": 2, "hasMore": true}// 3. Follow UserPOST /api/v1/followRequest Body:{ "followerId": 12345, "followeeId": 67890}// 4. Like TweetPOST /api/v1/tweets/{tweetId}/likeRequest Body:{ "userId": 12345}// 5. RetweetPOST /api/v1/tweets/{tweetId}/retweetRequest Body:{ "userId": 12345, "comment": "Great insight!" // optional}πΎ Database Schema
-- Users TableCREATE TABLE users ( user_id BIGINT PRIMARY KEY AUTO_INCREMENT, username VARCHAR(50) UNIQUE NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, bio TEXT, profile_image_url VARCHAR(500), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, followers_count INT DEFAULT 0, following_count INT DEFAULT 0, INDEX idx_username (username));-- Tweets Table (Use NoSQL like Cassandra in production)CREATE TABLE tweets ( tweet_id BIGINT PRIMARY KEY AUTO_INCREMENT, user_id BIGINT NOT NULL, content VARCHAR(280) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, likes_count INT DEFAULT 0, retweets_count INT DEFAULT 0, replies_count INT DEFAULT 0, FOREIGN KEY (user_id) REFERENCES users(user_id), INDEX idx_user_created (user_id, created_at DESC));-- Followers Table (Relationships)CREATE TABLE followers ( follower_id BIGINT NOT NULL, followee_id BIGINT NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (follower_id, followee_id), FOREIGN KEY (follower_id) REFERENCES users(user_id), FOREIGN KEY (followee_id) REFERENCES users(user_id), INDEX idx_followee (followee_id));-- Timeline/Feed Cache Table (Pre-computed feeds)CREATE TABLE user_timeline ( user_id BIGINT NOT NULL, tweet_id BIGINT NOT NULL, tweet_created_at TIMESTAMP NOT NULL, PRIMARY KEY (user_id, tweet_created_at, tweet_id), INDEX idx_user_timeline (user_id, tweet_created_at DESC));-- Likes TableCREATE TABLE likes ( user_id BIGINT NOT NULL, tweet_id BIGINT NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (user_id, tweet_id), FOREIGN KEY (user_id) REFERENCES users(user_id), FOREIGN KEY (tweet_id) REFERENCES tweets(tweet_id), INDEX idx_tweet_likes (tweet_id));Feed Generation Strategies
Strategy 1: Fan-out on Write (Push Model)
When user tweets, immediately push to all followers' feeds
Pros:
- β’ Fast read (feed is pre-computed)
- β’ Simple to implement
Cons:
- β’ Slow write for users with many followers (celebrities)
- β’ Wastes space for inactive users
Strategy 2: Fan-out on Read (Pull Model)
Generate feed on-demand by querying followed users' tweets
Pros:
- β’ No wasted computation for inactive users
- β’ Works well for users with many followers
Cons:
- β’ Slow read (must query many users)
- β’ High latency
Strategy 3: Hybrid Approach (Recommended)
Combine both approaches based on user type
- β’ Regular users: Fan-out on write (push to followers)
- β’ Celebrities (>1M followers): Fan-out on read
- β’ Cache popular tweets in Redis
- β’ Use message queue for async processing
public class FeedGenerationService { private final TweetRepository tweetRepository; private final FollowerRepository followerRepository; private final CacheService cacheService; private final MessageQueueService messageQueue; /** * Post tweet and fanout to followers (Hybrid approach) */ public Tweet postTweet(long userId, String content) { // 1. Save tweet to database Tweet tweet = tweetRepository.save(new Tweet(userId, content)); // 2. Check user's follower count int followerCount = followerRepository.getFollowerCount(userId); if (followerCount < 1_000_000) { // Regular user: Fan-out on write (push to followers) fanoutOnWrite(tweet); } else { // Celebrity: Fan-out on read (pull when requested) // Just cache the tweet for fast access cacheService.cacheTweet(tweet); } return tweet; } /** * Fan-out on write: Push tweet to all followers' feeds */ private void fanoutOnWrite(Tweet tweet) { // Publish to message queue for async processing messageQueue.publish("fanout-queue", tweet); // Workers will consume and push to followers' feeds // This prevents blocking the API response } /** * Get user timeline (feed) */ public List<Tweet> getTimeline(long userId, int page, int pageSize) { String cacheKey = "timeline:" + userId + ":" + page; // 1. Check cache first List<Tweet> cachedFeed = cacheService.get(cacheKey); if (cachedFeed != null) { return cachedFeed; } // 2. Get list of users this user follows List<Long> followingIds = followerRepository.getFollowing(userId); // 3. Check if any following are celebrities List<Long> celebrities = filterCelebrities(followingIds); List<Long> regularUsers = filterRegularUsers(followingIds); List<Tweet> feed = new ArrayList<>(); // 4. For regular users: Get from pre-computed timeline if (!regularUsers.isEmpty()) { feed.addAll(getPrecomputedTimeline(userId, page, pageSize)); } // 5. For celebrities: Fan-out on read (pull their latest tweets) if (!celebrities.isEmpty()) { feed.addAll(getLatestTweetsFromCelebrities(celebrities, pageSize)); } // 6. Merge and sort by timestamp (most recent first) feed.sort((a, b) -> b.getCreatedAt().compareTo(a.getCreatedAt())); // 7. Take only pageSize tweets feed = feed.subList(0, Math.min(pageSize, feed.size())); // 8. Cache the result cacheService.set(cacheKey, feed, 300); // 5 min TTL return feed; } private List<Long> filterCelebrities(List<Long> userIds) { return userIds.stream() .filter(id -> followerRepository.getFollowerCount(id) > 1_000_000) .collect(Collectors.toList()); } private List<Long> filterRegularUsers(List<Long> userIds) { return userIds.stream() .filter(id -> followerRepository.getFollowerCount(id) <= 1_000_000) .collect(Collectors.toList()); } private List<Tweet> getPrecomputedTimeline(long userId, int page, int pageSize) { int offset = (page - 1) * pageSize; return tweetRepository.getUserTimeline(userId, offset, pageSize); } private List<Tweet> getLatestTweetsFromCelebrities( List<Long> celebrityIds, int limit) { // Pull latest tweets from celebrities return tweetRepository.getLatestTweets(celebrityIds, limit); }}/** * Fanout Worker - Consumes from message queue */public class FanoutWorker { private final FollowerRepository followerRepository; private final TimelineRepository timelineRepository; private final MessageQueueService messageQueue; public void start() { messageQueue.consume("fanout-queue", this::processFanout); } private void processFanout(Tweet tweet) { // Get all followers of the user who posted the tweet List<Long> followerIds = followerRepository.getFollowers(tweet.getUserId()); // Push tweet to each follower's timeline (batch insert) for (Long followerId : followerIds) { timelineRepository.addToTimeline(followerId, tweet.getTweetId()); } System.out.println("Fanned out tweet " + tweet.getTweetId() + " to " + followerIds.size() + " followers"); }}ποΈ High-Level System Design
ββββββββββββββββ
β Clients β
β (Web/Mobile) β
ββββββββ¬ββββββββ
β
ββββββββΌββββββββ
βLoad Balancer β
ββββββββ¬ββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
ββββββββΌββββββ βββββββΌβββββββ ββββββΌβββββββ
β API β β API β β API β
β Server 1 β β Server 2 β β Server 3 β
ββββββββ¬ββββββ βββββββ¬βββββββ ββββββ¬βββββββ
β β β
βββββββββββββββββΌββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
β β β
βββββΌβββββ ββββββββΌβββββββ βββββββΌββββββ
β Tweet β β Timeline β β Fanout β
βService β β Service β β Service β
βββββ¬βββββ ββββββββ¬βββββββ βββββββ¬ββββββ
β β β
β ββββββββΌβββββββ β
β β Cache β β
β β (Redis) β β
β βββββββββββββββ β
β β
β ββββββββββββββββ β
ββββββββββββΊβ Message βββββββββββ
β Queue β
β (Kafka) β
ββββββββ¬ββββββββ
β
ββββββββΌββββββββ
β Fanout β
β Workers β
ββββββββ¬ββββββββ
β
ββββββββββββββββββββ΄βββββββββββββββββββ
β β
βββββΌβββββββββ βββββββββΌβββββββ
β User DB β β Tweet DB β
β (MySQL) β β (Cassandra) β
β β β β
β - Users β β - Tweets β
β - Followersβ β - Timeline β
ββββββββββββββ ββββββββββββββββ
Components:
- β’ Load Balancer: Distribute incoming traffic
- β’ API Servers: Handle user requests
- β’ Tweet Service: Create and store tweets
- β’ Timeline Service: Generate user feeds
- β’ Fanout Service: Distribute tweets to followers
- β’ Cache (Redis): Store hot feeds and tweets
- β’ Message Queue (Kafka): Async fanout processing
- β’ Databases: User data, tweets, relationships
Data Flow:
Post Tweet Flow:
- 1. User posts tweet β API Server
- 2. Tweet Service saves to database
- 3. Fanout Service publishes to message queue
- 4. Workers consume and push to followers' feeds
- 5. Feeds cached in Redis for fast access
Read Feed Flow:
- 1. User requests feed β API Server
- 2. Timeline Service checks Redis cache
- 3. If cache hit β return cached feed
- 4. If cache miss β query database
- 5. Merge results and cache for future
π Deep Dive Topics
Database Sharding
Partition data across multiple databases by user ID for scalability
Caching Strategy
Cache user feeds (last 1000 tweets) and popular tweets in Redis with TTL
Feed Ranking Algorithm
Sort by engagement score = likes + retweets + replies, with time decay
Real-time Updates
Use WebSockets or Server-Sent Events for live feed updates
βοΈ Trade-offs & Decisions
Consistency vs Availability:
Choose eventual consistency (AP in CAP theorem) for better availability
Push vs Pull vs Hybrid:
Hybrid gives best balance between read/write performance
SQL vs NoSQL:
SQL: SQL: For user data, relationships (strong consistency)
NoSQL: NoSQL (Cassandra): For tweets, feeds (high write throughput)
π Optimizations
- βCDN: Cache media (images/videos) globally
- βLazy Loading: Load tweets as user scrolls (pagination)
- βRead Replicas: Scale read operations
- βRate Limiting: Prevent spam and abuse
π€ Follow-up Questions
Q: How would you handle trending topics?
A: Use a separate service to track hashtag frequency in real-time. Keep top trending topics in cache, update every 5 minutes.
Q: How to implement @mentions and replies?
A: Store mentions in separate table with tweet_id and mentioned_user_id. Send notification via message queue when mentioned.
Q: How to prevent spam and bots?
A: Rate limiting (max tweets/hour), CAPTCHA for suspicious activity, ML models to detect spam patterns.
Q: How would you implement search?
A: Use Elasticsearch for full-text search on tweets. Index tweets asynchronously. Support hashtag and user search.
π Real-World Architecture
- β’ Twitter uses Manhattan (distributed database) and Gizmoduck (user service)
- β’ Facebook uses TAO (distributed data store) and memcached for social graph
- β’ Instagram uses Cassandra for feeds and Redis for caching