Home/System Design/Design Twitter/X Feed

Design Twitter/X Feed

🎯 Explain Like I'm 5...

Imagine a magical bulletin board where your friends post messages, and you can see all their messages in order, newest first! That's what a Twitter/X feed is!

What is a Twitter/X Feed?

A personalized stream of posts (tweets) from people you follow, shown in reverse chronological order (newest first).

Example:

You follow Alice, Bob, and Charlie. When they post tweets, you see them in your feed!

🔨 How It Works (Simple Version)

1. You follow people (like subscribing to their posts)
2. When they tweet, their post goes to all their followers' feeds
3. Your feed shows all tweets from people you follow, newest first
4. You can scroll down to see older tweets!

🎯 Key Features to Design

✓Post Tweet: Users can create and publish tweets
✓Follow/Unfollow: Users can follow other users
✓Timeline/Feed: Show tweets from followed users
✓Like & Retweet: Interact with tweets
✓Notifications: Alert users about interactions

📋 Requirements

Functional Requirements:

• Users can post tweets (280 characters max)
• Users can follow/unfollow other users
• Users see a timeline of tweets from followed users
• Users can like, retweet, and reply to tweets
• Real-time updates for new tweets

Non-Functional Requirements:

• High availability (99.99% uptime)
• Low latency (feed loads in <200ms)
• Scalability (500M users, 200M daily active)
• Handle 6,000 tweets per second

📊 Capacity Estimation

Assumptions:

• 500 million total users
• 200 million daily active users (DAU)
• Each user follows 200 people on average
• Each user posts 2 tweets per day
• Each tweet is ~280 characters (~500 bytes with metadata)

Calculations:

• Daily tweets: 200M users * 2 tweets = 400M tweets/day
• Tweets per second: 400M / 86,400 ≈ 4,630 tweets/sec
• Peak tweets/sec (assume 2x): ~10,000 tweets/sec
• Feed requests: 200M users * 20 reads/day = 4B reads/day
• Reads per second: 4B / 86,400 ≈ 46,000 reads/sec
• Storage per day: 400M * 500 bytes = 200GB/day
• Storage per year: 200GB * 365 = ~73TB/year

🔌 API Design

API Endpoints

java

// 1. Post Tweet
POST /api/v1/tweets
Request Body:
{
  "userId": 12345,
  "content": "Hello World! #Java",
  "mediaUrls": ["https://cdn.example.com/image.jpg"]
}
Response:
{
  "tweetId": 987654321,
  "userId": 12345,
  "content": "Hello World! #Java",
  "createdAt": "2024-01-15T10:30:00Z",
  "likes": 0,
  "retweets": 0
}
// 2. Get User Timeline (Feed)
GET /api/v1/timeline?userId=12345&page=1&pageSize=20
Response:
{
  "tweets": [
    {
      "tweetId": 999,
      "userId": 67890,
      "username": "alice",
      "content": "Excited about system design!",
      "createdAt": "2024-01-15T11:00:00Z",
      "likes": 42,
      "retweets": 5
    }
    // ... more tweets
  ],
  "nextPage": 2,
  "hasMore": true
}
// 3. Follow User
POST /api/v1/follow
Request Body:
{
  "followerId": 12345,
  "followeeId": 67890
}
// 4. Like Tweet
POST /api/v1/tweets/{tweetId}/like
Request Body:
{
  "userId": 12345
}
// 5. Retweet
POST /api/v1/tweets/{tweetId}/retweet
Request Body:
{
  "userId": 12345,
  "comment": "Great insight!" // optional
}

💾 Database Schema

Database Schema (SQL)

sql

-- Users Table
CREATE TABLE users (
    user_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    bio TEXT,
    profile_image_url VARCHAR(500),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    followers_count INT DEFAULT 0,
    following_count INT DEFAULT 0,
    INDEX idx_username (username)
);
-- Tweets Table (Use NoSQL like Cassandra in production)
CREATE TABLE tweets (
    tweet_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id BIGINT NOT NULL,
    content VARCHAR(280) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    likes_count INT DEFAULT 0,
    retweets_count INT DEFAULT 0,
    replies_count INT DEFAULT 0,
    FOREIGN KEY (user_id) REFERENCES users(user_id),
    INDEX idx_user_created (user_id, created_at DESC)
);
-- Followers Table (Relationships)
CREATE TABLE followers (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id),
    FOREIGN KEY (follower_id) REFERENCES users(user_id),
    FOREIGN KEY (followee_id) REFERENCES users(user_id),
    INDEX idx_followee (followee_id)
);
-- Timeline/Feed Cache Table (Pre-computed feeds)
CREATE TABLE user_timeline (
    user_id BIGINT NOT NULL,
    tweet_id BIGINT NOT NULL,
    tweet_created_at TIMESTAMP NOT NULL,
    PRIMARY KEY (user_id, tweet_created_at, tweet_id),
    INDEX idx_user_timeline (user_id, tweet_created_at DESC)
);
-- Likes Table
CREATE TABLE likes (
    user_id BIGINT NOT NULL,
    tweet_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, tweet_id),
    FOREIGN KEY (user_id) REFERENCES users(user_id),
    FOREIGN KEY (tweet_id) REFERENCES tweets(tweet_id),
    INDEX idx_tweet_likes (tweet_id)
);

Feed Generation Strategies

Strategy 1: Fan-out on Write (Push Model)

When user tweets, immediately push to all followers' feeds

Pros:

• Fast read (feed is pre-computed)
• Simple to implement

Cons:

• Slow write for users with many followers (celebrities)
• Wastes space for inactive users

Strategy 2: Fan-out on Read (Pull Model)

Generate feed on-demand by querying followed users' tweets

Pros:

• No wasted computation for inactive users
• Works well for users with many followers

Cons:

• Slow read (must query many users)
• High latency

Strategy 3: Hybrid Approach (Recommended)

Combine both approaches based on user type

• Regular users: Fan-out on write (push to followers)
• Celebrities (>1M followers): Fan-out on read
• Cache popular tweets in Redis
• Use message queue for async processing

Feed Generation Service (Java)

java

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

public class FeedGenerationService {
    private final TweetRepository tweetRepository;
    private final FollowerRepository followerRepository;
    private final CacheService cacheService;
    private final MessageQueueService messageQueue;
    /**
     * Post tweet and fanout to followers (Hybrid approach)
     */
    public Tweet postTweet(long userId, String content) {
        // 1. Save tweet to database
        Tweet tweet = tweetRepository.save(new Tweet(userId, content));
        // 2. Check user's follower count
        int followerCount = followerRepository.getFollowerCount(userId);
        if (followerCount < 1_000_000) {
            // Regular user: Fan-out on write (push to followers)
            fanoutOnWrite(tweet);
        } else {
            // Celebrity: Fan-out on read (pull when requested)
            // Just cache the tweet for fast access
            cacheService.cacheTweet(tweet);
        }
        return tweet;
    }
    /**
     * Fan-out on write: Push tweet to all followers' feeds
     */
    private void fanoutOnWrite(Tweet tweet) {
        // Publish to message queue for async processing
        messageQueue.publish("fanout-queue", tweet);
        // Workers will consume and push to followers' feeds
        // This prevents blocking the API response
    }
    /**
     * Get user timeline (feed)
     */
    public List<Tweet> getTimeline(long userId, int page, int pageSize) {
        String cacheKey = "timeline:" + userId + ":" + page;
        // 1. Check cache first
        List<Tweet> cachedFeed = cacheService.get(cacheKey);
        if (cachedFeed != null) {
            return cachedFeed;
        }
        // 2. Get list of users this user follows
        List<Long> followingIds = followerRepository.getFollowing(userId);
        // 3. Check if any following are celebrities
        List<Long> celebrities = filterCelebrities(followingIds);
        List<Long> regularUsers = filterRegularUsers(followingIds);
        List<Tweet> feed = new ArrayList<>();
        // 4. For regular users: Get from pre-computed timeline
        if (!regularUsers.isEmpty()) {
            feed.addAll(getPrecomputedTimeline(userId, page, pageSize));
        }
        // 5. For celebrities: Fan-out on read (pull their latest tweets)
        if (!celebrities.isEmpty()) {
            feed.addAll(getLatestTweetsFromCelebrities(celebrities, pageSize));
        }
        // 6. Merge and sort by timestamp (most recent first)
        feed.sort((a, b) -> b.getCreatedAt().compareTo(a.getCreatedAt()));
        // 7. Take only pageSize tweets
        feed = feed.subList(0, Math.min(pageSize, feed.size()));
        // 8. Cache the result
        cacheService.set(cacheKey, feed, 300); // 5 min TTL
        return feed;
    }
    private List<Long> filterCelebrities(List<Long> userIds) {
        return userIds.stream()
            .filter(id -> followerRepository.getFollowerCount(id) > 1_000_000)
            .collect(Collectors.toList());
    }
    private List<Long> filterRegularUsers(List<Long> userIds) {
        return userIds.stream()
            .filter(id -> followerRepository.getFollowerCount(id) <= 1_000_000)
            .collect(Collectors.toList());
    }
    private List<Tweet> getPrecomputedTimeline(long userId, int page, int pageSize) {
        int offset = (page - 1) * pageSize;
        return tweetRepository.getUserTimeline(userId, offset, pageSize);
    }
    private List<Tweet> getLatestTweetsFromCelebrities(
            List<Long> celebrityIds, int limit) {
        // Pull latest tweets from celebrities
        return tweetRepository.getLatestTweets(celebrityIds, limit);
    }
}
/**
 * Fanout Worker - Consumes from message queue
 */
public class FanoutWorker {
    private final FollowerRepository followerRepository;
    private final TimelineRepository timelineRepository;
    private final MessageQueueService messageQueue;
    public void start() {
        messageQueue.consume("fanout-queue", this::processFanout);
    }
    private void processFanout(Tweet tweet) {
        // Get all followers of the user who posted the tweet
        List<Long> followerIds = followerRepository.getFollowers(tweet.getUserId());
        // Push tweet to each follower's timeline (batch insert)
        for (Long followerId : followerIds) {
            timelineRepository.addToTimeline(followerId, tweet.getTweetId());
        }
        System.out.println("Fanned out tweet " + tweet.getTweetId() +
                           " to " + followerIds.size() + " followers");
    }
}

🏛️ High-Level System Design


                    ┌──────────────┐
                    │   Clients    │
                    │ (Web/Mobile) │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │Load Balancer │
                    └──────┬───────┘
                           │
           ┌───────────────┼───────────────┐
           │               │               │
    ┌──────▼─────┐  ┌─────▼──────┐  ┌────▼──────┐
    │   API      │  │    API     │  │    API    │
    │ Server 1   │  │  Server 2  │  │ Server 3  │
    └──────┬─────┘  └─────┬──────┘  └────┬──────┘
           │               │               │
           └───────────────┼───────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
    ┌───▼────┐      ┌──────▼──────┐    ┌─────▼─────┐
    │ Tweet  │      │  Timeline   │    │  Fanout   │
    │Service │      │   Service   │    │  Service  │
    └───┬────┘      └──────┬──────┘    └─────┬─────┘
        │                  │                  │
        │           ┌──────▼──────┐          │
        │           │   Cache     │          │
        │           │  (Redis)    │          │
        │           └─────────────┘          │
        │                                    │
        │           ┌──────────────┐         │
        └──────────►│   Message    │◄────────┘
                    │    Queue     │
                    │   (Kafka)    │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │   Fanout     │
                    │   Workers    │
                    └──────┬───────┘
                           │
        ┌──────────────────┴──────────────────┐
        │                                     │
    ┌───▼────────┐                    ┌───────▼──────┐
    │  User DB   │                    │   Tweet DB   │
    │  (MySQL)   │                    │ (Cassandra)  │
    │            │                    │              │
    │ - Users    │                    │ - Tweets     │
    │ - Followers│                    │ - Timeline   │
    └────────────┘                    └──────────────┘

Components:

• Load Balancer: Distribute incoming traffic
• API Servers: Handle user requests
• Tweet Service: Create and store tweets
• Timeline Service: Generate user feeds
• Fanout Service: Distribute tweets to followers
• Cache (Redis): Store hot feeds and tweets
• Message Queue (Kafka): Async fanout processing
• Databases: User data, tweets, relationships

Data Flow:

Post Tweet Flow:

1. User posts tweet → API Server
2. Tweet Service saves to database
3. Fanout Service publishes to message queue
4. Workers consume and push to followers' feeds
5. Feeds cached in Redis for fast access

Read Feed Flow:

1. User requests feed → API Server
2. Timeline Service checks Redis cache
3. If cache hit → return cached feed
4. If cache miss → query database
5. Merge results and cache for future

🔍 Deep Dive Topics

Database Sharding

Partition data across multiple databases by user ID for scalability

Caching Strategy

Cache user feeds (last 1000 tweets) and popular tweets in Redis with TTL

Feed Ranking Algorithm

Sort by engagement score = likes + retweets + replies, with time decay

Real-time Updates

Use WebSockets or Server-Sent Events for live feed updates

⚖️ Trade-offs & Decisions

Consistency vs Availability:

Choose eventual consistency (AP in CAP theorem) for better availability

Push vs Pull vs Hybrid:

Hybrid gives best balance between read/write performance

SQL vs NoSQL:

SQL: SQL: For user data, relationships (strong consistency)

NoSQL: NoSQL (Cassandra): For tweets, feeds (high write throughput)

🚀 Optimizations

→CDN: Cache media (images/videos) globally
→Lazy Loading: Load tweets as user scrolls (pagination)
→Read Replicas: Scale read operations
→Rate Limiting: Prevent spam and abuse

🤔 Follow-up Questions

Q: How would you handle trending topics?

A: Use a separate service to track hashtag frequency in real-time. Keep top trending topics in cache, update every 5 minutes.

Q: How to implement @mentions and replies?

A: Store mentions in separate table with tweet_id and mentioned_user_id. Send notification via message queue when mentioned.

Q: How to prevent spam and bots?

A: Rate limiting (max tweets/hour), CAPTCHA for suspicious activity, ML models to detect spam patterns.

Q: How would you implement search?

A: Use Elasticsearch for full-text search on tweets. Index tweets asynchronously. Support hashtag and user search.

🌍 Real-World Architecture

• Twitter uses Manhattan (distributed database) and Gizmoduck (user service)
• Facebook uses TAO (distributed data store) and memcached for social graph
• Instagram uses Cassandra for feeds and Redis for caching