Design Rate Limiter

Explain Like I'm 5

Imagine you have a water fountain at school, and EVERYONE wants to drink at the same time! If everyone pushes and rushes, the fountain breaks and nobody gets water. So what do you do? You make a rule: "Each person can only take 5 sips per minute!" This way, everyone gets a fair turn, and the fountain doesn't break. That's exactly what a rate limiter does - it makes sure nobody uses too much of something (like a website or API) so it stays working for everyone! It's like a teacher saying "One question per student every 5 minutes" so the teacher doesn't get overwhelmed!

Key Features

•Limit number of requests per user/IP per time window
•Protect servers from being overwhelmed (DDoS protection)
•Prevent abuse and ensure fair usage
•Different limits for different user tiers (free vs premium)
•Return 429 (Too Many Requests) when limit exceeded
•Track usage and provide headers showing remaining quota

Requirements

Functional Requirements:

•Limit requests per user based on time windows (per second, minute, hour)
•Support multiple rate limiting strategies
•Return appropriate error when limit exceeded
•Allow different limits for different API endpoints
•Support whitelisting/blacklisting of IPs

Non-Functional Requirements:

•Low latency (<10ms overhead)
•High availability - rate limiter failure shouldn't block traffic
•Distributed rate limiting across multiple servers
•Accurate counting even under high load

Rate Limiting Algorithms

1. Token Bucket

Bucket holds tokens. Each request consumes a token. Tokens refill at fixed rate.

Pros: Handles bursts, simple implementation, memory efficient

Cons: Can allow bursts that overwhelm downstream services

2. Leaky Bucket

Requests enter bucket. Process at fixed rate (leak). Bucket overflows = reject request.

Pros: Smooth output rate, prevents bursts

Cons: Recent requests may be dropped even if old requests are processed

3. Fixed Window Counter

Count requests in fixed time windows (e.g., 0-60s, 60-120s). Reset counter at window boundary.

Pros: Simple, memory efficient, easy to understand

Cons: Burst at window boundaries (can get 2x limit at edges)

4. Sliding Window Log

Store timestamp of each request. Count requests in sliding time window.

Pros: Very accurate, no boundary issues

Cons: Memory intensive (store all timestamps)

5. Sliding Window Counter

Hybrid of fixed window and sliding window. Estimate count based on weighted average.

Pros: Memory efficient, more accurate than fixed window

Cons: Approximation (not 100% accurate)

System Architecture


┌─────────────┐
│   Clients   │
└──────┬──────┘
       │
       │  HTTP Requests
       │
┌──────▼─────────────┐
│   API Gateway      │
│  (Rate Limiter)    │
└──────┬─────────────┘
       │
       │  Check Rate Limit
       │
┌──────▼─────────────┐
│      Redis         │
│  (Distributed      │
│   Counter Store)   │
└────────────────────┘
       │
       │  If limit OK
       │
┌──────▼─────────────┐
│  Application       │
│    Servers         │
└────────────────────┘

Deep Dive: Implementations

1. Token Bucket Algorithm

Most popular algorithm. Allows bursts while maintaining average rate.

TokenBucketRateLimiter.java

java

100

101

102

103

104

105

106

107

108

109

110

public class TokenBucketRateLimiter {
    private final long capacity;        // Max tokens in bucket
    private final long refillRate;      // Tokens added per second
    private long availableTokens;       // Current tokens
    private long lastRefillTimestamp;   // Last refill time
    private final Object lock = new Object();
    public TokenBucketRateLimiter(long capacity, long refillRate) {
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.availableTokens = capacity;
        this.lastRefillTimestamp = System.currentTimeMillis();
    }
    /**
     * Attempts to consume tokens for a request.
     * Returns true if allowed, false if rate limit exceeded.
     */
    public boolean allowRequest(long tokens) {
        synchronized (lock) {
            // 1. Refill tokens based on time elapsed
            refill();
            // 2. Check if enough tokens available
            if (availableTokens >= tokens) {
                availableTokens -= tokens;
                return true;  // Request allowed
            }
            return false;  // Rate limit exceeded
        }
    }
    /**
     * Refills tokens based on elapsed time.
     */
    private void refill() {
        long now = System.currentTimeMillis();
        long elapsedSeconds = (now - lastRefillTimestamp) / 1000;
        if (elapsedSeconds > 0) {
            // Calculate tokens to add
            long tokensToAdd = elapsedSeconds * refillRate;
            // Add tokens, but don't exceed capacity
            availableTokens = Math.min(capacity, availableTokens + tokensToAdd);
            lastRefillTimestamp = now;
        }
    }
    /**
     * Gets current token count and capacity info.
     */
    public RateLimitInfo getInfo() {
        synchronized (lock) {
            refill();
            return new RateLimitInfo(
                availableTokens,
                capacity,
                (capacity - availableTokens),  // tokens used
                availableTokens > 0 ? 0 : calculateRetryAfter()
            );
        }
    }
    private long calculateRetryAfter() {
        // Calculate seconds until next token is available
        return (long) Math.ceil(1.0 / refillRate);
    }
    public static void main(String[] args) throws InterruptedException {
        // Create rate limiter: 10 requests per second, burst of 10
        TokenBucketRateLimiter limiter = new TokenBucketRateLimiter(10, 10);
        // Simulate requests
        System.out.println("Sending 15 requests rapidly...");
        for (int i = 1; i <= 15; i++) {
            boolean allowed = limiter.allowRequest(1);
            System.out.println("Request " + i + ": " +
                (allowed ? "✓ ALLOWED" : "✗ BLOCKED (rate limit)"));
        }
        // Wait 1 second for refill
        System.out.println("\nWaiting 1 second for token refill...");
        Thread.sleep(1000);
        // Try again
        System.out.println("\nSending 5 more requests...");
        for (int i = 16; i <= 20; i++) {
            boolean allowed = limiter.allowRequest(1);
            System.out.println("Request " + i + ": " +
                (allowed ? "✓ ALLOWED" : "✗ BLOCKED"));
        }
    }
    static class RateLimitInfo {
        long remaining;
        long limit;
        long used;
        long retryAfter;
        RateLimitInfo(long remaining, long limit, long used, long retryAfter) {
            this.remaining = remaining;
            this.limit = limit;
            this.used = used;
            this.retryAfter = retryAfter;
        }
    }
}

2. Distributed Rate Limiter (Redis)

For multiple servers, use Redis to store counters with atomic operations.

DistributedRateLimiter.java

java

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

public class DistributedRateLimiter {
    private final RedisClient redis;
    private final long maxRequests;
    private final long windowSeconds;
    public DistributedRateLimiter(RedisClient redis, long maxRequests, long windowSeconds) {
        this.redis = redis;
        this.maxRequests = maxRequests;
        this.windowSeconds = windowSeconds;
    }
    /**
     * Checks if request is allowed using sliding window counter in Redis.
     */
    public boolean allowRequest(String userId) {
        String key = "rate_limit:" + userId;
        long now = System.currentTimeMillis();
        long windowStart = now - (windowSeconds * 1000);
        try {
            // Use Redis Sorted Set for sliding window
            // Score = timestamp, Value = unique request ID
            // 1. Remove old entries outside the window
            redis.zremrangeByScore(key, 0, windowStart);
            // 2. Count requests in current window
            long currentCount = redis.zcard(key);
            // 3. Check if under limit
            if (currentCount < maxRequests) {
                // 4. Add current request
                String requestId = UUID.randomUUID().toString();
                redis.zadd(key, now, requestId);
                // 5. Set expiry on key (cleanup)
                redis.expire(key, windowSeconds + 1);
                return true;  // Request allowed
            }
            return false;  // Rate limit exceeded
        } catch (Exception e) {
            // Fail open: allow request if Redis is down
            System.err.println("Redis error, allowing request: " + e.getMessage());
            return true;
        }
    }
    /**
     * Gets rate limit info for user.
     */
    public RateLimitInfo getInfo(String userId) {
        String key = "rate_limit:" + userId;
        long now = System.currentTimeMillis();
        long windowStart = now - (windowSeconds * 1000);
        try {
            // Clean old entries
            redis.zremrangeByScore(key, 0, windowStart);
            // Get current count
            long currentCount = redis.zcard(key);
            long remaining = Math.max(0, maxRequests - currentCount);
            // Calculate retry after
            long retryAfter = 0;
            if (currentCount >= maxRequests) {
                // Get timestamp of oldest request in window
                List<String> oldest = redis.zrange(key, 0, 0);
                if (!oldest.isEmpty()) {
                    long oldestTimestamp = redis.zscore(key, oldest.get(0)).longValue();
                    retryAfter = ((oldestTimestamp + (windowSeconds * 1000)) - now) / 1000;
                }
            }
            return new RateLimitInfo(remaining, maxRequests, currentCount, retryAfter);
        } catch (Exception e) {
            System.err.println("Redis error: " + e.getMessage());
            return new RateLimitInfo(maxRequests, maxRequests, 0, 0);
        }
    }
    /**
     * Alternative: Fixed window counter (simpler but less accurate).
     */
    public boolean allowRequestFixedWindow(String userId) {
        String key = "rate_limit:" + userId;
        long windowStart = System.currentTimeMillis() / (windowSeconds * 1000);
        String windowKey = key + ":" + windowStart;
        try {
            // Increment counter atomically
            long count = redis.incr(windowKey);
            // Set expiry on first request in window
            if (count == 1) {
                redis.expire(windowKey, windowSeconds);
            }
            return count <= maxRequests;
        } catch (Exception e) {
            System.err.println("Redis error: " + e.getMessage());
            return true;  // Fail open
        }
    }
    static class RateLimitInfo {
        long remaining;
        long limit;
        long used;
        long retryAfter;
        RateLimitInfo(long remaining, long limit, long used, long retryAfter) {
            this.remaining = remaining;
            this.limit = limit;
            this.used = used;
            this.retryAfter = retryAfter;
        }
    }
}

3. Rate Limiter Middleware

Integrate rate limiter into API Gateway to check every request.

RateLimiterMiddleware.java

java

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

public class RateLimiterMiddleware {
    private final DistributedRateLimiter rateLimiter;
    private final Map<String, RateLimitConfig> endpointConfigs;
    public RateLimiterMiddleware(RedisClient redis) {
        this.endpointConfigs = new HashMap<>();
        // Configure different limits per endpoint
        endpointConfigs.put("/api/login",
            new RateLimitConfig(5, 60));        // 5 requests per minute
        endpointConfigs.put("/api/search",
            new RateLimitConfig(100, 60));      // 100 requests per minute
        endpointConfigs.put("/api/upload",
            new RateLimitConfig(10, 3600));     // 10 requests per hour
        this.rateLimiter = new DistributedRateLimiter(redis, 100, 60);
    }
    /**
     * Processes incoming HTTP request and applies rate limiting.
     */
    public HttpResponse handleRequest(HttpRequest request) {
        // 1. Extract user identifier (user ID, IP address, API key)
        String userId = extractUserId(request);
        if (userId == null) {
            userId = request.getClientIP();  // Fallback to IP
        }
        // 2. Get rate limit config for this endpoint
        String endpoint = request.getPath();
        RateLimitConfig config = getConfigForEndpoint(endpoint);
        // 3. Create rate limiter for this config
        DistributedRateLimiter limiter = new DistributedRateLimiter(
            redis, config.maxRequests, config.windowSeconds
        );
        // 4. Check rate limit
        String rateLimitKey = userId + ":" + endpoint;
        boolean allowed = limiter.allowRequest(rateLimitKey);
        if (!allowed) {
            // 5. Rate limit exceeded - return 429 error
            RateLimitInfo info = limiter.getInfo(rateLimitKey);
            return createRateLimitResponse(info);
        }
        // 6. Request allowed - add rate limit headers
        RateLimitInfo info = limiter.getInfo(rateLimitKey);
        HttpResponse response = processRequest(request);  // Continue to app
        addRateLimitHeaders(response, info);
        return response;
    }
    /**
     * Returns 429 Too Many Requests response.
     */
    private HttpResponse createRateLimitResponse(RateLimitInfo info) {
        HttpResponse response = new HttpResponse();
        response.setStatus(429);  // Too Many Requests
        response.setHeader("Content-Type", "application/json");
        response.setHeader("X-RateLimit-Limit", String.valueOf(info.limit));
        response.setHeader("X-RateLimit-Remaining", "0");
        response.setHeader("X-RateLimit-Reset", String.valueOf(info.retryAfter));
        response.setHeader("Retry-After", String.valueOf(info.retryAfter));
        String body = String.format(
            "{\"error\": \"Rate limit exceeded\", " +
            "\"message\": \"Too many requests. Retry after %d seconds.\", " +
            "\"retryAfter\": %d}",
            info.retryAfter, info.retryAfter
        );
        response.setBody(body);
        return response;
    }
    /**
     * Adds rate limit headers to successful response.
     */
    private void addRateLimitHeaders(HttpResponse response, RateLimitInfo info) {
        response.setHeader("X-RateLimit-Limit", String.valueOf(info.limit));
        response.setHeader("X-RateLimit-Remaining", String.valueOf(info.remaining));
        response.setHeader("X-RateLimit-Used", String.valueOf(info.used));
        if (info.retryAfter > 0) {
            response.setHeader("X-RateLimit-Reset", String.valueOf(info.retryAfter));
        }
    }
    /**
     * Extracts user ID from JWT token or session.
     */
    private String extractUserId(HttpRequest request) {
        String authHeader = request.getHeader("Authorization");
        if (authHeader != null && authHeader.startsWith("Bearer ")) {
            String token = authHeader.substring(7);
            try {
                JWTPayload payload = JWTValidator.decode(token);
                return payload.getUserId();
            } catch (Exception e) {
                return null;
            }
        }
        return null;
    }
    /**
     * Gets rate limit config for endpoint (with defaults).
     */
    private RateLimitConfig getConfigForEndpoint(String endpoint) {
        // Check for exact match
        if (endpointConfigs.containsKey(endpoint)) {
            return endpointConfigs.get(endpoint);
        }
        // Check for prefix match (e.g., /api/*)
        for (Map.Entry<String, RateLimitConfig> entry : endpointConfigs.entrySet()) {
            if (endpoint.startsWith(entry.getKey())) {
                return entry.getValue();
            }
        }
        // Default: 100 requests per minute
        return new RateLimitConfig(100, 60);
    }
    static class RateLimitConfig {
        long maxRequests;
        long windowSeconds;
        RateLimitConfig(long maxRequests, long windowSeconds) {
            this.maxRequests = maxRequests;
            this.windowSeconds = windowSeconds;
        }
    }
}

Trade-offs and Best Practices

Token Bucket vs Fixed Window: Token bucket allows bursts (better UX), fixed window simpler (less memory)

Local vs Distributed: Local faster but inconsistent across servers, distributed consistent but Redis dependency

Fail Open vs Fail Closed: Fail open (allow on error) better availability, fail closed (block on error) better security

Per User vs Per IP: Per user more accurate but requires auth, per IP simpler but can block legitimate users behind NAT

Best Practices

✓Use Redis with atomic operations (INCR, ZADD) for accuracy

✓Implement graceful degradation (fail open when Redis down)

✓Return clear error messages with Retry-After header

✓Add rate limit headers (X-RateLimit-*) to all responses

✓Use different limits for different user tiers (free, pro, enterprise)

✓Monitor rate limit metrics (rejection rate, burst patterns)

✓Implement IP whitelisting for internal services

✓Consider geolocation-based limits (stricter for certain regions)

Real-World Examples

✓GitHub API: 5,000 requests/hour for authenticated, 60/hour for unauthenticated
✓Twitter API: 300 requests per 15-minute window
✓Stripe API: 100 requests per second with token bucket
✓Cloudflare: Layer 7 rate limiting with custom rules