Design Rate Limiter
Explain Like I'm 5
Imagine you have a water fountain at school, and EVERYONE wants to drink at the same time! If everyone pushes and rushes, the fountain breaks and nobody gets water. So what do you do? You make a rule: "Each person can only take 5 sips per minute!" This way, everyone gets a fair turn, and the fountain doesn't break. That's exactly what a rate limiter does - it makes sure nobody uses too much of something (like a website or API) so it stays working for everyone! It's like a teacher saying "One question per student every 5 minutes" so the teacher doesn't get overwhelmed!
Key Features
- •Limit number of requests per user/IP per time window
- •Protect servers from being overwhelmed (DDoS protection)
- •Prevent abuse and ensure fair usage
- •Different limits for different user tiers (free vs premium)
- •Return 429 (Too Many Requests) when limit exceeded
- •Track usage and provide headers showing remaining quota
Requirements
Functional Requirements:
- •Limit requests per user based on time windows (per second, minute, hour)
- •Support multiple rate limiting strategies
- •Return appropriate error when limit exceeded
- •Allow different limits for different API endpoints
- •Support whitelisting/blacklisting of IPs
Non-Functional Requirements:
- •Low latency (<10ms overhead)
- •High availability - rate limiter failure shouldn't block traffic
- •Distributed rate limiting across multiple servers
- •Accurate counting even under high load
Rate Limiting Algorithms
1. Token Bucket
Bucket holds tokens. Each request consumes a token. Tokens refill at fixed rate.
Pros: Handles bursts, simple implementation, memory efficient
Cons: Can allow bursts that overwhelm downstream services
2. Leaky Bucket
Requests enter bucket. Process at fixed rate (leak). Bucket overflows = reject request.
Pros: Smooth output rate, prevents bursts
Cons: Recent requests may be dropped even if old requests are processed
3. Fixed Window Counter
Count requests in fixed time windows (e.g., 0-60s, 60-120s). Reset counter at window boundary.
Pros: Simple, memory efficient, easy to understand
Cons: Burst at window boundaries (can get 2x limit at edges)
4. Sliding Window Log
Store timestamp of each request. Count requests in sliding time window.
Pros: Very accurate, no boundary issues
Cons: Memory intensive (store all timestamps)
5. Sliding Window Counter
Hybrid of fixed window and sliding window. Estimate count based on weighted average.
Pros: Memory efficient, more accurate than fixed window
Cons: Approximation (not 100% accurate)
System Architecture
┌─────────────┐
│ Clients │
└──────┬──────┘
│
│ HTTP Requests
│
┌──────▼─────────────┐
│ API Gateway │
│ (Rate Limiter) │
└──────┬─────────────┘
│
│ Check Rate Limit
│
┌──────▼─────────────┐
│ Redis │
│ (Distributed │
│ Counter Store) │
└────────────────────┘
│
│ If limit OK
│
┌──────▼─────────────┐
│ Application │
│ Servers │
└────────────────────┘Deep Dive: Implementations
1. Token Bucket Algorithm
Most popular algorithm. Allows bursts while maintaining average rate.
public class TokenBucketRateLimiter { private final long capacity; // Max tokens in bucket private final long refillRate; // Tokens added per second private long availableTokens; // Current tokens private long lastRefillTimestamp; // Last refill time private final Object lock = new Object(); public TokenBucketRateLimiter(long capacity, long refillRate) { this.capacity = capacity; this.refillRate = refillRate; this.availableTokens = capacity; this.lastRefillTimestamp = System.currentTimeMillis(); } /** * Attempts to consume tokens for a request. * Returns true if allowed, false if rate limit exceeded. */ public boolean allowRequest(long tokens) { synchronized (lock) { // 1. Refill tokens based on time elapsed refill(); // 2. Check if enough tokens available if (availableTokens >= tokens) { availableTokens -= tokens; return true; // Request allowed } return false; // Rate limit exceeded } } /** * Refills tokens based on elapsed time. */ private void refill() { long now = System.currentTimeMillis(); long elapsedSeconds = (now - lastRefillTimestamp) / 1000; if (elapsedSeconds > 0) { // Calculate tokens to add long tokensToAdd = elapsedSeconds * refillRate; // Add tokens, but don't exceed capacity availableTokens = Math.min(capacity, availableTokens + tokensToAdd); lastRefillTimestamp = now; } } /** * Gets current token count and capacity info. */ public RateLimitInfo getInfo() { synchronized (lock) { refill(); return new RateLimitInfo( availableTokens, capacity, (capacity - availableTokens), // tokens used availableTokens > 0 ? 0 : calculateRetryAfter() ); } } private long calculateRetryAfter() { // Calculate seconds until next token is available return (long) Math.ceil(1.0 / refillRate); } public static void main(String[] args) throws InterruptedException { // Create rate limiter: 10 requests per second, burst of 10 TokenBucketRateLimiter limiter = new TokenBucketRateLimiter(10, 10); // Simulate requests System.out.println("Sending 15 requests rapidly..."); for (int i = 1; i <= 15; i++) { boolean allowed = limiter.allowRequest(1); System.out.println("Request " + i + ": " + (allowed ? "✓ ALLOWED" : "✗ BLOCKED (rate limit)")); } // Wait 1 second for refill System.out.println("\nWaiting 1 second for token refill..."); Thread.sleep(1000); // Try again System.out.println("\nSending 5 more requests..."); for (int i = 16; i <= 20; i++) { boolean allowed = limiter.allowRequest(1); System.out.println("Request " + i + ": " + (allowed ? "✓ ALLOWED" : "✗ BLOCKED")); } } static class RateLimitInfo { long remaining; long limit; long used; long retryAfter; RateLimitInfo(long remaining, long limit, long used, long retryAfter) { this.remaining = remaining; this.limit = limit; this.used = used; this.retryAfter = retryAfter; } }}2. Distributed Rate Limiter (Redis)
For multiple servers, use Redis to store counters with atomic operations.
public class DistributedRateLimiter { private final RedisClient redis; private final long maxRequests; private final long windowSeconds; public DistributedRateLimiter(RedisClient redis, long maxRequests, long windowSeconds) { this.redis = redis; this.maxRequests = maxRequests; this.windowSeconds = windowSeconds; } /** * Checks if request is allowed using sliding window counter in Redis. */ public boolean allowRequest(String userId) { String key = "rate_limit:" + userId; long now = System.currentTimeMillis(); long windowStart = now - (windowSeconds * 1000); try { // Use Redis Sorted Set for sliding window // Score = timestamp, Value = unique request ID // 1. Remove old entries outside the window redis.zremrangeByScore(key, 0, windowStart); // 2. Count requests in current window long currentCount = redis.zcard(key); // 3. Check if under limit if (currentCount < maxRequests) { // 4. Add current request String requestId = UUID.randomUUID().toString(); redis.zadd(key, now, requestId); // 5. Set expiry on key (cleanup) redis.expire(key, windowSeconds + 1); return true; // Request allowed } return false; // Rate limit exceeded } catch (Exception e) { // Fail open: allow request if Redis is down System.err.println("Redis error, allowing request: " + e.getMessage()); return true; } } /** * Gets rate limit info for user. */ public RateLimitInfo getInfo(String userId) { String key = "rate_limit:" + userId; long now = System.currentTimeMillis(); long windowStart = now - (windowSeconds * 1000); try { // Clean old entries redis.zremrangeByScore(key, 0, windowStart); // Get current count long currentCount = redis.zcard(key); long remaining = Math.max(0, maxRequests - currentCount); // Calculate retry after long retryAfter = 0; if (currentCount >= maxRequests) { // Get timestamp of oldest request in window List<String> oldest = redis.zrange(key, 0, 0); if (!oldest.isEmpty()) { long oldestTimestamp = redis.zscore(key, oldest.get(0)).longValue(); retryAfter = ((oldestTimestamp + (windowSeconds * 1000)) - now) / 1000; } } return new RateLimitInfo(remaining, maxRequests, currentCount, retryAfter); } catch (Exception e) { System.err.println("Redis error: " + e.getMessage()); return new RateLimitInfo(maxRequests, maxRequests, 0, 0); } } /** * Alternative: Fixed window counter (simpler but less accurate). */ public boolean allowRequestFixedWindow(String userId) { String key = "rate_limit:" + userId; long windowStart = System.currentTimeMillis() / (windowSeconds * 1000); String windowKey = key + ":" + windowStart; try { // Increment counter atomically long count = redis.incr(windowKey); // Set expiry on first request in window if (count == 1) { redis.expire(windowKey, windowSeconds); } return count <= maxRequests; } catch (Exception e) { System.err.println("Redis error: " + e.getMessage()); return true; // Fail open } } static class RateLimitInfo { long remaining; long limit; long used; long retryAfter; RateLimitInfo(long remaining, long limit, long used, long retryAfter) { this.remaining = remaining; this.limit = limit; this.used = used; this.retryAfter = retryAfter; } }}3. Rate Limiter Middleware
Integrate rate limiter into API Gateway to check every request.
public class RateLimiterMiddleware { private final DistributedRateLimiter rateLimiter; private final Map<String, RateLimitConfig> endpointConfigs; public RateLimiterMiddleware(RedisClient redis) { this.endpointConfigs = new HashMap<>(); // Configure different limits per endpoint endpointConfigs.put("/api/login", new RateLimitConfig(5, 60)); // 5 requests per minute endpointConfigs.put("/api/search", new RateLimitConfig(100, 60)); // 100 requests per minute endpointConfigs.put("/api/upload", new RateLimitConfig(10, 3600)); // 10 requests per hour this.rateLimiter = new DistributedRateLimiter(redis, 100, 60); } /** * Processes incoming HTTP request and applies rate limiting. */ public HttpResponse handleRequest(HttpRequest request) { // 1. Extract user identifier (user ID, IP address, API key) String userId = extractUserId(request); if (userId == null) { userId = request.getClientIP(); // Fallback to IP } // 2. Get rate limit config for this endpoint String endpoint = request.getPath(); RateLimitConfig config = getConfigForEndpoint(endpoint); // 3. Create rate limiter for this config DistributedRateLimiter limiter = new DistributedRateLimiter( redis, config.maxRequests, config.windowSeconds ); // 4. Check rate limit String rateLimitKey = userId + ":" + endpoint; boolean allowed = limiter.allowRequest(rateLimitKey); if (!allowed) { // 5. Rate limit exceeded - return 429 error RateLimitInfo info = limiter.getInfo(rateLimitKey); return createRateLimitResponse(info); } // 6. Request allowed - add rate limit headers RateLimitInfo info = limiter.getInfo(rateLimitKey); HttpResponse response = processRequest(request); // Continue to app addRateLimitHeaders(response, info); return response; } /** * Returns 429 Too Many Requests response. */ private HttpResponse createRateLimitResponse(RateLimitInfo info) { HttpResponse response = new HttpResponse(); response.setStatus(429); // Too Many Requests response.setHeader("Content-Type", "application/json"); response.setHeader("X-RateLimit-Limit", String.valueOf(info.limit)); response.setHeader("X-RateLimit-Remaining", "0"); response.setHeader("X-RateLimit-Reset", String.valueOf(info.retryAfter)); response.setHeader("Retry-After", String.valueOf(info.retryAfter)); String body = String.format( "{\"error\": \"Rate limit exceeded\", " + "\"message\": \"Too many requests. Retry after %d seconds.\", " + "\"retryAfter\": %d}", info.retryAfter, info.retryAfter ); response.setBody(body); return response; } /** * Adds rate limit headers to successful response. */ private void addRateLimitHeaders(HttpResponse response, RateLimitInfo info) { response.setHeader("X-RateLimit-Limit", String.valueOf(info.limit)); response.setHeader("X-RateLimit-Remaining", String.valueOf(info.remaining)); response.setHeader("X-RateLimit-Used", String.valueOf(info.used)); if (info.retryAfter > 0) { response.setHeader("X-RateLimit-Reset", String.valueOf(info.retryAfter)); } } /** * Extracts user ID from JWT token or session. */ private String extractUserId(HttpRequest request) { String authHeader = request.getHeader("Authorization"); if (authHeader != null && authHeader.startsWith("Bearer ")) { String token = authHeader.substring(7); try { JWTPayload payload = JWTValidator.decode(token); return payload.getUserId(); } catch (Exception e) { return null; } } return null; } /** * Gets rate limit config for endpoint (with defaults). */ private RateLimitConfig getConfigForEndpoint(String endpoint) { // Check for exact match if (endpointConfigs.containsKey(endpoint)) { return endpointConfigs.get(endpoint); } // Check for prefix match (e.g., /api/*) for (Map.Entry<String, RateLimitConfig> entry : endpointConfigs.entrySet()) { if (endpoint.startsWith(entry.getKey())) { return entry.getValue(); } } // Default: 100 requests per minute return new RateLimitConfig(100, 60); } static class RateLimitConfig { long maxRequests; long windowSeconds; RateLimitConfig(long maxRequests, long windowSeconds) { this.maxRequests = maxRequests; this.windowSeconds = windowSeconds; } }}Trade-offs and Best Practices
Token Bucket vs Fixed Window: Token bucket allows bursts (better UX), fixed window simpler (less memory)
Local vs Distributed: Local faster but inconsistent across servers, distributed consistent but Redis dependency
Fail Open vs Fail Closed: Fail open (allow on error) better availability, fail closed (block on error) better security
Per User vs Per IP: Per user more accurate but requires auth, per IP simpler but can block legitimate users behind NAT
Best Practices
Real-World Examples
- ✓GitHub API: 5,000 requests/hour for authenticated, 60/hour for unauthenticated
- ✓Twitter API: 300 requests per 15-minute window
- ✓Stripe API: 100 requests per second with token bucket
- ✓Cloudflare: Layer 7 rate limiting with custom rules