Home/System Design/Design Rate Limiter

Design Rate Limiter

Explain Like I'm 5

Imagine you have a water fountain at school, and EVERYONE wants to drink at the same time! If everyone pushes and rushes, the fountain breaks and nobody gets water. So what do you do? You make a rule: "Each person can only take 5 sips per minute!" This way, everyone gets a fair turn, and the fountain doesn't break. That's exactly what a rate limiter does - it makes sure nobody uses too much of something (like a website or API) so it stays working for everyone! It's like a teacher saying "One question per student every 5 minutes" so the teacher doesn't get overwhelmed!

Key Features

  • Limit number of requests per user/IP per time window
  • Protect servers from being overwhelmed (DDoS protection)
  • Prevent abuse and ensure fair usage
  • Different limits for different user tiers (free vs premium)
  • Return 429 (Too Many Requests) when limit exceeded
  • Track usage and provide headers showing remaining quota

Requirements

Functional Requirements:

  • Limit requests per user based on time windows (per second, minute, hour)
  • Support multiple rate limiting strategies
  • Return appropriate error when limit exceeded
  • Allow different limits for different API endpoints
  • Support whitelisting/blacklisting of IPs

Non-Functional Requirements:

  • Low latency (<10ms overhead)
  • High availability - rate limiter failure shouldn't block traffic
  • Distributed rate limiting across multiple servers
  • Accurate counting even under high load

Rate Limiting Algorithms

1. Token Bucket

Bucket holds tokens. Each request consumes a token. Tokens refill at fixed rate.

Pros: Handles bursts, simple implementation, memory efficient

Cons: Can allow bursts that overwhelm downstream services

2. Leaky Bucket

Requests enter bucket. Process at fixed rate (leak). Bucket overflows = reject request.

Pros: Smooth output rate, prevents bursts

Cons: Recent requests may be dropped even if old requests are processed

3. Fixed Window Counter

Count requests in fixed time windows (e.g., 0-60s, 60-120s). Reset counter at window boundary.

Pros: Simple, memory efficient, easy to understand

Cons: Burst at window boundaries (can get 2x limit at edges)

4. Sliding Window Log

Store timestamp of each request. Count requests in sliding time window.

Pros: Very accurate, no boundary issues

Cons: Memory intensive (store all timestamps)

5. Sliding Window Counter

Hybrid of fixed window and sliding window. Estimate count based on weighted average.

Pros: Memory efficient, more accurate than fixed window

Cons: Approximation (not 100% accurate)

System Architecture


┌─────────────┐
│   Clients   │
└──────┬──────┘
       │
       │  HTTP Requests
       │
┌──────▼─────────────┐
│   API Gateway      │
│  (Rate Limiter)    │
└──────┬─────────────┘
       │
       │  Check Rate Limit
       │
┌──────▼─────────────┐
│      Redis         │
│  (Distributed      │
│   Counter Store)   │
└────────────────────┘
       │
       │  If limit OK
       │
┌──────▼─────────────┐
│  Application       │
│    Servers         │
└────────────────────┘

Deep Dive: Implementations

1. Token Bucket Algorithm

Most popular algorithm. Allows bursts while maintaining average rate.

TokenBucketRateLimiter.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
public class TokenBucketRateLimiter {
private final long capacity; // Max tokens in bucket
private final long refillRate; // Tokens added per second
private long availableTokens; // Current tokens
private long lastRefillTimestamp; // Last refill time
private final Object lock = new Object();
public TokenBucketRateLimiter(long capacity, long refillRate) {
this.capacity = capacity;
this.refillRate = refillRate;
this.availableTokens = capacity;
this.lastRefillTimestamp = System.currentTimeMillis();
}
/**
* Attempts to consume tokens for a request.
* Returns true if allowed, false if rate limit exceeded.
*/
public boolean allowRequest(long tokens) {
synchronized (lock) {
// 1. Refill tokens based on time elapsed
refill();
// 2. Check if enough tokens available
if (availableTokens >= tokens) {
availableTokens -= tokens;
return true; // Request allowed
}
return false; // Rate limit exceeded
}
}
/**
* Refills tokens based on elapsed time.
*/
private void refill() {
long now = System.currentTimeMillis();
long elapsedSeconds = (now - lastRefillTimestamp) / 1000;
if (elapsedSeconds > 0) {
// Calculate tokens to add
long tokensToAdd = elapsedSeconds * refillRate;
// Add tokens, but don't exceed capacity
availableTokens = Math.min(capacity, availableTokens + tokensToAdd);
lastRefillTimestamp = now;
}
}
/**
* Gets current token count and capacity info.
*/
public RateLimitInfo getInfo() {
synchronized (lock) {
refill();
return new RateLimitInfo(
availableTokens,
capacity,
(capacity - availableTokens), // tokens used
availableTokens > 0 ? 0 : calculateRetryAfter()
);
}
}
private long calculateRetryAfter() {
// Calculate seconds until next token is available
return (long) Math.ceil(1.0 / refillRate);
}
public static void main(String[] args) throws InterruptedException {
// Create rate limiter: 10 requests per second, burst of 10
TokenBucketRateLimiter limiter = new TokenBucketRateLimiter(10, 10);
// Simulate requests
System.out.println("Sending 15 requests rapidly...");
for (int i = 1; i <= 15; i++) {
boolean allowed = limiter.allowRequest(1);
System.out.println("Request " + i + ": " +
(allowed ? "✓ ALLOWED" : "✗ BLOCKED (rate limit)"));
}
// Wait 1 second for refill
System.out.println("\nWaiting 1 second for token refill...");
Thread.sleep(1000);
// Try again
System.out.println("\nSending 5 more requests...");
for (int i = 16; i <= 20; i++) {
boolean allowed = limiter.allowRequest(1);
System.out.println("Request " + i + ": " +
(allowed ? "✓ ALLOWED" : "✗ BLOCKED"));
}
}
static class RateLimitInfo {
long remaining;
long limit;
long used;
long retryAfter;
RateLimitInfo(long remaining, long limit, long used, long retryAfter) {
this.remaining = remaining;
this.limit = limit;
this.used = used;
this.retryAfter = retryAfter;
}
}
}

2. Distributed Rate Limiter (Redis)

For multiple servers, use Redis to store counters with atomic operations.

DistributedRateLimiter.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
public class DistributedRateLimiter {
private final RedisClient redis;
private final long maxRequests;
private final long windowSeconds;
public DistributedRateLimiter(RedisClient redis, long maxRequests, long windowSeconds) {
this.redis = redis;
this.maxRequests = maxRequests;
this.windowSeconds = windowSeconds;
}
/**
* Checks if request is allowed using sliding window counter in Redis.
*/
public boolean allowRequest(String userId) {
String key = "rate_limit:" + userId;
long now = System.currentTimeMillis();
long windowStart = now - (windowSeconds * 1000);
try {
// Use Redis Sorted Set for sliding window
// Score = timestamp, Value = unique request ID
// 1. Remove old entries outside the window
redis.zremrangeByScore(key, 0, windowStart);
// 2. Count requests in current window
long currentCount = redis.zcard(key);
// 3. Check if under limit
if (currentCount < maxRequests) {
// 4. Add current request
String requestId = UUID.randomUUID().toString();
redis.zadd(key, now, requestId);
// 5. Set expiry on key (cleanup)
redis.expire(key, windowSeconds + 1);
return true; // Request allowed
}
return false; // Rate limit exceeded
} catch (Exception e) {
// Fail open: allow request if Redis is down
System.err.println("Redis error, allowing request: " + e.getMessage());
return true;
}
}
/**
* Gets rate limit info for user.
*/
public RateLimitInfo getInfo(String userId) {
String key = "rate_limit:" + userId;
long now = System.currentTimeMillis();
long windowStart = now - (windowSeconds * 1000);
try {
// Clean old entries
redis.zremrangeByScore(key, 0, windowStart);
// Get current count
long currentCount = redis.zcard(key);
long remaining = Math.max(0, maxRequests - currentCount);
// Calculate retry after
long retryAfter = 0;
if (currentCount >= maxRequests) {
// Get timestamp of oldest request in window
List<String> oldest = redis.zrange(key, 0, 0);
if (!oldest.isEmpty()) {
long oldestTimestamp = redis.zscore(key, oldest.get(0)).longValue();
retryAfter = ((oldestTimestamp + (windowSeconds * 1000)) - now) / 1000;
}
}
return new RateLimitInfo(remaining, maxRequests, currentCount, retryAfter);
} catch (Exception e) {
System.err.println("Redis error: " + e.getMessage());
return new RateLimitInfo(maxRequests, maxRequests, 0, 0);
}
}
/**
* Alternative: Fixed window counter (simpler but less accurate).
*/
public boolean allowRequestFixedWindow(String userId) {
String key = "rate_limit:" + userId;
long windowStart = System.currentTimeMillis() / (windowSeconds * 1000);
String windowKey = key + ":" + windowStart;
try {
// Increment counter atomically
long count = redis.incr(windowKey);
// Set expiry on first request in window
if (count == 1) {
redis.expire(windowKey, windowSeconds);
}
return count <= maxRequests;
} catch (Exception e) {
System.err.println("Redis error: " + e.getMessage());
return true; // Fail open
}
}
static class RateLimitInfo {
long remaining;
long limit;
long used;
long retryAfter;
RateLimitInfo(long remaining, long limit, long used, long retryAfter) {
this.remaining = remaining;
this.limit = limit;
this.used = used;
this.retryAfter = retryAfter;
}
}
}

3. Rate Limiter Middleware

Integrate rate limiter into API Gateway to check every request.

RateLimiterMiddleware.java
java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
public class RateLimiterMiddleware {
private final DistributedRateLimiter rateLimiter;
private final Map<String, RateLimitConfig> endpointConfigs;
public RateLimiterMiddleware(RedisClient redis) {
this.endpointConfigs = new HashMap<>();
// Configure different limits per endpoint
endpointConfigs.put("/api/login",
new RateLimitConfig(5, 60)); // 5 requests per minute
endpointConfigs.put("/api/search",
new RateLimitConfig(100, 60)); // 100 requests per minute
endpointConfigs.put("/api/upload",
new RateLimitConfig(10, 3600)); // 10 requests per hour
this.rateLimiter = new DistributedRateLimiter(redis, 100, 60);
}
/**
* Processes incoming HTTP request and applies rate limiting.
*/
public HttpResponse handleRequest(HttpRequest request) {
// 1. Extract user identifier (user ID, IP address, API key)
String userId = extractUserId(request);
if (userId == null) {
userId = request.getClientIP(); // Fallback to IP
}
// 2. Get rate limit config for this endpoint
String endpoint = request.getPath();
RateLimitConfig config = getConfigForEndpoint(endpoint);
// 3. Create rate limiter for this config
DistributedRateLimiter limiter = new DistributedRateLimiter(
redis, config.maxRequests, config.windowSeconds
);
// 4. Check rate limit
String rateLimitKey = userId + ":" + endpoint;
boolean allowed = limiter.allowRequest(rateLimitKey);
if (!allowed) {
// 5. Rate limit exceeded - return 429 error
RateLimitInfo info = limiter.getInfo(rateLimitKey);
return createRateLimitResponse(info);
}
// 6. Request allowed - add rate limit headers
RateLimitInfo info = limiter.getInfo(rateLimitKey);
HttpResponse response = processRequest(request); // Continue to app
addRateLimitHeaders(response, info);
return response;
}
/**
* Returns 429 Too Many Requests response.
*/
private HttpResponse createRateLimitResponse(RateLimitInfo info) {
HttpResponse response = new HttpResponse();
response.setStatus(429); // Too Many Requests
response.setHeader("Content-Type", "application/json");
response.setHeader("X-RateLimit-Limit", String.valueOf(info.limit));
response.setHeader("X-RateLimit-Remaining", "0");
response.setHeader("X-RateLimit-Reset", String.valueOf(info.retryAfter));
response.setHeader("Retry-After", String.valueOf(info.retryAfter));
String body = String.format(
"{\"error\": \"Rate limit exceeded\", " +
"\"message\": \"Too many requests. Retry after %d seconds.\", " +
"\"retryAfter\": %d}",
info.retryAfter, info.retryAfter
);
response.setBody(body);
return response;
}
/**
* Adds rate limit headers to successful response.
*/
private void addRateLimitHeaders(HttpResponse response, RateLimitInfo info) {
response.setHeader("X-RateLimit-Limit", String.valueOf(info.limit));
response.setHeader("X-RateLimit-Remaining", String.valueOf(info.remaining));
response.setHeader("X-RateLimit-Used", String.valueOf(info.used));
if (info.retryAfter > 0) {
response.setHeader("X-RateLimit-Reset", String.valueOf(info.retryAfter));
}
}
/**
* Extracts user ID from JWT token or session.
*/
private String extractUserId(HttpRequest request) {
String authHeader = request.getHeader("Authorization");
if (authHeader != null && authHeader.startsWith("Bearer ")) {
String token = authHeader.substring(7);
try {
JWTPayload payload = JWTValidator.decode(token);
return payload.getUserId();
} catch (Exception e) {
return null;
}
}
return null;
}
/**
* Gets rate limit config for endpoint (with defaults).
*/
private RateLimitConfig getConfigForEndpoint(String endpoint) {
// Check for exact match
if (endpointConfigs.containsKey(endpoint)) {
return endpointConfigs.get(endpoint);
}
// Check for prefix match (e.g., /api/*)
for (Map.Entry<String, RateLimitConfig> entry : endpointConfigs.entrySet()) {
if (endpoint.startsWith(entry.getKey())) {
return entry.getValue();
}
}
// Default: 100 requests per minute
return new RateLimitConfig(100, 60);
}
static class RateLimitConfig {
long maxRequests;
long windowSeconds;
RateLimitConfig(long maxRequests, long windowSeconds) {
this.maxRequests = maxRequests;
this.windowSeconds = windowSeconds;
}
}
}

Trade-offs and Best Practices

Token Bucket vs Fixed Window: Token bucket allows bursts (better UX), fixed window simpler (less memory)

Local vs Distributed: Local faster but inconsistent across servers, distributed consistent but Redis dependency

Fail Open vs Fail Closed: Fail open (allow on error) better availability, fail closed (block on error) better security

Per User vs Per IP: Per user more accurate but requires auth, per IP simpler but can block legitimate users behind NAT

Best Practices

Use Redis with atomic operations (INCR, ZADD) for accuracy
Implement graceful degradation (fail open when Redis down)
Return clear error messages with Retry-After header
Add rate limit headers (X-RateLimit-*) to all responses
Use different limits for different user tiers (free, pro, enterprise)
Monitor rate limit metrics (rejection rate, burst patterns)
Implement IP whitelisting for internal services
Consider geolocation-based limits (stricter for certain regions)

Real-World Examples

  • GitHub API: 5,000 requests/hour for authenticated, 60/hour for unauthenticated
  • Twitter API: 300 requests per 15-minute window
  • Stripe API: 100 requests per second with token bucket
  • Cloudflare: Layer 7 rate limiting with custom rules