Rate Limiting Algorithms: A Deep Dive 2
Introduction Rate limiting is a technique used to control the number of requests a system processes within a specific time frame. It helps prevent abuse, ensures fair usage, protects against DDoS attacks, and maintains system stability. This blog explores the most commonly used rate-limiting algorithms, their advantages and disadvantages, and how to implement them in Java. 1️⃣ Token Bucket Algorithm How It Works A bucket holds a fixed number of tokens. Tokens are added to the bucket at a constant rate. Each request consumes a token. If the bucket is empty, excess requests are denied until new tokens are added. Pros & Cons ✅ Allows short bursts of traffic while controlling overall request rate. ✅ Efficient for distributed systems. ❌ If the bucket drains quickly, requests may be blocked until tokens are refilled. Java Implementation import java.util.concurrent.Semaphore; import java.util.concurrent.TimeUnit; class TokenBucketRateLimiter { private final Semaphore tokens; private final int capacity; public TokenBucketRateLimiter(int capacity, int refillRatePerSecond) { this.capacity = capacity; this.tokens = new Semaphore(capacity); // Refill tokens periodically new Thread(() -> { while (true) { tokens.release(refillRatePerSecond); if (tokens.availablePermits() > capacity) { tokens.drainPermits(); tokens.release(capacity); } try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } }).start(); } public boolean allowRequest() { return tokens.tryAcquire(); } } 2️⃣ Leaky Bucket Algorithm How It Works Requests enter a queue (bucket). Requests are processed at a fixed rate (like water leaking from a bucket). If the queue overflows, extra requests are discarded. Pros & Cons ✅ Ensures a steady flow of requests. ✅ Prevents sudden traffic spikes from overloading the system. ❌ Can introduce delays if the queue is full. Java Implementation import java.util.LinkedList; import java.util.Queue; class LeakyBucketRateLimiter { private final Queue queue; private final int capacity; private final long leakRateMillis; public LeakyBucketRateLimiter(int capacity, int leakRatePerSecond) { this.capacity = capacity; this.leakRateMillis = 1000L / leakRatePerSecond; this.queue = new LinkedList(); } public synchronized boolean allowRequest() { long currentTime = System.currentTimeMillis(); while (!queue.isEmpty() && queue.peek()

Introduction
Rate limiting is a technique used to control the number of requests a system processes within a specific time frame. It helps prevent abuse, ensures fair usage, protects against DDoS attacks, and maintains system stability.
This blog explores the most commonly used rate-limiting algorithms, their advantages and disadvantages, and how to implement them in Java.
1️⃣ Token Bucket Algorithm
How It Works
- A bucket holds a fixed number of tokens.
- Tokens are added to the bucket at a constant rate.
- Each request consumes a token.
- If the bucket is empty, excess requests are denied until new tokens are added.
Pros & Cons
✅ Allows short bursts of traffic while controlling overall request rate.
✅ Efficient for distributed systems.
❌ If the bucket drains quickly, requests may be blocked until tokens are refilled.
Java Implementation
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;
class TokenBucketRateLimiter {
private final Semaphore tokens;
private final int capacity;
public TokenBucketRateLimiter(int capacity, int refillRatePerSecond) {
this.capacity = capacity;
this.tokens = new Semaphore(capacity);
// Refill tokens periodically
new Thread(() -> {
while (true) {
tokens.release(refillRatePerSecond);
if (tokens.availablePermits() > capacity) {
tokens.drainPermits();
tokens.release(capacity);
}
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}).start();
}
public boolean allowRequest() {
return tokens.tryAcquire();
}
}
2️⃣ Leaky Bucket Algorithm
How It Works
- Requests enter a queue (bucket).
- Requests are processed at a fixed rate (like water leaking from a bucket).
- If the queue overflows, extra requests are discarded.
Pros & Cons
✅ Ensures a steady flow of requests.
✅ Prevents sudden traffic spikes from overloading the system.
❌ Can introduce delays if the queue is full.
Java Implementation
import java.util.LinkedList;
import java.util.Queue;
class LeakyBucketRateLimiter {
private final Queue<Long> queue;
private final int capacity;
private final long leakRateMillis;
public LeakyBucketRateLimiter(int capacity, int leakRatePerSecond) {
this.capacity = capacity;
this.leakRateMillis = 1000L / leakRatePerSecond;
this.queue = new LinkedList<>();
}
public synchronized boolean allowRequest() {
long currentTime = System.currentTimeMillis();
while (!queue.isEmpty() && queue.peek() <= currentTime - leakRateMillis) {
queue.poll();
}
if (queue.size() < capacity) {
queue.add(currentTime);
return true;
}
return false;
}
}
3️⃣ Fixed Window Counter Algorithm
How It Works
- A counter tracks the number of requests per fixed time window (e.g., per minute).
- If the count exceeds the allowed limit, further requests are rejected until the next window.
Pros & Cons
✅ Easy to implement.
✅ Works well for predictable traffic patterns.
❌ Can lead to spikes at window boundaries.
Java Implementation
import java.util.concurrent.atomic.AtomicInteger;
class FixedWindowRateLimiter {
private final int limit;
private final long windowSizeMillis;
private long windowStart;
private final AtomicInteger requestCount;
public FixedWindowRateLimiter(int limit, int windowSizeSeconds) {
this.limit = limit;
this.windowSizeMillis = windowSizeSeconds * 1000L;
this.windowStart = System.currentTimeMillis();
this.requestCount = new AtomicInteger(0);
}
public synchronized boolean allowRequest() {
long now = System.currentTimeMillis();
if (now - windowStart >= windowSizeMillis) {
windowStart = now;
requestCount.set(0);
}
return requestCount.incrementAndGet() <= limit;
}
}
4️⃣ Sliding Window Counter Algorithm
How It Works
- Uses smaller sub-windows within a fixed window.
- More accurate and distributes requests evenly.
Pros & Cons
✅ More accurate than Fixed Window Counter.
✅ Reduces request bursts at window boundaries.
❌ More complex to implement.
5️⃣ Sliding Window Log Algorithm
How It Works
- Stores timestamps of each request.
- Removes timestamps outside the allowed time window.
Pros & Cons
✅ High accuracy in rate limiting.
❌ High memory usage due to storing request timestamps.
6️⃣ Adaptive Rate Limiting
How It Works
- Uses machine learning or heuristics to adjust rate limits dynamically.
- Can consider factors like server load, request patterns, and user behavior.
Pros & Cons
✅ Dynamically adjusts limits for better efficiency.
❌ Complex implementation.