Rate Limiting Algorithms System Design (original) (raw)

Last Updated : 4 May, 2026

Rate Limiting Algorithms are mechanisms designed to control the rate at which requests are processed or served by a system. These algorithms are crucial in various domains such as web services, APIs, network traffic management, and distributed systems to ensure stability, fairness, and protection against abuse.

**Example: An API allows only 100 requests per minute per user; if exceeded, further requests are temporarily blocked.

Real-World Application

Some real-world examples where rate limiting can be used:

1. Token Bucket Algorithm

The token bucket algorithm controls data flow by generating tokens at a steady rate, which are required to process requests. If tokens are available, requests are allowed; otherwise, they are denied. It helps manage varying traffic while maintaining a defined rate limit.

**Example: Video streaming service where data is sent in bursts when enough tokens are available.

token_bucket_algorithm

Token Bucket Algorithm

Benefits

Highlights the advantages of the token bucket algorithm in handling traffic efficiently.

Challenges

Describes the limitations and considerations when using the token bucket algorithm.

Working

Explains how the token bucket algorithm operates step-by-step.

**Implementation

Python `

class TokenBucket: def init(self, rate, capacity): self.rate = rate self.capacity = capacity self.tokens = capacity self.last_refill = time.time()

def allow_request(self):
    now = time.time()
    self.tokens += (now - self.last_refill) * self.rate
    self.tokens = min(self.tokens, self.capacity)
    self.last_refill = now

    if self.tokens >= 1:
        self.tokens -= 1
        return True
    else:
        return False

`

2. Leaky Bucket Algorithm

The leaky bucket approach controls request flow by processing data at a constant rate while storing incoming requests in a fixed-size bucket. If the bucket becomes full, additional requests are rejected. It ensures a steady and predictable output rate.

**Example: API rate limiting where requests are handled at a steady rate

Leaky-Bucket-Algorithm

Leaky Bucket Algorithm

Benefits

Shows how this approach helps manage traffic in a controlled and efficient way.

Challenges

Outlines the trade-offs and potential limitations in real-world usage.

Working

Describes the step-by-step flow of how data is regulated through the system.

**Implementation

Python `

class LeakyBucket: def init(self, capacity, leak_rate): self.capacity = capacity # Maximum capacity of the bucket self.leak_rate = leak_rate # Rate at which the bucket leaks (units per second) self.bucket_size = 0 # Current size of the bucket self.last_updated = time.time() # Last time the bucket was updated

def add_data(self, data_size):
    # Calculate time elapsed since last update
    current_time = time.time()
    time_elapsed = current_time - self.last_updated
    self.last_updated = current_time
    
    # Leak the bucket (remove data according to the leak rate)
    self.bucket_size -= self.leak_rate * time_elapsed
    
    # Add new data to the bucket
    self.bucket_size = min(self.bucket_size + data_size, self.capacity)
    
    # Check if data can be sent
    if self.bucket_size >= data_size:
        self.bucket_size -= data_size
        return True
    else:
        return False

Example usage:

bucket = LeakyBucket(capacity=10, leak_rate=1) # Bucket with capacity of 10 units and leak rate of 1 unit per second data_to_send = 5 # Example data size to send if bucket.add_data(data_to_send): print(f"Data of size {data_to_send} sent successfully.") else: print(f"Bucket overflow. Unable to send data of size {data_to_send}.")

`

3. Fixed Window Algorithm

The fixed window algorithm tracks the number of requests in a fixed time window and resets the counter when the window expires. If the limit is exceeded, further requests are blocked until the window resets. It is simple but, in traditional implementations with globally aligned windows, it may allow bursts near window boundaries due to counter reset.

**Example: A login system allows 5 attempts per minute; if exceeded, further attempts are blocked until the next minute starts.

a

Fixed Window Algorithm

Benefits

Highlights why this approach is useful for basic rate limiting scenarios.

Challenges

Explains the limitations when dealing with dynamic traffic patterns.

Working

Describes how requests are counted and controlled over fixed intervals.

**Implementation

Python `

class FixedWindow: def init(self, window_size, max_requests): self.window_size = window_size self.max_requests = max_requests self.requests = 0 self.window_start = time.time()

def allow_request(self):
    now = time.time()
    if now - self.window_start >= self.window_size:
        self.requests = 0
        self.window_start = now

    if self.requests < self.max_requests:
        self.requests += 1
        return True
    else:
        return False

`

4. Sliding Window Algorithm

The sliding window algorithm uses a continuously moving time frame to limit the number of requests. It combines advantages of fixed window and leaky bucket, providing smoother and more accurate rate control. This helps distribute requests evenly over time.

**Example: A messaging system allows 20 messages in any rolling 1-minute window, instead of resetting the count every fixed minute.

Sliding-Window-Algorithm

Sliding Window Algorithm

Benefits

Explains why this method is preferred for handling dynamic traffic scenarios.

Challenges

Highlights the added complexity and resource requirements.

Working

Describes how requests are tracked over a continuously moving time window.

**Implementation

Python `

class SlidingWindow: def init(self, window_size, max_requests): self.window_size = window_size self.max_requests = max_requests self.requests = deque()

def allow_request(self):
    now = time.time()
    while self.requests and self.requests[0] <= now - self.window_size:
        self.requests.popleft()

    if len(self.requests) < self.max_requests:
        self.requests.append(now)
        return True
    else:
        return False

`

Selecting the Best Rate Limiting Strategy

Choosing the right rate limiting algorithm depends on several factors:

Traffic Pattern

Helps understand how traffic behaves in the system.

Implementation Complexity

Defines how easy or difficult the algorithm is to implement.

Performance Requirements

Ensures the system meets performance and latency needs.

Scalability

Focuses on handling increasing traffic and users.

Flexibility

Allows adaptation to changing traffic conditions.

Handling Bursts and Spikes

Handling bursts and spikes efficiently is crucial for maintaining system stability: