Rate Limiting Strategies
RateThrottle implements five proven rate limiting algorithms, each suited for different use cases.
Overview
Rate limiting strategies determine how requests are counted and when limits reset. The choice of strategy affects:
How bursts are handled
How smoothly limits are enforced
Memory and computational overhead
Behavior at window boundaries
Available Strategies
Strategy |
Best For |
Allows Bursts |
Complexity |
|---|---|---|---|
Token Bucket |
APIs with variable load |
Yes (configurable) |
Medium |
Leaky Bucket |
Constant-rate processing |
No |
Medium |
Fixed Window |
Simple high-volume APIs |
Yes (at boundaries) |
Low |
Sliding Window Counter |
Accurate rate limiting |
Minimal |
Low |
Sliding Window |
Smooth enforcement |
Partially |
High |
Token Bucket Strategy
How It Works
Imagine a bucket that holds tokens. Tokens are added at a constant rate, and each request consumes one token. If no tokens are available, the request is blocked.
rule = RateThrottleRule(
name="api_burst",
limit=100, # Refill rate: 100 tokens per window
window=60, # Window: 60 seconds (= ~1.67 tokens/sec)
strategy="token_bucket",
burst=150 # Bucket capacity: 150 tokens
)
Key Characteristics:
Tokens refill at rate:
limit / windowper secondMaximum tokens:
burst(defaults tolimit)Allows traffic bursts up to
burstcapacitySmooth long-term rate limiting
Example Behavior
Given: limit=100, window=60, burst=150
Time | Tokens | Request | Result
--------|--------|---------|--------
0s | 150 | 50 | ✓ (100 remaining)
1s | 101.67 | 50 | ✓ (51.67 remaining)
2s | 53.34 | 60 | ✗ Blocked (need 60, have 53.34)
30s | 103.34 | 100 | ✓ (3.34 remaining)
60s | 53.34 | 50 | ✓ (3.34 remaining)
Best For
APIs that allow occasional bursts
Services with varying load patterns
User-facing APIs where responsiveness matters
Scenarios where some burst is acceptable
Implementation
from ratethrottle import RateThrottleCore, RateThrottleRule
limiter = RateThrottleCore()
# Standard configuration
rule = RateThrottleRule(
name="api_standard",
limit=100,
window=60,
strategy="token_bucket",
burst=120 # 20% burst capacity
)
limiter.add_rule(rule)
# Strict configuration (no burst)
strict_rule = RateThrottleRule(
name="api_strict",
limit=100,
window=60,
strategy="token_bucket",
burst=100 # Same as limit = no burst
)
Leaky Bucket Strategy
How It Works
Imagine a bucket with a hole at the bottom that leaks at a constant rate. Requests fill the bucket. If the bucket overflows, requests are rejected.
rule = RateThrottleRule(
name="api_steady",
limit=100, # Leak rate: 100 requests per window
window=60, # Window: 60 seconds
strategy="leaky_bucket"
)
Key Characteristics:
Processes requests at constant rate:
limit / windowper secondQueue capacity:
limitrequestsEnforces steady, predictable rate
No bursts allowed
Example Behavior
Given: limit=100, window=60 (leak rate = 1.67/sec)
Time | Queue | Request | Result
--------|--------|---------|--------
0s | 0 | 10 | ✓ (queue: 10)
1s | 8.33 | 10 | ✓ (queue: 18.33)
5s | 10 | 95 | ✗ Blocked (overflow)
10s | 0 | 50 | ✓ (queue: 50)
60s | 0 | 10 | ✓ (queue: 10)
Best For
Background processing systems
Message queues and workers
Rate-sensitive third-party API calls
Ensuring predictable backend load
Implementation
# Constant-rate API calls
api_rule = RateThrottleRule(
name="external_api",
limit=60, # 60 requests
window=60, # per minute = 1 req/sec
strategy="leaky_bucket"
)
# Database operations
db_rule = RateThrottleRule(
name="database",
limit=1000, # 1000 operations
window=60, # per minute
strategy="leaky_bucket"
)
Fixed Window Strategy
How It Works
Counts requests in fixed time windows. When a window ends, the counter resets to zero.
rule = RateThrottleRule(
name="api_simple",
limit=100,
window=60,
strategy="fixed_window"
)
Key Characteristics:
Simple counter per time window
Resets at window boundaries
Fast and memory-efficient
Subject to boundary condition issues
Example Behavior
Given: limit=100, window=60
Time | Window | Count | Request | Result
----------|-------------|-------|---------|--------
00:00:00 | 00:00-00:59 | 0 | 60 | ✓ (60/100)
00:00:30 | 00:00-00:59 | 60 | 40 | ✓ (100/100)
00:00:45 | 00:00-00:59 | 100 | 10 | ✗ Blocked
00:01:00 | 00:01-01:59 | 0 | 60 | ✓ (60/100) - Reset!
00:01:01 | 00:01-01:59 | 60 | 60 | ✓ (120/100)
Boundary Condition Problem
A client can make up to 2 * limit requests in window seconds:
Window 1: [00:00:00 - 00:00:59]
- At 00:00:59: Make 100 requests ✓
Window 2: [00:01:00 - 00:01:59]
- At 00:01:00: Make 100 requests ✓
Total: 200 requests in 2 seconds!
Best For
High-performance scenarios where speed is critical
Internal APIs with trusted clients
Cases where slight overage is acceptable
Simple, straightforward rate limiting
Implementation
# High-volume public API
public_rule = RateThrottleRule(
name="public_api",
limit=10000,
window=60,
strategy="fixed_window"
)
# Per-user limit
user_rule = RateThrottleRule(
name="user_api",
limit=1000,
window=3600, # 1 hour
strategy="fixed_window",
scope="user"
)
Sliding Window Counter Strategy
How It Works
A hybrid approach that combines the efficiency of Fixed Window with the accuracy of Sliding Window. Uses weighted average of current and previous window counters.
rule = RateThrottleRule(
name="api_balanced",
limit=100,
window=60,
strategy="sliding_counter"
)
Key Characteristics:
Uses only 2 counters (current + previous window)
Calculates weighted average based on time elapsed
Solves Fixed Window boundary problem
Memory: O(1) like Fixed Window
Accuracy: ~95-98% of true Sliding Window
Formula:
weighted_count = (previous_count × (1 - window_progress)) + current_count
Where window_progress = time_elapsed_in_current_window / window_size
Example Behavior
Given: limit=100, window=60
Previous window (0-60s): 80 requests
Current window (60-120s): 40 requests
Current time: 90s (30 seconds into window = 50% progress)
Weighted count = (80 × (1 - 0.5)) + 40
= (80 × 0.5) + 40
= 40 + 40
= 80 requests
Remaining: 100 - 80 = 20 requests allowed
Time progression example:
Time | Progress | Previous × Weight | Current | Weighted Total
-------|----------|-------------------|---------|---------------
60s | 0% | 80 × 1.0 = 80 | 0 | 80 (block new)
75s | 25% | 80 × 0.75 = 60 | 5 | 65
90s | 50% | 80 × 0.5 = 40 | 15 | 55
105s | 75% | 80 × 0.25 = 20 | 30 | 50
119s | 98% | 80 × 0.02 = 1.6 | 50 | 51.6
Solves Boundary Problem
Unlike Fixed Window, Sliding Window Counter prevents burst abuse at boundaries:
Fixed Window Problem:
├─ Window 1 (0-60s): 100 requests at t=59s ✓
├─ Window 2 (60-120s): 100 requests at t=61s ✓
└─ Total: 200 requests in 2 seconds! ✗
Sliding Window Counter Solution:
├─ Window 1 (0-60s): 100 requests at t=59s ✓
├─ At t=61s (1.67% into window 2):
│ └─ Weighted: (100 × 0.983) + 0 = 98.3 ≈ 99
│ └─ Limit: 100
│ └─ Can only make ~1 request ✓
└─ Prevents burst! ✓
Best For
Most production APIs - Best all-around choice
APIs needing better accuracy than Fixed Window
When memory is limited (can’t store all timestamps)
High-traffic applications where performance matters
Recommended default strategy
Advantages over alternatives:
More accurate than Fixed Window (no boundary bursts)
Much faster than Sliding Window (O(1) vs O(n))
Less memory than Sliding Window (2 counters vs n timestamps)
Industry standard (used by Cloudflare, Kong, AWS)
Implementation
from ratethrottle import RateThrottleCore, RateThrottleRule
limiter = RateThrottleCore()
# Standard configuration (recommended)
rule = RateThrottleRule(
name="api_default",
limit=100,
window=60,
strategy="sliding_counter"
)
limiter.add_rule(rule)
# High-traffic API
high_traffic = RateThrottleRule(
name="public_api",
limit=10000,
window=60,
strategy="sliding_counter"
)
# Per-user limit
user_rule = RateThrottleRule(
name="user_api",
limit=1000,
window=3600, # 1 hour
strategy="sliding_counter",
scope="user"
)
When to Use vs Other Strategies
Choose Sliding Window Counter when:
├─ You need accuracy better than Fixed Window
├─ You can't afford Sliding Window's memory cost
├─ You want industry-proven algorithm
└─ You want the best general-purpose strategy
Choose Fixed Window instead when:
├─ You need absolute maximum performance
└─ Boundary bursts are acceptable
Choose Sliding Window instead when:
├─ You need perfect accuracy (SLA/legal requirements)
├─ Memory cost is not a concern
└─ You can tolerate slower performance
Sliding Window Strategy
How It Works
Uses a sliding time window that moves with each request. Provides smooth rate limiting without boundary issues.
rule = RateThrottleRule(
name="api_smooth",
limit=100,
window=60,
strategy="sliding_window"
)
Key Characteristics:
Tracks timestamps of individual requests
Window slides with current time
No boundary condition issues
Higher memory usage
Example Behavior
Given: limit=100, window=60
Current Time: 00:01:00
Looking back 60 seconds: 00:00:00 - 00:01:00
Requests in window:
- 00:00:05: 20 requests
- 00:00:30: 40 requests
- 00:00:55: 30 requests
Total: 90 requests
New request at 00:01:00: ✓ Allowed (90/100)
Current Time: 00:01:30
Looking back 60 seconds: 00:00:30 - 00:01:30
Requests in window:
- 00:00:30: 40 requests (still in window)
- 00:00:55: 30 requests
- 00:01:00: 1 request
Total: 71 requests
Note: Requests at 00:00:05 are now outside window
Best For
Premium APIs requiring fair enforcement
Financial applications
Rate limits with legal/SLA requirements
Scenarios where accuracy is critical
Implementation
# Precise rate limiting
premium_rule = RateThrottleRule(
name="premium_api",
limit=5000,
window=3600,
strategy="sliding_window",
scope="user"
)
# Payment endpoints
payment_rule = RateThrottleRule(
name="payment",
limit=10,
window=60,
strategy="sliding_window"
)
Strategy Comparison
Performance
Strategy |
Speed |
Memory Usage |
Accuracy |
|---|---|---|---|
Fixed Window |
★★★★★ |
★★★★★ |
★★★☆☆ |
Sliding Window Counter |
★★★★★ |
★★★★★ |
★★★★☆ |
Token Bucket |
★★★★☆ |
★★★★☆ |
★★★★☆ |
Leaky Bucket |
★★★★☆ |
★★★★☆ |
★★★★☆ |
Sliding Window |
★★★☆☆ |
★★★☆☆ |
★★★★★ |
Burst Handling
# Token Bucket: Full burst support
limiter.check_rate_limit("client", "token_bucket_rule")
# Can burst up to 'burst' parameter
# Leaky Bucket: No bursts
limiter.check_rate_limit("client", "leaky_bucket_rule")
# Strictly enforces constant rate
# Fixed Window: Bursts at boundaries
limiter.check_rate_limit("client", "fixed_window_rule")
# Can burst at window edges (up to 2x limit)
# Sliding Window Counter: Minimal bursts
limiter.check_rate_limit("client", "sliding_counter_rule")
# Prevents boundary bursts, allows small variance
# Sliding Window: Limited bursts
limiter.check_rate_limit("client", "sliding_window_rule")
# Smooth enforcement prevents large bursts
Choosing a Strategy
Decision Tree
Need burst support?
├─ Yes → Do you need precise control?
│ ├─ Yes → Token Bucket
│ └─ No → Fixed Window (simpler, faster)
│
└─ No → Need perfect accuracy?
├─ Yes → Sliding Window (most accurate)
└─ No → Need good balance?
├─ Yes → Sliding Window Counter (recommended)
└─ No → Leaky Bucket (constant rate)
Use Case Matrix
Use Case |
Recommended Strategy |
|---|---|
Public REST API |
Sliding Window Counter (best balance) |
Internal microservices |
Fixed Window (fast, simple) |
Third-party API client |
Leaky Bucket (respects their limits) |
Payment processing |
Sliding Window (perfect accuracy) |
Search/autocomplete |
Token Bucket (responsive bursts) |
File uploads |
Leaky Bucket (constant rate) |
GraphQL API |
Sliding Window Counter (accurate, fast) |
WebSocket connections |
Token Bucket (bursty traffic) |
High-volume API |
Sliding Window Counter (efficient, accurate) |
SaaS applications |
Sliding Window Counter (fair, performant) |
Mobile app backend |
Sliding Window Counter (handles bursts better) |
Combining Strategies
You can use multiple strategies for different endpoints:
from ratethrottle import RateThrottleCore, RateThrottleRule
limiter = RateThrottleCore()
# Public endpoints: Best balance of accuracy and performance
public_rule = RateThrottleRule(
name="public",
limit=100,
window=60,
strategy="sliding_counter"
)
# Authenticated: Smooth limiting
auth_rule = RateThrottleRule(
name="authenticated",
limit=1000,
window=60,
strategy="sliding_counter"
)
# Expensive operations: Constant rate
expensive_rule = RateThrottleRule(
name="expensive",
limit=10,
window=60,
strategy="leaky_bucket"
)
# Background jobs: Simple counting
background_rule = RateThrottleRule(
name="background",
limit=10000,
window=3600,
strategy="fixed_window"
)
# Burst-friendly endpoints: Allow traffic spikes
burst_rule = RateThrottleRule(
name="search",
limit=100,
window=60,
strategy="token_bucket",
burst=150
)
limiter.add_rule(public_rule)
limiter.add_rule(auth_rule)
limiter.add_rule(expensive_rule)
limiter.add_rule(background_rule)
limiter.add_rule(burst_rule)
Advanced Configuration
Token Bucket Fine-Tuning
# Generous burst for good UX
rule = RateThrottleRule(
name="user_facing",
limit=100,
window=60,
strategy="token_bucket",
burst=200 # 2x burst capacity
)
# Strict: minimal burst
rule = RateThrottleRule(
name="api_strict",
limit=100,
window=60,
strategy="token_bucket",
burst=105 # Only 5% burst
)
# No burst: same as limit
rule = RateThrottleRule(
name="no_burst",
limit=100,
window=60,
strategy="token_bucket",
burst=100
)
Testing Strategies
You can test different strategies to find what works best:
import time
from ratethrottle import RateThrottleCore, RateThrottleRule
def test_strategy(strategy_name):
limiter = RateThrottleCore()
rule = RateThrottleRule(
name="test",
limit=10,
window=60,
strategy=strategy_name,
burst=15 if strategy_name == "token_bucket" else None
)
limiter.add_rule(rule)
# Make burst of requests
allowed = 0
for i in range(20):
status = limiter.check_rate_limit("test_client", "test")
if status.allowed:
allowed += 1
print(f"{strategy_name}: {allowed}/20 requests allowed in burst")
# Wait and try again
time.sleep(10)
status = limiter.check_rate_limit("test_client", "test")
print(f" After 10s: {'Allowed' if status.allowed else 'Blocked'}")
# Test all strategies
strategies = [
"fixed_window",
"sliding_counter",
"token_bucket",
"leaky_bucket",
"sliding_window"
]
for strategy in strategies:
test_strategy(strategy)
Next Steps
Configure Storage Backends backends for your strategy
Learn about Configuration options
Explore Analytics and Reporting to monitor strategy performance