Rate Limiting Strategies

RateThrottle implements five proven rate limiting algorithms, each suited for different use cases.

Overview

Rate limiting strategies determine how requests are counted and when limits reset. The choice of strategy affects:

  • How bursts are handled

  • How smoothly limits are enforced

  • Memory and computational overhead

  • Behavior at window boundaries

Available Strategies

Strategy

Best For

Allows Bursts

Complexity

Token Bucket

APIs with variable load

Yes (configurable)

Medium

Leaky Bucket

Constant-rate processing

No

Medium

Fixed Window

Simple high-volume APIs

Yes (at boundaries)

Low

Sliding Window Counter

Accurate rate limiting

Minimal

Low

Sliding Window

Smooth enforcement

Partially

High

Token Bucket Strategy

How It Works

Imagine a bucket that holds tokens. Tokens are added at a constant rate, and each request consumes one token. If no tokens are available, the request is blocked.

rule = RateThrottleRule(
    name="api_burst",
    limit=100,        # Refill rate: 100 tokens per window
    window=60,        # Window: 60 seconds (= ~1.67 tokens/sec)
    strategy="token_bucket",
    burst=150         # Bucket capacity: 150 tokens
)

Key Characteristics:

  • Tokens refill at rate: limit / window per second

  • Maximum tokens: burst (defaults to limit)

  • Allows traffic bursts up to burst capacity

  • Smooth long-term rate limiting

Example Behavior

Given: limit=100, window=60, burst=150

Time    | Tokens | Request | Result
--------|--------|---------|--------
0s      | 150    | 50      | ✓ (100 remaining)
1s      | 101.67 | 50      | ✓ (51.67 remaining)
2s      | 53.34  | 60      | ✗ Blocked (need 60, have 53.34)
30s     | 103.34 | 100     | ✓ (3.34 remaining)
60s     | 53.34  | 50      | ✓ (3.34 remaining)

Best For

  • APIs that allow occasional bursts

  • Services with varying load patterns

  • User-facing APIs where responsiveness matters

  • Scenarios where some burst is acceptable

Implementation

from ratethrottle import RateThrottleCore, RateThrottleRule

limiter = RateThrottleCore()

# Standard configuration
rule = RateThrottleRule(
    name="api_standard",
    limit=100,
    window=60,
    strategy="token_bucket",
    burst=120  # 20% burst capacity
)
limiter.add_rule(rule)

# Strict configuration (no burst)
strict_rule = RateThrottleRule(
    name="api_strict",
    limit=100,
    window=60,
    strategy="token_bucket",
    burst=100  # Same as limit = no burst
)

Leaky Bucket Strategy

How It Works

Imagine a bucket with a hole at the bottom that leaks at a constant rate. Requests fill the bucket. If the bucket overflows, requests are rejected.

rule = RateThrottleRule(
    name="api_steady",
    limit=100,        # Leak rate: 100 requests per window
    window=60,        # Window: 60 seconds
    strategy="leaky_bucket"
)

Key Characteristics:

  • Processes requests at constant rate: limit / window per second

  • Queue capacity: limit requests

  • Enforces steady, predictable rate

  • No bursts allowed

Example Behavior

Given: limit=100, window=60 (leak rate = 1.67/sec)

Time    | Queue  | Request | Result
--------|--------|---------|--------
0s      | 0      | 10      | ✓ (queue: 10)
1s      | 8.33   | 10      | ✓ (queue: 18.33)
5s      | 10     | 95      | ✗ Blocked (overflow)
10s     | 0      | 50      | ✓ (queue: 50)
60s     | 0      | 10      | ✓ (queue: 10)

Best For

  • Background processing systems

  • Message queues and workers

  • Rate-sensitive third-party API calls

  • Ensuring predictable backend load

Implementation

# Constant-rate API calls
api_rule = RateThrottleRule(
    name="external_api",
    limit=60,          # 60 requests
    window=60,         # per minute = 1 req/sec
    strategy="leaky_bucket"
)

# Database operations
db_rule = RateThrottleRule(
    name="database",
    limit=1000,        # 1000 operations
    window=60,         # per minute
    strategy="leaky_bucket"
)

Fixed Window Strategy

How It Works

Counts requests in fixed time windows. When a window ends, the counter resets to zero.

rule = RateThrottleRule(
    name="api_simple",
    limit=100,
    window=60,
    strategy="fixed_window"
)

Key Characteristics:

  • Simple counter per time window

  • Resets at window boundaries

  • Fast and memory-efficient

  • Subject to boundary condition issues

Example Behavior

Given: limit=100, window=60

Time      | Window      | Count | Request | Result
----------|-------------|-------|---------|--------
00:00:00  | 00:00-00:59 | 0     | 60      | ✓ (60/100)
00:00:30  | 00:00-00:59 | 60    | 40      | ✓ (100/100)
00:00:45  | 00:00-00:59 | 100   | 10      | ✗ Blocked
00:01:00  | 00:01-01:59 | 0     | 60      | ✓ (60/100) - Reset!
00:01:01  | 00:01-01:59 | 60    | 60      | ✓ (120/100)

Boundary Condition Problem

A client can make up to 2 * limit requests in window seconds:

Window 1: [00:00:00 - 00:00:59]
- At 00:00:59: Make 100 requests ✓

Window 2: [00:01:00 - 00:01:59]
- At 00:01:00: Make 100 requests ✓

Total: 200 requests in 2 seconds!

Best For

  • High-performance scenarios where speed is critical

  • Internal APIs with trusted clients

  • Cases where slight overage is acceptable

  • Simple, straightforward rate limiting

Implementation

# High-volume public API
public_rule = RateThrottleRule(
    name="public_api",
    limit=10000,
    window=60,
    strategy="fixed_window"
)

# Per-user limit
user_rule = RateThrottleRule(
    name="user_api",
    limit=1000,
    window=3600,  # 1 hour
    strategy="fixed_window",
    scope="user"
)

Sliding Window Counter Strategy

How It Works

A hybrid approach that combines the efficiency of Fixed Window with the accuracy of Sliding Window. Uses weighted average of current and previous window counters.

rule = RateThrottleRule(
    name="api_balanced",
    limit=100,
    window=60,
    strategy="sliding_counter"
)

Key Characteristics:

  • Uses only 2 counters (current + previous window)

  • Calculates weighted average based on time elapsed

  • Solves Fixed Window boundary problem

  • Memory: O(1) like Fixed Window

  • Accuracy: ~95-98% of true Sliding Window

Formula:

weighted_count = (previous_count × (1 - window_progress)) + current_count

Where window_progress = time_elapsed_in_current_window / window_size

Example Behavior

Given: limit=100, window=60

Previous window (0-60s): 80 requests
Current window (60-120s): 40 requests
Current time: 90s (30 seconds into window = 50% progress)

Weighted count = (80 × (1 - 0.5)) + 40
               = (80 × 0.5) + 40
               = 40 + 40
               = 80 requests

Remaining: 100 - 80 = 20 requests allowed

Time progression example:

Time   | Progress | Previous × Weight | Current | Weighted Total
-------|----------|-------------------|---------|---------------
60s    | 0%       | 80 × 1.0 = 80    | 0       | 80 (block new)
75s    | 25%      | 80 × 0.75 = 60   | 5       | 65
90s    | 50%      | 80 × 0.5 = 40    | 15      | 55
105s   | 75%      | 80 × 0.25 = 20   | 30      | 50
119s   | 98%      | 80 × 0.02 = 1.6  | 50      | 51.6

Solves Boundary Problem

Unlike Fixed Window, Sliding Window Counter prevents burst abuse at boundaries:

Fixed Window Problem:
├─ Window 1 (0-60s): 100 requests at t=59s ✓
├─ Window 2 (60-120s): 100 requests at t=61s ✓
└─ Total: 200 requests in 2 seconds! ✗

Sliding Window Counter Solution:
├─ Window 1 (0-60s): 100 requests at t=59s ✓
├─ At t=61s (1.67% into window 2):
│   └─ Weighted: (100 × 0.983) + 0 = 98.3 ≈ 99
│   └─ Limit: 100
│   └─ Can only make ~1 request ✓
└─ Prevents burst! ✓

Best For

  • Most production APIs - Best all-around choice

  • APIs needing better accuracy than Fixed Window

  • When memory is limited (can’t store all timestamps)

  • High-traffic applications where performance matters

  • Recommended default strategy

Advantages over alternatives:

  • More accurate than Fixed Window (no boundary bursts)

  • Much faster than Sliding Window (O(1) vs O(n))

  • Less memory than Sliding Window (2 counters vs n timestamps)

  • Industry standard (used by Cloudflare, Kong, AWS)

Implementation

from ratethrottle import RateThrottleCore, RateThrottleRule

limiter = RateThrottleCore()

# Standard configuration (recommended)
rule = RateThrottleRule(
    name="api_default",
    limit=100,
    window=60,
    strategy="sliding_counter"
)
limiter.add_rule(rule)

# High-traffic API
high_traffic = RateThrottleRule(
    name="public_api",
    limit=10000,
    window=60,
    strategy="sliding_counter"
)

# Per-user limit
user_rule = RateThrottleRule(
    name="user_api",
    limit=1000,
    window=3600,  # 1 hour
    strategy="sliding_counter",
    scope="user"
)

When to Use vs Other Strategies

Choose Sliding Window Counter when:
├─ You need accuracy better than Fixed Window
├─ You can't afford Sliding Window's memory cost
├─ You want industry-proven algorithm
└─ You want the best general-purpose strategy

Choose Fixed Window instead when:
├─ You need absolute maximum performance
└─ Boundary bursts are acceptable

Choose Sliding Window instead when:
├─ You need perfect accuracy (SLA/legal requirements)
├─ Memory cost is not a concern
└─ You can tolerate slower performance

Sliding Window Strategy

How It Works

Uses a sliding time window that moves with each request. Provides smooth rate limiting without boundary issues.

rule = RateThrottleRule(
    name="api_smooth",
    limit=100,
    window=60,
    strategy="sliding_window"
)

Key Characteristics:

  • Tracks timestamps of individual requests

  • Window slides with current time

  • No boundary condition issues

  • Higher memory usage

Example Behavior

Given: limit=100, window=60

Current Time: 00:01:00
Looking back 60 seconds: 00:00:00 - 00:01:00

Requests in window:
- 00:00:05: 20 requests
- 00:00:30: 40 requests
- 00:00:55: 30 requests
Total: 90 requests

New request at 00:01:00: ✓ Allowed (90/100)

Current Time: 00:01:30
Looking back 60 seconds: 00:00:30 - 00:01:30

Requests in window:
- 00:00:30: 40 requests (still in window)
- 00:00:55: 30 requests
- 00:01:00: 1 request
Total: 71 requests

Note: Requests at 00:00:05 are now outside window

Best For

  • Premium APIs requiring fair enforcement

  • Financial applications

  • Rate limits with legal/SLA requirements

  • Scenarios where accuracy is critical

Implementation

# Precise rate limiting
premium_rule = RateThrottleRule(
    name="premium_api",
    limit=5000,
    window=3600,
    strategy="sliding_window",
    scope="user"
)

# Payment endpoints
payment_rule = RateThrottleRule(
    name="payment",
    limit=10,
    window=60,
    strategy="sliding_window"
)

Strategy Comparison

Performance

Strategy

Speed

Memory Usage

Accuracy

Fixed Window

★★★★★

★★★★★

★★★☆☆

Sliding Window Counter

★★★★★

★★★★★

★★★★☆

Token Bucket

★★★★☆

★★★★☆

★★★★☆

Leaky Bucket

★★★★☆

★★★★☆

★★★★☆

Sliding Window

★★★☆☆

★★★☆☆

★★★★★

Burst Handling

# Token Bucket: Full burst support
limiter.check_rate_limit("client", "token_bucket_rule")
# Can burst up to 'burst' parameter

# Leaky Bucket: No bursts
limiter.check_rate_limit("client", "leaky_bucket_rule")
# Strictly enforces constant rate

# Fixed Window: Bursts at boundaries
limiter.check_rate_limit("client", "fixed_window_rule")
# Can burst at window edges (up to 2x limit)

# Sliding Window Counter: Minimal bursts
limiter.check_rate_limit("client", "sliding_counter_rule")
# Prevents boundary bursts, allows small variance

# Sliding Window: Limited bursts
limiter.check_rate_limit("client", "sliding_window_rule")
# Smooth enforcement prevents large bursts

Choosing a Strategy

Decision Tree

Need burst support?
├─ Yes → Do you need precise control?
│        ├─ Yes → Token Bucket
│        └─ No  → Fixed Window (simpler, faster)
│
└─ No  → Need perfect accuracy?
         ├─ Yes → Sliding Window (most accurate)
         └─ No  → Need good balance?
                  ├─ Yes → Sliding Window Counter (recommended)
                  └─ No  → Leaky Bucket (constant rate)

Use Case Matrix

Use Case

Recommended Strategy

Public REST API

Sliding Window Counter (best balance)

Internal microservices

Fixed Window (fast, simple)

Third-party API client

Leaky Bucket (respects their limits)

Payment processing

Sliding Window (perfect accuracy)

Search/autocomplete

Token Bucket (responsive bursts)

File uploads

Leaky Bucket (constant rate)

GraphQL API

Sliding Window Counter (accurate, fast)

WebSocket connections

Token Bucket (bursty traffic)

High-volume API

Sliding Window Counter (efficient, accurate)

SaaS applications

Sliding Window Counter (fair, performant)

Mobile app backend

Sliding Window Counter (handles bursts better)

Combining Strategies

You can use multiple strategies for different endpoints:

from ratethrottle import RateThrottleCore, RateThrottleRule

limiter = RateThrottleCore()

# Public endpoints: Best balance of accuracy and performance
public_rule = RateThrottleRule(
    name="public",
    limit=100,
    window=60,
    strategy="sliding_counter"
)

# Authenticated: Smooth limiting
auth_rule = RateThrottleRule(
    name="authenticated",
    limit=1000,
    window=60,
    strategy="sliding_counter"
)

# Expensive operations: Constant rate
expensive_rule = RateThrottleRule(
    name="expensive",
    limit=10,
    window=60,
    strategy="leaky_bucket"
)

# Background jobs: Simple counting
background_rule = RateThrottleRule(
    name="background",
    limit=10000,
    window=3600,
    strategy="fixed_window"
)

# Burst-friendly endpoints: Allow traffic spikes
burst_rule = RateThrottleRule(
    name="search",
    limit=100,
    window=60,
    strategy="token_bucket",
    burst=150
)

limiter.add_rule(public_rule)
limiter.add_rule(auth_rule)
limiter.add_rule(expensive_rule)
limiter.add_rule(background_rule)
limiter.add_rule(burst_rule)

Advanced Configuration

Token Bucket Fine-Tuning

# Generous burst for good UX
rule = RateThrottleRule(
    name="user_facing",
    limit=100,
    window=60,
    strategy="token_bucket",
    burst=200  # 2x burst capacity
)

# Strict: minimal burst
rule = RateThrottleRule(
    name="api_strict",
    limit=100,
    window=60,
    strategy="token_bucket",
    burst=105  # Only 5% burst
)

# No burst: same as limit
rule = RateThrottleRule(
    name="no_burst",
    limit=100,
    window=60,
    strategy="token_bucket",
    burst=100
)

Testing Strategies

You can test different strategies to find what works best:

import time
from ratethrottle import RateThrottleCore, RateThrottleRule

def test_strategy(strategy_name):
    limiter = RateThrottleCore()
    rule = RateThrottleRule(
        name="test",
        limit=10,
        window=60,
        strategy=strategy_name,
        burst=15 if strategy_name == "token_bucket" else None
    )
    limiter.add_rule(rule)

    # Make burst of requests
    allowed = 0
    for i in range(20):
        status = limiter.check_rate_limit("test_client", "test")
        if status.allowed:
            allowed += 1

    print(f"{strategy_name}: {allowed}/20 requests allowed in burst")

    # Wait and try again
    time.sleep(10)
    status = limiter.check_rate_limit("test_client", "test")
    print(f"  After 10s: {'Allowed' if status.allowed else 'Blocked'}")

# Test all strategies
strategies = [
    "fixed_window",
    "sliding_counter",
    "token_bucket",
    "leaky_bucket",
    "sliding_window"
]

for strategy in strategies:
    test_strategy(strategy)

Next Steps