NGINX rate limiting that actually works: leaky bucket, limit_req_zone, and sane burst handling for production

Published 10/22/2025

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

NGINX sits at the front door of a lot of services. One of its most useful—yet frequently misunderstood—capabilities is rate limiting: capping request throughput so login forms aren’t brute-forced, upstreams don’t melt during traffic spikes, and “natural” browser bursts don’t turn into backlog and 5xx storms. Configured correctly, rate limiting smooths load without punishing real users.

Heads-up. The community has clarified that burst and delay are enforced with millisecond-resolution sliding windows, not coarse per-second averages. For exact semantics, always check the official docs on nginx.org.

The leaky bucket in HTTP terms

NGINX implements the classic leaky bucket algorithm:

Incoming water → incoming requests.
Bucket / FIFO → the queue that holds excess requests.
Leak rate → the maximum throughput the server will accept.
Overflow → requests dropped (rejected) when the queue is full.

Practically, this keeps browser “shotgun” fetches (HTML, CSS, JS, images) from hammering the origin, slows down brute-force attempts, and narrows blast radius during marketing peaks or cache-busting mishaps.

Two directives that matter: `limit_req_zone` and `limit_req`

You configure rate limiting with:

limit_req_zone – defines key, shared memory zone, and base rate. Typically placed in the http block so it’s reusable.
limit_req – applies a zone policy to a server/location.

Minimal example to protect /login/ at 10 requests per second per client IP:

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

server {
    location /login/ {
        limit_req zone=mylimit;
        proxy_pass http://my_upstream;
    }
}
Code language: PHP (php)

Key points:

Key – $binary_remote_addr groups by client IP in binary (saves memory vs $remote_addr).
Zone – mylimit:10m reserves 10 MiB of shared memory; as a rule of thumb ~1 MiB ≈ 16,000 IPs, so 10 MiB stores ~160,000 entries. When space is exhausted, NGINX evicts old entries; if it still can’t insert, it returns 503.
Rate – 10r/s ≈ 1 request per ~100 ms with sliding millisecond accounting. Requests that break that cadence are rejected unless you configure a burst.

Bursts you can live with: `burst` to avoid punishing browsers

Without a burst cushion, two back-to-back requests within ~100 ms will make the second one fail with 503. Real browsing is always bursty, so add a queue:

location /login/ {
    limit_req zone=mylimit burst=20;
    proxy_pass http://my_upstream;
}
Code language: JavaScript (javascript)

burst=20 allocates 20 queue slots above the base rate. Early requests queue and NGINX releases them at the configured cadence (≈1 every 100 ms here).
If 21 arrive at once, the first goes through immediately, 20 queue, and the 22nd is rejected.

Trade-off: spacing 20 requests at 100 ms means the tail waits ~2 s—often too long to be useful.

Bursts without added latency: `nodelay`

Use nodelay to avoid injecting artificial delay while still enforcing the rate:

location /login/ {
    limit_req zone=mylimit burst=20 nodelay;
    proxy_pass http://my_upstream;
}
Code language: JavaScript (javascript)

With nodelay, NGINX sends queued requests immediately as long as there are free burst slots, and marks those slots as occupied for the interval (≈100 ms here).
If 21 requests arrive at once, all 21 are forwarded immediately; 20 burst slots are marked and released one every ~100 ms.
If another 20 arrive 101 ms later, only 1 slot is free → 1 is forwarded, 19 are rejected.

Net effect: the effective rate is preserved without adding delay to the allowed burst.
Practical default: combine burst + nodelay unless you want intentional spacing.

Two-stage limiting: `delay` for “free pass, then throttle”

Since NGINX 1.15.7, delay lets you accept the first X excess requests immediately, then throttle up to the burst cap, and finally reject beyond it. Typical page pattern (4–12 resources) with a base rate of 5 r/s:

limit_req_zone $binary_remote_addr zone=ip:10m rate=5r/s;

server {
    location / {
        limit_req zone=ip burst=12 delay=8;
        proxy_pass http://website;
    }
}
Code language: PHP (php)

First 8 above the base rate pass without delay.
Next 4 are delayed to honor 5 r/s.
Further excess requests are rejected until slots free up.

Remember: enforcement is ms-granular with a sliding window.

Allowlists and conditional keys with `geo` and `map`

You can exempt trusted subnets (or give them a higher cap) by making the key conditional:

geo $limit {
    default        1;
    10.0.0.0/8     0;
    192.168.0.0/24 0;
}

map $limit $limit_key {
    0 "";
    1 $binary_remote_addr;
}

limit_req_zone $limit_key zone=req_zone:10m rate=5r/s;

server {
    location / {
        limit_req zone=req_zone burst=10 nodelay;
        # ...
    }
}
Code language: PHP (php)

Trusted IPs get $limit=0 → map sets empty string as key.
When the zone key is empty, the limit does not apply.
Everyone else keys by IP and is capped at 5 r/s (+10 burst).

Multiple `limit_req` per location: the most restrictive wins

You can stack several limits in a single location. All matching limits apply; the longest delay or any reject wins.

Example: give allowlisted IPs a looser cap while keeping a stricter global cap:

http {
    limit_req_zone $limit_key           zone=req_zone:10m     rate=5r/s;
    limit_req_zone $binary_remote_addr  zone=req_zone_wl:10m  rate=15r/s;

    server {
        location / {
            limit_req zone=req_zone     burst=10 nodelay;   # everyone
            limit_req zone=req_zone_wl  burst=20 nodelay;   # allowlist
        }
    }
}
Code language: PHP (php)

Allowlisted IPs don’t match req_zone (empty key) but do match req_zone_wl → 15 r/s.
Non-allowlisted IPs match both → 5 r/s prevails.

Logging, response codes, and related knobs

Log entries. Rejections are logged at error level by default; delayed requests one level lower (warn). Example:

YYYY/MM/DD HH:MM:SS [error] … limiting requests, excess: 1.000 by zone "mylimit", client: 192.0.2.10, request: "GET / HTTP/1.1"
Code language: JavaScript (javascript)

excess is how far over the configured rate the request is (per millisecond).
Change level with limit_req_log_level:

location /login/ {
    limit_req zone=mylimit burst=20 nodelay;
    limit_req_log_level warn;
}

Response code. Default is 503. Override with limit_req_status:

location /login/ {
    limit_req zone=mylimit burst=20 nodelay;
    limit_req_status 444;  # drop without a formal response
}
Code language: PHP (php)

Deny a path outright. If you really want it closed:

location /foo.php { deny all; }

Zone sizing and memory behavior

Keep an eye on zone size: ~1 MiB ≈ 16k IPs. High cardinality (many unique clients or distributed scans) will exhaust a tiny zone and cause 503s unrelated to your app.

When inserting a new entry:

NGINX evicts up to two entries unused in the last 60 s.
If there’s still no space, it returns 503.

Diagnostic hint: if 503s appear only during heavy traffic and cache hit ratio is fine, check zone size before blaming the upstream.

Where to apply limits (and where not)

Good candidates:

Sensitive endpoints: /login/, password reset, auth APIs, /wp-login.php, /xmlrpc.php.
“Hot” endpoints: search, promo landing pages, DB-heavy operations.
Feeds/listings hit by scrapers.

Avoid strict limits on static assets or third-party CDNs—don’t throttle bytes you can cache.

Common footguns

No burst → real users get 503s on normal browser bursts. Add burst and usually nodelay.
Unrealistic rates → e.g., 5 r/s might be fine for login, not for a homepage that triggers ~12 fetches. Tune rate, burst, possibly delay.
Tiny zones → 1–2 MiB won’t last on busy sites; size for expected IP cardinality.
Global hammer → apply per location/server; broad limits punish the wrong traffic.
No logs → adjust blind. Use limit_req_log_level and ship the zone name with your logs.
Code mismatch → defaults return 503; if your business logic expects 429 (Too Many Requests), translate at a higher layer or set limit_req_status 429.

Three-step rollout

1) Measure

Count resources per typical page (4–6 common, sometimes up to 12).
Identify abused endpoints (login, search, APIs).
Estimate peak unique IPs to size the zone.

2) Configure

Set limit_req_zone in http with a realistic rate and sufficient memory.
Apply limit_req on sensitive paths with burst + nodelay.
For multi-asset pages, consider delay to give a short free pass then throttle.

3) Observe & adjust

Watch 503s by zone, excess values, and user-visible complaints.
Tune rate, burst, or zone size based on real traffic.
Add allowlists with geo/map where justified.

Complete example with critical paths and trust differentiation

http {
    # Trust map: exempt internal ranges
    geo $is_trusted {
        default        1;
        10.0.0.0/8     0;
        192.168.0.0/24 0;
    }

    map $is_trusted $rate_key {
        0 "";
        1 $binary_remote_addr;
    }

    # Zones
    limit_req_zone $rate_key           zone=zone_login:10m  rate=5r/s;
    limit_req_zone $binary_remote_addr zone=zone_public:20m rate=10r/s;

    server {
        listen 80;

        # Login: slow brute-force without adding latency to legitimate bursts
        location /login/ {
            limit_req          zone=zone_login  burst=8  nodelay;
            limit_req_log_level warn;
            proxy_pass         http://auth_upstream;
        }

        # Main pages: larger burst, no delay to avoid perceived slowness
        location / {
            limit_req  zone=zone_public burst=20 nodelay;
            proxy_pass http://app_upstream;
        }

        # Keep PHP out of the edge if it shouldn't be there
        location ~* \.php$ { deny all; }
    }
}
Code language: PHP (php)

This caps login at 5 r/s (+8 burst, no added delay), public routes at 10 r/s (+20 burst), and blocks direct PHP execution at the edge. Trusted subnets skip the login limit via empty key.

FAQ

How do I pick a reasonable requests-per-second cap for login in NGINX?
Start with ~5 r/s and burst 8–12 nodelay, then refine using logs. Consider the number of auxiliary calls your login flow makes (captcha, telemetry, SSO). Increase burst (not base rate) if you see legit clients tripping the cap.

What’s the practical difference between burst and nodelay?
burst allocates queue slots above the base rate. Without nodelay, NGINX spaces queued requests (adds delay). With nodelay, it sends immediately while “consuming” burst slots for the interval—preserving the effective rate without visible latency.

When should I use delay (two-stage limiting)?
When a typical page triggers several parallel fetches (e.g., up to 12 assets). delay lets the first X pass immediately (e.g., 8), then spaces the rest to respect the base rate (e.g., 5 r/s) before rejecting beyond the burst cap.

How do I log and diagnose rate-limit rejections without drowning in logs?
Use limit_req_log_level (e.g., set to warn), and include zone in your log fields. Track 503s per zone, excess values, and correlate with user reports. If rejections cluster only during peaks, check zone size and burst; if they appear at low traffic, your rate is probably too low or a specific client deserves its own rule.

source: blog.nginx.org

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

NGINX rate limiting that actually works: leaky bucket, limit_req_zone, and sane burst handling for production

The leaky bucket in HTTP terms

Two directives that matter: `limit_req_zone` and `limit_req`

Bursts you can live with: `burst` to avoid punishing browsers

Bursts without added latency: `nodelay`

Two-stage limiting: `delay` for “free pass, then throttle”

Allowlists and conditional keys with `geo` and `map`

Multiple `limit_req` per location: the most restrictive wins

Logging, response codes, and related knobs

Zone sizing and memory behavior

Where to apply limits (and where not)

Common footguns

Three-step rollout

Complete example with critical paths and trust differentiation

FAQ

Related articles

Linux Mint 22.1 “Xia”: Everything You Need to Know About the New Release

VMware’s Latest Move: No Active Contract, No Security Patches – A Dangerous Precedent

Mastering Bash History: Advanced Techniques for Efficient Linux Server Management

How to Copy Files in Linux and Overwrite Without Confirmation

PCI Express x16, x8, x4, and x1: Differences and Performance Impact

The Linux Ping command: Definition, practical examples, and tips

Microsoft Unveils “Quick Machine Recovery” in Windows 11: Remote Solution for Critical Boot Failures

VPN Configuration Chaos: A Cautionary Tale of 40 Tunnels, Inconsistent Parameters and Bandwidth Bottlenecks

Language models and their impact on digital security: The case of Stack Overflow

Microsoft Releases “Edit”: A Lightweight, Open Source Terminal Text Editor

The leaky bucket in HTTP terms

Two directives that matter: limit_req_zone and limit_req

Bursts you can live with: burst to avoid punishing browsers

Bursts without added latency: nodelay

Two-stage limiting: delay for “free pass, then throttle”

Allowlists and conditional keys with geo and map

Multiple limit_req per location: the most restrictive wins

Logging, response codes, and related knobs

Zone sizing and memory behavior

Where to apply limits (and where not)

Common footguns

Three-step rollout

Complete example with critical paths and trust differentiation

FAQ

Related articles

Two directives that matter: `limit_req_zone` and `limit_req`

Bursts you can live with: `burst` to avoid punishing browsers

Bursts without added latency: `nodelay`

Two-stage limiting: `delay` for “free pass, then throttle”

Allowlists and conditional keys with `geo` and `map`

Multiple `limit_req` per location: the most restrictive wins