NGINX sits at the front door of a lot of services. One of its most useful—yet frequently misunderstood—capabilities is rate limiting: capping request throughput so login forms aren’t brute-forced, upstreams don’t melt during traffic spikes, and “natural” browser bursts don’t turn into backlog and 5xx storms. Configured correctly, rate limiting smooths load without punishing real users.

Heads-up. The community has clarified that burst and delay are enforced with millisecond-resolution sliding windows, not coarse per-second averages. For exact semantics, always check the official docs on nginx.org.


The leaky bucket in HTTP terms

NGINX implements the classic leaky bucket algorithm:

  • Incoming water → incoming requests.
  • Bucket / FIFO → the queue that holds excess requests.
  • Leak rate → the maximum throughput the server will accept.
  • Overflow → requests dropped (rejected) when the queue is full.

Practically, this keeps browser “shotgun” fetches (HTML, CSS, JS, images) from hammering the origin, slows down brute-force attempts, and narrows blast radius during marketing peaks or cache-busting mishaps.


Two directives that matter: limit_req_zone and limit_req

You configure rate limiting with:

  1. limit_req_zone – defines key, shared memory zone, and base rate. Typically placed in the http block so it’s reusable.
  2. limit_reqapplies a zone policy to a server/location.

Minimal example to protect /login/ at 10 requests per second per client IP:

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

server {
    location /login/ {
        limit_req zone=mylimit;
        proxy_pass http://my_upstream;
    }
}
Code language: PHP (php)

Key points:

  • Key$binary_remote_addr groups by client IP in binary (saves memory vs $remote_addr).
  • Zonemylimit:10m reserves 10 MiB of shared memory; as a rule of thumb ~1 MiB ≈ 16,000 IPs, so 10 MiB stores ~160,000 entries. When space is exhausted, NGINX evicts old entries; if it still can’t insert, it returns 503.
  • Rate10r/s1 request per ~100 ms with sliding millisecond accounting. Requests that break that cadence are rejected unless you configure a burst.

Bursts you can live with: burst to avoid punishing browsers

Without a burst cushion, two back-to-back requests within ~100 ms will make the second one fail with 503. Real browsing is always bursty, so add a queue:

location /login/ {
    limit_req zone=mylimit burst=20;
    proxy_pass http://my_upstream;
}
Code language: JavaScript (javascript)
  • burst=20 allocates 20 queue slots above the base rate. Early requests queue and NGINX releases them at the configured cadence (≈1 every 100 ms here).
  • If 21 arrive at once, the first goes through immediately, 20 queue, and the 22nd is rejected.

Trade-off: spacing 20 requests at 100 ms means the tail waits ~2 s—often too long to be useful.


Bursts without added latency: nodelay

Use nodelay to avoid injecting artificial delay while still enforcing the rate:

location /login/ {
    limit_req zone=mylimit burst=20 nodelay;
    proxy_pass http://my_upstream;
}
Code language: JavaScript (javascript)
  • With nodelay, NGINX sends queued requests immediately as long as there are free burst slots, and marks those slots as occupied for the interval (≈100 ms here).
  • If 21 requests arrive at once, all 21 are forwarded immediately; 20 burst slots are marked and released one every ~100 ms.
  • If another 20 arrive 101 ms later, only 1 slot is free → 1 is forwarded, 19 are rejected.

Net effect: the effective rate is preserved without adding delay to the allowed burst.
Practical default: combine burst + nodelay unless you want intentional spacing.


Two-stage limiting: delay for “free pass, then throttle”

Since NGINX 1.15.7, delay lets you accept the first X excess requests immediately, then throttle up to the burst cap, and finally reject beyond it. Typical page pattern (4–12 resources) with a base rate of 5 r/s:

limit_req_zone $binary_remote_addr zone=ip:10m rate=5r/s;

server {
    location / {
        limit_req zone=ip burst=12 delay=8;
        proxy_pass http://website;
    }
}
Code language: PHP (php)
  • First 8 above the base rate pass without delay.
  • Next 4 are delayed to honor 5 r/s.
  • Further excess requests are rejected until slots free up.

Remember: enforcement is ms-granular with a sliding window.


Allowlists and conditional keys with geo and map

You can exempt trusted subnets (or give them a higher cap) by making the key conditional:

geo $limit {
    default        1;
    10.0.0.0/8     0;
    192.168.0.0/24 0;
}

map $limit $limit_key {
    0 "";
    1 $binary_remote_addr;
}

limit_req_zone $limit_key zone=req_zone:10m rate=5r/s;

server {
    location / {
        limit_req zone=req_zone burst=10 nodelay;
        # ...
    }
}
Code language: PHP (php)
  • Trusted IPs get $limit=0map sets empty string as key.
  • When the zone key is empty, the limit does not apply.
  • Everyone else keys by IP and is capped at 5 r/s (+10 burst).

Multiple limit_req per location: the most restrictive wins

You can stack several limits in a single location. All matching limits apply; the longest delay or any reject wins.

Example: give allowlisted IPs a looser cap while keeping a stricter global cap:

http {
    limit_req_zone $limit_key           zone=req_zone:10m     rate=5r/s;
    limit_req_zone $binary_remote_addr  zone=req_zone_wl:10m  rate=15r/s;

    server {
        location / {
            limit_req zone=req_zone     burst=10 nodelay;   # everyone
            limit_req zone=req_zone_wl  burst=20 nodelay;   # allowlist
        }
    }
}
Code language: PHP (php)
  • Allowlisted IPs don’t match req_zone (empty key) but do match req_zone_wl15 r/s.
  • Non-allowlisted IPs match both5 r/s prevails.

Logging, response codes, and related knobs

Log entries. Rejections are logged at error level by default; delayed requests one level lower (warn). Example:

YYYY/MM/DD HH:MM:SS [error] … limiting requests, excess: 1.000 by zone "mylimit", client: 192.0.2.10, request: "GET / HTTP/1.1"
Code language: JavaScript (javascript)
  • excess is how far over the configured rate the request is (per millisecond).
  • Change level with limit_req_log_level:
location /login/ {
    limit_req zone=mylimit burst=20 nodelay;
    limit_req_log_level warn;
}

Response code. Default is 503. Override with limit_req_status:

location /login/ {
    limit_req zone=mylimit burst=20 nodelay;
    limit_req_status 444;  # drop without a formal response
}
Code language: PHP (php)

Deny a path outright. If you really want it closed:

location /foo.php { deny all; }

Zone sizing and memory behavior

Keep an eye on zone size: ~1 MiB ≈ 16k IPs. High cardinality (many unique clients or distributed scans) will exhaust a tiny zone and cause 503s unrelated to your app.

When inserting a new entry:

  • NGINX evicts up to two entries unused in the last 60 s.
  • If there’s still no space, it returns 503.

Diagnostic hint: if 503s appear only during heavy traffic and cache hit ratio is fine, check zone size before blaming the upstream.


Where to apply limits (and where not)

Good candidates:

  • Sensitive endpoints: /login/, password reset, auth APIs, /wp-login.php, /xmlrpc.php.
  • “Hot” endpoints: search, promo landing pages, DB-heavy operations.
  • Feeds/listings hit by scrapers.

Avoid strict limits on static assets or third-party CDNs—don’t throttle bytes you can cache.


Common footguns

  1. No burst → real users get 503s on normal browser bursts. Add burst and usually nodelay.
  2. Unrealistic rates → e.g., 5 r/s might be fine for login, not for a homepage that triggers ~12 fetches. Tune rate, burst, possibly delay.
  3. Tiny zones1–2 MiB won’t last on busy sites; size for expected IP cardinality.
  4. Global hammer → apply per location/server; broad limits punish the wrong traffic.
  5. No logs → adjust blind. Use limit_req_log_level and ship the zone name with your logs.
  6. Code mismatch → defaults return 503; if your business logic expects 429 (Too Many Requests), translate at a higher layer or set limit_req_status 429.

Three-step rollout

1) Measure

  • Count resources per typical page (4–6 common, sometimes up to 12).
  • Identify abused endpoints (login, search, APIs).
  • Estimate peak unique IPs to size the zone.

2) Configure

  • Set limit_req_zone in http with a realistic rate and sufficient memory.
  • Apply limit_req on sensitive paths with burst + nodelay.
  • For multi-asset pages, consider delay to give a short free pass then throttle.

3) Observe & adjust

  • Watch 503s by zone, excess values, and user-visible complaints.
  • Tune rate, burst, or zone size based on real traffic.
  • Add allowlists with geo/map where justified.

Complete example with critical paths and trust differentiation

http {
    # Trust map: exempt internal ranges
    geo $is_trusted {
        default        1;
        10.0.0.0/8     0;
        192.168.0.0/24 0;
    }

    map $is_trusted $rate_key {
        0 "";
        1 $binary_remote_addr;
    }

    # Zones
    limit_req_zone $rate_key           zone=zone_login:10m  rate=5r/s;
    limit_req_zone $binary_remote_addr zone=zone_public:20m rate=10r/s;

    server {
        listen 80;

        # Login: slow brute-force without adding latency to legitimate bursts
        location /login/ {
            limit_req          zone=zone_login  burst=8  nodelay;
            limit_req_log_level warn;
            proxy_pass         http://auth_upstream;
        }

        # Main pages: larger burst, no delay to avoid perceived slowness
        location / {
            limit_req  zone=zone_public burst=20 nodelay;
            proxy_pass http://app_upstream;
        }

        # Keep PHP out of the edge if it shouldn't be there
        location ~* \.php$ { deny all; }
    }
}
Code language: PHP (php)

This caps login at 5 r/s (+8 burst, no added delay), public routes at 10 r/s (+20 burst), and blocks direct PHP execution at the edge. Trusted subnets skip the login limit via empty key.


FAQ

How do I pick a reasonable requests-per-second cap for login in NGINX?
Start with ~5 r/s and burst 8–12 nodelay, then refine using logs. Consider the number of auxiliary calls your login flow makes (captcha, telemetry, SSO). Increase burst (not base rate) if you see legit clients tripping the cap.

What’s the practical difference between burst and nodelay?
burst allocates queue slots above the base rate. Without nodelay, NGINX spaces queued requests (adds delay). With nodelay, it sends immediately while “consuming” burst slots for the interval—preserving the effective rate without visible latency.

When should I use delay (two-stage limiting)?
When a typical page triggers several parallel fetches (e.g., up to 12 assets). delay lets the first X pass immediately (e.g., 8), then spaces the rest to respect the base rate (e.g., 5 r/s) before rejecting beyond the burst cap.

How do I log and diagnose rate-limit rejections without drowning in logs?
Use limit_req_log_level (e.g., set to warn), and include zone in your log fields. Track 503s per zone, excess values, and correlate with user reports. If rejections cluster only during peaks, check zone size and burst; if they appear at low traffic, your rate is probably too low or a specific client deserves its own rule.

source: blog.nginx.org

Scroll to Top