NGINX sits at the front door of a lot of services. One of its most useful—yet frequently misunderstood—capabilities is rate limiting: capping request throughput so login forms aren’t brute-forced, upstreams don’t melt during traffic spikes, and “natural” browser bursts don’t turn into backlog and 5xx storms. Configured correctly, rate limiting smooths load without punishing real users.
Heads-up. The community has clarified that
burstanddelayare enforced with millisecond-resolution sliding windows, not coarse per-second averages. For exact semantics, always check the official docs on nginx.org.
The leaky bucket in HTTP terms
NGINX implements the classic leaky bucket algorithm:
- Incoming water → incoming requests.
- Bucket / FIFO → the queue that holds excess requests.
- Leak rate → the maximum throughput the server will accept.
- Overflow → requests dropped (rejected) when the queue is full.
Practically, this keeps browser “shotgun” fetches (HTML, CSS, JS, images) from hammering the origin, slows down brute-force attempts, and narrows blast radius during marketing peaks or cache-busting mishaps.
Two directives that matter: limit_req_zone and limit_req
You configure rate limiting with:
limit_req_zone– defines key, shared memory zone, and base rate. Typically placed in thehttpblock so it’s reusable.limit_req– applies a zone policy to aserver/location.
Minimal example to protect /login/ at 10 requests per second per client IP:
limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;
server {
location /login/ {
limit_req zone=mylimit;
proxy_pass http://my_upstream;
}
}
Code language: PHP (php)
Key points:
- Key –
$binary_remote_addrgroups by client IP in binary (saves memory vs$remote_addr). - Zone –
mylimit:10mreserves 10 MiB of shared memory; as a rule of thumb ~1 MiB ≈ 16,000 IPs, so 10 MiB stores ~160,000 entries. When space is exhausted, NGINX evicts old entries; if it still can’t insert, it returns 503. - Rate –
10r/s≈ 1 request per ~100 ms with sliding millisecond accounting. Requests that break that cadence are rejected unless you configure a burst.
Bursts you can live with: burst to avoid punishing browsers
Without a burst cushion, two back-to-back requests within ~100 ms will make the second one fail with 503. Real browsing is always bursty, so add a queue:
location /login/ {
limit_req zone=mylimit burst=20;
proxy_pass http://my_upstream;
}
Code language: JavaScript (javascript)
burst=20allocates 20 queue slots above the base rate. Early requests queue and NGINX releases them at the configured cadence (≈1 every 100 ms here).- If 21 arrive at once, the first goes through immediately, 20 queue, and the 22nd is rejected.
Trade-off: spacing 20 requests at 100 ms means the tail waits ~2 s—often too long to be useful.
Bursts without added latency: nodelay
Use nodelay to avoid injecting artificial delay while still enforcing the rate:
location /login/ {
limit_req zone=mylimit burst=20 nodelay;
proxy_pass http://my_upstream;
}
Code language: JavaScript (javascript)
- With
nodelay, NGINX sends queued requests immediately as long as there are free burst slots, and marks those slots as occupied for the interval (≈100 ms here). - If 21 requests arrive at once, all 21 are forwarded immediately; 20 burst slots are marked and released one every ~100 ms.
- If another 20 arrive 101 ms later, only 1 slot is free → 1 is forwarded, 19 are rejected.
Net effect: the effective rate is preserved without adding delay to the allowed burst.
Practical default: combine burst + nodelay unless you want intentional spacing.
Two-stage limiting: delay for “free pass, then throttle”
Since NGINX 1.15.7, delay lets you accept the first X excess requests immediately, then throttle up to the burst cap, and finally reject beyond it. Typical page pattern (4–12 resources) with a base rate of 5 r/s:
limit_req_zone $binary_remote_addr zone=ip:10m rate=5r/s;
server {
location / {
limit_req zone=ip burst=12 delay=8;
proxy_pass http://website;
}
}
Code language: PHP (php)
- First 8 above the base rate pass without delay.
- Next 4 are delayed to honor 5 r/s.
- Further excess requests are rejected until slots free up.
Remember: enforcement is ms-granular with a sliding window.
Allowlists and conditional keys with geo and map
You can exempt trusted subnets (or give them a higher cap) by making the key conditional:
geo $limit {
default 1;
10.0.0.0/8 0;
192.168.0.0/24 0;
}
map $limit $limit_key {
0 "";
1 $binary_remote_addr;
}
limit_req_zone $limit_key zone=req_zone:10m rate=5r/s;
server {
location / {
limit_req zone=req_zone burst=10 nodelay;
# ...
}
}
Code language: PHP (php)
- Trusted IPs get
$limit=0→mapsets empty string as key. - When the zone key is empty, the limit does not apply.
- Everyone else keys by IP and is capped at 5 r/s (+10 burst).
Multiple limit_req per location: the most restrictive wins
You can stack several limits in a single location. All matching limits apply; the longest delay or any reject wins.
Example: give allowlisted IPs a looser cap while keeping a stricter global cap:
http {
limit_req_zone $limit_key zone=req_zone:10m rate=5r/s;
limit_req_zone $binary_remote_addr zone=req_zone_wl:10m rate=15r/s;
server {
location / {
limit_req zone=req_zone burst=10 nodelay; # everyone
limit_req zone=req_zone_wl burst=20 nodelay; # allowlist
}
}
}
Code language: PHP (php)
- Allowlisted IPs don’t match
req_zone(empty key) but do matchreq_zone_wl→ 15 r/s. - Non-allowlisted IPs match both → 5 r/s prevails.
Logging, response codes, and related knobs
Log entries. Rejections are logged at error level by default; delayed requests one level lower (warn). Example:
YYYY/MM/DD HH:MM:SS [error] … limiting requests, excess: 1.000 by zone "mylimit", client: 192.0.2.10, request: "GET / HTTP/1.1"
Code language: JavaScript (javascript)
excessis how far over the configured rate the request is (per millisecond).- Change level with
limit_req_log_level:
location /login/ {
limit_req zone=mylimit burst=20 nodelay;
limit_req_log_level warn;
}
Response code. Default is 503. Override with limit_req_status:
location /login/ {
limit_req zone=mylimit burst=20 nodelay;
limit_req_status 444; # drop without a formal response
}
Code language: PHP (php)
Deny a path outright. If you really want it closed:
location /foo.php { deny all; }
Zone sizing and memory behavior
Keep an eye on zone size: ~1 MiB ≈ 16k IPs. High cardinality (many unique clients or distributed scans) will exhaust a tiny zone and cause 503s unrelated to your app.
When inserting a new entry:
- NGINX evicts up to two entries unused in the last 60 s.
- If there’s still no space, it returns 503.
Diagnostic hint: if 503s appear only during heavy traffic and cache hit ratio is fine, check zone size before blaming the upstream.
Where to apply limits (and where not)
Good candidates:
- Sensitive endpoints:
/login/, password reset, auth APIs,/wp-login.php,/xmlrpc.php. - “Hot” endpoints: search, promo landing pages, DB-heavy operations.
- Feeds/listings hit by scrapers.
Avoid strict limits on static assets or third-party CDNs—don’t throttle bytes you can cache.
Common footguns
- No
burst→ real users get 503s on normal browser bursts. Addburstand usuallynodelay. - Unrealistic rates → e.g., 5 r/s might be fine for login, not for a homepage that triggers ~12 fetches. Tune
rate,burst, possiblydelay. - Tiny zones → 1–2 MiB won’t last on busy sites; size for expected IP cardinality.
- Global hammer → apply per
location/server; broad limits punish the wrong traffic. - No logs → adjust blind. Use
limit_req_log_leveland ship the zone name with your logs. - Code mismatch → defaults return 503; if your business logic expects 429 (Too Many Requests), translate at a higher layer or set
limit_req_status 429.
Three-step rollout
1) Measure
- Count resources per typical page (4–6 common, sometimes up to 12).
- Identify abused endpoints (login, search, APIs).
- Estimate peak unique IPs to size the zone.
2) Configure
- Set
limit_req_zoneinhttpwith a realistic rate and sufficient memory. - Apply
limit_reqon sensitive paths withburst+nodelay. - For multi-asset pages, consider
delayto give a short free pass then throttle.
3) Observe & adjust
- Watch 503s by zone, excess values, and user-visible complaints.
- Tune rate, burst, or zone size based on real traffic.
- Add allowlists with
geo/mapwhere justified.
Complete example with critical paths and trust differentiation
http {
# Trust map: exempt internal ranges
geo $is_trusted {
default 1;
10.0.0.0/8 0;
192.168.0.0/24 0;
}
map $is_trusted $rate_key {
0 "";
1 $binary_remote_addr;
}
# Zones
limit_req_zone $rate_key zone=zone_login:10m rate=5r/s;
limit_req_zone $binary_remote_addr zone=zone_public:20m rate=10r/s;
server {
listen 80;
# Login: slow brute-force without adding latency to legitimate bursts
location /login/ {
limit_req zone=zone_login burst=8 nodelay;
limit_req_log_level warn;
proxy_pass http://auth_upstream;
}
# Main pages: larger burst, no delay to avoid perceived slowness
location / {
limit_req zone=zone_public burst=20 nodelay;
proxy_pass http://app_upstream;
}
# Keep PHP out of the edge if it shouldn't be there
location ~* \.php$ { deny all; }
}
}
Code language: PHP (php)
This caps login at 5 r/s (+8 burst, no added delay), public routes at 10 r/s (+20 burst), and blocks direct PHP execution at the edge. Trusted subnets skip the login limit via empty key.
FAQ
How do I pick a reasonable requests-per-second cap for login in NGINX?
Start with ~5 r/s and burst 8–12 nodelay, then refine using logs. Consider the number of auxiliary calls your login flow makes (captcha, telemetry, SSO). Increase burst (not base rate) if you see legit clients tripping the cap.
What’s the practical difference between burst and nodelay?burst allocates queue slots above the base rate. Without nodelay, NGINX spaces queued requests (adds delay). With nodelay, it sends immediately while “consuming” burst slots for the interval—preserving the effective rate without visible latency.
When should I use delay (two-stage limiting)?
When a typical page triggers several parallel fetches (e.g., up to 12 assets). delay lets the first X pass immediately (e.g., 8), then spaces the rest to respect the base rate (e.g., 5 r/s) before rejecting beyond the burst cap.
How do I log and diagnose rate-limit rejections without drowning in logs?
Use limit_req_log_level (e.g., set to warn), and include zone in your log fields. Track 503s per zone, excess values, and correlate with user reports. If rejections cluster only during peaks, check zone size and burst; if they appear at low traffic, your rate is probably too low or a specific client deserves its own rule.
source: blog.nginx.org
