Killing Bots at the Gate: Detecting Malicious Crawlers with Nginx
Bots are a fact of life on the internet. Some are helpful—like search engine crawlers. Others scrape your data, spam your forms, or brute-force your login pages. If you’re self-hosting with Nginx, you don’t need a pricey SaaS WAF to stop them. Here's how to detect and destroy malicious bots using good ol’ Nginx, a few scripts, and some zip-bomb flavor. 1. Start with Logs — Always Nginx logs tell the full story. Make sure you're capturing User-Agent, IP, and paths. log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent"'; access_log /var/log/nginx/access.log main; Now dig through logs for patterns: # Top IPs by request volume awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head # Suspicious User-Agents grep -iE 'curl|wget|python|scrapy|bot|crawler|headless' /var/log/nginx/access.log | less Want real-time views? Try GoAccess for a terminal dashboard. 2. Identify Suspicious Behavior Things that scream “bot”: Blank or obviously fake User-Agent headers High request volume from a single IP Frequent hits to /wp-login.php, /xmlrpc.php, /admin, or random paths Unusual Referer headers or none at all Crawlers hitting endpoints that no normal user would Bonus: check your logs against public bot signature lists like MitchellKrogza’s bad bot list. 3. Block the Obvious Stuff with Nginx Create a quick and dirty User-Agent filter: map $http_user_agent $bad_bot { default 0; ~*(curl|wget|python|scrapy|bot|Go-http-client) 1; } server { if ($bad_bot) { return 403; } } And rate limit abusive IPs: limit_req_zone $binary_remote_addr zone=abusers:10m rate=5r/s; server { location / { limit_req zone=abusers burst=10 nodelay; ... } } Also check out Nginx rate limiting docs. 4. Use Fail2Ban to Auto-Ban IPs Install Fail2Ban and wire it to your Nginx logs: Jail config (/etc/fail2ban/jail.local): [nginx-badbots] enabled = true filter = nginx-badbots logpath = /var/log/nginx/access.log maxretry = 5 findtime = 600 bantime = 3600 Filter (/etc/fail2ban/filter.d/nginx-badbots.conf): [Definition] failregex = ^ -.*"(GET|POST).*HTTP.*"(curl|wget|python|scrapy|bot|Go-http-client) ignoreregex = Once this is running, bots get banned automatically after a few hits. 5. Use Better Tools for Smarter Bots If you're seeing more sophisticated attacks, try: CrowdSec: Open-source tool that shares a dynamic IP reputation list and applies bans ModSecurity: Full WAF, works with Nginx OpenResty: Extend Nginx with Lua scripting (e.g., custom captcha, behavior analysis) If you’re open to a proxy layer: Cloudflare free tier: Blocks a lot of trash automatically Fastly Bot Protection: Advanced but paid Bonus Serve Zip Bombs to Dumb Bots (⚠️ Handle with care) This blog post by Idiallo shows how he turned bot detection into punishment. The method? Serve them a compressed zip bomb. To generate one: dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz This creates a ~10MB file that decompresses to 10GB of zeros. If a bot tries to read it without knowing, it chokes. Then detect and serve it: if (ipIsBlackListed() || isMalicious()) { header("Content-Encoding: deflate, gzip"); header("Content-Length: " . filesize(ZIP_BOMB_FILE_10G)); readfile(ZIP_BOMB_FILE_10G); exit; } He explains that when traffic spikes, he swaps in a 1MB variant. It’s a great deterrent for low-effort bots. Heuristics like repeated scanning and double-visits from spam IPs helped him fine-tune this method.

Bots are a fact of life on the internet.
Some are helpful—like search engine crawlers.
Others scrape your data, spam your forms, or brute-force your login pages.
If you’re self-hosting with Nginx, you don’t need a pricey SaaS WAF to stop them.
Here's how to detect and destroy malicious bots using good ol’ Nginx, a few scripts, and some zip-bomb flavor.
1. Start with Logs — Always
Nginx logs tell the full story. Make sure you're capturing User-Agent
, IP, and paths.
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent"';
access_log /var/log/nginx/access.log main;
Now dig through logs for patterns:
# Top IPs by request volume
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
# Suspicious User-Agents
grep -iE 'curl|wget|python|scrapy|bot|crawler|headless' /var/log/nginx/access.log | less
Want real-time views? Try GoAccess for a terminal dashboard.
2. Identify Suspicious Behavior
Things that scream “bot”:
- Blank or obviously fake
User-Agent
headers - High request volume from a single IP
- Frequent hits to
/wp-login.php
,/xmlrpc.php
,/admin
, or random paths - Unusual
Referer
headers or none at all - Crawlers hitting endpoints that no normal user would
Bonus: check your logs against public bot signature lists like MitchellKrogza’s bad bot list.
3. Block the Obvious Stuff with Nginx
Create a quick and dirty User-Agent filter:
map $http_user_agent $bad_bot {
default 0;
~*(curl|wget|python|scrapy|bot|Go-http-client) 1;
}
server {
if ($bad_bot) {
return 403;
}
}
And rate limit abusive IPs:
limit_req_zone $binary_remote_addr zone=abusers:10m rate=5r/s;
server {
location / {
limit_req zone=abusers burst=10 nodelay;
...
}
}
Also check out Nginx rate limiting docs.
4. Use Fail2Ban to Auto-Ban IPs
Install Fail2Ban and wire it to your Nginx logs:
Jail config (/etc/fail2ban/jail.local
):
[nginx-badbots]
enabled = true
filter = nginx-badbots
logpath = /var/log/nginx/access.log
maxretry = 5
findtime = 600
bantime = 3600
Filter (/etc/fail2ban/filter.d/nginx-badbots.conf
):
[Definition]
failregex = ^ -.*"(GET|POST).*HTTP.*"(curl|wget|python|scrapy|bot|Go-http-client)
ignoreregex =
Once this is running, bots get banned automatically after a few hits.
5. Use Better Tools for Smarter Bots
If you're seeing more sophisticated attacks, try:
- CrowdSec: Open-source tool that shares a dynamic IP reputation list and applies bans
- ModSecurity: Full WAF, works with Nginx
- OpenResty: Extend Nginx with Lua scripting (e.g., custom captcha, behavior analysis)
If you’re open to a proxy layer:
- Cloudflare free tier: Blocks a lot of trash automatically
- Fastly Bot Protection: Advanced but paid
Bonus Serve Zip Bombs to Dumb Bots (⚠️ Handle with care)
This blog post by Idiallo shows how he turned bot detection into punishment.
The method? Serve them a compressed zip bomb.
To generate one:
dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz
This creates a ~10MB file that decompresses to 10GB of zeros.
If a bot tries to read it without knowing, it chokes.
Then detect and serve it:
if (ipIsBlackListed() || isMalicious()) {
header("Content-Encoding: deflate, gzip");
header("Content-Length: " . filesize(ZIP_BOMB_FILE_10G));
readfile(ZIP_BOMB_FILE_10G);
exit;
}
He explains that when traffic spikes, he swaps in a 1MB variant.
It’s a great deterrent for low-effort bots.
Heuristics like repeated scanning and double-visits from spam IPs helped him fine-tune this method.