Trying to ban bots

1. The problem I’m having:

From time to time, my site (a bibliography database with 15,000+ entries) gets overwhelmed by bot requests and not responding any more, in spite of blocking all through a robots.txt.
I found a recipe for Apache (at the French site Docs Evolix - Gestion des bots ):

    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} "FooBot" [NC]
    RewriteRule ^ - [F,L]

but I can’t ‘translate’ this into a Caddyfile entry.
Apologies if this sounds trivial …

2. Error messages and/or full log output:

3. Caddy version:

v2.11.1 h1:C7sQpsFOC5CH+31KqJc7EoOf8mXrOEkFyYd6GpIqm/s=

4. How I installed and ran Caddy:

Installation through Debian package manager

a. System environment:

Debian GNU/Linux 13.3

b. Command:

c. Service/unit/compose file:

d. My complete Caddy config:

(common) {
	header {
		Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
		X-Xss-Protection "1; mode=block"
		X-Content-Type-Options "nosniff"
		X-Frame-Options "DENY"
		Content-Security-Policy "upgrade-insecure-requests"
		Referrer-Policy "strict-origin-when-cross-origin"
		Cache-Control "public, max-age=15, must-revalidate"
		Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
	}
	encode gzip
}

www.bobc.uni-bonn.de {
	root * /var/www/wikindx
	root /adminer* /usr/share/adminer
	file_server
	import common
	log {
		output file /var/log/caddy/access.log
	}
	php_fastcgi unix//run/php/php8.4-fpm.sock
}

www.bobc.uni-bonn.de:8088 {
	reverse_proxy localhost:3000
}

5. Links to relevant resources:

1 Like

That would just be header matchers rules on the User-Agent header, and applying that matcher to an abort or error handler.

2 Likes

While this has been answered already, I’d like to expand on the existing answer. The following block of code might help to get rid of some bots…

example.com {
    root /path/to/root

    @forbidden_bots {
        header User-Agent GPTBot
        header User-Agent ClaudeBot
        header User-Agent AnyOtherBot
    }
    respond @forbidden_bots "Forbidden bots" 403

    // other config
}

There are other examples at Request matchers (Caddyfile) — Caddy Documentation

I’d personally recommend rate_limit for such bots.

1 Like

A quite longer list is at GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.. There is a Caddyfile there defining a named matcher @aibots which can be referred to. The README outlines which user agents are included.

I also recommend using abort. This way, Caddy doesn’t bother waiting for handshakes to complete to actually answer any such client, it just drops the connection.

1 Like

Technically the TLS handshake still happens (HTTP handling runs after TLS is done), but no HTTP response gets written when you use abort. So yeah, still more efficient.

3 Likes