Trying to ban bots

1. The problem I’m having:

From time to time, my site (a bibliography database with 15,000+ entries) gets overwhelmed by bot requests and not responding any more, in spite of blocking all through a robots.txt.
I found a recipe for Apache (at the French site Docs Evolix - Gestion des bots ):

    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} "FooBot" [NC]
    RewriteRule ^ - [F,L]

but I can’t ‘translate’ this into a Caddyfile entry.
Apologies if this sounds trivial …

2. Error messages and/or full log output:

3. Caddy version:

v2.11.1 h1:C7sQpsFOC5CH+31KqJc7EoOf8mXrOEkFyYd6GpIqm/s=

4. How I installed and ran Caddy:

Installation through Debian package manager

a. System environment:

Debian GNU/Linux 13.3

b. Command:

c. Service/unit/compose file:

d. My complete Caddy config:

(common) {
	header {
		Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
		X-Xss-Protection "1; mode=block"
		X-Content-Type-Options "nosniff"
		X-Frame-Options "DENY"
		Content-Security-Policy "upgrade-insecure-requests"
		Referrer-Policy "strict-origin-when-cross-origin"
		Cache-Control "public, max-age=15, must-revalidate"
		Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
	}
	encode gzip
}

www.bobc.uni-bonn.de {
	root * /var/www/wikindx
	root /adminer* /usr/share/adminer
	file_server
	import common
	log {
		output file /var/log/caddy/access.log
	}
	php_fastcgi unix//run/php/php8.4-fpm.sock
}

www.bobc.uni-bonn.de:8088 {
	reverse_proxy localhost:3000
}

5. Links to relevant resources:

That would just be header matchers rules on the User-Agent header, and applying that matcher to an abort or error handler.

While this has been answered already, I’d like to expand on the existing answer. The following block of code might help to get rid of some bots…

example.com {
    root /path/to/root

    @forbidden_bots {
        header User-Agent GPTBot
        header User-Agent ClaudeBot
        header User-Agent AnyOtherBot
    }
    respond @forbidden_bots "Forbidden bots" 403

    // other config
}

There are other examples at Request matchers (Caddyfile) — Caddy Documentation

I’d personally recommend rate_limit for such bots.

A quite longer list is at GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.. There is a Caddyfile there defining a named matcher @aibots which can be referred to. The README outlines which user agents are included.

I also recommend using abort. This way, Caddy doesn’t bother waiting for handshakes to complete to actually answer any such client, it just drops the connection.

Technically the TLS handshake still happens (HTTP handling runs after TLS is done), but no HTTP response gets written when you use abort. So yeah, still more efficient.

Thank you all for your helpful answers. I’ve used this (with the list @techjedialex mentioned), but tried to replace respond with abort. I’m not sure I got the abort syntax right, the simple line

abort @forbidden_bots

doesn’t seem to work, I’m seeing bots in the list in my log files anyway.

Please post the complete Caddyfile as of now.

Here’s a working example of a snippet

(blocking) {
    @aiuseragents header_regexp User-Agent “AddSearchBot|AI2Bot|AI2Bot-DeepResearchEval|Ai2Bot-Dolma|aiHitBot|amazon-kendra|Amazonbot|AmazonBuyForMe|Amzn-SearchBot|Amzn-User|Andibot|Anomura|anthropic-ai|ApifyBot|ApifyWebsiteContentCrawler|Applebot|Applebot-Extended|Aranet-SearchBot|atlassian-bot|Awario|AzureAI-SearchBot|bedrockbot|bigsur.ai|Bravebot|Brightbot\ 1.0|BuddyBot|Bytespider|CCBot|Channel3Bot|ChatGLM-Spider|ChatGPT\ Agent|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|Cloudflare-AutoRAG|CloudVertexBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawl4AI|Crawlspace|Datenbank\ Crawler|DeepSeekBot|Devin|Diffbot|DuckAssistBot|Echobot\ Bot|EchoboxBot|ExaBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Gemini-Deep-Research|Google-Agent|Google-CloudVertexBot|Google-Extended|Google-Firebase|Google-NotebookLM|GoogleAgent-Mariner|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iAskBot|iaskspider|iaskspider/2.0|IbouBot|ICC-Crawler|ImagesiftBot|imageSpider|img2dataset|ISSCyberRiskCrawler|kagi-fetcher|Kangaroo\ Bot|KlaviyoAIBot|KunatoCrawler|laion-huggingface-processor|LAIONDownloader|LCC|LinerBot|Linguee\ Bot|LinkupBot|Manus-User|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|meta-webindexer|MistralAI-User|MistralAI-User/1.0|MyCentralAIScraperBot|netEstate\ Imprint\ Crawler|NotebookLM|NovaAct|OAI-SearchBot|omgili|omgilibot|OpenAI|Operator|PanguBot|Panscient|panscient.com|Perplexity-User|PerplexityBot|PetalBot|PhindBot|Poggio-Citations|Poseidon\ Research\ Crawler|QualifiedBot|QuillBot|quillbot.com|SBIntuitionsBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|ShapBot|Sidetrade\ indexer\ bot|Spider|TavilyBot|TerraCotta|Thinkbot|TikTokSpider|Timpibot|TwinAgent|VelenPublicWebCrawler|WARDBot|Webzio-Extended|webzio-extended|wpbot|WRTNBot|YaK|YandexAdditional|YandexAdditionalBot|YouBot|ZanistaBot”
    handle @aiuseragents {
        abort
    }
}
import blocking

For your purpose, you can also consider blocking ASNs with plugins like caddy-defender or geo-blocking with caddy-maxmind-geolocation.

Another interesting approach is to block bots with JA4 fingerprinting. This approach is under-discussed. I can only find one plugin matt-/caddy-ja4 using this method.