Trying to ban bots

jotteerr · February 27, 2026, 1:02pm

1. The problem I’m having:

From time to time, my site (a bibliography database with 15,000+ entries) gets overwhelmed by bot requests and not responding any more, in spite of blocking all through a robots.txt.
I found a recipe for Apache (at the French site Docs Evolix - Gestion des bots ):

    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} "FooBot" [NC]
    RewriteRule ^ - [F,L]

but I can’t ‘translate’ this into a Caddyfile entry.
Apologies if this sounds trivial …

2. Error messages and/or full log output:

—

3. Caddy version:

v2.11.1 h1:C7sQpsFOC5CH+31KqJc7EoOf8mXrOEkFyYd6GpIqm/s=

4. How I installed and ran Caddy:

Installation through Debian package manager

a. System environment:

Debian GNU/Linux 13.3

b. Command:

c. Service/unit/compose file:

d. My complete Caddy config:

(common) {
	header {
		Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
		X-Xss-Protection "1; mode=block"
		X-Content-Type-Options "nosniff"
		X-Frame-Options "DENY"
		Content-Security-Policy "upgrade-insecure-requests"
		Referrer-Policy "strict-origin-when-cross-origin"
		Cache-Control "public, max-age=15, must-revalidate"
		Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
	}
	encode gzip
}

www.bobc.uni-bonn.de {
	root * /var/www/wikindx
	root /adminer* /usr/share/adminer
	file_server
	import common
	log {
		output file /var/log/caddy/access.log
	}
	php_fastcgi unix//run/php/php8.4-fpm.sock
}

www.bobc.uni-bonn.de:8088 {
	reverse_proxy localhost:3000
}

5. Links to relevant resources:

—

francislavoie · February 27, 2026, 10:22pm

That would just be header matchers rules on the User-Agent header, and applying that matcher to an abort or error handler.

pothi · February 28, 2026, 5:36am

While this has been answered already, I’d like to expand on the existing answer. The following block of code might help to get rid of some bots…

example.com {
    root /path/to/root

    @forbidden_bots {
        header User-Agent GPTBot
        header User-Agent ClaudeBot
        header User-Agent AnyOtherBot
    }
    respond @forbidden_bots "Forbidden bots" 403

    // other config
}

There are other examples at Request matchers (Caddyfile) — Caddy Documentation

I’d personally recommend rate_limit for such bots.

techjedialex · March 1, 2026, 8:49pm

A quite longer list is at GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.. There is a Caddyfile there defining a named matcher @aibots which can be referred to. The README outlines which user agents are included.

I also recommend using abort. This way, Caddy doesn’t bother waiting for handshakes to complete to actually answer any such client, it just drops the connection.

francislavoie · March 2, 2026, 1:18am

Technically the TLS handshake still happens (HTTP handling runs after TLS is done), but no HTTP response gets written when you use abort. So yeah, still more efficient.

jotteerr · March 28, 2026, 11:46am

Thank you all for your helpful answers. I’ve used this (with the list @techjedialex mentioned), but tried to replace respond with abort. I’m not sure I got the abort syntax right, the simple line

abort @forbidden_bots

doesn’t seem to work, I’m seeing bots in the list in my log files anyway.

techjedialex · March 29, 2026, 12:39pm

Please post the complete Caddyfile as of now.

kasibwauniqazobifu · March 30, 2026, 4:15am

Here’s a working example of a snippet

(blocking) {
    @aiuseragents header_regexp User-Agent “AddSearchBot|AI2Bot|AI2Bot-DeepResearchEval|Ai2Bot-Dolma|aiHitBot|amazon-kendra|Amazonbot|AmazonBuyForMe|Amzn-SearchBot|Amzn-User|Andibot|Anomura|anthropic-ai|ApifyBot|ApifyWebsiteContentCrawler|Applebot|Applebot-Extended|Aranet-SearchBot|atlassian-bot|Awario|AzureAI-SearchBot|bedrockbot|bigsur.ai|Bravebot|Brightbot\ 1.0|BuddyBot|Bytespider|CCBot|Channel3Bot|ChatGLM-Spider|ChatGPT\ Agent|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|Cloudflare-AutoRAG|CloudVertexBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawl4AI|Crawlspace|Datenbank\ Crawler|DeepSeekBot|Devin|Diffbot|DuckAssistBot|Echobot\ Bot|EchoboxBot|ExaBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Gemini-Deep-Research|Google-Agent|Google-CloudVertexBot|Google-Extended|Google-Firebase|Google-NotebookLM|GoogleAgent-Mariner|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iAskBot|iaskspider|iaskspider/2.0|IbouBot|ICC-Crawler|ImagesiftBot|imageSpider|img2dataset|ISSCyberRiskCrawler|kagi-fetcher|Kangaroo\ Bot|KlaviyoAIBot|KunatoCrawler|laion-huggingface-processor|LAIONDownloader|LCC|LinerBot|Linguee\ Bot|LinkupBot|Manus-User|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|meta-webindexer|MistralAI-User|MistralAI-User/1.0|MyCentralAIScraperBot|netEstate\ Imprint\ Crawler|NotebookLM|NovaAct|OAI-SearchBot|omgili|omgilibot|OpenAI|Operator|PanguBot|Panscient|panscient.com|Perplexity-User|PerplexityBot|PetalBot|PhindBot|Poggio-Citations|Poseidon\ Research\ Crawler|QualifiedBot|QuillBot|quillbot.com|SBIntuitionsBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|ShapBot|Sidetrade\ indexer\ bot|Spider|TavilyBot|TerraCotta|Thinkbot|TikTokSpider|Timpibot|TwinAgent|VelenPublicWebCrawler|WARDBot|Webzio-Extended|webzio-extended|wpbot|WRTNBot|YaK|YandexAdditional|YandexAdditionalBot|YouBot|ZanistaBot”
    handle @aiuseragents {
        abort
    }
}

import blocking

kasibwauniqazobifu · March 30, 2026, 4:22am

For your purpose, you can also consider blocking ASNs with plugins like caddy-defender or geo-blocking with caddy-maxmind-geolocation.

Another interesting approach is to block bots with JA4 fingerprinting. This approach is under-discussed. I can only find one plugin matt-/caddy-ja4 using this method.