1. The problem I’m having:
I’m trying to stop a couple of aggressive crawlers using the below config. While testing it locally, however, it didn’t prove to do its job (just returns 200):
➜ curl -vH 'User-Agent: gAhrefsBotg' https://test.localhost/
* Trying [::1]:443...
* Connected to test.localhost (::1) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/ssl/cert.pem
* CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN: server accepted h2
* Server certificate:
* subject: [NONE]
* start date: Jul 20 12:32:21 2024 GMT
* expire date: Jul 21 00:32:21 2024 GMT
* subjectAltName: host "test.localhost" matched cert's "test.localhost"
* issuer: CN=Caddy ECC Intermediate
* SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://test.localhost/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: test.localhost]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [user-agent: AhrefsBot]
> GET / HTTP/2
> Host: test.localhost
> Accept: */*
> User-Agent: AhrefsBot
>
< HTTP/2 200
< alt-svc: h3=":443"; ma=2592000
< server: Caddy
< content-length: 0
< date: Sat, 20 Jul 2024 14:40:01 GMT
<
* Connection #0 to host test.localhost left intact
2. Error messages and/or full log output:
Access log produced by cURL’s request:
{
"level": "info",
"ts": 1721486672.1151009,
"logger": "http.log.access.log0",
"msg": "NOP",
"request": {
"remote_ip": "::1",
"remote_port": "54028",
"client_ip": "::1",
"proto": "HTTP/1.1",
"method": "GET",
"host": "test.localhost",
"uri": "/",
"headers": {
"User-Agent": [
"gAhrefsBotg"
],
"Connection": [
"close"
]
},
"tls": {
"resumed": false,
"version": 772,
"cipher_suite": 4865,
"proto": "",
"server_name": "test.localhost"
}
},
"bytes_read": 0,
"user_id": "",
"duration": 0.000026041,
"size": 0,
"status": 0,
"resp_headers": {
"Server": [
"Caddy"
],
"Alt-Svc": [
"h3=\":443\"; ma=2592000"
]
}
}
3. Caddy version:
v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=
4. How I installed and ran Caddy:
a. System environment:
macOS 14
b. Command:
sudo caddy start
in /etc/caddy
c. Service/unit/compose file:
n/a
d. My complete Caddy config:
test.localhost {
log {
output file /var/log/caddy/access.log {
roll_size 50mb
roll_keep 20
roll_keep_for 720h
}
}
@badbots {
header 'User-Agent' *AhrefsBot*
header 'User-Agent' *Amazonbot*
header 'User-Agent' *Bytespider*
}
respond @badbots 'Nope' 403
reverse_proxy localhost:8000
}
5. Links to relevant resources:
I was inspired by this comment but it didn’t quite work out.