Blocking certain user agent headers

1. The problem I’m having:

I’m trying to stop a couple of aggressive crawlers using the below config. While testing it locally, however, it didn’t prove to do its job (just returns 200):

➜ curl -vH 'User-Agent: gAhrefsBotg' https://test.localhost/
*   Trying [::1]:443...
* Connected to test.localhost (::1) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: [NONE]
*  start date: Jul 20 12:32:21 2024 GMT
*  expire date: Jul 21 00:32:21 2024 GMT
*  subjectAltName: host "test.localhost" matched cert's "test.localhost"
*  issuer: CN=Caddy ECC Intermediate
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://test.localhost/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: test.localhost]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [user-agent: AhrefsBot]
> GET / HTTP/2
> Host: test.localhost
> Accept: */*
> User-Agent: AhrefsBot
> 
< HTTP/2 200 
< alt-svc: h3=":443"; ma=2592000
< server: Caddy
< content-length: 0
< date: Sat, 20 Jul 2024 14:40:01 GMT
< 
* Connection #0 to host test.localhost left intact

2. Error messages and/or full log output:

Access log produced by cURL’s request:

{
  "level": "info",
  "ts": 1721486672.1151009,
  "logger": "http.log.access.log0",
  "msg": "NOP",
  "request": {
    "remote_ip": "::1",
    "remote_port": "54028",
    "client_ip": "::1",
    "proto": "HTTP/1.1",
    "method": "GET",
    "host": "test.localhost",
    "uri": "/",
    "headers": {
      "User-Agent": [
        "gAhrefsBotg"
      ],
      "Connection": [
        "close"
      ]
    },
    "tls": {
      "resumed": false,
      "version": 772,
      "cipher_suite": 4865,
      "proto": "",
      "server_name": "test.localhost"
    }
  },
  "bytes_read": 0,
  "user_id": "",
  "duration": 0.000026041,
  "size": 0,
  "status": 0,
  "resp_headers": {
    "Server": [
      "Caddy"
    ],
    "Alt-Svc": [
      "h3=\":443\"; ma=2592000"
    ]
  }
}

3. Caddy version:

v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

4. How I installed and ran Caddy:

a. System environment:

macOS 14

b. Command:

sudo caddy start in /etc/caddy

c. Service/unit/compose file:

n/a

d. My complete Caddy config:

test.localhost {
        log {
                output file /var/log/caddy/access.log {
                        roll_size 50mb
                        roll_keep 20
                        roll_keep_for 720h
                }
        }

        @badbots {
                header 'User-Agent' *AhrefsBot*
                header 'User-Agent' *Amazonbot*
                header 'User-Agent' *Bytespider*
        }
        respond @badbots 'Nope' 403

        reverse_proxy localhost:8000
}

5. Links to relevant resources:

I was inspired by this comment but it didn’t quite work out.

' single quote are not strings in Caddy. See Caddyfile Concepts — Caddy Documentation. In your case you don’t need quotes at all, just do header User-Agent *AhrefsBot* for example.

Use abort instead of respond, it’ll drop the connection without writing a response.

1 Like

Oops, thanks! Will try it without quotes.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.