Route malicious bot requests to separate log (or suppress entirely)

High-level goals

  1. Thwart automated bots
  2. Maintain clean access logs by keeping malicious/irrelevant requests separate

What I have tried

Caddyfile contents:

site.example.com {
    root /web/site.example.com
    internal /forbidden
    gzip
    log /log/site-access.log
    errors {
        log /log/site-error.log
        404 not-found.html
    }
    rewrite {
        r ^/(?:blog/)?(?:wp|wordpress|xmlrpc\.php|wp-admin|wp-login\.php)
        to /forbidden
    }
}

Desired result is to choose, on a per-site basis, whether to:

  1. log these malicious requests to a separate log file (e.g., for further analysis or action via Fail2Ban)
  2. suppress these requests from appearing in any log files

Observed result

Nothing appears in site-error.log. Malicious requests appear in site-access.log as:

82.74.17.2 - [05/Jan/2017:08:00:24 -0800] "GET /forbidden HTTP/1.1" 404 14
5.37.159.63 - [05/Jan/2017:08:03:32 -0800] "GET /forbidden HTTP/1.1" 404 14
190.92.106.171 - [05/Jan/2017:08:05:21 -0800] "GET /forbidden HTTP/1.1" 404 14
111.91.110.224 - [05/Jan/2017:08:08:10 -0800] "GET /forbidden HTTP/1.1" 404 14
95.90.204.57 - [05/Jan/2017:08:08:12 -0800] "GET /forbidden HTTP/1.1" 404 14

Anyone have any suggestions that would get me closer to either (preferably both) of the desired result options I mentioned above?

Other notes

Equivalent configuration (at least for wholesale suppression option) in Nginx:

location ~* ^/(?:blog/)?(?:wp|wordpress|xmlrpc\.php|wp-admin|wp-login\.php) {
            access_log off;
            log_not_found off;
            return 404;
}

Temporary workaround for manual inspection of “legitimate” requests with 404 responses:

tail -n 100 /log/access.log | grep -v forbidden | grep 404

Relevant links

The logging middleware could probably be extended to customize what kinds of requests are logged. It’s just not implemented now.

Thanks for the reply, Matt. I think that would be helpful. For what it’s worth, I also tried adding the following, which doesn’t seem to have any effect (everything is still logged to access.log):

log /forbidden /log/forbidden.log

As a matter of fact, I can’t seem to get anything logged to /log/error.log either. Any idea why all 404 errors are being logged in access.log instead of the specified error.log?

4xx responses indicate client errors. The server received and responded to the request as expected (although with 404, for example, it obviously did not find what the client requested, so it appears the client made a mistaken request).

5xx responses indicate server errors and the actual generated errors should be logged in the error logs.

edit: I’m incorrect and >=400 should be logged as indicated (according to the errors docs).

Does /log/site-error.log exist and is it writable by Caddy?

It does indeed exist and should have appropriate permissions:

drwxrwxr-x 4.0K Jan  5 19:45 ./
drwx------ 4.0K Jan  5 19:40 ../
-rw-rw-r-- 4.4K Jan  5 19:51 site-access.log
-rw-rw-r--    0 Jan  5 19:45 site-error.log
-rw-rw-r--    0 Jan  5 19:45 site-forbidden.log

I’m rather perplexed by this behavior. Any suggestions?

And the ownership of site-error.log is identical to site-access.log?

I don’t think it should be a cause of issue, but what about ownership of ./?

Maybe try with a simplified Caddyfile - put up a test site at localhost with errors /log/site-error-test.log, make some bad requests to it, and see what it puts out.

I tried this out myself… Caddy version 0.9.4 on Ubuntu 14.04

whitestrake at endymion in ~
→ printf ":2015\nerrors error.log\nlog access.log" | caddy -conf stdin &
[1] 3033 3034
Activating privacy features... done.
http://:2015

whitestrake at endymion in ~
→ l
total 4.0K
drwxrwxr-x 3 whitestrake whitestrake 4.0K Jul 25 00:56 ~
-rw-r--r-- 1 whitestrake whitestrake    0 Jan  6 04:10 access.log
-rw-r--r-- 1 whitestrake whitestrake    0 Jan  6 04:10 error.log

whitestrake at endymion in ~
→ curl localhost:2015/badrequest
404 Not Found

whitestrake at endymion in ~
→ l
total 8.0K
drwxrwxr-x 3 whitestrake whitestrake 4.0K Jul 25 00:56 ~
-rw-r--r-- 1 whitestrake whitestrake   70 Jan  6 04:12 access.log
-rw-r--r-- 1 whitestrake whitestrake    0 Jan  6 04:12 error.log

Seems like Caddy is not logging 404s to error.log. However it did log a 502:

whitestrake at endymion in ~
→ printf ":2015\nerrors error.log\nlog access.log\nproxy / badproxy:9999" | caddy -conf stdin &
[1] 3204 3205
Activating privacy features... done.
http://:2015

whitestrake at endymion in ~
→ curl localhost:2015/stuff
502 Bad Gateway

whitestrake at endymion in ~
→ l
total 12K
drwxrwxr-x 3 whitestrake whitestrake 4.0K Jul 25 00:56 ~
-rw-r--r-- 1 whitestrake whitestrake  140 Jan  6 04:20 access.log
-rw-r--r-- 1 whitestrake whitestrake  103 Jan  6 04:20 error.log

whitestrake at endymion in ~
→ cat error.log
06/Jan/2017:04:20:02 +0000 [ERROR 502 /stuff] dial tcp: lookup badproxy on 172.31.0.2:53: no such host

The error log won’t log 404s because those are bad requests, not errors in the server. If you look in the access log, the 404 will (should!) be there.

That’s exactly what I thought further up this thread, but the docs specify >=400. Shall we update the docs to specify >=500 instead?

Incidentally, the access log is functioning as expected and includes the 4xx responses.

Oops. I can see why that’s confusing.

It’s actually more complicated than just status codes. Each HTTP handler (middleware) returns two values: a status code integer and an error value: httpserver package - github.com/mholt/caddy/caddyhttp/httpserver - pkg.go.dev - if an error value is returned the errors middleware logs it. It doesn’t actually care what the status code is. Generally, middleware should NOT return an error value unless it is a 5xx error, but I suppose in some cases there are 4xx errors where a server operator might want an error message in the log (other than just “Not found” or something like that – something more useful that doesn’t get shown publicly).

The log middleware (access log) just logs requests, but it actually shouldn’t care what the response code is. It should just log it anyway. Does the access log not have those in there?

1 Like

Don’t know about @justin, but I’m not seeing any issues with the access logs at all, it’s working fine. The errors log just doesn’t seem to mesh up with how it’s described in the errors directive documentation on caddyserver.com.

I’ll remember to update and clarify those docs with the new site then.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.