Cross-vhost namespace pollution?

1. Caddy version (caddy version):

[root@webserver caddy]# caddy version 
v2.2.0 h1:sMUFqTbVIRlmA8NkFnNt9l7s0e+0gw+7GPIrhty905A=

2. How I run Caddy:

a. System environment:

I run Caddy in a dedicated CentOS 8 LXC container running under current ProxmoxVE. I installed from the RPM repository. I manage it with the included systemd files. Sometimes I do a quick “caddy reload” with the same Caddyfile that systemd uses. Other than that I don’t play with the API and as of yet I ignore the JSON config file that is there somewhere.

b. Command:

[root@webserver caddy]# systemctl start caddy

c. Service/unit/compose file:

[root@webserver system]# pwd
/usr/lib/systemd/system
[root@webserver system]# cat caddy.service
# caddy.service
#
# For using Caddy with a config file.
#
# Make sure the ExecStart and ExecReload commands are correct
# for your installation.
#
# See https://caddyserver.com/docs/install for instructions.
#
# WARNING: This service does not use the --resume flag, so if you
# use the API to make changes, they will be overwritten by the
# Caddyfile next time the service is restarted. If you intend to
# use Caddy's API to configure it, add the --resume flag to the
# `caddy run` command or use the caddy-api.service file instead.

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target

[Service]
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target
[root@webserver system]# 
...
[root@webserver caddy.service.d]# pwd
/etc/systemd/system/caddy.service.d
[root@webserver caddy.service.d]# cat override.conf
[Unit]
After=remote-fs.target
[root@webserver caddy.service.d]# 

The override is in there to ensure that caddy is not started until after the RO NFS mount with all the content is mounted. (It appears that caddy isn’t the least bit amused by it showing up after caddy is already running.)

d. My complete Caddyfile or JSON config:

[root@webserver caddy]# cat Caddyfile
# The Caddyfile is an easy way to configure your Caddy web server.
#
# Unless the file starts with a global options block, the first
# uncommented line is always the address of your site.
#
# To use your own domain name (with automatic HTTPS), first make
# sure your domain's A/AAAA DNS records are properly pointed to
# this machine's public IP, then replace the line below with your
# domain name.

{

        #default_sni www.fotw.info
        debug

}

www.fotw.info {

        # Set this path to your site's directory.
        root * /web/www.fotw.info/data

        redir / /flags/index.html
        redir /index.html /flags/index.html

        file_server {
                index index.html
        }

        header {
                X-Frame-Options SAMEORIGIN
                X-XSS-Protection 1;mode=block
                X-Content-Type-Options nosniff
                Strict-Transport-Security max-age=31622400
                Referrer-Policy strict-origin-when-cross-origin
                Feature-Policy "geolocation 'none'; midi 'none'; microphone 'none'; camera 'none'; speaker 'none'; vibrate 'none'; payment 'none'"
                Cache-Control "public, max-age=259200"
        }

        log {
                output file /var/log/caddy/www.fotw.info.log
        }

}


download.fotw.info {

        root * /web/www.fotw.info/mirrors

        file_server browse

        header {
                X-Frame-Options SAMEORIGIN
                X-XSS-Protection 1;mode=block
                X-Content-Type-Options nosniff
                Strict-Transport-Security max-age=31622400
                Referrer-Policy strict-origin-when-cross-origin
                Feature-Policy "geolocation 'none'; midi 'none'; microphone 'none'; camera 'none'; speaker 'none'; vibrate 'none'; payment 'none'"
                Cache-Control "public, max-age=259200"
        }

        log {
                output file /var/log/caddy/download.fotw.info.log
        }

}

fotw.info {

        redir https://www.fotw.info{uri} permanent

        header {
                Strict-Transport-Security max-age=31622400
                Cache-Control "public, max-age=259200"
        }

        log {
                output file /var/log/caddy/fotw.info.log
        }

}

cbfa.vexillology.info {

        # Set this path to your site's directory.
        root * /web/cbfa.vexillology.info/data

        file_server {
                index index.html
        }

        header {
                X-Frame-Options SAMEORIGIN
                X-XSS-Protection 1;mode=block
                X-Content-Type-Options nosniff
                Strict-Transport-Security max-age=31622400
                Referrer-Policy strict-origin-when-cross-origin
                Feature-Policy "geolocation 'none'; midi 'none'; microphone 'none'; camera 'none'; speaker 'none'; vibrate 'none'; payment 'none'"
                Cache-Control "public, max-age=259200"
        }

        log {
                output file /var/log/caddy/cbfa.vexillology.info.log
        }

}




# Refer to the Caddy docs for more information:
# https://caddyserver.com/docs/caddyfile
[root@webserver caddy]# 

3. The problem I’m having:

I’m running multiple virtual hosts sharing a single ipv4 address and, mostly, each with its own ipv6 address. I’ve been migrating sites from a venerable apache install one at a time, as every indication is that caddy is going to be less irritating for a bunch of relatively low-volume, completely static websites.

There is one piece of disturbing behavior that I see in the logs which is new since the migration to caddy: A number of bots are sending their usual streams of GETs, but I’m seeing URIs that exist in www.fotw.info show up in the logs for download.fotw.info and cbfa.vexillology.info, which get 404 responses given that they don’t, and never have to the best of my knowledge, existed in the latter two sites. For example:

{"level":"error","ts":1604183986.0887687,"logger":"http.log.access.log3","msg":"handled request","request":{"remote_addr":"114.119.139.198:30866","proto":"HTTP/1.1","method":"GET","host":"cbfa.vexillology.info","uri":"/flags/co-bolci.html","headers":{"User-Agent":["Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://aspiegel.com/petalbot)"],"Accept-Language":["en,zh;q=0.1"],"Accept-Encoding":["gzip,deflate"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"],"Connection":["Close"]},"tls":{"resumed":false,"version":771,"cipher_suite":49196,"proto":"","proto_mutual":true,"server_name":"cbfa.vexillology.info"}},"common_log":"114.119.139.198 - - [31/Oct/2020:22:39:46 +0000] \"GET /flags/co-bolci.html HTTP/1.1\" 404 0","duration":0.000087677,"size":0,"status":404,"resp_headers":{"Referrer-Policy":["strict-origin-when-cross-origin"],"Strict-Transport-Security":["max-age=31622400"],"X-Content-Type-Options":["nosniff"],"X-Frame-Options":["SAMEORIGIN"],"X-Xss-Protection":["1;mode=block"],"Cache-Control":["public, max-age=259200"],"Server":["Caddy"],"Feature-Policy":["geolocation 'none'; midi 'none'; microphone 'none'; camera 'none'; speaker 'none'; vibrate 'none'; payment 'none'"]}}

shows a URI, /flags/co-bolci.html , which definitely exists in www.fotw.info, but doesn’t in cbfa.vexillology.info.

I see no evidence of this in the logs of the previous apache server, so I don’t think all the bots are re-spidering bad information.

I can’t replicate this from a browser or from monitoring system that sends GETs and validates some text on the page.

This leads me to the following questions:

  • Am I botching something in the Caddyfile?
  • Are a variety of bots known to do something odd with SNI & Hosts: that I could work around?
  • Might this be a bug?
  • Suggestions on how to look at the actual request, given that they’re all encrypted on the wire?

Many thanks.

4. Error messages and/or full log output:

5. What I already tried:

One thing that I was curious about was whether there might be something going wrong with HTTP connections being kept open and then abused in some fashion. However a search across all logs of the remote_addr shows only one request ever received from one of the seen ip/port combinations:

[root@webserver caddy]# grep 114.119.139.198:30866 *log
cbfa.vexillology.info.log:{"level":"error","ts":1604183986.0887687,"logger":"http.log.access.log3","msg":"handled request","request":{"remote_addr":"114.119.139.198:30866","proto":"HTTP/1.1","method":"GET","host":"cbfa.vexillology.info","uri":"/flags/co-bolci.html","headers":{"User-Agent":["Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://aspiegel.com/petalbot)"],"Accept-Language":["en,zh;q=0.1"],"Accept-Encoding":["gzip,deflate"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"],"Connection":["Close"]},"tls":{"resumed":false,"version":771,"cipher_suite":49196,"proto":"","proto_mutual":true,"server_name":"cbfa.vexillology.info"}},"common_log":"114.119.139.198 - - [31/Oct/2020:22:39:46 +0000] \"GET /flags/co-bolci.html HTTP/1.1\" 404 0","duration":0.000087677,"size":0,"status":404,"resp_headers":{"Referrer-Policy":["strict-origin-when-cross-origin"],"Strict-Transport-Security":["max-age=31622400"],"X-Content-Type-Options":["nosniff"],"X-Frame-Options":["SAMEORIGIN"],"X-Xss-Protection":["1;mode=block"],"Cache-Control":["public, max-age=259200"],"Server":["Caddy"],"Feature-Policy":["geolocation 'none'; midi 'none'; microphone 'none'; camera 'none'; speaker 'none'; vibrate 'none'; payment 'none'"]}}
[root@webserver caddy]# 

6. Links to relevant resources:

Looks like you’re getting hit by Petalbot, a subsidiary of Huawei. Interesting.

I don’t see anything wrong with what you’re doing, it must just be misbehaved crawlers. You should be able to configure a robots.txt to tell them to kindly f-off :smile: See their docs: Aspiegel

It is common though that bots just hit your IP without SNI because they’re just looking all over the place. I see you tried playing with default_sni, which is one way to set a default when that happens, but frankly you can probably just ignore connections without SNI because they’re typically from very old devices/clients, or bots.

I may indeed resort to robots.txt across all the sites for the most confused, but there are some, such as Yandex and Bing, that I’d just as soon be indexed by. I’m most confused by how widespread it appears to be; if were just one wacky bot, I’d just write it off as being odd failure to follow standards and/or default norms.

But there are an awful lot of failures involving Yandex, this from the log for cbfa.vexillology.info:

{"level":"error","ts":1604188534.8034434,"logger":"http.log.access.log3","msg":"handled request","request":{"remote_addr":"77.88.5.17:54826","proto":"HTTP/1.1","method":"GET","host":"cbfa.vexillology.info","uri":"/flags/co-calro.html","headers":{"Accept":["*/*"],"Connection":["Keep-Alive"],"User-Agent":["Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"],"From":["support@search.yandex.ru"],"Accept-Encoding":["gzip,deflate"]},"tls":{"resumed":false,"version":772,"cipher_suite":4867,"proto":"","proto_mutual":true,"server_name":"cbfa.vexillology.info"}},"common_log":"77.88.5.17 - - [31/Oct/2020:23:55:34 +0000] \"GET /flags/co-calro.html HTTP/1.1\" 404 0","duration":0.0000998,"size":0,"status":404,"resp_headers":{"X-Frame-Options":["SAMEORIGIN"],"X-Xss-Protection":["1;mode=block"],"Server":["Caddy"],"Cache-Control":["public, max-age=259200"],"Feature-Policy":["geolocation 'none'; midi 'none'; microphone 'none'; camera 'none'; speaker 'none'; vibrate 'none'; payment 'none'"],"Referrer-Policy":["strict-origin-when-cross-origin"],"Strict-Transport-Security":["max-age=31622400"],"X-Content-Type-Options":["nosniff"]}}

at the same time that other requests from the same IP address work fine, as found in the log for www.fotw.info:

www.fotw.info.log:{"level":"info","ts":1604188744.8055608,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"77.88.5.17:48576","proto":"HTTP/1.1","method":"GET","host":"www.fotw.info","uri":"/flags/ao.html","headers":{"Connection":["Keep-Alive"],"User-Agent":["Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"],"From":["support@search.yandex.ru"],"Accept-Encoding":["gzip,deflate"],"Accept":["*/*"]},"tls":{"resumed":false,"version":772,"cipher_suite":4867,"proto":"","proto_mutual":true,"server_name":"www.fotw.info"}},"common_log":"77.88.5.17 - - [31/Oct/2020:23:59:04 +0000] \"GET /flags/ao.html HTTP/1.1\" 200 16666","duration":0.00124108,"size":16666,"status":200,"resp_headers":{"Server":["Caddy"],"X-Xss-Protection":["1;mode=block"],"Feature-Policy":["geolocation 'none'; midi 'none'; microphone 'none'; camera 'none'; speaker 'none'; vibrate 'none'; payment 'none'"],"Etag":["\"qe3h2qcuy\""],"X-Content-Type-Options":["nosniff"],"Content-Type":["text/html; charset=utf-8"],"Last-Modified":["Sun, 26 Jul 2020 21:01:38 GMT"],"Accept-Ranges":["bytes"],"X-Frame-Options":["SAMEORIGIN"],"Cache-Control":["public, max-age=259200"],"Referrer-Policy":["strict-origin-when-cross-origin"],"Strict-Transport-Security":["max-age=31622400"],"Content-Length":["16666"]}}

Many thanks for the confirmation that my config file looks “normal”.

Two more data points that don’t leave me any closer to enlightenment:

  • I downloaded a new version of the caddy executable, as caddy_linux_amd64 from the download page and dropped it into /usr/bin on the server, in place of the file installed by the RPM package. This bumped me up from 2.2.0 to 2.2.1.
[root@webserver caddy]# caddy version
v2.2.1 h1:Q62GWHMtztnvyRU+KPOpw6fNfeCD3SkwH7SfT1Tgt2c=

The problem persists.

  • I signed up for an account with Dr. Link Check, https://www.drlinkcheck.com/, to see what a spidering bot sees on its side. My logs show a constant stream of URIs belonging to www.fotw.info in the log for cbfa.vexillology.info, with 404 responses. Dr. Link Check does not report receiving any such 404 responses.

This topic was automatically closed after 30 days. New replies are no longer allowed.