Core Issue
I’ve only had this issue on one of my development servers, but until I understand it it’s making me really paranoid for any production server running Caddy.
I’m using systemd to run Caddy as a service. Randomly the Caddy service will simply stop working. Sometimes it goes a week or two before it goes down, sometimes it goes down within a day of me restarting it.
The service logs are never helpful. Here’s the output from “service caddy status”:
● caddy.service - Caddy HTTP/2 web server
Loaded: loaded (/etc/systemd/system/caddy.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Mon 2017-04-24 03:36:13 EDT; 5h 47min ago
Docs: Welcome — Caddy Documentation
Process: 17870 ExecStart=/usr/local/bin/caddy -log stdout -agree=true -conf=/etc/caddy/Caddyfile -root=/var/tmp (code=killed, signal=PIPE)
Main PID: 17870 (code=killed, signal=PIPE)
Apr 24 00:36:13 dev-server caddy[17870]: 2017/04/24 00:36:13 [INFO] Done checking OCSP staples
Apr 24 00:44:59 dev-server caddy[17870]: 2017/04/24 00:44:59 http: TLS handshake error from 71.6.202.196:60000: EOF
Apr 24 00:49:26 dev-server caddy[17870]: 2017/04/24 00:49:26 [INFO] www.pj85896.com - No such site at :80 (Remote: 23.247.72.83, Referer: )
Apr 24 01:36:13 dev-server caddy[17870]: 2017/04/24 01:36:13 [INFO] Scanning for stale OCSP staples
Apr 24 01:36:13 dev-server caddy[17870]: 2017/04/24 01:36:13 [INFO] Done checking OCSP staples
Apr 24 01:55:41 dev-server caddy[17870]: 2017/04/24 01:55:41 [INFO] - No such site at :80 (Remote: 222.144.186.189, Referer: )
Apr 24 02:36:13 dev-server caddy[17870]: 2017/04/24 02:36:13 [INFO] Scanning for expiring certificates
Apr 24 02:36:13 dev-server caddy[17870]: 2017/04/24 02:36:13 [INFO] Done checking certificates
Apr 24 02:36:13 dev-server caddy[17870]: 2017/04/24 02:36:13 [INFO] Scanning for stale OCSP staples
Apr 24 02:36:13 dev-server caddy[17870]: 2017/04/24 02:36:13 [INFO] Done checking OCSP staples
Note that the “www.pj85896.com” domain is not one I own nor is it hosted on this server. I simply see tons of similar requests throughout the day where people attempt to access non-existent websites on my server. I figure it’s either a script trying to fuzz the web server or this IP address once belonged to those domains and some automated task still hasn’t been updated.
The only clue I have as to what’s going on is that it always crashes one hour after the last “Done checking OCSP staples”, which suggests it’s crashing when trying to perform the “Scanning for stale OCSP staples” step.
Additional Info
Here is my Caddyfile:
import vhosts/*
And my vhosts/* contains six files for different development subdomains – all things like “client1.clients.example.com” and “client2.clients.example.com”
Each one of these vhost/client1.clients.example.com files has the same layout:
client1.clients.example.com {
header / Strict-Transport-Security "max-age=1814400; includeSubDomains; preload"
header /css/ Cache-Control "max-age=2592000"
header /js/ Cache-Control "max-age=2592000"
header /img/ Cache-Control "max-age=2592000"
root /mnt/dev-code/client1/public
fastcgi / /var/run/php/php7.0-fpm.sock php
log /var/log/caddy/access/client1.log
log /var/log/caddy/errors/client2.log
gzip
rewrite / {
to {uri} {uri}/ /index.php?{query}
}
}
The access and error logs give no hints around the time of the server crashing, although humorously they are flooded with requests for /wp-admin.php
and similar
Running journalctl -u caddy
also doesn’t give any useful information – the last few lines are always identical to the last few lines in service caddy status
, which itself didn’t tell me much
My dev server has 4 gigabytes of RAM and even when I’m actively using the web server and downloading multiple files, it almost never exceeds 500 megabytes being used – most of which is used by MySQL