1. Caddy version:
v2.6.3
2. How I installed, and run Caddy:
Installed via apt
using the commands from the official docs
a. System environment:
EC2 instance running Ubuntu 22.04
b. Command:
systemctl reload caddy
c. Service/unit/compose file:
Using the stock systemd service and unit file
d. My complete Caddy config:
psymetricstest.com {
@notStatic {
not {
path /staticfiles/*
}
}
handle_path /staticfiles/* {
file_server
root * /opt/app_repo/static/
}
reverse_proxy @notStatic unix//run/gunicorn.sock {
header_up Host {host}
}
log {
output file /opt/app_repo/caddy.access.log {
roll_size 1gb
roll_keep 5
roll_keep_for 720h
}
}
}
3. The problem I’m having:
I have an EC2 instance that runs a Django via gunicorn, in which Caddy sits on top of. The domain is hosted in Route53 with an A record pointing to the IP address of the instance.
For completion, here’s my gunicorn files as well:
# gunicorn.service
[Unit]
Description=gunicorn daemon
Requires=gunicorn.socket
After=network.target
[Service]
User=root
Group=root
WorkingDirectory=/opt/app_repo
Restart=always
ExecStart=/opt/app_repo/venv/bin/gunicorn \
--access-logfile /opt/app_repo/gunicorn.access.log \
--error-logfile /opt/app_repo/gunicorn.error.log \
--timeout 600 \
--workers 5 \
--bind unix:/run/gunicorn.sock \
--log-level DEBUG \
--capture-output \
app_repo.wsgi:application
[Install]
WantedBy=multi-user.target
# gunicorn.socket
[Unit]
Description=gunicorn socket
[Socket]
ListenStream=/run/gunicorn.sock
[Install]
WantedBy=sockets.target
The problem is that the site is reported as unreachable by our monitoring tool (and confirmed by some clients as well) for 5-10 minutes everyday with no apparent pattern. Whenever I SSH back onto the server, the gunicorn and caddy service are up and running (checked via systemctl status
). Checking journalctl
doesn’t yield any helpful details:
4. Error messages and/or full log output:
$ journalctl -u gunicorn --boot
Feb 14 18:27:50 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 1h 15min 13.075s CPU time.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:16:52 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 39.035s CPU time.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
$ # grepped to when the recent outage happened
$ journalctl -u caddy --boot | grep "Feb 16" | grep "error"
Feb 16 03:10:09 ip-172-31-3-73 caddy[5328]: {"level":"error","ts":1676517009.8251915,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"http2: stream closed"}
grep
-ing dmesg
for gunicorn and caddy doesn’t yield anything as well as far as I can tell.
$ dmesg | grep caddy
$ dmesg | grep gunicorn
[ 2.972213] systemd[1]: Configuration file /etc/systemd/system/gunicorn.socket is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
[ 2.984758] systemd[1]: Configuration file /etc/systemd/system/gunicorn.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
5. What I already tried:
Aside from looking at the logs on step #4, I’m also monitoring htop
to see if there’s any clue I can find. But unfortunately I’ve never had it open while the outage happens because the timeframe varies day to day.
I know Caddy is only one of the moving parts in my setup and it could very well be a non-Caddy issue, but at this point I don’t know how to confirm for sure. Any help is very appreciated!