1. My Caddy version (caddy version
):
v2.0.0-beta.17
2. How I run Caddy:
a. System environment:
Ubuntu running docker
b. Command:
docker-compose up -d
c. Service/unit/compose file:
reverse-proxy:
container_name: reverse-proxy
image: caddy/caddy:v2.0.0-beta.17
restart: unless-stopped
ports:
- "80:80"
- "443:443"
user: root
volumes:
- ./Caddyfile.prod:/etc/caddy/Caddyfile
- caddy-config:/root/.config/caddy
- caddy-data:/root/.local/share/caddy
d. My complete Caddyfile or JSON config:
{
email admin@29th.org
# acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
}
29th.org {
redir https://www.{host}{uri} permanent
}
www.29th.org {
reverse_proxy homepage:80
}
personnel.29th.org {
reverse_proxy app:8080
}
api.29th.org {
reverse_proxy api:80
}
forums.29th.org {
reverse_proxy forums:80 {
header_up X-Forwarded-Proto {http.request.scheme}
}
}
portainer.29th.org {
reverse_proxy portainer:9000
}
bitwarden.29th.org {
encode gzip
reverse_proxy /notifications/hub/negotiate bitwarden:80
reverse_proxy /notifications/hub bitwarden:3012
reverse_proxy bitwarden:80
}
3. The problem I’m having:
I run docker-compose in production. It’s worked fine for months, but twice this week the site has gone down with a “Connection refused” error. Upon investigation, it appears the caddy container is in a zombie-like state and no longer handling requests. It also appears that one of the containers being reverse proxied (app) is down. Perhaps that container went down first and reverse-proxy entered a zombie state after failing to reach it? Guessing…
4. Error messages and/or full log output:
root@dockerprod:/usr/local/src# docker-compose ps
Name Command State Ports
------------------------------------------------------------------------------------------------------------------
api docker-php-entrypoint apac ... Up 80/tcp
app docker-entrypoint.sh npm r ... Up 8080/tcp
bitwarden /bitwarden_rs Up (healthy) 3012/tcp, 80/tcp
forums docker-php-entrypoint apac ... Up 80/tcp
homepage nginx -g daemon off; Up 80/tcp
portainer /portainer --admin-passwor ... Up 9000/tcp
reverse-proxy caddy run --config /etc/ca ... Up 2019/tcp, 0.0.0.0:443->443/tcp, 0.0.0.0:80->80/tcp
(When I began, app
had a state of Exit 127
I believe. Now gone in terminal history since I’ve restarted it.)
root@dockerprod:/usr/local/src# docker-compose top reverse-proxy
Traceback (most recent call last):
File "bin/docker-compose", line 6, in <module>
File "compose/cli/main.py", line 71, in main
File "compose/cli/main.py", line 127, in perform_command
File "compose/cli/main.py", line 941, in top
TypeError: 'NoneType' object is not iterable
[1073] Failed to execute script docker-compose
root@dockerprod:/usr/local/src# docker container top reverse-proxy
UID PID PPID C STIME TTY TIME CMD
Here I am attempting to stop it. I’ve tried using docker-compose stop
as well with the same effect. kill
also has the same effect.
root@dockerprod:/usr/local/src# docker container stop reverse-proxy
reverse-proxy
root@dockerprod:/usr/local/src# docker container ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5cd5e1fd4025 29th/forums:latest "docker-php-entrypoi…" 2 days ago Up 2 days 80/tcp forums
a2f964bb67eb 29th/personnel-api:latest "docker-php-entrypoi…" 2 days ago Up 2 days 80/tcp api
c6d1d5103282 portainer/portainer "/portainer --admin-…" 2 days ago Up 2 days 9000/tcp portainer
1250c050644b bitwardenrs/server-mysql "/bitwarden_rs" 2 days ago Up 2 days (healthy) 80/tcp, 3012/tcp bitwarden
2f024f1d0daf caddy/caddy:v2.0.0-beta.17 "caddy run --config …" 2 days ago Up 2 days 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 2019/tcp reverse-proxy
49122372dda6 nginx:1.17.7 "nginx -g 'daemon of…" 2 days ago Up 2 days 80/tcp homepage
2902a6eb4170 29th/personnel-app:latest "docker-entrypoint.s…" 2 days ago Up 33 minutes 8080/tcp app
The first time this happened, I tried docker rm -f reverse-proxy
which did successfully remove it, but then when I tried to bring it back up with docker-compose up -d reverse-proxy
I got an error about port 443 already being allocated (assumedly by the zombie container process). I had to reboot the server to fix that.
root@dockerprod:/usr/local/src# docker-compose exec reverse-proxy bash
cannot exec in a stopped state: unknown
root@dockerprod:/usr/local/src# docker container update --restart=no reverse-proxy
Error response from daemon: Cannot update container 2f024f1d0dafa6473b557655a9e3685029bd53deae6f8459413b526738ec7243: cannot update a stopped container: unknown
The logs from the reverse-proxy container are from 13 hours ago, right around the time the site went down. But I always see logs like this in the container in production so I don’t see anything unusual.
reverse-proxy | 2020/03/21 01:13:34 http: TLS handshake error from 157.55.39.23:7879: no certificate available for ''
reverse-proxy | 2020/03/21 01:13:34 http: TLS handshake error from 157.55.39.23:8043: tls: client offered only unsupported versions: [302 301]
reverse-proxy | 2020/03/21 01:13:34 http: TLS handshake error from 157.55.39.23:8100: tls: client offered only unsupported versions: [301]
reverse-proxy | 2020/03/21 01:13:34 http: TLS handshake error from 157.55.39.23:8137: EOF
reverse-proxy | 2020/03/21 01:18:07 http: TLS handshake error from 184.105.247.195:34238: no certificate available for ''
reverse-proxy | 2020/03/21 01:18:46 http: TLS handshake error from 71.175.49.17:49655: EOF
reverse-proxy | 2020/03/21 01:18:46 http: TLS handshake error from 71.175.49.17:49651: EOF
reverse-proxy | 2020/03/21 01:19:42 http: TLS handshake error from 71.232.250.251:54046: EOF
reverse-proxy | 2020/03/21 01:29:55 http: TLS handshake error from 71.175.49.17:49854: EOF
reverse-proxy | 2020/03/21 01:29:55 http: TLS handshake error from 71.175.49.17:49855: EOF
reverse-proxy | 2020/03/21 01:29:55 http: TLS handshake error from 71.175.49.17:49856: EOF
reverse-proxy | 2020/03/21 01:30:20 http: TLS handshake error from 202.107.226.3:23559: no certificate available for 'www.google-analytics.com'
reverse-proxy | 2020/03/21 01:33:58 http: TLS handshake error from 185.94.219.160:52720: EOF
reverse-proxy | 2020/03/21 01:34:08 http: TLS handshake error from 186.251.10.90:43489: EOF
reverse-proxy | 2020/03/21 01:35:44 http: TLS handshake error from 135.23.214.137:50012: EOF
reverse-proxy | 2020/03/21 01:35:44 http: TLS handshake error from 135.23.214.137:50011: EOF
5. What I already tried:
I was originally using the alpine
image (before the official image used versioned tags) from a month or so ago. When this issue happened earlier this week I switched to the most recent tagged image, v2.0.0-beta.17
and the issue happened again a couple days later.
To fix it the first time, I force removed the image and then had to reboot the server because port 443 was still allocated. This time, I rebooted the server without force killing the image. docker-compose ps
then showed the reverse-proxy
container in a state of Exit 255
. I ran docker-compose restart reverse-proxy
and the site came back up.
6. Links to relevant resources:
I found an issue on moby/moby that sounds similar.