Hello! I had the same problem as someone else trying to use Caddy behind a (Hetzner) load balancer, and wanted to share my solution.
The problem is that TCP load balancers can take a few seconds to spot that a service has been (re)started, and won’t forward traffic straight away - in my case it seemed to take 30-60s.
So if you start Caddy behind such a load balancer, with the LB forwarding ports 80 & 443, it might be that Caddy immediately starts trying to provision a certificate, and fails because the Let’s Encrypt servers can’t call back. If you’re unlucky, Caddy fails 3 times, trips the rate limit and you can’t try again for an hour.
As part of my app deployment, I was already creating a Caddy package based off library/caddy, so I made a couple of changes to make the startup reliable.
Firstly, I put this script into the new package, to stage the startup:
#!/bin/sh # start-caddy-after-port-80-connect.sh # if [ ! -z "$CADDY_TEST_URL" ] ; then echo "Listening on port 80 and testing $CADDY_TEST_URL" ( printf "HTTP/1.1 200 OK\n\nPort 80 is reachable" | nc -l -p 80 -q 0 ) & while ! curl -s "$CADDY_TEST_URL" ; do sleep 1 done # race condition here fi echo "Starting caddy" exec caddy run --config /etc/caddy/Caddyfile --adapter caddyfile
Then my Dockerfile has this as a separate build step:
FROM library/caddy AS caddy COPY start-caddy-after-port-80-connect.sh / RUN chmod +x /start-caddy-after-port-80-connect.sh RUN /sbin/apk add netcat-openbsd curl COPY Caddyfile /etc/caddy/Caddyfile # this is just the static content from my app COPY --from=app /srv/app/public /usr/share/caddy/ CMD ["/start-caddy-after-port-80-connect.sh"]
So when I start caddy as part of my stack, the script listens on port 80 and probes the URL specified in $CADDY_TEST_URL - which is the external load balancer URL. This fails a few times until the external load balancer starts routing traffic. Then it starts Caddy, which can now provision certificates safely.
This has a theoretical race in it, but one that’s very unlikely to be tripped given the load balancer behaviour. Of course it’s a grubby shim, but I’m well into a Docker workflow so it barely registers
Is there a better way of doing this? I could only see DNS auth, but I don’t use a supported DNS provider.
I don’t know whether Caddy could help out a bit more - e.g. it’d be lovely if it could self-test connectivity before reaching out to Let’s Encrypt. But even some control over timing & retries would be enough to make it reliable.
I couldn’t really follow the v2 docs in the way that I could for v1, so I wasn’t sure if I might be missing something - so any improvements / comments would be appreciated.