Hello! I had the same problem as someone else trying to use Caddy behind a (Hetzner) load balancer, and wanted to share my solution.
The problem is that TCP load balancers can take a few seconds to spot that a service has been (re)started, and won’t forward traffic straight away - in my case it seemed to take 30-60s.
So if you start Caddy behind such a load balancer, with the LB forwarding ports 80 & 443, it might be that Caddy immediately starts trying to provision a certificate, and fails because the Let’s Encrypt servers can’t call back. If you’re unlucky, Caddy fails 3 times, trips the rate limit and you can’t try again for an hour.
As part of my app deployment, I was already creating a Caddy package based off library/caddy, so I made a couple of changes to make the startup reliable.
Firstly, I put this script into the new package, to stage the startup:
#!/bin/sh
# start-caddy-after-port-80-connect.sh
#
if [ ! -z "$CADDY_TEST_URL" ] ; then
echo "Listening on port 80 and testing $CADDY_TEST_URL"
( printf "HTTP/1.1 200 OK\n\nPort 80 is reachable" | nc -l -p 80 -q 0 ) &
while ! curl -s "$CADDY_TEST_URL" ; do
sleep 1
done
# race condition here
fi
echo "Starting caddy"
exec caddy run --config /etc/caddy/Caddyfile --adapter caddyfile
Then my Dockerfile has this as a separate build step:
FROM library/caddy AS caddy
COPY start-caddy-after-port-80-connect.sh /
RUN chmod +x /start-caddy-after-port-80-connect.sh
RUN /sbin/apk add netcat-openbsd curl
COPY Caddyfile /etc/caddy/Caddyfile
# this is just the static content from my app
COPY --from=app /srv/app/public /usr/share/caddy/
CMD ["/start-caddy-after-port-80-connect.sh"]
So when I start caddy as part of my stack, the script listens on port 80 and probes the URL specified in $CADDY_TEST_URL - which is the external load balancer URL. This fails a few times until the external load balancer starts routing traffic. Then it starts Caddy, which can now provision certificates safely.
This has a theoretical race in it, but one that’s very unlikely to be tripped given the load balancer behaviour. Of course it’s a grubby shim, but I’m well into a Docker workflow so it barely registers
Is there a better way of doing this? I could only see DNS auth, but I don’t use a supported DNS provider.
I don’t know whether Caddy could help out a bit more - e.g. it’d be lovely if it could self-test connectivity before reaching out to Let’s Encrypt. But even some control over timing & retries would be enough to make it reliable.
I couldn’t really follow the v2 docs in the way that I could for v1, so I wasn’t sure if I might be missing something - so any improvements / comments would be appreciated.