Running docker exec caddy caddy reload --force --config /caddy/Caddyfile is the correcy way to reload external certs afaik. However this does a) drop existing websocket connections and b) seems to cause socket read errors for some normal http requests (tested with wrk).
Is there a better way to hot reload external certs?
Websocket connections must be reconnected so that they use the new config. If the connections were not closed on reload, Caddy would leak memory by having references to the old copy of the config.
Make sure your client-side websockets code has a reconnection loop to recover gracefully from being disconnected. That’s important in general, not just for handling config reloads, because network conditions on the client-side can change at any time.
Small thing, you can simplify your docker command by mounting your Caddyfile at /etc/caddy/Caddyfile, which is the location that the container’s default config loads the config from. Then you don’t need to specify caddy run --config. See Keep Caddy Running — Caddy Documentation for an example using docker-compose (can be adapted to straight docker commands if you prefer – although IMO using compose is almost always better)
About the websockets I understand but it would be nice to have a more graceful way to reload just the certs, old connections should be able to stay on old certs for at least awhile. Perhaps with a configurable timeout.
And with the regular requests getting errors it seems like caddy wasn’t built with external certs in mind however caddy is such a nice piece of software it would be nice to get some love for external certs.
How often are you reloading that you need to worry about this? If you’re just reloading because certs changed, that should be extremely infrequent (every ~60 days maybe?) so I think it’s totally fine to have websocket connections reconnect at that point.
If you enable the debug global option and watch the logs, can you use grep to find anything relating to those errors? Try something like docker logs <container-id> -f | grep 'error'
8 errors in 2.7 million requests seems like totally a totally acceptable amount.
Certs will normally reload every 60 days yes, however it’s not that websocket connections are expected to be alive for 60 days, rather that one is created just before the inevitable reload is about to happen. The apps I run are stateful so aborting a websocket connection will impact user experience, even with reconnecting.
That sounds like buggy code, then. A reconnect should be graceful on the client side.
Your connection wrapper should handle queuing up messages to send while not connected, and remember any “subscribed channels” (or whatever, making assumptions here) to re-subscribe to on reconnect.
So my reading from this is that reloading cancels in-flight requests.