Hot reload external cert

1. Output of caddy version:

v2.6.2 h1:wKoFIxpmOJLGl3QXoo6PNbYvGW4xLEgo32GPBEjWL8o=

2. How I run Caddy:

a. System environment:

Ubuntu 22.04 / Docker 20.10.21

b. Command:

docker run -d --net host --name caddy -v $(pwd):/caddy caddy:2 caddy run --config /caddy/Caddyfile

d. My complete Caddy config:

{
	auto_https disable_redirects
}

:443 {
	tls /caddy/cert.pem /caddy/cert.pem
	reverse_proxy unix//caddy/caddy.sock
}

3. The problem I’m having:

Running docker exec caddy caddy reload --force --config /caddy/Caddyfile is the correcy way to reload external certs afaik. However this does a) drop existing websocket connections and b) seems to cause socket read errors for some normal http requests (tested with wrk).

Is there a better way to hot reload external certs?

Nope, that’s the best way to do it.

Websocket connections must be reconnected so that they use the new config. If the connections were not closed on reload, Caddy would leak memory by having references to the old copy of the config.

Make sure your client-side websockets code has a reconnection loop to recover gracefully from being disconnected. That’s important in general, not just for handling config reloads, because network conditions on the client-side can change at any time.

Small thing, you can simplify your docker command by mounting your Caddyfile at /etc/caddy/Caddyfile, which is the location that the container’s default config loads the config from. Then you don’t need to specify caddy run --config. See Keep Caddy Running — Caddy Documentation for an example using docker-compose (can be adapted to straight docker commands if you prefer – although IMO using compose is almost always better)

1 Like

Thanks for the reply,

About the websockets I understand but it would be nice to have a more graceful way to reload just the certs, old connections should be able to stay on old certs for at least awhile. Perhaps with a configurable timeout.

And with the regular requests getting errors it seems like caddy wasn’t built with external certs in mind however caddy is such a nice piece of software it would be nice to get some love for external certs.

I’d like to know more about this. Config changes should be graceful. How can we minimally reproduce the errors you’re seeing? (without docker)

Okay so I’ve reproduced it using bare caddy ./caddy_linux_amd64 run

Simplified config

{
        auto_https disable_redirects
}

:4443 {
        tls cert.pem cert.pem
        respond "Hello"
}

./wrk -t2 -c10 -d30s https://localhost:4443 (GitHub - wg/wrk: Modern HTTP benchmarking tool)

Output of wrk

Running 30s test @ https://localhost:4443
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   186.24us  507.78us  17.91ms   95.82%
    Req/Sec    48.00k    10.16k   71.56k    73.00%
  2865687 requests in 30.01s, 461.87MB read
Requests/sec:  95500.02
Transfer/sec:     15.39MB

Output of wrk when running ./caddy_linux_amd64 reload --force once during the 30s test.

Running 30s test @ https://localhost:4443
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   183.79us  476.33us  12.13ms   95.75%
    Req/Sec    46.80k     9.14k   73.48k    70.50%
  2794342 requests in 30.00s, 450.37MB read
  Socket errors: connect 0, read 8, write 0, timeout 0
Requests/sec:  93137.17
Transfer/sec:     15.01MB

As you can see there are 8 read errors.

1 Like

How often are you reloading that you need to worry about this? If you’re just reloading because certs changed, that should be extremely infrequent (every ~60 days maybe?) so I think it’s totally fine to have websocket connections reconnect at that point.

If you enable the debug global option and watch the logs, can you use grep to find anything relating to those errors? Try something like docker logs <container-id> -f | grep 'error'

8 errors in 2.7 million requests seems like totally a totally acceptable amount.

Certs will normally reload every 60 days yes, however it’s not that websocket connections are expected to be alive for 60 days, rather that one is created just before the inevitable reload is about to happen. The apps I run are stateful so aborting a websocket connection will impact user experience, even with reconnecting.

Tried it, nothing out of the ordinary.

It’s not about the successful requests, if we instead run a forked version of wrk (GitHub - giltene/wrk2: A constant throughput, correct latency recording variant of wrk) which allows specifying a rate we can see these results:

./wrk -t1 -c1 -d10s -R1 https://localhost:4443

Running 10s test @ https://localhost:4443
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.81ms  478.07us   2.70ms   70.00%
    Req/Sec       -nan      -nan   0.00      0.00%
  11 requests in 10.00s, 1.82KB read
  Socket errors: connect 0, read 1, write 0, timeout 0
Requests/sec:      1.10
Transfer/sec:     185.86B

That sounds like buggy code, then. A reconnect should be graceful on the client side.

Your connection wrapper should handle queuing up messages to send while not connected, and remember any “subscribed channels” (or whatever, making assumptions here) to re-subscribe to on reconnect.

So my reading from this is that reloading cancels in-flight requests.

You might want to try playing with the grace_timeout global option which might make this more seamless for you Global options (Caddyfile) — Caddy Documentation

It’s not up to me, they are 3rd party.

The default grace_period seems to be infinite, as can be seen in the logs;

DEBUG http servers shutting down with eternal grace period

For completeness I also tried setting an explicit grace_period of 20s without improvements.

DEBUG http servers shutting down; grace period initiated {"duration": 20}

This topic was automatically closed after 30 days. New replies are no longer allowed.