About multiple hosts sharing the same TLS certificates storage

My situation is similar to this post: Storing certificates in a NFS mount

We’re running a swarm mode cluster, and Caddy will be the frontend as a reverse proxy, which also handle the TLS certificates stuff (LE).

To minimize the down time, we want to run multiple instances of Caddy, such as docker service create --replicas 3 ... , and docker swarm mode ingress mesh-routing will handle the load balance for us.

As all the instance are served for the same domain(s), we don’t want them to request the certificates for the same domain separately, so we plan to use a NFS-like storage to share the certificates across the hosts in the swarm cluster, so those Caddy reverse proxies can share the same certificates.

Will this setup work for renew the certificates?

With this setup, will the all three instance send the renew request at the same time? Is that possible after one of the instance fetch the new certificate, then others found the certificate is new and not necessary to send a renew request again?

I checked the source code here:
https://github.com/mholt/caddy/blob/e49474a4f555d2b8ebfac504fb9a2d3bad08730e/caddytls/maintain.go#L50

and here:

https://github.com/mholt/caddy/blob/e49474a4f555d2b8ebfac504fb9a2d3bad08730e/caddytls/maintain.go#L23

It seems the Caddy will check the certificate for renew for every 12 hour, is it mean that if those 3 Caddy instance start at the same time(almost), they will all send the renew request at the same time, as no one finished the renew procedure yet?

Is that possible to add a randomness in the interval? so, all the 3 caddy instances will start the checking at different time, even if they start at the same time, so there will be enough time for one finished the renew procedure and, later, the rest instances will not request for renew certificates?

Technically doable, but as you note, there’s a decent chance of a race to renew the same certificate. At the moment there’s no way to configure Caddy with a random cert renewal interval. You could stagger the startup of your replicas to help reduce the likelihood.

The best solution would be a TLS asset storage provider. From another thread regarding clustering Caddy:

Because we had the same issue I wrote a plugin for Caddy to use Consul’s KV (wich itself is a clustered too) as TLS storage backend. Please have a look here: GitHub - pteich/caddy-tlsconsul: 🔒 Consul K/V storage for Caddy Web Server / Certmagic TLS data

I also maintain a Docker image for a recent Caddy build with included Consul TLS plugin here:
https://hub.docker.com/r/pteich/caddy-tlsconsul/

We use this configuration and Docker image in production for many months now without any problem.

4 Likes

Cool – I hope we can make TLS storage plugins more 1st class, the main blocker is just taking the time to finalize the design of the Storage interface. Keep an eye out in case it changes (sorry if that’s the case). But then you’ll be able to add your plugin to the Caddy website just like every other one.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.