Is there a way to emit certificates without blocking initialization?

lbguilherme · April 21, 2017, 9:39am

For my stack I have Docker Swarm configured to start Caddy as a service. So every time I change the Caddyfile, this is what happens:

CI pipeline produces a new Docker image with the changed Caddyfile.
CI starts a service update procedure on the swarm.
One of the 3 running caddy containers is stopped and removed from loadbalancing.
A new caddy container is started with the new image. This new container is then watched with a healthcheck for a configurable amount of time. If it does not respond within this deadline, the update process is considered a failure (this image must have a bug) and everything is rolled back.
When the first container start listening, a second one will be stopped and a new one will be created also waiting for the healthcheck. This repeats until all caddy containers are using the new image.

This works create for zero-downtime deployments, and allow me to update caddy, change plugins, etc, without issues. The problem is emitting TLS certificates:

If I add too many new domains in a single update (say… 5 domains), the process of obtaining certificates will take some time and this time may cause the update process to fail (note: previously obtained certificates are preserved between updates). This means I have to set a long time (1 minute?) before the healthcheck is really allowed to fail. If I use such a long time, the update process will be slow (5 minutes?) to complete, and if some day I need to add 10 domains in one go, this will still fail. Not good.

My first attempt of a solution for this was to use OnDemandTLS for all hosts, which mean certificate will be obtained lazily. This is okish since I can visit the site manually myself (actually, the uptime monitoring tool will visit the site frequently and force it to always have a ready certificate). The problem is that when I do that, caddy seems to not take into account that most of the certs were already obtained and are in the .caddy directory. It does not load certs from there and obtained a new cert even for a domain we had for quite a while. Not good.

The dream solution would be to have the “Activating privacy features…” be a non-blocking phase. If a vhost can start serving, let it serve as soon as possible. Ofc the vhost without cert will have to wait (or maybe be served with a incorrect cert), but that’s ok. Is it possible in the current design?

Thanks!

matt · April 21, 2017, 4:59pm

Using max_certs should work for what you need.

Hmm, this was not my experience. Were the certs older than 60 days, perchance?

lbguilherme · April 21, 2017, 5:36pm

Hmm, looks like I did something wrong on my test before. I did it again and it just worked well! I think this solves the problem for me.

A question: when is a ondemand certificate renewed? Does it block a handshake too or is it done in the background just like a normal cert?

Thank you!

matt · April 21, 2017, 8:57pm

Great!

OnDemand-managed certificates are issued and maintained during handshakes, “on demand” only. At least, I think that’s the case – I’d have to go back into the source to double check (been a while since I’ve been down that deep). But I think they are managed during handshakes. (Edit: That may not be true. It may be renewed while it’s in memory.)