Hi folks. I’m looking to start using Caddy in our environment to enable us to switch all our customers’ custom domains over to SSL, without using 1000’s of IP addresses and updating all those certificates (!)
So my plan is to have our existing Cisco load balancer offloading the majority of our heavy SSL due to the dedicated hardware it contains to help with that, and then load-balance all the other traffic to two Caddy servers. These in turn will be set up as transparent proxy servers to our Varnish cache servers, which also do all sorts of crazy logic to work out what backend servers to send traffic to.
So, to cut long story short, would I have to replicate the certificates to each node? Otherwise, the first time a node sees a domain, it will try and fetch a new cert from LetsEncrypt, right? What will LE do if they get at least 2 requests for each cert?
Can I even just replicate the files in /opt/caddy/ssl/acme/acme-v01.api.letsencrypt.org/sites/ to each node?
thanks folks, Caddy could go a long way towards solving a LOT of our problems
UPDATE: This is a very popular topic in search results, but it’s also over 3 years old. Caddy now works very well in a cluster to coordinate certificate management and share assets. Please refer to the latest offiical documentation and relevant wiki articles for information.
Hey Mark, I’ve got good news and bad news, and more good news.
Good news: Caddy’s TLS asset storage is designed to be pluggable. Meaning you can plug in a TLS storage provider that takes care of the replication and syncing between Caddy instances, especially useful if the storage is a shared resource.
Bad news: It’s not fully developed yet; you’ll have to change some of the code and compile Caddy yourself, after writing the storage plugin you want.
More good news though: In a little while, we’ll be launching a subscription product for companies who rely on Caddy to ensure their features get considered first and get their bugs fixed before others, as well as guaranteed continued development and private support.
You could do this, but replication may not be enough. Once Caddy loads a certificate, it will try to manage it, including renewing it. You don’t want each node doing that independently.
Yes. So, replication could solve that problem, but…
It will count against your rate limit. Which is bad. If you only replicate, each Caddy will attempt renewals, instead of just one of them. This is why the storage plugin is necessary: it coordinates management of the TLS assets too.
Maybe I can get the load balancer to decide the Caddy node based on hostname instead for now. That way no domain should be visible to two caddy nodes, apart from during occasional outages etc. Depends if the load balancer looks at the SNI header or not…
Perhaps I could build some sort of custom directive so I can set a caddy node to do_not_renew_certs. That would achieve a sort of master/slave cluster methodology.
Time to grab the code and start rummaging about I already want to create a custom log format to output in a format I can send to ElasticSearch.
Aren’t cert and key individual files though? looking at the docs, I guess the load option might let it point to a central store. Although it doesn’t solve the problem of all nodes trying to renew the same certs as they realise they need to be.
dir is a directory from which to load certificates and keys. The entire directory and its subfolders will be walked in search of .pem files. Each .pem file must contain the PEM-encoded certificate (chain) and key blocks, concatenated together.
I’ve been thinking about this bit, and was wondering: when Caddy starts up, does it read the certificate from storage, and then just cache when the expiry is, so it can try to auto-renew it some time before that? Does it ever reload the cert from disk to check for expiry dates?
Just looking through renew.go, if RenewDurationBefore was configurable, and there was a way to flush the in-memory certificate cache (with a USR1 signal?) then it would be possible to set one cluster node to have a RenewDurationBefore a couple of days shorter than the rest, so it will renew any pending certs it knows about, and then all nodes get a “reload” message.
Using healthcheck URLs, I could even perform a full rolling restart of all caddy nodes every night if I need to.
I should clarify that Caddy will only try to renew certificates that it is managing – BYOC (bring your own cert) is different, like other web servers, it’s all up to you then, to renew it and reload the server yourself.
To answer your question for managed certificates: Caddy will load the certificate once and then just run expiry scans every 12 hours. It doesn’t reload a certificate once it has been loaded.
A clustered system would need to be designed in a way that accounts for failure of any cluster member.
Go that way and you will arrive at a distributed webserver. If you don’t want to make synchronization and so on part of it, you’d use a database such as CoreOS’ etcd.
A distributed webserver, even just a »clustered one«, needs far more than a shared certificate storage and some kind of locking to prevent two members renewing the same certificate. For example:
synchronize any TLS state data (or design for shared-nothing)
internal routing to backends (CGI)
internal routing to assets, or a separate storage layer (NFS), or something better (hyperconverged design)
I no longer contribute to Caddy, and one reason is that I am writing such a email- and webserver. But compeptitors, for example Nginx, are moving in that direction, too, though they settled on a leader-follower (formerly called master-slave) design. Wait a few months and such webserver designs will be the new mainstream.
This gets a little trickier actually, when you consider that behind a load-balancer, only one node will get the pingback from Let’s Encrypt, depending on how it decided which Caddy server to send the request to.
It’s almost like one node needs to make the renewal request, but then all nodes are able to deal with the callback from LE.
For now, until minds smarter than mine build something clever, I may have to settle for a single node with a hot-standby. If I occasionally have to re-request some certs when the node changes, I don’t think that will be a problem.
You’re right, but by using the DNS challenge, no nodes need to handle any sort of request or handshake. It just has to be synchronized so that only one node initiates the challenge; it sets the record with the DNS provider, waits for it to be verified, then clears the record when done, all without hitting the local network.