Not everyone is comfortable enabling the DNS challenge, so I am currently looking into the possibilities to work around the “challenge doesn’t arrive at the right server” issue.
The main idea would be to have one caddy server be the “master”, who generates the certs and writes/updates these in a shared location (currently a persistent volume).
The other dynamically spun up caddy servers could be seen as slaves. Same config, but only reading the certs from storage.
Now that’s where I got some roadblocks.
The concept in my opinion would be to filter out the let’s encrypt challenge request via a user agent check or something similar. Would that be possible? Furthermore, if that works, the current method would be to rewrite the request to a specific path, which then gets proxied to the master server. This leads to the question, if there is a conditional clause within the proxy plugin in v0.9 to prevent the need of an ugly rewrite and additional server hops?
DNS might be at a different provider, be locked down so access from a webserver is not allowed, use another method to push DNS records such as going through a git repo first (for auditing, rollback etc.).
Ah perfect. So basically one just has to proxy /.well-known/acme-challenge/* to the master server.
Awesome. That solves most of the issues with running caddy with auto TLS on kubernetes in HA mode.
Fair points. Although the DNS records that are created are extremely ephemeral, lasting for a minute at most, and are simply TXT records that don’t affect other services, so I would be surprised if that was a barrier.
Anyway, keep in mind that Caddy treats the HTTP challenge path specially in order to guarantee that it can serve the challenge. It can’t proxy ACME HTTP challenges. It possibly could with some minor internal changes, although there might be side-effects there that are not desired.
Not perfect for HA setups so. Would be nice to overwrite that behaviour with a proxy rule. That would be the most logical solution I think. Another option would be to have a masterURL field or something inside the TLS config, but that seems like duplicating the proxy logic there.
Thanks for considering so. It’s definitely a nice to have for such a setup. (would be used for the new owncloud/nextcloud demo then)
I agree. Would there be any reasonable way to support something like master/slave without going down the middleware chain? Or does it sound reasonable to add a command line flag to push this path down the middleware chain? It’s a feature quite useful for the ingress plugin and in general using caddy for TLS termination and LB.
Would love to help out, but will start with another plugin first, when the time permits.
Probably best to know what makes sense before taking the time to write a whole proposal.
Native way for master/slave without DNS challenge would need to proxy the challenge request to the master server and be able to read certs from storage (already supported by specifying a directory - should not generate certs, when specified right?)
→ Might be harder and not as straight forward.
Middleware way for master/slave could use a command line flag to push the handling of .well-known down the middleware chain.
→ Enables custom plugins to handle challenges.
→ No special handling for this infrastructure case, just middleware plugins.
Yeah the general TLS configuration is done in the caddyfile, but adding an exception to auto TLS via the caddyfile makes it easier for people to shoot themselves in the foot. Therefore the idea was to make the non default behavior a command line flag to make it more of an advanced config.
I think a command line flag is as easy to break things as a Caddyfile directive, FWIW. I’m not convinced a command line flag is the right way to approach this; TLS config is scoped to each individual site (where the Go std lib allows it) and I don’t think I like the idea of bringing TLS configuration out into a command line flag.
I don’t believe master/slave w/ a proxy to the challenge path is a good way at all. Firstly, that’s not really HA unless you do leader election which can be difficult to implement natively, e.g. raft/paxos, as opposed to using something like Zookeeper, Consul, etcd, etc.
I think the way that has been discussed, shared storage for certs and challenge details, is the way to have truly HA. Whether you round robin your DNS or you do end-to-end encryption with a load balancer or haproxy-type setup in the middle, the point is the ACME protocol and let’s encrypt specifically say there is no way to choose which server requests will go to for the HTTP challenge. Therefore, all servers in a truly HA setup need to be able to respond to any request from the CA and need to be able to auto renew at will. There needs to be no concept of a “master” because that’s not HA when the master can die (again, unless you have complicated leader election built in or use an external service).
It’s only master/slave for the certificates, which is a service, that could handle a failing master, which gets either reelected/restarted or something else. Renewal has enough overlapping time to not be failure sensitive.
If challenge data could be shared, that might be another interesting solution. The thought for using the proxy was ease of use and implementation (until the default behavior prevented the ease of use).
Yes, in fact. In the code, Caddy supports pluggable TLS asset storage providers thanks to some work by contributors. I’m kind of refining it, trying to get it just right, we’ll eventually make it a first-class plugin that you can select on the download page. (First up is a kubernetes implementation which is already mostly done.)