Any SSL Solution for Multi-Server Setup Behind Same Domain

Per the docs on automatic SSL:

Load balancing

If Caddy is behind a load balancer, it may have trouble validating the domain and obtaining certificates. It is your responsibility, then, to ensure SSL certificates are obtained and properly set up on all machines.

It doesn’t appear much effort has been put in to “federating” the certificate or making this easier for users. Many sites, for HA reasons or otherwise, may have something as simple as round robin DNS w/ multiple A records for a single name/zone pointing to multiple servers.

What is the current state-of-the-art solution for this today? Grab $HOME/.caddy/acme/<ca hostname>/sites/<site hostname> and manually copy it? Is there a plugin out there?

If not, would there be any interest in a vault plugin that stores the cert there for sharing across servers? Does the plugin API allow you to change where/how the cert is stored? What about the fact that the ACME protocol can randomly choose any value for the A record? This means I would need to keep the challenge information in a shared space too, correct?

Ok, quick perusal of the code shows that storage would have to be abstracted. I figure a storage option that would default to disk if not provided, and a plugin handler e.g. caddy.RegisterStorageType for registering new storage types. Would a PR of this form be accepted or am I going down the wrong path?

Hi @cretz, thanks for your feedback.

That’ll work. And I don’t know of a plugin that does this already, it’s very specific to your environment. You’d probably have to write a new plugin. I have plans to write one, but haven’t gotten around to it yet.

No. But it’s something that I’d be willing to do, given a thoroughly considered and detailed proposal!

That shouldn’t be a problem for the DNS challenge, which Caddy 0.9 supports. The DNS challenge should work fine when Caddy is behind a load balancer; that’s the main benefit of it.

I like this thinking so far; can you develop it a little more into something more concrete? Would be happy to consider it.

Sure, I’ll just start jotting down points here stream of consciousness style (I personally would love to not have to write it myself :slight_smile: but I don’t mind)

Basically in the Caddyfile you’d have your normal TLS conf for obvious backwards compatibility requirements. But, if a cert, key, and load dir are not set, then storage could be set (the mutual exclusivity of the presence of these values would be validated) like so:

tls {
    storage vault {
        host 1.2.3.4
        user someuser
        pass somepass
    }
}

This custom directive information for storage is passed the same way as a new server type at GitHub - caddyserver/caddy: Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS. You would RegisterTLSStorageType in the same way you currently RegisterServerType. Granted if you feel this is a more global thing, then the directive can be moved out of the tls section and up to top level.

In the code I would get rid of https://github.com/mholt/caddy/blob/master/caddytls/storage.go and make a new dir (i.e. package) called storage that would have storage.go in it which would be an interface. The interface would contain all of the reading/writing operations needed. Right now it just returns filenames which is not abstract enough. So I would have it do things like PersistUser(user *caddytls.User) error and remove the specific user.go save user. This goes for everything in caddytls which uses storage but does its file IO right there inside the code next to the rest of it. Of course storage/file.go would have the default implementation that only does file IO that is done today. The factory of sorts (StorageFor) would accept the storage type and defer based on the set of registered types (defaulting to file as today).

This way, you have solved the following goals:

  • 100% backwards compatibility with no upgrade pains or breaking changes
  • Support for custom storage which is important for those that might want to share state or whatever
  • The custom storage plugins are not baked into the codebase allowing for more extensibility

The possible reasons for rejection:

  • Fear of such a change simply due to lack of confidence in being able to test and confirm the backwards compat
  • Against the principals of the project which is no shared state or custom storage
  • Not worth the time

In my use case, and I’m sure I’m not unique, automatic renewal, registration, etc should not be restricted to single-server setups. I suppose I could leverage some NFS share, but it feels limiting.

1 Like

Thanks for the proposal! I appreciate it since I wasn’t planning on working on this soon, and it will help speed up the process and even make it possible for others to discuss and implement if that’s what we choose to do.

The notion of storage right now is caddytls-specific, but it doesn’t have to be that way. As far as the config goes, it’s pretty good, but we do want to avoid nesting. In fact, the parser doesn’t really support nesting inside directives. So perhaps the idea of “storage” should move up to the top level like you suggested. I’m not sure how that would work yet… but your next paragraph is one possibility. I do want to avoid one large interface and instead use multiple small interfaces as needed. Maybe even just two interfaces for Store() and Retrieve() or Load()? We’ll have to look at this.

:+1: Yay extensibility.

I’m not worried about backwards compatibility quite yet; not until 1.0. As long as we document the breaking changes.

I’m working on a detached, general-purpose certificate manager on the side, and eventually it will have a Caddy plugin which frankly would need to use this storage adapter, so I do think this change is good.

@cretz What if we abstract the file system instead? Most of the uses of storage are to read and write files, so what if we let the user plug in a key-value store? That way we only need ~two methods in the interface (Get and Set).

Actually this is idealized a bit too much; some operations involve scanning the storage to find files with certain properties (e.g. most recently modified). It might be tricky no matter how we do it.

Works for me. Even if you have to add a third method for “FindWhatever” or “ScanWhatever” it would still be worth it. Those of us without a supported DNS challenge provider but want multi server capability with automated renewal almost have to have shared storage. I personally don’t have a ton of immediate time to devote to this problem atm, but will at some point.

1 Like

I’m going to transfer this to an issue then, since it’s a specific dev-related task. Thanks!

Edit: link to issue, #906

Not everyone is comfortable enabling the DNS challenge, so I am currently looking into the possibilities to work around the “challenge doesn’t arrive at the right server” issue.

The main idea would be to have one caddy server be the “master”, who generates the certs and writes/updates these in a shared location (currently a persistent volume).
The other dynamically spun up caddy servers could be seen as slaves. Same config, but only reading the certs from storage.

Now that’s where I got some roadblocks.

The concept in my opinion would be to filter out the let’s encrypt challenge request via a user agent check or something similar. Would that be possible? Furthermore, if that works, the current method would be to rewrite the request to a specific path, which then gets proxied to the master server. This leads to the question, if there is a conditional clause within the proxy plugin in v0.9 to prevent the need of an ugly rewrite and additional server hops?

client → master/slave:80 → redirect → :443
client → master/slave:443 → serve
LE → slave:80 → proxy → master:80
LE → master:80 → generate cert

Why’s that?

For the HTTP challenge? Maybe, but might as well use the path; challenge requests are to /.well-known/acme-challenge/*

After that I got a little lost.

1 Like

DNS might be at a different provider, be locked down so access from a webserver is not allowed, use another method to push DNS records such as going through a git repo first (for auditing, rollback etc.).

Ah perfect. So basically one just has to proxy /.well-known/acme-challenge/* to the master server.

Awesome. That solves most of the issues with running caddy with auto TLS on kubernetes in HA mode.

Fair points. Although the DNS records that are created are extremely ephemeral, lasting for a minute at most, and are simply TXT records that don’t affect other services, so I would be surprised if that was a barrier.

Anyway, keep in mind that Caddy treats the HTTP challenge path specially in order to guarantee that it can serve the challenge. It can’t proxy ACME HTTP challenges. It possibly could with some minor internal changes, although there might be side-effects there that are not desired.

The DNS records might be ephemeral, but it still needs authentication/access to the provider. Most don’t provide as fine grained ACLs as needed for this.

That’s a bummer. Thought it might work out of the box, when I proxy the challenge path. Even in v0.9 it would be handled differently and not proxy this path?

That’s correct. It is nice because it guarantees, hassle-free, that the HTTP challenge can be served when it hits the socket. With some careful review we can look at revising it in the future though.

Not perfect for HA setups so. Would be nice to overwrite that behaviour with a proxy rule. That would be the most logical solution I think. Another option would be to have a masterURL field or something inside the TLS config, but that seems like duplicating the proxy logic there.

Thanks for considering so. It’s definitely a nice to have for such a setup. (would be used for the new owncloud/nextcloud demo then)

It’s just scary to have the HTTP challenge request go down the middleware chain. People could shoot themselves in the foot way more easily then.

I agree. Would there be any reasonable way to support something like master/slave without going down the middleware chain? Or does it sound reasonable to add a command line flag to push this path down the middleware chain? It’s a feature quite useful for the ingress plugin and in general using caddy for TLS termination and LB.

Would love to help out, but will start with another plugin first, when the time permits.

Perhaps, with a more detailed proposal! I don’t really know what that entails right now.

Probably best to know what makes sense before taking the time to write a whole proposal.

Native way for master/slave without DNS challenge would need to proxy the challenge request to the master server and be able to read certs from storage (already supported by specifying a directory - should not generate certs, when specified right?)
→ Might be harder and not as straight forward.

Middleware way for master/slave could use a command line flag to push the handling of .well-known down the middleware chain.
→ Enables custom plugins to handle challenges.
→ No special handling for this infrastructure case, just middleware plugins.

Is there a particular reason you want to use a command line flag? TLS is generally configured in the Caddyfile.