Fault Tolerance, clustering, docker 2

Let me start off, I love caddy, it’s so simple and gets all my sites up and running in no time.

Occasionally, the host I’m running caddy on dies, stops and I push to a new host and carry on…but it’s not automatic. And while I could run multiple instances of caddy, or make it participate in a swarm, it still leaves one part I can’t solve…how the external request to port 443 (for example) is port forwarded to my caddy instance.

Manually I handle this by having some internal server names all pointing to the ip where my caddy instance exists:


So any request to internal is pointed at my caddy instance, which has that certificate, and redirects a sub page.

With external I have a port 443 port forward to the machine.

But where I’m stuck, if if I lose - and have to move to a new machine.

Ideally I’d like caddy to be a single address - always, and regardless of the host hosting caddy, it presents the same ip or address. Even in swarm mode with multiple managers, you still have to point external references at any of the managers hosts, which doesn’t give me a lot of benefit.

I see people talking about load balancers, but surely that has the same problem as the swarm managers, in that you now have multiple end points that you have to manage.

Any ideas on how I might achieve an single address for any number of caddy instances (ok I only really need 2), so that if one is down, the other is used? And have one address in my dns and port forwarding set up?

This is literally what a load balancer is/does. A single point through which you can access an arbitrary number of upstream hosts, usually depending on availability / health. Caddy is quite capable of load balancing and health checking, so you could use Caddy as a load balancer to two Caddy web servers. Caddy also has a Docker plugin that acts as a Caddyfile loader, so you can configure it to dynamically proxy to Docker containers (e.g. based on container metadata).

You’re asking fundamental reliability / availability questions, and ultimately this comes down to how you design your cluster. If you want a single IP, always, then you need some highly reliable hardware on that IP that won’t go down - it’s that simple.

If you want an arbitrary number of Caddy servers available through a single IP, you need a load balancer to handle that. And since it’s at a single IP, it needs to be highly reliable hardware that won’t go down. Alternately, use DNS / service discovery so that when your server changes IP, your network can still find it.

Yeah I figured at some point I have a single thing that is able to keep track of the multiples.

I think I preferred a dns type option on my router, using dnsmasq with the IPs of the swarm nodes on the basis if that went down, I’ve got bigger problems to worry about.

Whereas anything I run caddy on, a pi, a vm could potentially be affected but some down time.

If you can configure your router to dynamically update its DNS entries based on the presence of Docker containers, that’d be an option. That would be, effectively, service discovery (via DNS).

Ultimately, though, there’s always going to be critical points in the path - especially when you’ve got a complex system like multi tenanted containers.

The trick is to keep those critical points as simple and reliable as possible. A pi running nothing but Caddy is pretty damn reliable, for example.

I’m curious as to why you don’t simply have swarm run Caddy, clustered, in one container per node. If they’re all pulling Caddyfile information from the Docker Caddyfile loader, they’ll all be dynamically updated with the correct container info, and any one of them can be used, so you can just add A records for every node to your DNS zone and be done with it.

I guess cos technically there’s no real point, since for the port forwarding I can’t actually use the name as i have to use a single ip address.

So regardless of a single instance or a swarm ,I still have to choose one instance to point the port forwarding up. TBH I’ll just sttick with the instance on a pi…I might have a play with glusterfs or something to replicate the cert store…since it doesn’t yet look like the storage tls stuff is prime time.

Caddy’s net server type can proxy arbitrary traffic - effectively “port forwarding” - to a DNS host, e.g. proxy :443 ingress.example.net where the ingress subdomain could return IPs of your swarm nodes.

Still gotta port forward to it through the router, presumably, but it makes the Caddy instance on the Pi a little simpler, and because it’s passing through traffic it’s not an additional proxy layer.

You can use a simple NFS mount from somewhere, or the redis and consul TLS storage plugins are available. I’ve seen a few people around here using redis so far, from memory.