Caddy as clustered load balancer - how's that gonna work?

Tenou · May 19, 2021, 7:32am

Hey there,

first of all my sincere apologies for deleting the beautiful help-template - it just didn’t make sense in this case. Hope you aren’t too sad

I’m looking forward to learning about the capabilites of caddy to act as a clustered load balancer - or in a cluster configuration in general, with redis as a shared storage backend.

I imagine, using caddy in a clustered configuration with a load balancer in front of it would be rather simple - same storage backend, config through the API, and external load balancing between caddy’s healthy nodes.

However, especially since the documentation lacks a bit in terms of these larger configurations, I struggle to understand how something like this would work when caddy itself is supposed to act as a clustered load balancer with multiple nodes.

Is there an implementation to add a virtual IP managed by caddy itself, like most major load balancers use, so that availability and well distributed workloads between all caddy nodes is ensured at all times, or is that something that caddy is just not made to do?

Hope you all stay safe and have a great day!

francislavoie · May 19, 2021, 8:38am

Storage is clustered, but config is not.

That means that Caddy instances that share the same storage will be able to solve ACME challenges without contention because any of them can start, continue our complete the process.

If you plan to use the config API, then you need to make sure you send the updated config to all your Caddy instances yourself. Caddy doesn’t attempt to sync config at this time.

Since v2.4 though, you may configure Caddy to pull an initial config from an external source like an internal HTTP endpoint. There’s also an issue open discussing approaches to polling for updated configurations:

Hopefully that answers your questions. If not, could you clarify?

Tenou · May 19, 2021, 9:37am

It certainly answers it to an extent, thank you!

I assume, when caddy doesn’t sync the config between multiple instances it won’t able to share a virtual IP natively to load balance between multiple nodes either. So, when using caddy as a load balancer for multiple webservers, I’d still need to put a seperate load balancer in front of the caddy cluster to distribute the load between the diffrerent instances.

This would introduce new challenges, like (as funny as it sounds) “unbalanced” load balancing by the different caddy nodes, since they don’t communicate with each other about which of the instances sends how much traffic to each webserver backend.

To sum it up (and please correct me if I’m wrong!), from what I could gather from the documentation and your comment, Caddy is a great load balancer when used as a single instance and supports clustered backend storage for certificates, so that multiple caddy instances can be utilized to gain maximum availability. But for a setup where a clustered load balancer is necessary for which a VIP is being managed by the load balancers themselves and all nodes are aware of each other, I’d still have to rely on a 3rd party solution.

francislavoie · May 19, 2021, 2:39pm

I’m not so sure what you mean by virtual IP. That sounds like something outside the context of Caddy though. Outside my expertise. I’ve never worked on systems of a scale where that’s a topic where that’s come up.

What I can say is that you could use DNS round robin or something to distribute the load across your Caddy instances.

That’s probably accurate based on what you wrote.

Another thing to note, Caddy is an HTTP server, it can’t proxy TCP/UDP transparently with the standard modules. But, there exists caddy-l4 which can though:

Tenou · May 19, 2021, 3:32pm

Thank you very much, that answers all my questions!

Regaring the virtual IP: With larger scale load balancers, which are configured in clusters of multiple nodes, they usually share a virtual IP. But instead of having that virtual IP failover to another node in case the active one goes down - usually using an external service like pacemaker for this purpose - the clustered nodes are fully aware of it and “share” it.
You can think of that virtual IP like an entry door, behind which the (as shown in my drawing) three load balancers sit and “take” the traffic based on their individual load and availability.

It essentially ensures that you can point your DNS to a single (virtual) IP which is being shared by the whole cluster of load balancers and will always be available, even when some nodes fail.

This way, you gain maximum uptime and, since it’s implemented into the load balancers themselves, gain the best distribution over all nodes possible.

I hope I explained that at least kind of comprehensible, it’s been a long day. Thank you again for your time and thorough answers

francislavoie · May 19, 2021, 4:40pm

Yeah, makes sense.

I think that aspect is probably out of scope of Caddy. I think it would introduce a lot of complexity to introduce a system like that, so I don’t think we could justify having it built-in to Caddy’s standard distribution. And it would probably be too opinionated.

But there’s probably room for those sorts of features via plugins. I can’t say we’ll have the time anytime soon to work on that sort of stuff, nor do I think @matt or I have the expertise to work on that type of stuff I think. If you have the resources to offer to work on that stuff, that’d be cool

Tenou · May 19, 2021, 5:44pm

As much as I’d love to, my speciality unfortunatly is system integration and not software development. I wouldn’t consider this an urgent (or necessary) requirement either, it just would’ve been a nice to have. Maybe someday! Until then, I’ll stick with 3rd party load balancing and use caddy where it really shines - as a webserver and reverse proxy

matt · May 19, 2021, 6:04pm

I haven’t had time to read this thread in detail, but I fail to understand why Caddy can’t be used as a load balancer?

Tenou · May 20, 2021, 5:42am

Hey there - no worries, here’s the TLDR:

I wanted to use Caddy in an Active/Active/Active-Cluster config as a Load balancer and reverse proxy.
Since, besides sharing the TLS certificates, multiple caddy nodes won’t communicate with each other, some requirements for that setup aren’t met by caddy. For example, least-connection load balancing (since it would require all nodes to know how many connections they all originate to which backend) and a managed virtual IP through which the incoming connections would be balanced between the individual caddy nodes.

I realize now that caddy is not (yet?) capable of this, which is why I’ll have to rely on 3rd party load balancers and use caddy as a reverse proxy / webserver solely.

With round-robin load balancing, an active/passive cluster or in a single node configuration, caddy is obviously more than capable of acting as a great load balancer and all-in-one solution in general!

francislavoie · May 20, 2021, 9:00am

In case you missed it, Caddy does have a least_conn load balancing policy, but you’re right that multiple instances of Caddy won’t communicate this to eachother.

Gorian · May 21, 2021, 6:31pm

I also plan to cluster Caddy, but I’m not sure why you would put another load balancer in front of Caddy instead of using Caddy as the load-balancer? You can share the configs either via NFS, or using the new dynamic configuration. HA can be achieved with either a VIP or BGP.

A VIP btw, is a Virtual IP - it’s an IP that isn’t assigned to a “physical” interface, but instead “floats” between nodes, and is managed by a service such as keepalived that is configured with rules and communicates with other keepalived instances on the other nodes to determine which server currently gets the IP. Then your NAT, firewall, or whatever, will just port forward port 443 to that IP rather than the assigned IP of any individual node.

What I would really like to see is:

the ability to use a database like mysql, redis, consul, etc. for storing the live config
some central management plane for managing multiple nodes

If you use consul for service discovery and use consul SRV records for load balancing with caddy, you can have a very dynamic, clustered, load balancer.

Tenou · May 21, 2021, 8:30pm

I also plan to cluster Caddy, but I’m not sure why you would put another load balancer in front of Caddy instead of using Caddy as the load-balancer? You can share the configs either via NFS, or using the new dynamic configuration. HA can be achieved with either a VIP or BGP.

Whilst that would work flawlessly for an HA configuration, it still would not provide an active/active cluster scenario where multiple nodes are active at the same time. It’ll always be limited to a single node - the one currently running the VIP.

I was, however, specifically looking for the latter, where n nodes would work simultaneously, sharing the same VIP, instead of assigning it to a specific node.

Gorian · May 21, 2021, 8:31pm

This is true. Any reason you NEED active-active?

As far as sharing an IP, the only solution I know of that will do that is BGP.

Gorian · May 21, 2021, 8:35pm

I suppose that if you just want to use a DNS round-robin, you can use a consul DNS service entry with health-checks, that might be adequate too - but it wouldn’t help in situations in which you want to port forward from a firewall to a specific IP address.

packeteer · May 25, 2021, 11:53am

DNS round robin is the way to go and the way that most modern load balancers use.

Gorian · May 26, 2021, 5:20pm

The problem, as stated before, is that you can’t exactly port forward from a NAT’ing router/firewall to a DNS record, it’s via IP, which is where BGP or something like keepalived would be required to provide HA. Once you have multiple sites port-fowarded to internally redundant Caddy instances, then you could use DNS round-robin to provide public HA on top of it.