Clustering Caddy


(Francislavoie) #21

Just giving my 2c, I was reading this thread a while back and decided that the best solution for me was to use Caddy as my load balancer and terminate TLS there, then set up my backend Caddy instances using self-signed certs, with insecure_skip_verify and a health check along with it. I had issued a couple bug reports and a PR for this use-case, so it works now in Caddy. I don’t know if it totally fits your use-case but it might help? Maybe a single instance of Caddy as a reverse proxy isn’t fast enough for you, don’t know.


(Mark Hanford) #22

It’s not the performance so much, but rather the complexity. We have a high-throughput Cisco load balancer “on the internet”, which currently offloads our SSL, then talks to two Varnish Cache servers, which also make all sorts of fairly complex routing decisions based on business rules, and in turn talk to various pools of web servers.

LoadBalancer1 -> Cache1|Cache2 -> Web1|Web2|...|WebX|

I have added in a Caddy Server to the mix, so we can offer our customers (that have custom domains they use for our product) to have SSL portals, which makes the process now:

LoadBalancer1 -> Caddy1 -> Cache1|Cache2 -> Web1|Web2|...|WebX|

The problem with this is that now, when I need to do any sort of maintenance on the CaddyServer, it takes out the entire environment. That is current considered Not Ideal™, and I need to resolve that before really ramping up adoption.

Adding in more Caddy servers would make that an ever deeper proxying stack, and still have a single point of failure.

LoadBalancer1 -> Caddy1 -> Caddy2|Caddy3 -> Cache1|Cache2 -> Web1|Web2|...|WebX|

I don’t need load balancing, just some way of having a hot-standby Caddy instance that the Load Balancer can swap to when it detects an issue. I think the synch-the-certs method might get me just far enough. I could even just let the second Caddy node request more certs. I don’t think the occasional duplicate LetsEncrypt call will cause massive problems.


(Mark Hanford) #23

Hmm, I might be flogging a dead horse here at this point, but what happens if I copy all of the storage paths /opt/caddy/ssl/acme/ and /opt/caddy/ssl/ocsp/ from one Caddy server to another, and then restart that second Caddy?


(Mark Hanford) #24

Well it worked on an initial test.

caddy1 is live, so has certs in /opt/caddy/ssl
caddy2 is fresh and essentially unreachable, so nothing is in /opt/caddy/ssl yet

  • I hit https://www.example.com on caddy1, it all works as expected.
  • I simulate a failover
  • I hit https://www.example.com on caddy2, I get SSL errors, and caddy2 complains about invalid certificate fetching things from LetsEncrypt (as it should)

If I then sync* the files over from caddy1 to caddy2:

rsync -v -e ssh --perms --recursive /opt/caddy/ssl/acme/* caddy@caddy2:/opt/caddy/ssl/acme
rsync -v -e ssh --perms --recursive /opt/caddy/ssl/ocsp/* caddy@caddy2:/opt/caddy/ssl/ocsp

And then on caddy2 reload the config

sudo pkill -USR1 caddy

I can then visit https://www.example.com and it works perfectly!

Just need to schedule the sync to happen regularly and I should be good to go live.


*I set up certificate-based auth for the caddy user, so the rsync and ssh commands can be issued without interaction.


(Matt Holt) #25

I suppose that will be fine if the failover is only temporarily running, but otherwise you’ll have 2 Caddy instances managing those certificates and OCSP staples, causing duplicate requests to the CA and OCSP responders.


(Mark Hanford) #26

I’ll consider it a failover server only, so it should only be in service during maintenance and failures.
Not sure yet how to stop it trying to maintain certs as they approach expiry. I could just block it’s access to LE domains I guess.


(Matt Holt) #27

The primary Caddy should renew the cert ~30 days out, so if your failover is less than 30 days, that one won’t need to do anything with them.


(Mark Hanford) #28

But if a cert turns 30, it’ll do that on both nodes, so I still need to prevent the backup now from trying to renew it at the same time as the primary does.
Ideally there would just be a setting I could use to tell the backup to renew at 32 days old, and the primary at the default 30. That would solve the problem too.


(Anders Norrback Bornholm) #29

Basically this keeps happening to us every time the certificates expire. All of our caddy nodes start renewing simultaneously and we hit the rate limit (7 domains x 3 nodes).

So did you find a way to make caddy not renew automatically?


(Matt Holt) #30

@osirisguitar You should upgrade to Caddy 0.10.12 (scroll down a bit) :wink:


(Anders Norrback Bornholm) #31

Ah, the problem was that we didn’t have a shared folder apparently (I didn’t do the setup). Thanks!