Using Kubernetes, shared storage and Let's Encrypt

nicdoye · June 14, 2016, 11:17am

Introduction

There are two issues I’m trying to address here. If you think I should split them out into separate posts (or move them to another category, let me know).

I’m using Kubernetes on GKE which throws up some interesting problems with automated certificates, pod restarts, multiple Caddy front-ends managing certs etc.

1. Sharing Certificates

I assume that it’s best for all the servers to have the same certificates. I believe it’s not mandatory, but it would be odd (and inefficient on the client) not to do this.

2. Managing Certificates

This is easy enough by putting /root/.caddy/acme on a shared disk/volume (gcePersistentDisk on GKE) but then you have multiple Caddy servers checking on the age of the certificate, and requesting renewals when it gets close to expiry date.

One way around this would be to have one pod (that is configured to automate certificate requests, and configure all subsequent ones with a static entry (as in the syntax for “To use Caddy with your own certificate and key:” tls cert key) for those keys.

Of course, then the certificate-management pod will have to signal all the others to reload when a new certificate is installed… I haven’t worked that out yet.

(There are other more complicated solutions that would involve ConfigMaps, stdin and Secrets, but I’ve not looked into them - they don’t solve any of the issues mentioned)

3. Restarts near renewal time

This is a known problem (not with Caddy, per se, more of with orchestration, pre-empting and bad set ups), and not limited to Kubernetes.

If a server keeps restarting (e.g in a crash loop) close to renewal, one breaks the rate limit for Let’s Encrypt. How can one mitigate oneself against this?

Backing up the current certificate would certainly be beneficial (so you don’t end up with no certificate whatsoever,) then, if it hasn’t expired, it could at least be put back in place. The question is, can this functionality be placed within Caddy? (Should it? How would one mark that certificate renewal can’t happen for a week?)

How do other people solve this? And yes, I have done it, by overloading my Kubernetes cluster which then put the pod into a crash loop.

jacob · June 14, 2016, 1:09pm

Hello Nicolas, welcome to the Caddy forum!

Disclaimer: I have no experience with Kubernetes or orchestration.

It sounds like you should have a dedicated ACME client that handles renewals on a shared volume.

When Caddy get’s an API I’m sure there will be a way to restart them remotely. Until then a plugin could probably be made to add an endpoint to trigger remote restarts.

So you would just point all your Caddy instances at the shared volume keys and have a single ACME client handle renewals separately from any Caddy instance.

But then again I’m only managing a couple servers so I may be way off base.

nicdoye · June 14, 2016, 2:11pm

You’re right, using a dedicated ACME client on the pod that renews certificate sounds like a better idea than abusing Caddy (although it does kinda remove the main reason I started using Caddy!) Scheduled Jobs are arriving (this month)[GitHub - kubernetes/kubernetes: Production-Grade Container Scheduling and Management] in k8s 1.3.

Then signalling via the (Kubernetes Control Plane)[kubernetes/architecture.md at release-1.2 · kubernetes/kubernetes · GitHub] would get the other pods to update.

For green-blue deployment, you’d want two disk areas with certs.

A Deployment update would be more natural, I think than using the control plane, but perhaps that’s because I’m more used to external management of the cluster.

jacob · June 14, 2016, 2:16pm

Caddy still has simpler (config format | plugin system | deployment) than Nginx/Apache.

nicdoye · June 14, 2016, 2:16pm

Too right!

smebberson · May 3, 2018, 11:18am

This is a little old, but I’ve been looking into this recently. With 0.10.11 releasing auto HTTPS in fleet configuration, I think this should be possible without too much effort.

Have one pod mount a gcePersistent disk in ReadWriteOnce mode, running Caddy, that obtains the certificate (in my instance, a wild card, so I can easily serve 5+ subdomains) and stores them on the shared volume.

All other pods running Caddy can then mount the shared volume in ReadManyOnly mode (I hope that would work?), and have access to the certificates.

When it comes time for renewal, if a pod (or multiple) with ReadManyOnly access tried to obtain a lock, I imagine it would fail, shrug and try again later. In the meantime, the Pod with ReadWriteOnce access would update the certificates. Later on, ReadManyOnly pods would try again, notice the cert was updated, and start using them.

Sounds like it would work…?

smebberson · May 3, 2018, 11:21am

Ah. Sadly, no

https://groups.google.com/forum/#!topic/kubernetes-users/z0ZhNnJmBxw

Whitestrake · May 3, 2018, 11:59pm

That’s unfortunate. Next best thing I can think of would be to simply have one of your pods host an NFS share to the other Caddies.

jcm · May 31, 2019, 6:07pm

1 year later, has anyone found a solution for that? Just hit the same issue

Whitestrake · June 1, 2019, 8:17am

Run an NFS service on one of your pods and have Caddy instances mount it. Caddy should be able to do the rest.

matt · June 3, 2019, 3:50am

We have a working ingress controller that will be released along with Caddy 2 later this year. It uses Kubernetes secrets for certificate storage and will take care of all this stuff for you.

jcm · November 10, 2020, 8:51pm

@matt, sorry for resurrecting this old thread, but are there plans for a production-ready release of the ingress controller for Caddy 2?

francislavoie · November 10, 2020, 9:19pm

You can find it here @jcm

jcm · November 11, 2020, 12:55pm

@francislavoie yeah I saw the project, but from what I could see it’s not production-ready yet.