Internal CA - certificate renewal does not refresh intermediate cert properly

Strange. So it sounds like the acme_server directive still had the old intermediate cert in memory when performing renewal? There might be missing a thing to reload the certs used by the ACME server when the intermediate is renewed. I’ll try to take a look at the code soon to see if anything stands out.

2 Likes

Exactly. That’s the impression I got.

Thanks for having a look at the code. :heart:

Hm, that is suspicious. Worth a look. Not many people using that feature yet so this is a good opportunity. Thanks!

1 Like

Alright well as a workaround I recommend force reloading Caddy at least once a week (daily even better) for the time being so that it has a chance to refresh the intermediate cert in memory before the ACME server tries to issue certs. You can do this with docker-compose exec -w /etc/caddy caddy caddy reload --force I think. You can put this in a cron job or something.

At a glance, it looks like the fix would be a bit complicated because of how the actual CA renewal process and the ACME server are decoupled. I have some work in progress to implement an event system in Caddy, and the event system would make this much easier to resolve (i.e. CA intermediate cert renewal would trigger an event, the ACME server could subscribe to that event and update the intermediate cert at that point).

The other option is we could add a timer inside the ACME server to reload the intermediate cert from storage daily-ish which might be good enough.

2 Likes

Thanks for the tip with the cron job. That seems like a nice workaround for the time being.

1 Like

Thanks for looking into that and @ronau thanks for reporting this in detail.

I think I mentioned the same issue a while ago thinking I would get into it shortly after. Unfortunately a black hole appeared and it’s eating all my time!

My workaround is to kill the front-end Caddy (caddy stop does not work as it looks like it is waiting for all connections to close), stop the back-end caddy and delete the certificate folder. Then restarting both Caddy end and have new certificates issued.

Should I create a bug on Github for this so that we can keep track of it?

Sure @ronau, an issue would help remind us.

Issue has been created:

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.

FYI @ronau @Rob789, I’ve opened a PR to fix this issue once and for all:

If you could try out a build from this branch, that’d be awesome, to confirm it does indeed fix it.

Let me know if you need help with getting/making a build.

2 Likes

@francislavoie, Great news! I like to try it immediately.

If you don’t mind to give me the pointers to go from the standard installation to the correct build branch, I get it started right away.

Depends on your OS/arch and installation method.

If you’re using linux/amd64 and running on your host machine, then you can grab a build artifact from here

If running in Docker, then you write a Dockerfile using the builder image variant, building with xcaddy build pki-intermediate-fix to build from that branch

If using another OS/arch not in Docker, you can build with these instructions Build from source — Caddy Documentation

Wow, that is great news. Thanks so much!!

I’m running a build from this branch now in front of my Nextcloud. I know, probably not the wisest thing to do. However, it was the quickest way for me to put it to a test. So I will definitely notice if something breaks after a while.

As for the issue, we have to wait for about a week, I guess, to see if it’s gone.

1 Like

Yeah, exactly. More like 2/3 of a week, since that’s when Caddy will renew the intermediate. You can check in daily-ish ~4 days from now to see if the certs were issued with the right intermediate i.e. it should start issuing from the new one. You can check the expiry of the signing intermediate, your browser should let you check that (Firefox lets me anyways).

Thanks! got your build running.

At the time of replacing the Caddy official build, the certificates were expired. Once I started the new Caddy version, it renewed them.

2 Likes

Ok, so this morning I can see that the backend has received a new cert chain, again with an expired intermediate.

First of all, to be on the same page: I’m running the custom build from the PR only on the frontend Caddy (where the local CA is running). The backend Caddy (which is requesting the certificates from the frontend Caddy) is still running the official 2.4.6 build. However, I guess that makes sense. The issue happens on the frontend Caddy, right?

Frontend Caddy (with local CA): v2.5.0-beta.1.0.20220329213420-65f9064b2142 h1:J0tojrW5ZTYuiQB/THa8Lzjbx5T8WqZSCDEYhGC7QJ4=
Backend Caddy: v2.4.6 h1:HGkGICFGvyrodcqOOclHKfvJC0qTU7vny/7FhYp9hNw=

On the frontend/proxy side:

The custom build is running since: 2022-03-31 22:58 (UTC)

A few hours later the intermediate seems to have been renewed on the frontend side:

/data/caddy/pki/authorities/local # date +"%Z %z"
UTC +0000
/data/caddy/pki/authorities/local # ls -l
total 16
-rw-------    1 root     root           676 Apr  1 02:08 intermediate.crt
-rw-------    1 root     root           227 Apr  1 02:08 intermediate.key
-rw-------    1 root     root           627 Oct  3 18:12 root.crt
-rw-------    1 root     root           227 Oct  3 18:12 root.key

Screenshot 2022-04-03 at 11.30.47

On the backend/nextcloud side:

New certificate chain fetched this morning at 05:41 UTC:

ncweb_1    | 2022-04-03T05:41:05.872059399Z {"level":"info","ts":1648964465.8717616,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"ncweb","challenge_type":"tls-alpn-01","ca":"https://proxy/acme/local/directory"}
ncweb_1    | 2022-04-03T05:41:05.886892228Z {"level":"info","ts":1648964465.8866944,"logger":"tls","msg":"served key authentication certificate","server_name":"ncweb","challenge":"tls-alpn-01","remote":"172.18.0.2:45398","distributed":false}
ncweb_1    | 2022-04-03T05:41:06.157146585Z {"level":"info","ts":1648964466.156908,"logger":"tls.issuance.acme.acme_client","msg":"validations succeeded; finalizing order","order":"https://proxy/acme/local/order/rMO9eYyq2IcyGPP9MunUOFXK1XdFo7eh"}
ncweb_1    | 2022-04-03T05:41:06.198324546Z {"level":"info","ts":1648964466.1980906,"logger":"tls.issuance.acme.acme_client","msg":"successfully downloaded available certificate chains","count":1,"first_url":"https://proxy/acme/local/certificate/GRPFLq1N0TY2yHaRRZ6z02FSD4ElSPkb"}
ncweb_1    | 2022-04-03T05:41:06.199836540Z {"level":"info","ts":1648964466.1996422,"logger":"tls.renew","msg":"certificate renewed successfully","identifier":"ncweb"}
ncweb_1    | 2022-04-03T05:41:06.200036095Z {"level":"info","ts":1648964466.1999474,"logger":"tls.renew","msg":"releasing lock","identifier":"ncweb"}
ncweb_1    | 2022-04-03T05:41:06.200654222Z {"level":"info","ts":1648964466.2005339,"logger":"tls","msg":"reloading managed certificate","identifiers":["ncweb"]}
ncweb_1    | 2022-04-03T05:41:06.202216753Z {"level":"warn","ts":1648964466.2020655,"logger":"tls","msg":"stapling OCSP","error":"no OCSP stapling for [ncweb]: no OCSP server specified in certificate"}
ncweb_1    | 2022-04-03T05:41:06.202372789Z {"level":"info","ts":1648964466.2022657,"logger":"tls.cache","msg":"replaced certificate in cache","subjects":["ncweb"],"new_expiration":1649007665}
/data/caddy/certificates/proxy-acme-local-directory/ncweb # ls -l
total 12
-rw-------    1 root     root          1380 Apr  3 05:41 ncweb.crt
-rw-------    1 root     root           130 Apr  3 05:41 ncweb.json
-rw-------    1 root     root           227 Apr  3 05:41 ncweb.key

Host cert looks fine:
Screenshot 2022-04-03 at 11.38.19

But the intermediate cert in the acquired cert chain expired on 2 April (yesterday):
Screenshot 2022-04-03 at 11.39.41

So apparently the issue is still there.

Should I do something else in addition to just starting up the PR build? Maybe purge the certificate caches (i.e. the docker volumes) or anything like this?

I could give it another try and reboot both the frontend and the backend containers and wait again a few days to see if it happens again.
I rebooted both containers (because I needed my Nextcloud to work again). Had to delete the certificates on the backend caddy. After startup, it fetched a new certificate chain including the current (valid) intermediate. Will check out in a few days again if a renewed intermediate cert has been handed over to the backend Caddy by then.

:grimacing:

From what I can tell, you did everything right. I’m not sure why I didn’t work :cry:

I guess I’ll try to add a bunch of debug logs to the build throughout the acme_server stuff to try and figure out what’s going on. In the meantime, yeah I guess you’ll need to reload the frontend and delete the cert(s) from the backend and reload it (to redo issuance)

Yep, that’s what I did.