Internal CA - certificate renewal does not refresh intermediate cert properly

Continuing the discussion from Internal CA - automatic renewal of intermediate cert:

1. Caddy version (caddy version):

2.4.6

2. How I run Caddy:

a. System environment:

I’m trying to implement this here at my home: Caddy reverse proxy + Nextcloud + Collabora + Bitwarden_rs with local HTTPS

  • Docker on Raspi 4 (hostname: rowena) behind private DSL (port 80 and 443 forwarded to Pi, DynDNS etc. all working fine)
  • Caddy container to be used as reverse proxy and local CA
  • Nextcloud as backend service
  • Caddy and Nextcloud service connected via Docker network

Goal: TLS between Frontend (reverse proxy) and backend (Nextcloud)

b. Command:

Please see original topic.

c. Service/unit/compose file:

Please see original topic.

d. My complete Caddyfile or JSON config:

Please see original topic.

3. The problem I’m having:

Sometimes(???), during certificate renewal (ACME challenge?), the Caddy CA does hand out a fresh server certificate with an outdated intermediate certificate.

Trying to connect to the backend service (Nextcloud) via the reverse proxy, I get a HTTP 502 Bad Gateway. The Proxy cannot connect to the backend service.

4. Error messages and/or full log output:

In the logs of the reverse proxy (frontend) I find this:

2021-12-17T12:49:48.362131078Z {
    "level": "error",
    "ts": 1639745388.3616397,
    "logger": "http.log.error",
    "msg": "x509: certificate has expired or is not yet valid: current time 2021-12-17T12:49:48Z is after 2021-12-17T00:05:59Z",
    "request": {
        "remote_addr": "192.168.112.1:35064",
        "proto": "HTTP/1.1",
        "method": "GET",
        "host": "cloud.domain.net",
        "uri": "/status.php",
        "headers": {
            "Authorization": [
                "Basic cm9uOnNSWEJZLWJmREJxLUw5QzlyLThnRG5YLWpRRXlD"
            ],
            "User-Agent": [
                "Mozilla/5.0 (Macintosh) mirall/3.3.3git (build 7266) (Nextcloud, osx-19.6.0 ClientArchitecture: x86_64 OsArchitecture: x86_64)"
            ],
            "Accept": [
                "*/*"
            ],
            "Accept-Encoding": [
                "gzip, deflate"
            ],
            "X-Request-Id": [
                "f6c588b6-4810-4adc-a3e2-1dda773869ff"
            ],
            "Cookie": [
                "__Host-nc_sameSiteCookielax=true; __Host-nc_sameSiteCookiestrict=true; oc_sessionPassphrase=djT%2BHwbwuxMQNf2tZhxhSuHesJb645xba7gJ29fkX6oCBdoTH4THe9n%2FqEuYtnSOZPr1afF0ts5iofubGXZKgpPKc6Uh7FLf48mV9itKoMOk3oaSX8VyjDuQR3dLOLba; ocgsgfzlngt6=6cec6238a0db064eecfa0b01e6dd8198"
            ],
            "Connection": [
                "Keep-Alive"
            ],
            "Accept-Language": [
                "en-DE,*"
            ]
        },
        "tls": {
            "resumed": false,
            "version": 772,
            "cipher_suite": 4867,
            "proto": "",
            "proto_mutual": true,
            "server_name": "cloud.domain.net"
        }
    },
    "duration": 0.01587291,
    "status": 502,
    "err_id": "8gps69ehw",
    "err_trace": "reverseproxy.statusError (reverseproxy.go:886)"
}

In the logs of the backend service (Caddy serving static Nextcloud content and forwarding to PHP-FPM):

2021-12-17T12:52:54.834094820Z {
    "level": "debug",
    "ts": 1639745574.8334348,
    "logger": "tls.handshake",
    "msg": "choosing certificate",
    "identifier": "ncweb",
    "num_choices": 1
}
2021-12-17T12:52:54.834297486Z {
    "level": "debug",
    "ts": 1639745574.8335903,
    "logger": "tls.handshake",
    "msg": "default certificate selection results",
    "identifier": "ncweb",
    "subjects": [
        "ncweb"
    ],
    "managed": true,
    "issuer_key": "proxy-acme-local-directory",
    "hash": "bea83ae0fe0757f68dccd9b96d52e374932ffe8c1fa11464e303f4d0dd24b061"
}
2021-12-17T12:52:54.834363449Z {
    "level": "debug",
    "ts": 1639745574.833675,
    "logger": "tls.handshake",
    "msg": "matched certificate in cache",
    "subjects": [
        "ncweb"
    ],
    "managed": true,
    "expiration": 1639772631,
    "hash": "bea83ae0fe0757f68dccd9b96d52e374932ffe8c1fa11464e303f4d0dd24b061"
}
2021-12-17T12:52:54.843031261Z {
    "level": "debug",
    "ts": 1639745574.84263,
    "logger": "http.stdlib",
    "msg": "http: TLS handshake error from 192.168.112.2:36548: remote error: tls: bad certificate"
}

I checked the certificate on the backend Caddy:

/data/caddy # ls -lsa certificates/proxy-acme-local-directory/ncweb/
total 20
     4 drwx------    2 root     root          4096 Dec 10 00:23 .
     4 drwx------    3 root     root          4096 Dec 10 00:23 ..
     4 -rw-------    1 root     root          1384 Dec 17 08:23 ncweb.crt
     4 -rw-------    1 root     root           130 Dec 17 08:23 ncweb.json
     4 -rw-------    1 root     root           227 Dec 17 08:23 ncweb.key
/data/caddy # less certificates/proxy-acme-local-directory/ncweb/ncweb.crt

-----BEGIN CERTIFICATE-----
MIIB2zCCAYCgAwIBAgIRANVd1ul2Rv7MDtgvt3IJIcAwCgYIKoZIzj0EAwIwMzEx
MC8GA1UEAxMoQ2FkZHkgTG9jYWwgQXV0aG9yaXR5IC0gRUNDIEludGVybWVkaWF0
ZTAeFw0yMTEyMTcwODIyNTFaFw0yMTEyMTcyMDIzNTFaMAAwWTATBgcqhkjOPQIB
BggqhkjOPQMBBwNCAAR1kj/IKJwp2NeJNwwYf2xaxzFrOThXme71rinZi4DgpSLN
PocozRuv+OMAq+Cazfh6mNlrQgrEjl/9kv3eI/yao4GnMIGkMA4GA1UdDwEB/wQE
AwIHgDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHQYDVR0OBBYEFOH8
aNuL2O/TVS9aLunfrRCf4hulMB8GA1UdIwQYMBaAFOG9Kn54M1sWnsh5T9fztSAj
Kmz8MBMGA1UdEQEB/wQJMAeCBW5jd2ViMB4GDCsGAQQBgqRkxihAAQQOMAwCAQYE
BWxvY2FsBAAwCgYIKoZIzj0EAwIDSQAwRgIhAMSyE9GLKFFi7LBk3hrBjJ+nBUBY
Vbgmt+ZI0bSbTprgAiEAhv9MLm4cH5/Y/POic1VmquS1WXynP8ifPP4y5UYOdOM=
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIByDCCAW6gAwIBAgIRAIGYbQD5IZsV0lmMlMXo1B8wCgYIKoZIzj0EAwIwMDEu
MCwGA1UEAxMlQ2FkZHkgTG9jYWwgQXV0aG9yaXR5IC0gMjAyMSBFQ0MgUm9vdDAe
Fw0yMTEyMTAwMDA1NTlaFw0yMTEyMTcwMDA1NTlaMDMxMTAvBgNVBAMTKENhZGR5
IExvY2FsIEF1dGhvcml0eSAtIEVDQyBJbnRlcm1lZGlhdGUwWTATBgcqhkjOPQIB
BggqhkjOPQMBBwNCAAQH7ICO0FDnHJuemLDqFw2mC5VJh/kHmGc0khgoc9wJ3FpZ
mxTuWq45eWj2C5UaFYb0XZ9Gvm8ViSaG7Ypcxmyno2YwZDAOBgNVHQ8BAf8EBAMC
AQYwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNVHQ4EFgQU4b0qfngzWxaeyHlP1/O1
ICMqbPwwHwYDVR0jBBgwFoAUre9VgslnP8aqOZmSGci7xbdcAjQwCgYIKoZIzj0E
AwIDSAAwRQIgN6NqwBkqLJXxQfQVru2YJY0zepqMDBTAj+VTPsa/c2ICIQDw3XNX
8xVPzbNBFDI4mUoUgFw0qwqCKxbROMF3cQFpaw==
-----END CERTIFICATE-----

So apparently at 8:22 this morning, the backend caddy has retrieved a new certificate from the frontend caddy

but has not received the new intermediate certificate (which at time of the renewal had been expired already). I suppose during the ACME challenge one usually receives the full cert chain, which means that the CA on the frontend caddy must have delivered the chain with the outdated intermediate certificate.

On the frontend caddy (CA), the intermediate certificate has been renewed on 15 December already:

# on frontend caddy

/data/caddy # ls -lsa pki/authorities/local/
total 24
     4 drwx------    2 root     root          4096 Oct  3 18:12 .
     4 drwx------    3 root     root          4096 Oct  3 18:12 ..
     4 -rw-------    1 root     root           676 Dec 15 14:39 intermediate.crt
     4 -rw-------    1 root     root           227 Dec 15 14:39 intermediate.key
     4 -rw-------    1 root     root           627 Oct  3 18:12 root.crt
     4 -rw-------    1 root     root           227 Oct  3 18:12 root.key

Don’t know if there’s anything else that I could provide from my setup.

1 Like

Strange. So it sounds like the acme_server directive still had the old intermediate cert in memory when performing renewal? There might be missing a thing to reload the certs used by the ACME server when the intermediate is renewed. I’ll try to take a look at the code soon to see if anything stands out.

2 Likes

Exactly. That’s the impression I got.

Thanks for having a look at the code. :heart:

Hm, that is suspicious. Worth a look. Not many people using that feature yet so this is a good opportunity. Thanks!

1 Like

Alright well as a workaround I recommend force reloading Caddy at least once a week (daily even better) for the time being so that it has a chance to refresh the intermediate cert in memory before the ACME server tries to issue certs. You can do this with docker-compose exec -w /etc/caddy caddy caddy reload --force I think. You can put this in a cron job or something.

At a glance, it looks like the fix would be a bit complicated because of how the actual CA renewal process and the ACME server are decoupled. I have some work in progress to implement an event system in Caddy, and the event system would make this much easier to resolve (i.e. CA intermediate cert renewal would trigger an event, the ACME server could subscribe to that event and update the intermediate cert at that point).

The other option is we could add a timer inside the ACME server to reload the intermediate cert from storage daily-ish which might be good enough.

2 Likes

Thanks for the tip with the cron job. That seems like a nice workaround for the time being.

1 Like

Thanks for looking into that and @ronau thanks for reporting this in detail.

I think I mentioned the same issue a while ago thinking I would get into it shortly after. Unfortunately a black hole appeared and it’s eating all my time!

My workaround is to kill the front-end Caddy (caddy stop does not work as it looks like it is waiting for all connections to close), stop the back-end caddy and delete the certificate folder. Then restarting both Caddy end and have new certificates issued.

Should I create a bug on Github for this so that we can keep track of it?

Sure @ronau, an issue would help remind us.

Issue has been created:

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.

FYI @ronau @Rob789, I’ve opened a PR to fix this issue once and for all:

If you could try out a build from this branch, that’d be awesome, to confirm it does indeed fix it.

Let me know if you need help with getting/making a build.

2 Likes

@francislavoie, Great news! I like to try it immediately.

If you don’t mind to give me the pointers to go from the standard installation to the correct build branch, I get it started right away.

Depends on your OS/arch and installation method.

If you’re using linux/amd64 and running on your host machine, then you can grab a build artifact from here

If running in Docker, then you write a Dockerfile using the builder image variant, building with xcaddy build pki-intermediate-fix to build from that branch

If using another OS/arch not in Docker, you can build with these instructions Build from source — Caddy Documentation

Wow, that is great news. Thanks so much!!

I’m running a build from this branch now in front of my Nextcloud. I know, probably not the wisest thing to do. However, it was the quickest way for me to put it to a test. So I will definitely notice if something breaks after a while.

As for the issue, we have to wait for about a week, I guess, to see if it’s gone.

1 Like

Yeah, exactly. More like 2/3 of a week, since that’s when Caddy will renew the intermediate. You can check in daily-ish ~4 days from now to see if the certs were issued with the right intermediate i.e. it should start issuing from the new one. You can check the expiry of the signing intermediate, your browser should let you check that (Firefox lets me anyways).

Thanks! got your build running.

At the time of replacing the Caddy official build, the certificates were expired. Once I started the new Caddy version, it renewed them.

2 Likes

Ok, so this morning I can see that the backend has received a new cert chain, again with an expired intermediate.

First of all, to be on the same page: I’m running the custom build from the PR only on the frontend Caddy (where the local CA is running). The backend Caddy (which is requesting the certificates from the frontend Caddy) is still running the official 2.4.6 build. However, I guess that makes sense. The issue happens on the frontend Caddy, right?

Frontend Caddy (with local CA): v2.5.0-beta.1.0.20220329213420-65f9064b2142 h1:J0tojrW5ZTYuiQB/THa8Lzjbx5T8WqZSCDEYhGC7QJ4=
Backend Caddy: v2.4.6 h1:HGkGICFGvyrodcqOOclHKfvJC0qTU7vny/7FhYp9hNw=

On the frontend/proxy side:

The custom build is running since: 2022-03-31 22:58 (UTC)

A few hours later the intermediate seems to have been renewed on the frontend side:

/data/caddy/pki/authorities/local # date +"%Z %z"
UTC +0000
/data/caddy/pki/authorities/local # ls -l
total 16
-rw-------    1 root     root           676 Apr  1 02:08 intermediate.crt
-rw-------    1 root     root           227 Apr  1 02:08 intermediate.key
-rw-------    1 root     root           627 Oct  3 18:12 root.crt
-rw-------    1 root     root           227 Oct  3 18:12 root.key

Screenshot 2022-04-03 at 11.30.47

On the backend/nextcloud side:

New certificate chain fetched this morning at 05:41 UTC:

ncweb_1    | 2022-04-03T05:41:05.872059399Z {"level":"info","ts":1648964465.8717616,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"ncweb","challenge_type":"tls-alpn-01","ca":"https://proxy/acme/local/directory"}
ncweb_1    | 2022-04-03T05:41:05.886892228Z {"level":"info","ts":1648964465.8866944,"logger":"tls","msg":"served key authentication certificate","server_name":"ncweb","challenge":"tls-alpn-01","remote":"172.18.0.2:45398","distributed":false}
ncweb_1    | 2022-04-03T05:41:06.157146585Z {"level":"info","ts":1648964466.156908,"logger":"tls.issuance.acme.acme_client","msg":"validations succeeded; finalizing order","order":"https://proxy/acme/local/order/rMO9eYyq2IcyGPP9MunUOFXK1XdFo7eh"}
ncweb_1    | 2022-04-03T05:41:06.198324546Z {"level":"info","ts":1648964466.1980906,"logger":"tls.issuance.acme.acme_client","msg":"successfully downloaded available certificate chains","count":1,"first_url":"https://proxy/acme/local/certificate/GRPFLq1N0TY2yHaRRZ6z02FSD4ElSPkb"}
ncweb_1    | 2022-04-03T05:41:06.199836540Z {"level":"info","ts":1648964466.1996422,"logger":"tls.renew","msg":"certificate renewed successfully","identifier":"ncweb"}
ncweb_1    | 2022-04-03T05:41:06.200036095Z {"level":"info","ts":1648964466.1999474,"logger":"tls.renew","msg":"releasing lock","identifier":"ncweb"}
ncweb_1    | 2022-04-03T05:41:06.200654222Z {"level":"info","ts":1648964466.2005339,"logger":"tls","msg":"reloading managed certificate","identifiers":["ncweb"]}
ncweb_1    | 2022-04-03T05:41:06.202216753Z {"level":"warn","ts":1648964466.2020655,"logger":"tls","msg":"stapling OCSP","error":"no OCSP stapling for [ncweb]: no OCSP server specified in certificate"}
ncweb_1    | 2022-04-03T05:41:06.202372789Z {"level":"info","ts":1648964466.2022657,"logger":"tls.cache","msg":"replaced certificate in cache","subjects":["ncweb"],"new_expiration":1649007665}
/data/caddy/certificates/proxy-acme-local-directory/ncweb # ls -l
total 12
-rw-------    1 root     root          1380 Apr  3 05:41 ncweb.crt
-rw-------    1 root     root           130 Apr  3 05:41 ncweb.json
-rw-------    1 root     root           227 Apr  3 05:41 ncweb.key

Host cert looks fine:
Screenshot 2022-04-03 at 11.38.19

But the intermediate cert in the acquired cert chain expired on 2 April (yesterday):
Screenshot 2022-04-03 at 11.39.41

So apparently the issue is still there.

Should I do something else in addition to just starting up the PR build? Maybe purge the certificate caches (i.e. the docker volumes) or anything like this?

I could give it another try and reboot both the frontend and the backend containers and wait again a few days to see if it happens again.
I rebooted both containers (because I needed my Nextcloud to work again). Had to delete the certificates on the backend caddy. After startup, it fetched a new certificate chain including the current (valid) intermediate. Will check out in a few days again if a renewed intermediate cert has been handed over to the backend Caddy by then.

:grimacing:

From what I can tell, you did everything right. I’m not sure why I didn’t work :cry:

I guess I’ll try to add a bunch of debug logs to the build throughout the acme_server stuff to try and figure out what’s going on. In the meantime, yeah I guess you’ll need to reload the frontend and delete the cert(s) from the backend and reload it (to redo issuance)