Internal CA - certificate renewal does not refresh intermediate cert properly

Continuing the discussion from Internal CA - automatic renewal of intermediate cert:

1. Caddy version (caddy version):

2.4.6

2. How I run Caddy:

a. System environment:

I’m trying to implement this here at my home: Caddy reverse proxy + Nextcloud + Collabora + Bitwarden_rs with local HTTPS

  • Docker on Raspi 4 (hostname: rowena) behind private DSL (port 80 and 443 forwarded to Pi, DynDNS etc. all working fine)
  • Caddy container to be used as reverse proxy and local CA
  • Nextcloud as backend service
  • Caddy and Nextcloud service connected via Docker network

Goal: TLS between Frontend (reverse proxy) and backend (Nextcloud)

b. Command:

Please see original topic.

c. Service/unit/compose file:

Please see original topic.

d. My complete Caddyfile or JSON config:

Please see original topic.

3. The problem I’m having:

Sometimes(???), during certificate renewal (ACME challenge?), the Caddy CA does hand out a fresh server certificate with an outdated intermediate certificate.

Trying to connect to the backend service (Nextcloud) via the reverse proxy, I get a HTTP 502 Bad Gateway. The Proxy cannot connect to the backend service.

4. Error messages and/or full log output:

In the logs of the reverse proxy (frontend) I find this:

2021-12-17T12:49:48.362131078Z {
    "level": "error",
    "ts": 1639745388.3616397,
    "logger": "http.log.error",
    "msg": "x509: certificate has expired or is not yet valid: current time 2021-12-17T12:49:48Z is after 2021-12-17T00:05:59Z",
    "request": {
        "remote_addr": "192.168.112.1:35064",
        "proto": "HTTP/1.1",
        "method": "GET",
        "host": "cloud.domain.net",
        "uri": "/status.php",
        "headers": {
            "Authorization": [
                "Basic cm9uOnNSWEJZLWJmREJxLUw5QzlyLThnRG5YLWpRRXlD"
            ],
            "User-Agent": [
                "Mozilla/5.0 (Macintosh) mirall/3.3.3git (build 7266) (Nextcloud, osx-19.6.0 ClientArchitecture: x86_64 OsArchitecture: x86_64)"
            ],
            "Accept": [
                "*/*"
            ],
            "Accept-Encoding": [
                "gzip, deflate"
            ],
            "X-Request-Id": [
                "f6c588b6-4810-4adc-a3e2-1dda773869ff"
            ],
            "Cookie": [
                "__Host-nc_sameSiteCookielax=true; __Host-nc_sameSiteCookiestrict=true; oc_sessionPassphrase=djT%2BHwbwuxMQNf2tZhxhSuHesJb645xba7gJ29fkX6oCBdoTH4THe9n%2FqEuYtnSOZPr1afF0ts5iofubGXZKgpPKc6Uh7FLf48mV9itKoMOk3oaSX8VyjDuQR3dLOLba; ocgsgfzlngt6=6cec6238a0db064eecfa0b01e6dd8198"
            ],
            "Connection": [
                "Keep-Alive"
            ],
            "Accept-Language": [
                "en-DE,*"
            ]
        },
        "tls": {
            "resumed": false,
            "version": 772,
            "cipher_suite": 4867,
            "proto": "",
            "proto_mutual": true,
            "server_name": "cloud.domain.net"
        }
    },
    "duration": 0.01587291,
    "status": 502,
    "err_id": "8gps69ehw",
    "err_trace": "reverseproxy.statusError (reverseproxy.go:886)"
}

In the logs of the backend service (Caddy serving static Nextcloud content and forwarding to PHP-FPM):

2021-12-17T12:52:54.834094820Z {
    "level": "debug",
    "ts": 1639745574.8334348,
    "logger": "tls.handshake",
    "msg": "choosing certificate",
    "identifier": "ncweb",
    "num_choices": 1
}
2021-12-17T12:52:54.834297486Z {
    "level": "debug",
    "ts": 1639745574.8335903,
    "logger": "tls.handshake",
    "msg": "default certificate selection results",
    "identifier": "ncweb",
    "subjects": [
        "ncweb"
    ],
    "managed": true,
    "issuer_key": "proxy-acme-local-directory",
    "hash": "bea83ae0fe0757f68dccd9b96d52e374932ffe8c1fa11464e303f4d0dd24b061"
}
2021-12-17T12:52:54.834363449Z {
    "level": "debug",
    "ts": 1639745574.833675,
    "logger": "tls.handshake",
    "msg": "matched certificate in cache",
    "subjects": [
        "ncweb"
    ],
    "managed": true,
    "expiration": 1639772631,
    "hash": "bea83ae0fe0757f68dccd9b96d52e374932ffe8c1fa11464e303f4d0dd24b061"
}
2021-12-17T12:52:54.843031261Z {
    "level": "debug",
    "ts": 1639745574.84263,
    "logger": "http.stdlib",
    "msg": "http: TLS handshake error from 192.168.112.2:36548: remote error: tls: bad certificate"
}

I checked the certificate on the backend Caddy:

/data/caddy # ls -lsa certificates/proxy-acme-local-directory/ncweb/
total 20
     4 drwx------    2 root     root          4096 Dec 10 00:23 .
     4 drwx------    3 root     root          4096 Dec 10 00:23 ..
     4 -rw-------    1 root     root          1384 Dec 17 08:23 ncweb.crt
     4 -rw-------    1 root     root           130 Dec 17 08:23 ncweb.json
     4 -rw-------    1 root     root           227 Dec 17 08:23 ncweb.key
/data/caddy # less certificates/proxy-acme-local-directory/ncweb/ncweb.crt

-----BEGIN CERTIFICATE-----
MIIB2zCCAYCgAwIBAgIRANVd1ul2Rv7MDtgvt3IJIcAwCgYIKoZIzj0EAwIwMzEx
MC8GA1UEAxMoQ2FkZHkgTG9jYWwgQXV0aG9yaXR5IC0gRUNDIEludGVybWVkaWF0
ZTAeFw0yMTEyMTcwODIyNTFaFw0yMTEyMTcyMDIzNTFaMAAwWTATBgcqhkjOPQIB
BggqhkjOPQMBBwNCAAR1kj/IKJwp2NeJNwwYf2xaxzFrOThXme71rinZi4DgpSLN
PocozRuv+OMAq+Cazfh6mNlrQgrEjl/9kv3eI/yao4GnMIGkMA4GA1UdDwEB/wQE
AwIHgDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHQYDVR0OBBYEFOH8
aNuL2O/TVS9aLunfrRCf4hulMB8GA1UdIwQYMBaAFOG9Kn54M1sWnsh5T9fztSAj
Kmz8MBMGA1UdEQEB/wQJMAeCBW5jd2ViMB4GDCsGAQQBgqRkxihAAQQOMAwCAQYE
BWxvY2FsBAAwCgYIKoZIzj0EAwIDSQAwRgIhAMSyE9GLKFFi7LBk3hrBjJ+nBUBY
Vbgmt+ZI0bSbTprgAiEAhv9MLm4cH5/Y/POic1VmquS1WXynP8ifPP4y5UYOdOM=
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIByDCCAW6gAwIBAgIRAIGYbQD5IZsV0lmMlMXo1B8wCgYIKoZIzj0EAwIwMDEu
MCwGA1UEAxMlQ2FkZHkgTG9jYWwgQXV0aG9yaXR5IC0gMjAyMSBFQ0MgUm9vdDAe
Fw0yMTEyMTAwMDA1NTlaFw0yMTEyMTcwMDA1NTlaMDMxMTAvBgNVBAMTKENhZGR5
IExvY2FsIEF1dGhvcml0eSAtIEVDQyBJbnRlcm1lZGlhdGUwWTATBgcqhkjOPQIB
BggqhkjOPQMBBwNCAAQH7ICO0FDnHJuemLDqFw2mC5VJh/kHmGc0khgoc9wJ3FpZ
mxTuWq45eWj2C5UaFYb0XZ9Gvm8ViSaG7Ypcxmyno2YwZDAOBgNVHQ8BAf8EBAMC
AQYwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNVHQ4EFgQU4b0qfngzWxaeyHlP1/O1
ICMqbPwwHwYDVR0jBBgwFoAUre9VgslnP8aqOZmSGci7xbdcAjQwCgYIKoZIzj0E
AwIDSAAwRQIgN6NqwBkqLJXxQfQVru2YJY0zepqMDBTAj+VTPsa/c2ICIQDw3XNX
8xVPzbNBFDI4mUoUgFw0qwqCKxbROMF3cQFpaw==
-----END CERTIFICATE-----

So apparently at 8:22 this morning, the backend caddy has retrieved a new certificate from the frontend caddy

but has not received the new intermediate certificate (which at time of the renewal had been expired already). I suppose during the ACME challenge one usually receives the full cert chain, which means that the CA on the frontend caddy must have delivered the chain with the outdated intermediate certificate.

On the frontend caddy (CA), the intermediate certificate has been renewed on 15 December already:

# on frontend caddy

/data/caddy # ls -lsa pki/authorities/local/
total 24
     4 drwx------    2 root     root          4096 Oct  3 18:12 .
     4 drwx------    3 root     root          4096 Oct  3 18:12 ..
     4 -rw-------    1 root     root           676 Dec 15 14:39 intermediate.crt
     4 -rw-------    1 root     root           227 Dec 15 14:39 intermediate.key
     4 -rw-------    1 root     root           627 Oct  3 18:12 root.crt
     4 -rw-------    1 root     root           227 Oct  3 18:12 root.key

Don’t know if there’s anything else that I could provide from my setup.

Strange. So it sounds like the acme_server directive still had the old intermediate cert in memory when performing renewal? There might be missing a thing to reload the certs used by the ACME server when the intermediate is renewed. I’ll try to take a look at the code soon to see if anything stands out.

2 Likes

Exactly. That’s the impression I got.

Thanks for having a look at the code. :heart:

Hm, that is suspicious. Worth a look. Not many people using that feature yet so this is a good opportunity. Thanks!

1 Like

Alright well as a workaround I recommend force reloading Caddy at least once a week (daily even better) for the time being so that it has a chance to refresh the intermediate cert in memory before the ACME server tries to issue certs. You can do this with docker-compose exec -w /etc/caddy caddy caddy reload --force I think. You can put this in a cron job or something.

At a glance, it looks like the fix would be a bit complicated because of how the actual CA renewal process and the ACME server are decoupled. I have some work in progress to implement an event system in Caddy, and the event system would make this much easier to resolve (i.e. CA intermediate cert renewal would trigger an event, the ACME server could subscribe to that event and update the intermediate cert at that point).

The other option is we could add a timer inside the ACME server to reload the intermediate cert from storage daily-ish which might be good enough.

2 Likes

Thanks for the tip with the cron job. That seems like a nice workaround for the time being.

1 Like

Thanks for looking into that and @ronau thanks for reporting this in detail.

I think I mentioned the same issue a while ago thinking I would get into it shortly after. Unfortunately a black hole appeared and it’s eating all my time!

My workaround is to kill the front-end Caddy (caddy stop does not work as it looks like it is waiting for all connections to close), stop the back-end caddy and delete the certificate folder. Then restarting both Caddy end and have new certificates issued.

Should I create a bug on Github for this so that we can keep track of it?

Sure @ronau, an issue would help remind us.

Issue has been created:

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.