Caddy removes expired cert resource and breaks auto renewal

1. Caddy version (caddy version):

v2.5.0 h1:eRHzZ4l3X6Ag3kUt8nj5IxATprhqKq/wToP7OHlXWA0=

2. How I run Caddy:

caddy start with a caddyfile at root of the project. And I access via https://localhost/

a. System environment:

Windows 10 21H2, 19044.1645

b. Command:

caddy start

c. Service/unit/compose file:

Paste full file contents here.
Make sure backticks stay on their own lines,
and the post looks nice in the preview pane.

d. My complete Caddyfile or JSON config:

{
        admin localhost:3019
}

localhost {
        reverse_proxy 127.0.0.1:58806
}

3. The problem I’m having:

I give caddy a try last month. At the time things are working fine. And I think Caddy generated a cert that expires on 4/29. And today 5/9, I try to run it again, it looks caddy removed the localhost cert as it’s already expired, and then it tries to renewal it, and report error, that the key file can’t be found.

4. Error messages and/or full log output:

I used Caddy a few weeks back, and at the time the certs works properly. And on the most recent usage, I see the below logs and error.

What the recommended steps to restore this setup in working order?

I had to remove a few lines from the log before to be allowed to post. for full log, please refer to original github post, mainly the windows path of related resources. (Or: Gist )

2022/05/09 12:10:04.504 INFO    admin   admin endpoint started  {"address": "tcp/localhost:3019", "enforce_origin": false, "origins": ["//localhost:3019", "//[::1]:3019", "//127.0.0.1:3019"]}
2022/05/09 12:10:04.505 INFO    tls.cache.maintenance   started background certificate maintenance      {"cache": "0xc000024cb0"}
2022/05/09 12:10:04.506 INFO    http    server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS {"server_name": "srv0", "https_port": 443}
2022/05/09 12:10:04.506 INFO    http    enabling automatic HTTP->HTTPS redirects        {"server_name": "srv0"}
2022/05/09 12:10:04.562 INFO    pki.ca.local    root certificate is already trusted by system   {"path": "storage:pki/authorities/local/root.crt"}
2022/05/09 12:10:04.562 INFO    pki     intermediate expires soon; renewing     {"ca": "local", "time_remaining": -888682.5628871}
2022/05/09 12:10:04.568 INFO    pki     renewed intermediate    {"ca": "local", "new_expiration": "2022/05/16 12:10:04.000"}
2022/05/09 12:10:04.570 INFO    http    enabling automatic TLS certificate management   {"domains": ["localhost"]}
2022/05/09 12:10:04.595 WARN    tls     stapling OCSP   {"error": "no OCSP stapling for [localhost]: no OCSP server specified in certificate", "identifiers": ["localhost"]}
2022/05/09 12:10:04.598 INFO    admin.api       load complete
2022/05/09 08:10:04 [INFO] Certificate certificates/local/localhost/localhost.crt expired 370h41m40.6010362s ago; cleaning up
2022/05/09 08:10:04 [INFO] Deleting certificates/local/localhost/localhost.crt because resource expired
2022/05/09 08:10:04 [INFO] Deleting certificates/local/localhost/localhost.key because resource expired
2022/05/09 08:10:04 [INFO] Deleting certificates/local/localhost/localhost.json because resource expired
2022/05/09 08:10:04 [INFO] Deleting certificates/local/localhost because key is empty

Here the localhost key resources are being removed.

2022/05/09 12:10:04.603 INFO    tls     finished cleaning storage units
2022/05/09 12:10:04.605 INFO    admin   stopped previous server {"address": "tcp/localhost:2019"}
2022/05/09 12:10:04.608 INFO    tls.renew       acquiring lock  {"identifier": "localhost"}
2022/05/09 12:10:04.609 INFO    tls.renew       lock acquired   {"identifier": "localhost"}

Here is the error when it’s used in renewal.

5. What I already tried:

I tried to do a few caddy untrust, and caddy trust but it’s not getting anywhere.

6. Links to relevant resources:

Interesting; that seems like a really rare edge case. Does it happen repeatedly / consistently? Or did it just happen the one time?

I used it only twice. 1st time it created the cert. And 2nd time after the cert is already expired when I saw the above issue. I’m not sure about if this is a rare case.

Any suggestion for me to do next? What to test or look for?

Is it expected to run caddy at least once more before the cert expires?

I think it was just a rare race condition, never seen it before. Basically storage cleanup happened at approximately the same time as trying to renew the resource that was getting cleaned up. Usually, very very expired resources are no longer used, so it’s safe to just clean them up. It’s very unlikely that an extremely-expired resource is being renewed at the exact same time as it’s being cleaned up, so I don’t think I bothered to implement locking there.

Definitely let me know if you see this again!