Unable to start caddy with prod let's encrypt acme_ca -- `caching certificate: decoding certificate metadata: unexpected end of JSON input`

1. The problem I’m having:

My server ran out of space, I have cleared up space, now GBs free. I have currently valid certificates.

I can’t get caddy to start cleanly.

If I switch acme_ca to the Let’s Encrypt staging servers, caddy starts cleanly, and serves traffic. I still can’t use the site because we have the domain enrolled in HSTS.

When I set acme_ca in the global options to Let’s Encrypt staging endpoint, it stays up and serves traffic, though my users won’t see it because the domain is enrolled in HSTS.

acme_ca https://acme-staging-v02.api.letsencrypt.org/directory

When I remove acme_ca directive and let it use the default Let’s Encrypt prod acme endpoint, the server fails to start with the error that follows.

2. Error messages and/or full log output:

Mar 04 20:05:48 h1.org.tld caddy[1119233]: Error: loading initial config: loading new config: http app module: start: finalizing automatic HTTPS: managing certificates for [www.org.tld org.tld lists.org.tld org.tld2 www.org.tld2]: automate: manage [www.org.tld org.tld lists.org.tld org.tld2 www.org.tld2]: www.org.tld: caching certificate: decoding certificate metadata: unexpected end of JSON input

3. Caddy version:

v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

4. How I installed and ran Caddy:

caddy_ansible

Run as systemd unit.

a. System environment:

Description:    Debian GNU/Linux 11 (bullseye)

b. Command:

c. Service/unit/compose file:

ExecStart="/usr/local/bin/caddy" run --environ --config "/etc/caddy/Caddyfile"
ExecReload="/usr/local/bin/caddy" reload --config "/etc/caddy/Caddyfile"

d. My complete Caddy config:

5. Links to relevant resources:

  1. Similar issue, same error messages
  • I have confirmed all paths are www-data r/w/x as appropriate, and
  • all of the subdirectories of /etc/ssl/caddy for each domain, contain .key, .crt, and complete valid json files. (syntactically valid & matching in keys).
  • I’ve confirmed the certs are still valid with openssl x509 -in ... -noout -text inspection.
  1. similar. Based on this I moved an re-created the certificates directory ./acme in /etc/ssl/caddy/ ensuring users data was still present but removing the certs. Same errors.

Separately, seeing these errors on the staging endpoint. I don’t these errors indicate rate limiting from the staging endpoint, but it’s possible.

I’ve only replaced the tld/domain/subdomains with sed.

Mar 04 21:03:09 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122189.1064963,"logger":"tls","msg":"failed updating renewal info from ACME CA","identifiers":["org.tld2"],"cert_hash":"8f0ca6aba21f95c0fe510f97957e65ec75094c01c4d1526e2be74181bbf33c70","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LDXXYo2kcTyiKn3nijwPmXmm","cert_expiry":1748835147,"issuer":"acme-staging-v02.api.letsencrypt.tld2-directory","error":"provisioning client: HTTP 0 urn:ietf:params:acme:error:serverInternal - The service is down for maintenance or had an internal error. Check https://letsencrypt.status.io/ for more details."}
Mar 04 21:03:09 h1.org.tld caddy[1124301]: {"level":"warn","ts":1741122189.106569,"logger":"tls","msg":"ARI window or selected renewal time changed","identifiers":["org.tld2"],"cert_hash":"8f0ca6aba21f95c0fe510f97957e65ec75094c01c4d1526e2be74181bbf33c70","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LDXXYo2kcTyiKn3nijwPmXmm","cert_expiry":1748835147,"prev_start":1746157917,"next_start":-6795364578.8713455,"prev_end":1746330717,"next_end":-6795364578.8713455,"prev_selected_time":1746267104,"next_selected_time":-6795364578.8713455,"explanation_url":""}
Mar 04 21:03:09 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122189.1065896,"logger":"tls","msg":"updating ARI upon managing","error":"could not fully update ACME renewal info: either no issuer supporting ARI is configured for certificate, or all such failed (make sure the ACME CA that issued the certificate is configured)"}
Mar 04 21:03:09 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122189.6950767,"logger":"tls","msg":"failed updating renewal info from ACME CA","identifiers":["www.org.tld2"],"cert_hash":"44c01b949539d41c0f06e4380ea222489eabef42a18b4a23530ec85382c3e33b","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LLDDvykgcKNW9kV_h21OsNvY","cert_expiry":1748835147,"issuer":"acme-staging-v02.api.letsencrypt.tld2-directory","error":"provisioning client: HTTP 0 urn:ietf:params:acme:error:serverInternal - The service is down for maintenance or had an internal error. Check https://letsencrypt.status.io/ for more details."}
Mar 04 21:03:09 h1.org.tld caddy[1124301]: {"level":"warn","ts":1741122189.698189,"logger":"tls","msg":"ARI window or selected renewal time changed","identifiers":["www.org.tld2"],"cert_hash":"44c01b949539d41c0f06e4380ea222489eabef42a18b4a23530ec85382c3e33b","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LLDDvykgcKNW9kV_h21OsNvY","cert_expiry":1748835147,"prev_start":1746157917,"next_start":-6795364578.8713455,"prev_end":1746330717,"next_end":-6795364578.8713455,"prev_selected_time":1746268173,"next_selected_time":-6795364578.8713455,"explanation_url":""}
Mar 04 21:03:09 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122189.6982853,"logger":"tls","msg":"updating ARI upon managing","error":"could not fully update ACME renewal info: either no issuer supporting ARI is configured for certificate, or all such failed (make sure the ACME CA that issued the certificate is configured)"}
Mar 04 21:03:10 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122190.2832773,"logger":"tls","msg":"failed updating renewal info from ACME CA","identifiers":["www.org.tld"],"cert_hash":"90dd25632472f62a21f37eed1b4fa3bb176af3311f6cba9e390776391659e581","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LM4FioBb6zvWYz5xh2TjW-HI","cert_expiry":1748835146,"issuer":"acme-staging-v02.api.letsencrypt.tld2-directory","error":"provisioning client: HTTP 0 urn:ietf:params:acme:error:serverInternal - The service is down for maintenance or had an internal error. Check https://letsencrypt.status.io/ for more details."}
Mar 04 21:03:10 h1.org.tld caddy[1124301]: {"level":"warn","ts":1741122190.2833793,"logger":"tls","msg":"ARI window or selected renewal time changed","identifiers":["www.org.tld"],"cert_hash":"90dd25632472f62a21f37eed1b4fa3bb176af3311f6cba9e390776391659e581","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LM4FioBb6zvWYz5xh2TjW-HI","cert_expiry":1748835146,"prev_start":1746157916,"next_start":-6795364578.8713455,"prev_end":1746330716,"next_end":-6795364578.8713455,"prev_selected_time":1746185606,"next_selected_time":-6795364578.8713455,"explanation_url":""}
Mar 04 21:03:10 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122190.2834167,"logger":"tls","msg":"updating ARI upon managing","error":"could not fully update ACME renewal info: either no issuer supporting ARI is configured for certificate, or all such failed (make sure the ACME CA that issued the certificate is configured)"}
Mar 04 21:03:10 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122190.8802137,"logger":"tls","msg":"failed updating renewal info from ACME CA","identifiers":["org.tld"],"cert_hash":"46e645fb5e032d4c776559960c25e5c59bdfd48ccc75acca2e176d1b2e4a0cd0","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LFX0hdkk__YoNKaLRkWQcfFH","cert_expiry":1748835208,"issuer":"acme-staging-v02.api.letsencrypt.tld2-directory","error":"provisioning client: HTTP 0 urn:ietf:params:acme:error:serverInternal - The service is down for maintenance or had an internal error. Check https://letsencrypt.status.io/ for more details."}
Mar 04 21:03:10 h1.org.tld caddy[1124301]: {"level":"warn","ts":1741122190.8824391,"logger":"tls","msg":"ARI window or selected renewal time changed","identifiers":["org.tld"],"cert_hash":"46e645fb5e032d4c776559960c25e5c59bdfd48ccc75acca2e176d1b2e4a0cd0","ari_unique_id":"_EbRAUNfu3umPTBorhG64LxtydM.LFX0hdkk__YoNKaLRkWQcfFH","cert_expiry":1748835208,"prev_start":1746157978,"next_start":-6795364578.8713455,"prev_end":1746330778,"next_end":-6795364578.8713455,"prev_selected_time":1746263942,"next_selected_time":-6795364578.8713455,"explanation_url":""}
Mar 04 21:03:10 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122190.8828819,"logger":"tls","msg":"updating ARI upon managing","error":"could not fully update ACME renewal info: either no issuer supporting ARI is configured for certificate, or all such failed (make sure the ACME CA that issued the certificate is configured)"}
Mar 04 21:03:11 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122191.4679193,"logger":"tls","msg":"failed updating renewal info from ACME CA","identifiers":["lists.org.tld"],"cert_hash":"c72a017bf7c474baf7949c9977a4bee13538b652bf708f9d6e49e7dccdeadbd7","ari_unique_id":"oXQaBm1Qt4YtSizBfrSNiElszRY.LOIU7-o0Vd20yu_IB1CbyfDT","cert_expiry":1748835147,"issuer":"acme-staging-v02.api.letsencrypt.tld2-directory","error":"provisioning client: HTTP 0 urn:ietf:params:acme:error:serverInternal - The service is down for maintenance or had an internal error. Check https://letsencrypt.status.io/ for more details."}
Mar 04 21:03:11 h1.org.tld caddy[1124301]: {"level":"warn","ts":1741122191.468003,"logger":"tls","msg":"ARI window or selected renewal time changed","identifiers":["lists.org.tld"],"cert_hash":"c72a017bf7c474baf7949c9977a4bee13538b652bf708f9d6e49e7dccdeadbd7","ari_unique_id":"oXQaBm1Qt4YtSizBfrSNiElszRY.LOIU7-o0Vd20yu_IB1CbyfDT","cert_expiry":1748835147,"prev_start":1746157917,"next_start":-6795364578.8713455,"prev_end":1746330717,"next_end":-6795364578.8713455,"prev_selected_time":1746317724,"next_selected_time":-6795364578.8713455,"explanation_url":""}
Mar 04 21:03:11 h1.org.tld caddy[1124301]: {"level":"error","ts":1741122191.4680257,"logger":"tls","msg":"updating ARI upon managing","error":"could not fully update ACME renewal info: either no issuer supporting ARI is configured for certificate, or all such failed (make sure the ACME CA that issued the certificate is configured)"}

Resolution

Despite the CADDYPATH (=/etc/ssl/caddy) configured in the caddy.service environment directives, it appear that the caddy path in use is in www-data’s configured home directory, at /home/caddy/.local/share/caddy/....

Resolved by removing the certificates in www-data//home/caddy/.local/share/caddy and allowing caddy to re-request them.

Report from debugging

I had trouble strace-ing this. I was running it under systemd.

I determined the core issue by running sudo caddy run ... (as root, instead of www-data user+group) and seeing the default home as /root/.local/share/caddy/....

Requests

I would request that

  • caddy report the path that it fails to read. It was needlessly confusing to debug this because the file path and configured caddypath in use were not reported on error, nor on startup.
  • It would have been helpful to have --debug flag to add, e.g. caddy run --debug ...

Extra debug context

root@h1:~# grep www-data /etc/passwd
www-data:x:33:33:www-data:/home/caddy:/usr/sbin/nologin

caddy.service:

[Unit]
Description=Caddy HTTP/2 web server
Documentation=https://caddyserver.com/docs
After=network-online.target
Wants=network-online.target systemd-networkd-wait-online.service
StartLimitIntervalSec=86400
StartLimitBurst=2

[Service]
Restart=on-failure

User=www-data
Group=www-data

Environment=CADDYPATH=/etc/ssl/caddy

ExecStart="/usr/local/bin/caddy" run --environ --config "/etc/caddy/Caddyfile"
ExecReload="/usr/local/bin/caddy" reload --config "/etc/caddy/Caddyfile"


LimitNOFILE=1048576

PrivateTmp=true
PrivateDevices=true
ProtectHome=false
ProtectSystem=full
ReadWriteDirectories=/etc/ssl/caddy /var/log/caddy

CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

The linked topic reports using Caddy v1.0.4, while now we’re at v2.9.1. As mentioned on GitHub, the CADDYPATH hasn’t been a thing since v2, which was in 2020. Where did you get it from? The only mention of CADDYPATH in the docs is in the upgrade guide.

The fix for the reported error is to delete the corrupted certificate file. We discovered an edge case that could cause corrupted files in the file-system storage, which was improved in 2.9.0 to not happen again and be atomic.

1 Like

just delete every certificates under dir /var/lib/caddy/.local/share/caddy/certificates/acme-v02.api.letsencrypt.org-directory/<yoursite> and re-issuse the certificates will do the job.