SSL Errors csr_cn_is_invalid and "error while checking if stored certificate is also expiring soon"

1. The problem I’m having:

We run Caddy in a cluster with shared S3 storage. We’ve been doing this without any issues or interruptions for more than a year. Certificates are no longer renewing using the ZeroSSL API.

Similar to ZeroSSL certificate query API error after upgrade to Caddy v2.8.4 - #10 by Antonin but they ran into a 64-character limit on the ZeroSSL API side. Not the issue we’re having.

Also similar to Zerossl issuance error after upgrade to Caddy v2.8.0 - #11 by whizzygeeks but I haven’t seen much movement there.

2. Error messages and/or full log output:

{
    "dt": "2024-09-04T03:07:30.600294855+00:00",
    "log": {
        "attempt": 2,
        "elapsed": 64.623345006,
        "error": "[www.ginlab.io] Renew: creating certificate: POST https://api.zerossl.com/certificates?access_key=redacted: HTTP 200: API error 2836: csr_cn_is_invalid (details=map[]) (raw={\"success\":false,\"error\":{\"code\":2836,\"type\":\"csr_cn_is_invalid\"}} decode_error=json: unknown field \"success\")",
        "level": "error",
        "logger": "tls.renew",
        "max_duration": 2592000,
        "msg": "will retry",
        "retrying_in": 120,
        "ts": 1725419249.8697977
    }
}

3. Caddy version:

v2.8.4

4. How I installed and ran Caddy:

Using the package from ports and then replacing with our own binary. FreshPorts -- www/caddy: Fast, cross-platform HTTP/2 web server with automatic HTTPS

Xcaddy generated with:

a. System environment:

FreeBSD 14.0 and 14.1

b. Command:

See the package freebsd-ports/www/caddy/files/caddy.in at main · freebsd/freebsd-ports · GitHub

/usr/bin/su -m www -c /usr/bin/caddy start --config /usr/local/etc/caddy/Caddyfile --pidfile /var/run/caddy/caddy.pid >> /var/caddy/caddy.log

d. My complete Caddy config:

This is not my complete Caddyfile/caddy config as it spans dozens of nested files. Most importantly, it has not changed since the last successful renewal.

{
        order coraza_waf first 
        order cache before rewrite
        storage s3 {
                host "redacted"
                bucket "certs"
                access_id "redacted"
                secret_key "redacted"
                prefix "ssl"
                insecure false #disables SSL if true
        }
        email noc@skip2.net
        cert_issuer zerossl redacted
        log default {
                format json
                level info
		output file /var/log/caddy/caddy.log
        }
        cache {
                cache_name Souin
                log_level info
                key {
                        hide
                }
                redis {
                        url redacted
                }
                allowed_http_verbs GET POST PATCH
                ttl 10s
        }
        servers {
		trusted_proxies static redacted
		client_ip_headers X-Forwarded-For X-Real-IP
                metrics
	}
}
https://www.skip2.net {
    encode zstd br gzip
    log
    import default
    import cto
    import xfo SAMEORIGIN
    import ref same-origin
    import hsts "max-age=90; includeSubDomains"
    redir /whoami https://{system.hostname}.pop.skip2.net/whoami 301
    redir /dashboard https://{system.hostname}.pop.skip2.net/dashboard 301
    reverse_proxy /blog* cname.vercel-dns.com {
        header_up Host skip2.net
        import intercept-errors
    }
    reverse_proxy * skip2.netlify.app {
        header_up Host skip2.netlify.app
        import intercept-errors
    }
    import rm-thirdpty-headers
}

5. Links to relevant resources:

n/a

My certificates expired 2 hours ago, no idea when they stopped renewing.

I was able to mostly work around this by doing one or both of the following:

  • deleting the /var/db/caddy/data (equivalent of /var/lib/caddy on Linux) folder on every node in the cluster and running ‘caddy restart’ (‘caddy reload’ did not work).
  • creating a new empty shared S3 storage destination and using that instead

After these two steps, certs started rolling in right away for most domains. There are still some in error and I’m also seeing a new error in the logs:

{
    "dt": "2024-09-04T04:13:51.599758603+00:00",
    "log": {
        "error": "file does not exist",
        "identifiers": [
            "kord5001.pop.skip2.net"
        ],
        "level": "warn",
        "logger": "tls.cache.maintenance",
        "msg": "error while checking if stored certificate is also expiring soon",
        "ts": 1725423230.7948484
    }
}

Hmm, odd. I’ll look into it. I might have more questions soon.

3 Likes

Anything you need at all :+1:
I can provide a backup from the /var/db/caddy folder and I still have the original shared certificate storage preserved.

Still not able to get that kord5001.pop.skip2.net certificate, same error.

Thank you :saluting_face:

1 Like

That error is probably expected – is it still renewing those certs at least?

Curious about that one domain that isn’t working…

I’ll commit a debug log to CertMagic that should emit the contents of the CSR so we can see why the CN is invalid. You’ll have to build with the latest commit of CertMagic (let me know if you would like a sample command) and enable debug logs. They can be quite noisy so you’ll want to enable them for as long as you need then turn them off.

3 Likes

Thank you for your help with this, Matt.

That error is probably expected – is it still renewing those certs at least?

Which error? I assume "error": "file does not exist". Either way, both errors resulted in the certificate not being renewed. They would just get requeued for renewal.

Yesterday we ran really routine OS patches & package updates across the cluster and rebooted each node. The last 2 expired certificates that were hanging with the "error": "file does not exist" started renewing after this reboot.

At this point all certificates have been renewed but I’m gonna have nightmares about seeing csr_cn_is_invalid in the logs again. Of course, ZeroSSL isn’t able to troubleshoot without seeing the CSR or more logs from us.

I’m not sure how to replicate the issue to troubleshoot further since we’re not getting either error anymore. I set up alerts on the error so if it happens again I’ll know before the certs expire. Lemme know if I can do something else to help.

1 Like

Sounds good. Thanks for your patience!

If it happens again (after the next release), I’ll have a better sense of things since it will be in debug logs.

1 Like