Perfect, thanks! (sigh, you still redacted stuff, but in this case it fortunately has little to do with TLS – usually this gets in the way of our debugging though, please don’t redact in the future!)
CertMagic currently has a lock so that only 1 cert operation happens at a time (this will go away in the future, replaced instead by a rate limiter that allows bursts of requests but still has a hard limit over a sliding window), so that lock is acquired, and then…
It simply seems that that function is never returning. It might be worth filing a bug report with lego because it should probably emit more logs here to help us understand what is going on.
I forgot to mention that I tried to restart the caddy instances as well (removing and creating new containers in docker, so completely fresh state), still the same.
Also, make sure you are testing using the staging endpoint: Staging Environment - Let's Encrypt – otherwise you’ll def. hit rate limits really fast if anything is wrong.
I’m getting logs now. Caddy doesn’t print all acme error logs without -logs stdout for me for some reason. I turned that on.
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
http: TLS handshake error from 127.0.0.1:59836: EOF
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 1/3; challenge=tls-alpn-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 2/3; challenge=tls-alpn-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 3/3; challenge=tls-alpn-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 1/3; challenge=http-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 2/3; challenge=http-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 3/3; challenge=http-01)
http: TLS handshake error from 152.115.135.58:55802: failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url:
http: TLS handshake error from 3.83.121.68:48048: certificate for hostname '' not allowed; non-2xx status code 403 returned from http://ask-caddy-api/query
So it looks like it conflicts with the ask tls function?
Some of the logs above may not be about the certificate request. could be mixed with other requests.
I saw idez answer. Is he referring to the users key?
If that’s the issue, can I just delete the key and have caddy automatically get a new one? Or should I do something else? Will it affect renewals of my currently fetched certificates?
A one-time account switch shouldn’t hurt anything, no. You should be able to delete the users folder (assuming there is just the one user – if not, delete the right user account folder inside that one) and Caddy will just register a new account. Don’t do this yet though, read the rest of my post.
Ideally, though, you could just generate a new key for the account, but I’m not sure if lego supports that and I’m waiting for ldez to get back to us about that. If the cause is key corruption, being able to simply use a new key would be the best solution.
Before you delete anything, can you post the contents of the account’s .json file here? And can you also verify that the .key file is a valid EC PRIVATE KEY in PEM format?
I switched the user key/json files between my test setup and prod setup. Now the test setup has the errors, and the prod setup works. So something is definitely wrong with the user files.
I don’t really have a good idea about packaging it all. I currently build caddy with a few plugins to a docker image and run that. I have no go knowledge.
But shouldn’t the JWS always be logged with errors like this though? Or do you rather want to do a custom build with logging for this specific incident?