ACME auto-ssl suddenly stopped working

Caddyfile:

:80 {
  proxyprotocol x.x.x.x/16

  redir 301 {
    if {path} not /caddyping
    https://{host}{uri}
  }

  status 200 /caddyping
}

:443 {
  proxyprotocol x.x.x.x/16

  tls {
    ask http://ask-caddy-api/query
  }

  proxy / proxyservice:80 {
    transparent
    websocket
  }
}

env:

CADDY_CLUSTERING=redis
CADDY_CLUSTERING_REDIS_TLS=true
CADDY_CLUSTERING_REDIS_PORT=xxxx
CADDY_CLUSTERING_REDIS_HOST=xxxx
CADDY_CLUSTERING_REDIS_PASSWORD=xxxx

command:
caddy -email EMAIL -agree

Perfect, thanks! (sigh, you still redacted stuff, but in this case it fortunately has little to do with TLS – usually this gets in the way of our debugging though, please don’t redact in the future!)

CertMagic currently has a lock so that only 1 cert operation happens at a time (this will go away in the future, replaced instead by a rate limiter that allows bursts of requests but still has a hard limit over a sliding window), so that lock is acquired, and then…

The log lines [*] acme: Obtaining bundled SAN certificate are emitted directly by lego, here: lego/certificates.go at bc4b57accc090b9c61bde051c99fcb14e952f6e6 · go-acme/lego · GitHub

It simply seems that that function is never returning. It might be worth filing a bug report with lego because it should probably emit more logs here to help us understand what is going on.

Yeah It would help with some error logging.

I forgot to mention that I tried to restart the caddy instances as well (removing and creating new containers in docker, so completely fresh state), still the same.

I’ll file a bug report with lego.

Thank you!

Also, make sure you are testing using the staging endpoint: Staging Environment - Let's Encrypt – otherwise you’ll def. hit rate limits really fast if anything is wrong.

I’ll use the staging endpoint if I do any tests, thanks.

https://github.com/go-acme/lego/issues/1001

I’m getting logs now. Caddy doesn’t print all acme error logs without -logs stdout for me for some reason. I turned that on.

[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
http: TLS handshake error from 127.0.0.1:59836: EOF
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 1/3; challenge=tls-alpn-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 2/3; challenge=tls-alpn-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 3/3; challenge=tls-alpn-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 1/3; challenge=http-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 2/3; challenge=http-01)
[INFO] [mydomain.com] acme: Obtaining bundled SAN certificate
[ERROR][mydomain.com] failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url: (attempt 3/3; challenge=http-01)
http: TLS handshake error from 152.115.135.58:55802: failed to obtain certificate: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url:
http: TLS handshake error from 3.83.121.68:48048: certificate for hostname '' not allowed; non-2xx status code 403 returned from http://ask-caddy-api/query

So it looks like it conflicts with the ask tls function?
Some of the logs above may not be about the certificate request. could be mixed with other requests.

No, it looks like that is working. It appears that lego is sending a malformed JWS. Almost certainly not a bug in Caddy…

Okay, so I should make an issue on the lego repo then?

Yeah, if you could that would be great. I’ll follow it and help them in debugging it if necessary.

1 Like

Thank you!

https://github.com/go-acme/lego/issues/1006

1 Like

I saw idez answer. Is he referring to the users key?

If that’s the issue, can I just delete the key and have caddy automatically get a new one? Or should I do something else? Will it affect renewals of my currently fetched certificates?

A one-time account switch shouldn’t hurt anything, no. You should be able to delete the users folder (assuming there is just the one user – if not, delete the right user account folder inside that one) and Caddy will just register a new account. Don’t do this yet though, read the rest of my post. :slight_smile:

Ideally, though, you could just generate a new key for the account, but I’m not sure if lego supports that and I’m waiting for ldez to get back to us about that. If the cause is key corruption, being able to simply use a new key would be the best solution.

Before you delete anything, can you post the contents of the account’s .json file here? And can you also verify that the .key file is a valid EC PRIVATE KEY in PEM format?

If it is, we should re-open that issue on GitHub.

I only have one user.

I’ll see if I can get the json file and also verify the .key file.

Thanks!

1 Like

I’m not too comfortable revealing the email address used. I hope that’s okay.

json:

{
	"Email": "mail@mydomain.com",
	"Registration": {
		"body": {
			"status": "valid",
			"contact": [
				"mailto:mail@mydomain.com"
			]
		},
		"uri": "https://acme-v02.api.letsencrypt.org/acme/acct/70549132"
	}
}

Now the key I’m a bit unsure of how to validate. It has BEGIN EC PRIVATE KEY and END EC PRIVATE KEY with characters in the middle. seems right.

If I run openssl ec -in key.pem -text -noout it seems to read it fine.

I’m a bit out of depth here :slight_smile:

That’s good enough for me, everything appears to be in order. I’ll reopen the issue and see if we can drill down further.

1 Like

I switched the user key/json files between my test setup and prod setup. Now the test setup has the errors, and the prod setup works. So something is definitely wrong with the user files.

1 Like

Interesting. Are both using the production let’s Encrypt endpoint?

Yes. Both use production.

@mxrlkn As per JWS verification error · Issue #1006 · go-acme/lego · GitHub, do you want any tips for adding log lines and recompiling with those changes?

I don’t really have a good idea about packaging it all. I currently build caddy with a few plugins to a docker image and run that. I have no go knowledge.

But shouldn’t the JWS always be logged with errors like this though? Or do you rather want to do a custom build with logging for this specific incident?