Systematic failures on ultimately successful LE certificate retrieval

Note: this is a question not about something which does not work, but rather about something that does - but I do not know why.

1. Caddy version (caddy version):

v2.1.1 h1:X9k1+ehZPYYrSqBvf/ocUgdLSRIuiNiMo7CvyGUQKeA=

2. How I run Caddy:

Docker container with official image. Relevant Caddyfile:

{
	admin 0.0.0.0:2015
	email m@mymail
	# acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
}

(lan) {
	log {
	    level ERROR
		format single_field common_log
	} 
	@internet {
		not remote_ip 192.168.10.0/24 192.168.20.0/24 172.19.0.0/16
	}
	@local {
		remote_ip 192.168.10.0/24 192.168.20.0/24 172.19.0.0/16
	}
	reverse_proxy @local {args.0}
	respond @internet 200
}


https://nifi.my.domain {
	import lan nifi:8080
}

3. The problem I’m having:

When requesting a new certificate (after adding a site and restarting caddy) I always get the same scenario, as shown in the log below

2020/08/28 12:32:09 [INFO][nifi.my.domain] Obtain certificate; acquiring lock...
2020/08/28 12:32:09 [INFO][nifi.my.domain] Obtain: Lock acquired; proceeding...
2020/08/28 12:32:09 [INFO] [nifi.my.domain] acme: Obtaining bundled SAN certificate given a CSR
2020/08/28 12:32:09 [INFO][nifi.my.domain] Waiting on rate limiter...
2020/08/28 12:32:09 [INFO][nifi.my.domain] Done waiting
2020/08/28 12:32:11 [INFO] [nifi.my.domain] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/6827782498
2020/08/28 12:32:11 [INFO] [nifi.my.domain] acme: Could not find solver for: tls-alpn-01
2020/08/28 12:32:11 [INFO] [nifi.my.domain] acme: use http-01 solver
2020/08/28 12:32:11 [INFO] [nifi.my.domain] acme: Trying to solve HTTP-01
2020/08/28 12:32:26 [ERROR] error: one or more domains had a problem:
[nifi.my.domain] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://nifi.my.domain/.well-known/acme-challenge/N...redacted...r0: Timeout during connect (likely firewall problem), url:
 (challenge=http-01 remaining=[tls-alpn-01])
2020/08/28 12:32:29 [INFO][nifi.my.domain] Served key authentication certificate (TLS-ALPN challenge)
2020/08/28 12:32:30 [INFO][nifi.my.domain] Served key authentication certificate (TLS-ALPN challenge)
2020/08/28 12:32:30 [INFO][nifi.my.domain] Served key authentication certificate (TLS-ALPN challenge)
2020/08/28 12:32:30 [INFO][nifi.my.domain] Served key authentication certificate (TLS-ALPN challenge)
2020/08/28 12:32:33 [INFO] [nifi.my.domain] The server validated our request
2020/08/28 12:32:33 [INFO] [nifi.my.domain] acme: Validations succeeded; requesting certificates
2020/08/28 12:32:35 [INFO] [nifi.my.domain] Server responded with a certificate.
2020/08/28 12:32:35 [INFO][nifi.my.domain] Certificate obtained successfully
2020/08/28 12:32:35 [INFO][nifi.my.domain] Obtain: Releasing lock

This is always the same sequence, with the same error and with, systematically, a successful retrieval afterwards.

I have a hard time understanding the sequence when choosing the solver, comparing it with the documentation at https://caddyserver.com/docs/automatic-https:

  • first Could not find solver for: tls-alpn-01 ← I do not understand this point, it looks like there is nothing to configure (short of making port 443 available to LE)
  • then use http-01 solver → this one fails, and this is OK, port 80 is not exposed (I do not know, though, how to disable that challenge so that it does not show up in the logs)
  • and then a sequence of successful, TLS-ALPN based messages which end up with a LE cert.

My questions:

  • what does Could not find solver for: tls-alpn-01 mean?
  • how to disable the HTTP challenge?

Yeah this one’s kinda tricky to explain. But to preface, it’s completely harmless and expected, on Caddy v2.1.1.

So, since forever ago, Caddy has used https://github.com/go-acme/lego as the library providing ACME support. This used to work great and it was a pretty symbiotic relationship, but eventually that project was more-or-less taken over by the Traefik development team, so it didn’t stay as flexible or configurable to Caddy’s needs, as Caddy also changed.

One thing Caddy does to be more robust is to randomly pick one of the available ACME challenges (i.e. HTTP or ALPN) to avoid depending on one over the other. This is pretty important to make Caddy work reliably and not hammer the LE servers if one challenge type is down (which has happened in the past).

The lego lib unfortunately did not offer a good way to randomize the order in which they’re tried, so Caddy needed to implement a hack to do its own picking, by picking one of them, then disabling the other challenge method so it wouldn’t get picked by lego. The default order in lego I think was to have ALPN first then HTTP second, so if Caddy decided to use HTTP, then lego would first look for the “solver” for ALPN first, but Caddy disabled it so it “can’t find it”. Then it would try HTTP afterwards.

In Caddy v2.2+, we now use our own ACME lib that @matt wrote himself, with many parts borrowed from lego of course, which is meant to be much more flexible to how Caddy actually wants to use it. It’s here: GitHub - mholt/acmez: Premier ACME client library for Go. With this lib, that “could not find solver” message will never come up again.

Unfortunately, I don’t think it’s currently possible to do that from the Caddyfile right now, but it is possible in the JSON:

This might come as a feature in a later version, but no promises. Actually @matt just told me that this will be configurable in the Caddyfile in v2.2! :tada:

2 Likes

Great news, thanks!

This topic was automatically closed after 30 days. New replies are no longer allowed.