Force retry certificate fetch after dns challenge failed

davebain · August 11, 2022, 9:04am

Relates to: Process for handling retries for failed certificate generation
I wanted to provide more details after further testing but am blocked from editing the post so apologies for the duplicate.

1. Output of `caddy version`:

v2.5.2

2. How I run Caddy:

brew

a. System environment:

mac

b. Command:

sudo caddy run

d. My complete Caddy config:

{
	acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
	admin :2020
}

alpha-four.davidbain.me {
	reverse_proxy 127.0.0.1:8090

	tls {
		dns route53 {
			aws_profile "default"
			max_retries 10
		}
		dns_challenge_override_domain _acme-challenge.davidbainchallenges.me
	}

	log {
		output file /var/log/caddy/access.log
	}
}

3. The problem I’m having:

I’m onboarding clients where they provide a cname pointing to my dns zone so I can pass the dns challenge, so this looks something like:
_acme-challenge.alpha-four.davidbain.me > _acme-challenge.davidbainchallenges.me
Then using route53 plugin I create the challenge on davidbainchallenges.me

This works well, but I want to catch the scenario where the certificate fails to generate, e.g. the client incorrectly sets up the cname forwarding record.

In my app, I essentially want a button where the user can rectify their issue then click ‘Retry’ which would trigger caddy to retry getting the certificate. Currently the only way I can do this is if I stop the server, and then restart. This triggers a new attempt at the dns challenge and it successfully gets the certificate.

Two questions:

Is there any way to replicate this with the api (i.e. force retrying getting the cert via dns challenge)?
How long does caddy wait before doing a retry, is this something we can control? (only config I see for route53 is retry_attempts, but I’m not sure on the interval)

4. Error messages and/or full log output:

-- Challenge initially fails (on purpose to test scenario)
2022/08/11 08:26:11.863 ERROR   tls.issuance.acme.acme_clientcleaning up solver       {"identifier": "alpha-four.davidbain.me", "challenge_type": "dns-01", "error": "authorization failed: HTTP 403 urn:ietf:params:acme:error:unauthorized - No TXT record found at _acme-challenge.alpha-four.davidbain.me"}

-- After posting config reload, it shows the domain in the logs, but doesn't retry a dns challenge
2022/08/11 08:27:20.920 INFO    http    enabling automatic TLS certificate management {"domains": ["alpha-four.davidbain.me"]}

-- Trying caddy reload --config ... --force (again doesn't retry challenge)
2022/08/11 08:37:33.117 ERROR   tls     job failed      {"error": "alpha-four.davidbain.me: obtaining certificate: unable to acquire lock 'issue_cert_alpha-four.davidbain.me': context canceled"}
2022/08/11 08:37:33.118 INFO    tls.obtain      acquiring lock{"identifier": "alpha-four.davidbain.me"}

-- caddy stop and caddy run again (it works, retries the challenge and issues the cert)
2022/08/11 08:45:36.842 INFO    tls.issuance.acme.acme_clienttrying to solve challenge        {"identifier": "alpha-four.davidbain.me", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}

5. What I already tried:

posting to the api /load and setting the cache-control to must-revalidate
doing caddy reload --config … --force (in practice I won’t be running commands but testing this seems to have same effect as api load)
reposting to /load without the domain included, then post again with the domain again, this shows a context cancel but still doesn’t retrigger the second time.
stop and start caddy - this works but is a manual command line process, and would also create downtime

Whitestrake · August 12, 2022, 12:09am

Sorry to leave your main questions to others, but I wonder if you couldn’t simply start a second Caddy to initiate challenges? As long as the TLS storage is the same, two Caddy servers will solve in a cluster. Could start the second server up explicitly to start challenges, watch logs to check for success, and then add the configuration to the primary Caddy once you’ve got the certificate validated.

francislavoie · August 12, 2022, 1:29am

Caddy itself should never stop trying to issue the cert (only after 30 days of failing, and only kept in memory, and reloading the config probably resets that). As I mentioned in your other topic, see Automatic HTTPS — Caddy Documentation

I’m pretty sure the max_retries is just internal to the route53 plugin specifically. So if you’re having issues, I’m pretty sure it wouldn’t be resolved by kicking Caddy, and it’s probably something wrong with the plugin’s config or something like that.

system · September 10, 2022, 9:04am

This topic was automatically closed after 30 days. New replies are no longer allowed.