Relates to: Process for handling retries for failed certificate generation
I wanted to provide more details after further testing but am blocked from editing the post so apologies for the duplicate.
1. Output of caddy version
:
v2.5.2
2. How I run Caddy:
brew
a. System environment:
mac
b. Command:
sudo caddy run
d. My complete Caddy config:
{
acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
admin :2020
}
alpha-four.davidbain.me {
reverse_proxy 127.0.0.1:8090
tls {
dns route53 {
aws_profile "default"
max_retries 10
}
dns_challenge_override_domain _acme-challenge.davidbainchallenges.me
}
log {
output file /var/log/caddy/access.log
}
}
3. The problem I’m having:
I’m onboarding clients where they provide a cname pointing to my dns zone so I can pass the dns challenge, so this looks something like:
_acme-challenge.alpha-four.davidbain.me > _acme-challenge.davidbainchallenges.me
Then using route53 plugin I create the challenge on davidbainchallenges.me
This works well, but I want to catch the scenario where the certificate fails to generate, e.g. the client incorrectly sets up the cname forwarding record.
In my app, I essentially want a button where the user can rectify their issue then click ‘Retry’ which would trigger caddy to retry getting the certificate. Currently the only way I can do this is if I stop the server, and then restart. This triggers a new attempt at the dns challenge and it successfully gets the certificate.
Two questions:
- Is there any way to replicate this with the api (i.e. force retrying getting the cert via dns challenge)?
- How long does caddy wait before doing a retry, is this something we can control? (only config I see for route53 is retry_attempts, but I’m not sure on the interval)
4. Error messages and/or full log output:
-- Challenge initially fails (on purpose to test scenario)
2022/08/11 08:26:11.863 ERROR tls.issuance.acme.acme_clientcleaning up solver {"identifier": "alpha-four.davidbain.me", "challenge_type": "dns-01", "error": "authorization failed: HTTP 403 urn:ietf:params:acme:error:unauthorized - No TXT record found at _acme-challenge.alpha-four.davidbain.me"}
-- After posting config reload, it shows the domain in the logs, but doesn't retry a dns challenge
2022/08/11 08:27:20.920 INFO http enabling automatic TLS certificate management {"domains": ["alpha-four.davidbain.me"]}
-- Trying caddy reload --config ... --force (again doesn't retry challenge)
2022/08/11 08:37:33.117 ERROR tls job failed {"error": "alpha-four.davidbain.me: obtaining certificate: unable to acquire lock 'issue_cert_alpha-four.davidbain.me': context canceled"}
2022/08/11 08:37:33.118 INFO tls.obtain acquiring lock{"identifier": "alpha-four.davidbain.me"}
-- caddy stop and caddy run again (it works, retries the challenge and issues the cert)
2022/08/11 08:45:36.842 INFO tls.issuance.acme.acme_clienttrying to solve challenge {"identifier": "alpha-four.davidbain.me", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
5. What I already tried:
- posting to the api /load and setting the cache-control to must-revalidate
- doing caddy reload --config … --force (in practice I won’t be running commands but testing this seems to have same effect as api load)
- reposting to /load without the domain included, then post again with the domain again, this shows a context cancel but still doesn’t retrigger the second time.
- stop and start caddy - this works but is a manual command line process, and would also create downtime