Caddy won't renew certs (again)

Hi, about one month ago I opened an issue on GH regarding Caddy failing to renew my certs.

Then suddently the problem was gone before I managed to really understand the root cause. But, since few days ago the issue is back. Definitly Caddy is failing to auto renew the certs for some reasons, related to DNS failure.
Few days ago I received some email from expiry@letsencrypt.org

Your certificate (or certificates) for the names listed below will expire in x days (on x date). Please make sure to renew your certificate before then, or visitors to your web site will encounter errors.

In caddy logs I can see the related errors:

ERROR   tls.renew       could not get certificate from issuer   {"identifier": "mydomain.fr", "issuer": "acme-v02.api.letsencrypt.org-directory", "error": "[mydomain.fr] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-v02.api.letsencrypt.org/acme/order/1197323837/345101897055) (ca=https://acme-v02.api.letsencrypt.org/directory)"}
ERROR   tls.renew       will retry      {"error": "[mydomain.fr] Renew: [mydomain.fr] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-v02.api.letsencrypt.org/acme/order/1234/1234) (ca=https://acme-v02.api.letsencrypt.org/directory)", "attempt": 1, "retrying_in": 60, "elapsed": 123.673620015, "max_duration": 2592000}

### it even endup being rate limited ###

ERROR   tls.renew       will retry      {"error": "[mydomain.fr] Renew: [mydomain.fr] creating new order: attempt 1: https://acme-v02.api.letsencrypt.org/acme/new-order: HTTP 429 urn:ietf:params:acme:error:rateLimited - too many failed authorizations (5) for \"mydomain.fr\" in the last 1h0m0s, retry after 2025-01-16 23:23:23 UTC: see https://letsencrypt.org/docs/rate-limits/#authorization-failures-per-hostname-per-account (ca=https://acme-v02.api.letsencrypt.org/directory)", "attempt": 4, "retrying_in": 300, "elapsed": 316.281999645, "max_duration": 2592000}

As a reminder Im running caddy as a docker container on a Linux host. There is nothing special with the host DNS setup. Just using the classic raw /etc/resolv.conf with some nameserver. Same for docker DNS config. Basically it’s whatever is the default.
The only thing DNS related is that I expose and run my own dns server (blocky) via a docker container too (on the same host). Which maps at host level on port 53. But it is not the the dns server used by the host for dns query resolution.

My Caddyfile (simplified)

{
	email me@me.me

	dynamic_dns {
		provider cloudflare {env.CLOUDFLARE_API_TOKEN}
		domains {
			mydomain.fr
		}
		check_interval 10m
		versions ipv4
		dynamic_domains
	}

	crowdsec {
    ...
	}
	order crowdsec first
}

*.mydomain.fr, mydomain.fr {
	tls {
		dns cloudflare {env.CLOUDFLARE_API_TOKEN}
	}
  ...
}

Any idea?

caddy v2.9.1 h1:OEYiZ7DbCzAWVb6TNEkjRcSCRGHVoZsJinoDR/n9oaY=
using cloudflare and GitHub - caddy-dns/cloudflare: Caddy module: dns.providers.cloudflare module to manage DNS zones and records
also using GitHub - mholt/caddy-dynamicdns: Caddy app that keeps your DNS records (A/AAAA) pointed at itself.

Have you checked your certificates? You aren’t sharing the real domain, so we can’t check the certificate transparency log, but it’s very possible Caddy couldn’t renew the cert of Let’s Encrypt and got one from ZeroSSL.

The error is from Caddy saying it cannot see the DNS record propagated properly. In the DNS01 challenge, Caddy sets a TXT record with some value, and it tells the CA to check that record for verification. However, for robustness, Caddy doesn’t tell the CA to check until Caddy itself can see it by querying DNS for that specific TXT record. What’s happening is that Caddy doesn’t see the record, so it thinks DNS failed to propagate properly. It keeps checking for certain duration, then cancels it after a pre-defined (though configurable) timeout.

The default timeout for propagation_timeout is 2 minutes. You can increase that value. You can also configure propagation_delay to tell Caddy to wait before checking DNS for propagation, otherwise Caddy checks immediately.

This is more of a combination of DNS and network issue. Though there’s been an increase of these experiences. I wonder if we should increase the default timeout or change the default delay.

Hi @Mohammed90 and thx for the help.
So I increased the timeout duration to 10m (not sure what kind of value I’m supposed to use so letme know) and even set the DNS resolvers used by the challenge:

	tls {
		dns cloudflare {env.CLOUDFLARE_API_TOKEN}
		resolvers 1.1.1.1
		propagation_delay 10m
	}

then reloaded Caddy to take the changes into account.

But after some time, the issue seems not fixed as I still see the errors in the logs

2025/01/17 22:06:14     INFO    tls.renew       renewing certificate    {"identifier": "my.fr", "remaining": 918732.316684333}
2025/01/17 22:06:14     INFO    tls.issuance.acme       using ACME account      {"account_id": "https://acme-staging-v02.api.letsencrypt.org/acme/acct/1234", "account_contact": ["mailto:me@me.me"]}
2025/01/17 22:06:15     INFO    authorization finalized {"identifier": "my.fr", "authz_status": "valid"}
2025/01/17 22:06:15     INFO    validations succeeded; finalizing order {"order": "https://acme-staging-v02.api.letsencrypt.org/acme/order/12234/1234"}
2025/01/17 22:06:19     INFO    got renewal info        {"names": ["my.fr"], "window_start": "2025/03/17 21:27:15", "window_end": "2025/03/19 21:27:15", "selected_time": "2025/03/17 23:06:36", "recheck_after": "2025/01/18 04:06:19", "explanation_url": ""}
2025/01/17 22:06:19     INFO    got renewal info        {"names": ["my.fr"], "window_start": "2025/03/17 21:27:15", "window_end": "2025/03/19 21:27:15", "selected_time": "2025/03/18 22:10:39", "recheck_after": "2025/01/18 04:06:19", "explanation_url": ""}
2025/01/17 22:06:19     INFO    successfully downloaded available certificate chains    {"count": 2, "first_url": "https://acme-staging-v02.api.letsencrypt.org/acme/cert/1234"}
2025/01/17 22:06:19     INFO    tls.issuance.acme       waiting on internal rate limiter        {"identifiers": ["my.fr"], "ca": "https://acme-v02.api.letsencrypt.org/directory", "account": "me@me.me"}
2025/01/17 22:06:19     INFO    tls.issuance.acme       done waiting on internal rate limiter   {"identifiers": ["my.fr"], "ca": "https://acme-v02.api.letsencrypt.org/directory", "account": "me@me.me"}
2025/01/17 22:06:19     INFO    tls.issuance.acme       using ACME account      {"account_id": "https://acme-v02.api.letsencrypt.org/acme/acct/1234", "account_contact": ["mailto:me@me.me"]}
2025/01/17 22:06:20     INFO    trying to solve challenge       {"identifier": "my.fr", "challenge_type": "dns-01", "ca": "https://acme-v02.api.letsencrypt.org/directory"}
2025/01/17 22:08:07     INFO    dynamic_dns     Loaded dynamic domains  {"domains": ["*.my.fr", "my.fr"]}
2025/01/17 22:08:07     INFO    dynamic_dns     Adding dynamic domain   {"domain": "*"}
2025/01/17 22:08:07     INFO    dynamic_dns     Adding dynamic domain   {"domain": "@"}
2025/01/17 22:08:22     ERROR   tls.renew       could not get certificate from issuer   {"identifier": "my.fr", "issuer": "acme-v02.api.letsencrypt.org-directory", "error": "[my.fr] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-v02.api.letsencrypt.org/acme/order/1234/1234) (ca=https://acme-v02.api.letsencrypt.org/directory)"}
2025/01/17 22:08:22     INFO    tls.renew       releasing lock  {"identifier": "my.fr"}
2025/01/17 22:08:22     ERROR   tls     job failed      {"error": "[my.fr] [my.fr] Renew: [my.fr] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-v02.api.letsencrypt.org/acme/order/1234/1234) (ca=https://acme-v02.api.letsencrypt.org/directory)"}

Ah finally it seems that it worked (after a few more unsuccessful attempts)

INFO    tls.renew       certificate renewed successfully