Verify DNS provider configuration is correct - dns-01 challenge fails - cloudflare

Yeah, if you don’t provide resolvers, Caddy will check for /etc/resolv.conf. If it’s absent or has nothing, it falls back to 1.1.1.1, 1.0.0.1, 8.8.8.8, and 8.8.4.4.

Nope, I don’t believe it’s a permission thing. It’s actually why I asked about SELinux. It doesn’t need any other privileges.

I’ve soft-forked certmagic to place additional logging around DNS propagation check and early exit. You can build a custom caddy to use it with this command:

xcaddy build ff137d1 --with github.com/caddyserver/certmagic@master=github.com/mohammed90/certmagic@master --output caddy-custom-certmagic

You can see the changes I made here exit checkDNSPropagation early with additional logging · mohammed90/certmagic@24f20f1 · GitHub. Can you run the custom build and check the logs?

3 Likes

Thanks… I built this and think I need to change my caddyfile? I note it doesnt include github.com/caddy-dns/cloudflare … I get a startup error it can’t find dns.providers.cloudflare package.

Tried adding a --with for the cloudflare package to the build but it errors out - and it found a package in multiple modules, I assume your fork has some of the same things as the cloudflare provider package? Tbh I don’t think I understand enough!

Make sure to add your --with for cloudflare before --output (I think it matters)

1 Like

This one works:

xcaddy build ff137d1 --with github.com/caddyserver/caddy/v2@v2.4.0=github.com/caddyserver/caddy/v2@v2.4.6 --with github.com/caddy-dns/cloudflare --with github.com/caddyserver/certmagic@master=github.com/mohammed90/certmagic@master --output caddy-custom-certmagic

2 Likes

Well… I built, moved to /usr/bin… restarted…

Feb 19 10:27:51 caddy caddy[2555]: {"level":"info","ts":1645266471.547268,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"zanj.cc","challenge_type":"dns-01","ca":"https://acme-v02.api.letsencrypt.org/directory"}
Feb 19 10:28:04 caddy caddy[2555]: {"level":"info","ts":1645266484.0333877,"logger":"tls.issuance.acme.acme_client","msg":"validations succeeded; finalizing order","order":"https://acme-v02.api.letsencrypt.org/acme/order/411599010/65192308520"}
Feb 19 10:28:05 caddy caddy[2555]: {"level":"info","ts":1645266485.11281,"logger":"tls.issuance.acme.acme_client","msg":"successfully downloaded available certificate chains","count":2,"first_url":"https://acme-v02.api.letsencrypt.org/acme/cert/0329cdfd4e779a609ae24c75a3f388d8a667"}
Feb 19 10:28:05 caddy caddy[2555]: {"level":"info","ts":1645266485.1133802,"logger":"tls.obtain","msg":"certificate obtained successfully","identifier":"zanj.cc"}
Feb 19 10:28:05 caddy caddy[2555]: {"level":"info","ts":1645266485.1134732,"logger":"tls.obtain","msg":"releasing lock","identifier":"zanj.cc"}

It worked!!!

Mohammed that is brilliant - thank you so much! I have no idea what is different - I thought we were just looking for additional logs…! If there’s anything I can do with this to further troubleshoot (if this is a bug and its not just me…!), let me know!

Huh, well that means something between 2.4.6 and the latest code on the master branch is what fixed it. So 2.5.0 should work for you.

But yeah, it would be good to track down exactly what fixed it, because there’s a lot of changes since 2.4.6.

Would you be willing to do a bisect to track down which commit was the fix?

Here’s the list of all the commits between 2.4.6 and the one you tried today (top one is oldest)

So basically the process will be:

  1. Turn off Caddy

  2. Clear out Caddy’s storage

  3. Make sure Caddy is configured to use Let’s Encrypt’s ACME staging endpoint so you don’t accidentally run into rate limits

  4. Pick a commit somewhere halfway in between the list in the link above, use that commit hash to make a build of Caddy

  5. Test out that build:

    5a. If it works then back to step 4, but try a commit that’s higher on the list (older), somewhere halfway in between

    5b. If it doesn’t work, then back to step 4, but try a commit that’s lower on the list (newer), somewhere halfway in between

Rinse & repeat until you find which commit was the difference between it working and not working.

2 Likes

Nope, I short-circuited the lookup as well. The logic used to lookup the TXT record using the default resolver, then it checks for full propagation by looking up the TXT record on all the nameservers managing the domain name.

Like Francis said, we’ll need bisect tracking to know the exact commit. Since you report it worked earlier then stopped working and now it’s working when short-circuited, the traversal has to be backwards.

2 Likes

Ok that all seems straightforward enough. I’ll get cracking with this when I get home after the weekend. Thanks both!

@Mohammed90, you are a wizard.

So we have two wizards, Mohammed and @francislavoie

1 Like

This topic was automatically closed after 13 days. New replies are no longer allowed.