Caddy DNS Validation vs pfsense

1. Output of caddy version:

2.5.2

2. How I run Caddy:

As a reverse proxy.

a. System environment:

TrueNAS

b. Command:

Paste command here.

c. Service/unit/compose file:

Paste full file contents here.
Make sure backticks stay on their own lines,
and the post looks nice in the preview pane. -->

d. My complete Caddy config:

Paste your config here, replacing this text.
Use `caddy fmt` to make it readable.
DO NOT REDACT anything except credentials.
LEAVE DOMAIN NAMES INTACT.
Make sure the backticks stay on their own lines.

3. The problem I’m having:

I am wondering how caddy obtains DNS validation. On my pfsense box i have NAT rules forcing DNS to my pfsense DNS server. While this rule is active, caddy cannot obtain DNS validation. But (and here is the question) pfsense is able to obtain a proper cert with DNS validation.
The only difference i can see is that pfsense uses the token and the zone ID for cloudflare. Caddy only uses the token.
I have been able to get it to work by creating a NAT rule excluding caddy.

But the question is, what is different between the two?
Why cant caddy obtain a proper DNS cert without having to create a rule allowing it to go to other DNS servers, while pfsense doesnt?

4. Error messages and/or full log output:

Paste logs/commands/output here.
USE THE PREVIEW PANE TO MAKE SURE IT LOOKS NICELY FORMATTED.

5. What I already tried:

6. Links to relevant resources:

Please use v2.6.1!

What do you mean by “DNS validation”? That’s not a term we use.

Are you talking about the ACME DNS challenge? Automatic HTTPS — Caddy Documentation

I’m not sure I understand what you mean here. What do you mean by “proper cert”? Do you mean a “publicly trusted certificate”? (i.e. signed by a public CA, like Let’s Encrypt)

And again, what do you mean by “DNS validation”?

As far as I know, pfSense doesn’t have an ACME client to automate certificate management. But I don’t use it, so I’m not certain.

Please completely fill out the help topic template. There are holes here, so we can’t really know what you’re talking about.

We need to know your config and what’s in your logs. Please be as specific as possible.

Sorry for not posting my logs and such, but the question i have is not related to that.

Caddy works perfectly for me with one caveat. I currently am forcing my network to use local DNS (127.0.0.1) only, instead of (1.1.1.1,8.8.8.8) etc…
pfsense has no issues obtaining and renewing certs (yes it does do ACME challenges) with this setup.

Caddy fails to obtain and renew until i create a rule that allows caddy to use the cloudflares DNS challenge nameservers as DNS on port 53.

Why would pfsense be ok in doing this without having a DNS bypass rule, when caddy needs the bypass rule to work?

Thanks

Okay, super quick rundown:

  1. Caddy reaches out to the ACME provider to initiate an order
  2. ACME provider supplies a TXT record
  3. Caddy reaches out to the DNS provider to append the TXT record to the zone
  4. Caddy attempts to resolve that brand new record, waiting until it appears
  5. When Caddy can see the record, it instructs the ACME provider to proceed with the challenge
  6. The ACME provider resolves the domain, inspects the TXT record, and issues the certificate

My hunch is that the issue is appearing at step 4. This isn’t strictly a pfSense issue as much as it is a captive+cached DNS issue in general. If your pfSense is eating DNS requests and returning them, and it’s cached the result, and therefore does not return fresh TXT records to Caddy, then Caddy will never proceed with the challenge and complete the order, forever waiting.

However…

The logs would show this,

if the above was the issue. Don’t ever sleep on the logs. The developers of Caddy have put a lot of effort into ensuring that Caddy emits readable output that can help you - and us - figure out the cause of issues. Please post your logs. They will confirm or deny my hunch, and there are workarounds we can attempt in that case.

2 Likes
{"level":"info","ts":1664154076.1610212,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"demo1.my-domain.ca","challenge_type":"dns-01","ca":"https://acme-v02.api.letsencrypt.org/directory"}
{"level":"error","ts":1664154079.1317897,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"demo1.my-domain.ca","issuer":"acme-v02.api.letsencrypt.org-directory","error":"[demo1.my-domain.ca] solving challenges: waiting for solver certmagic.solverWrapper to be ready: checking DNS propagation of _acme-challenge.demo1.my-domain.ca: NS beth.ns.cloudflare.com. returned REFUSED for _acme-challenge.demo1.my-domain.ca. (order=https://acme-v02.api.letsencrypt.org/acme/order/735040181/128897308937) (ca=https://acme-v02.api.letsencrypt.org/directory)"}

Ok here is a test i just did.

It seems its not even writing the TXT record to my cloudflare. I keep reloading it and its not showing up…

I’m betting that beth.ns.cloudflare.com. didn’t actually return REFUSED and that your pfSense resolver did this via NAT rule.

If it got to propagation check stage, then it definitely should’ve received confirmation from the Cloudflare API that it did indeed append the required record. Bear in mind that Caddy will clean up after itself once it fails this check, so the record should only be there very briefly.


To resolve this issue, ideally, you should configure pfSense’s resolver to properly resolve the TXT record at the _acme-challenge subdomain and pass it back to Caddy.

Failing that, your NAT bypass to allow Caddy to connect directly to Cloudflare’s nameserver for resolution would work, as you’ve already observed.

Failing that, you can provide a workaround for each site that needs to complete the DNS challenge:

  tls {
    issuer acme {
      propagation_timeout -1
    }
    issuer zerossl {
      propagation_timeout -1
    }
  }

This will configure both LetsEncrypt and ZeroSSL to skip the propagation checks entirely. This should be OK with Cloudflare because they’re usually very fast, but you might run into the occasional issue where the ACME provider checks before the record has actually propagated; rarely, though, and Caddy should simply try again without issue.

2 Likes

I now have it so it resolve my domain to my publice IP, and am now getting a

SERVFAIL
and

could not determine zone for domain.

And i checked the token. It is scoped correctly.

What is ‘it’? What did you change, exactly?

At what step of the process? This response was returned from which server, and to which client, at which stage?

1 Like

Basically this is my setup.

  1. All of my internal network is forced to using local DNS. (127.0.0.1)
  2. Even my-domian.ca queries are resolved using local DNS. This is what i changed to my public IP (queries for *.my-domain.ca are now resolved at my public IP)
{"level":"error","ts":1664156422.2947662,"logger":"tls.obtain","msg":"will retry","error":"[bitwarden.my-domain.ca] Obtain: [bitwarden.my-domain.ca] solving challenges: waiting for solver certmagic.solverWrapper to be ready: checking DNS propagation of _acme-challenge.bitwarden.my-domain.ca: read udp 192.168.1.137:55900->172.64.35.154:53: read: connection refused (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/68839304/4254889764) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)","attempt":1,"retrying_in":60,"elapsed":6.139948015,"max_duration":2592000}

This error is when caddy can only use local DNS.

{"level":"error","ts":1664155278.1745617,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"demo1.my-domain.ca","issuer":"acme-staging-v02.api.letsencrypt.org-directory","error":"[demo1.my-domain.ca] solving challenges: presenting for challenge: could not determine zone for domain \"_acme-challenge.demo1.my-domain.ca\": unexpected response code 'SERVFAIL' for _acme-challenge.demo1.my-domain.ca. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/68839304/4254647294) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)"}

And this is the SERVFAIL error. Im not sure at which point it actually happens.

127.0.0.1 is the loopback address. If all clients were using this, wouldn’t DNS resolution would be broken for every client that isn’t running its own DNS server locally?

presenting for challenge is - I think - the step of submitting the TXT record to Cloudflare for publishing.

That said, I think we’re venturing out of the realm of troubleshooting Caddy and into the realm of troubleshooting networking, which is a little beyond the scope of these forums. Suffice it to say, a DNS request to Cloudflare shouldn’t be returning SERVFAIL, and of course, when Caddy is checking propagation, it needs to receive a proper response from the DNS server it’s connecting to (as opposed to a connection refused / RST,ACK from your firewall), and pfSense specifically will need to be addressed to ensure these. You could possibly make some DNS checks directly against Cloudflare’s NS to see if their services are producing bad results, but I’d consider that less than likely, and if they are, I’d not expect them to take a very long time to resolve their problem.

2 Likes

Thanks. I’ll keep investigating that end.

1 Like

Sorry meant to say my local DNS server. (192.168.0.253)

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.