Forcing certmagic to use system resolvers for propagation checks?

1. Caddy version (caddy version):

v2.5.0-rc.1 h1:d/ivzqaW+ht8J4yD+XI9omgCDIbQCDOD5AzKPTwkwWk=

2. How I run Caddy:

root@AGW-see-caddy01:/etc/caddy# /usr/local/bin/caddy-custom-latest run

a. System environment:

Devuan LXC

b. Command:

root@AGW-see-caddy01:/etc/caddy# /usr/local/bin/caddy-custom-latest run

c. Service/unit/compose file:

n/a

d. My complete Caddyfile or JSON config:

{
acme_dns acmedns /etc/acmedns/clientstorage.json

}


sa.see.trosint.ovh {
tls {
 resolvers 10.128.20.1 1.1.1.1
dns acmedns /etc/acmedns/clientstorage.json
}
        reverse_proxy http://a.see.trosint.ovh:9000
}

3. The problem I’m having:

Internal protected system, only have proxied access to LetsEncrypt/ZeroSSL and acmedns.io via HTTPS and local (firewall) DNS resolver.

CertMagic seems to ignore the resolvers and try to go straight to the source instead of the system resolvers

4. Error messages and/or full log output:

2022/04/18 20:59:13.018 INFO tls.issuance.acme waiting on internal rate limiter {“identifiers”: [“sa.see.trosint.ovh”], “ca”: “https://acme-v02.api.letsencrypt.org/directory”, “account”: “”}
2022/04/18 20:59:13.018 INFO tls.issuance.acme done waiting on internal rate limiter {“identifiers”: [“sa.see.trosint.ovh”], “ca”: “https://acme-v02.api.letsencrypt.org/directory”, “account”: “”}
2022/04/18 20:59:14.028 INFO tls.issuance.acme.acme_client trying to solve challenge {“identifier”: “sa.see.trosint.ovh”, “challenge_type”: “dns-01”, “ca”: “https://acme-v02.api.letsencrypt.org/directory”}
2022/04/18 21:00:17.937 ERROR tls.obtain could not get certificate from issuer {“identifier”: “sa.see.trosint.ovh”, “issuer”: “acme-v02.api.letsencrypt.org-directory”, “error”: “[sa.see.trosint.ovh] solving challenges: waiting for solver certmagic.solverWrapper to be ready: checking DNS propagation of _acme-challenge.sa.see.trosint.ovh: dial tcp 46.4.128.227:53: i/o timeout (order=https://acme-v02.api.letsencrypt.org/acme/order/503426720/81301885640) (ca=https://acme-v02.api.letsencrypt.org/directory)”}

5. What I already tried:

added resolvers (wish is was global enforceable)

6. Links to relevant resources:

Just so I understand, you’re asking to use a different resolver for propagation checks than for resolving requests to your DNS provider?

I don’t think that’s possible right now, Caddy uses the same resolver config for both, I think.

Well, I want it to always use my system resolver, but Caddy/certMagic seems to try and be smart by doing the resolution it self and trying to do the work of the resolver and thus not querying the system resolver, but instead directly itself/

I’m confused. You configured resolvers here though:

Leave this out, and Caddy will use the system resolver by default.

2 Likes

No, it doesn’t.

It does the “smart thing” and try to do direct DNS.
I have the firewall logs showing that it DOES do the wrong “smart thing” , EVEN with those (10.128.20.1 is the system resolver) and 1.1.1.1 isn’t even asked.

The other part to this, I realize that should be asked: to NOT track the DNS, as querying the internal DNS, will yield a different answer from the external DNS as only the external DNS (that ACME server queries) need the CNAME/TXT records, as the internal DNS doesn’t, as it contains only the internal IPs, and nothing that’s “external”

Lets do some logs and tcpdumps to show, the Caddyfile I’ll test/use:

{
        acme_dns acmedns /etc/acmedns/clientstorage.json
        debug
}

*.sa.see.trosint.ovh sa.see.trosint.ovh {
        respond "Hello SSL"
}

Do note The EXTERNAL DNS domain (ie. when you, and ACME/LE/ZeroSSL query you’ll get a response for _challenge.sa.see.trosint.ovh) is configured correctly. I am not going to change that. I’m only going to change the “internal”/protected DNS and firewall rules.

FIRST: Internal doesn’t have the _acme-challenge.sa.see.trosint.ovh configured, and *.sa.see.trosint.ovh points to sa.see.trosint.ovh, so _challenge would also point to sa.see.trosint.ovh

Caddy log :
https://plik.cloudoffice.co.za/file/YKZRxQ80AOx2qWDW/mojOcNIZpZzWeCAp/AGWsurf20-internal-no_cname.caddy.log
tcpdump’s port 53
https://plik.cloudoffice.co.za/file/5PnGnWnkeUwnWm4U/UtnTSHYnHijSBrkB/STDIN

the relevant portions (for me):
from DNS showing that it tries to query the _challenge, and got “nothing” (as expected, you are an internal not needing the external stuff)

 10.128.20.11.43474 > 10.128.20.1.53: [bad udp cksum 0x3d64 -> 0x68b1!] 932+ [1au] SOA? _acme-challenge.sa.see.trosint.ovh. ar: . OPT UDPsize=4096 (63)
06:22:04.856588 IP (tos 0x0, ttl 64, id 61757, offset 0, flags [DF], proto UDP (17), length 174)
    10.128.20.1.53 > 10.128.20.11.43474: [udp sum ok] 932 q: SOA? _acme-challenge.sa.see.trosint.ovh. 1/1/1 _acme-challenge.sa.see.trosint.ovh. CNAME sa.see.trosint.ovh. ns: see.trosint.ovh. SOA ns1.mwprox.in. sysadmin.hevis.co.za. 2022041901 28800 7200 604800 600 ar: . OPT UDPsize=4096 (146)
06:22:04.856754 IP (tos 0x0, ttl 64, id 9315, offset 0, flags [DF], proto UDP (17), length 75)
    10.128.20.11.34178 > 10.128.20.1.53: [bad udp cksum 0x3d54 -> 0xcd5a!] 28823+ [1au] SOA? sa.see.trosint.ovh. ar: . OPT UDPsize=4096 (47)
06:22:04.865809 IP (tos 0x0, ttl 64, id 62013, offset 0, flags [DF], proto UDP (17), length 144)
    10.128.20.1.53 > 10.128.20.11.34178: [udp sum ok] 28823 q: SOA? sa.see.trosint.ovh. 0/1/1 ns: see.trosint.ovh. SOA ns1.mwprox.in. sysadmin.hevis.co.za. 2022041901 28800 7200 604800 600 ar: . OPT UDPsize=4096 (116)

From the caddy log:

{"level":"info","ts":1650349324.3459597,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"sa.see.trosint.ovh","challenge_type":"dns-01","ca":"https://acme-v02.api.letsencrypt.org/directory"}
{"level":"error","ts":1650349355.0679355,"logger":"tls.issuance.acme.acme_client","msg":"cleaning up solver","identifier":"*.sa.see.trosint.ovh","challenge_type":"dns-01","error":"no memory of presenting a DNS record for sa.see.trosint.ovh (probably OK if presenting failed)"}

Doesn’t matter how many times, it doesn’t seem to remember that “presenting” DNS record, which I guess is related to it “wanting” to check the record itself, not trusting the push? Is this perhaps related to the “fun” with delegating the domain to update/check ?

So Next test to show that caddy/certmagic/whomever does direct DNS and ignores the system resolver:

I added in the INTERNAL DNS _acme-challenge.sa.see.trosint.ovh CNAME to 3b13c262-628e-4576-8a38-3b5f52a77896.auth.acme-dns.io

Caddy debug output: https://plik.cloudoffice.co.za/file/D8ZfCKEJFOb2IDEg/K5ZCOLwpLvMJQSo0/AGWsurf20-internal-internal_cname.caddy.log

the interesting part:

{"level":"error","ts":1650353308.67106,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"*.sa.see.trosint.ovh","issuer":"acme-v02.api.letsencrypt.org-directory","error":"[*.sa.see.trosint.ovh] solving challenges: waiting for solver certmagic.solverWrapper to be ready: checking DNS propagation of _acme-challenge.sa.see.trosint.ovh: dial tcp 46.4.128.227:53: i/o timeout (order=https://acme-v02.api.letsencrypt.org/acme/order/504113820/81429230560) (ca=https://acme-v02.api.letsencrypt.org/directory)"}

The tcpdump -a port 53 output: https://plik.cloudoffice.co.za/file/q85KTCvEegM0RHIu/3jNyyIlDyl2LRwwq/STDIN

the interesting line confirming my statements that certMagic does the DNS direct and not via the system resolver, first a UDP, and then TCP on port 53:

07:28:08.517783 IP 10.128.20.11.33532 > 46.4.128.227.53: 15495 [1au] TXT? 3b13c262-628e-4576-8a38-3b5f52a77896.auth.acme-dns.io. (82)
07:28:18.518476 IP 10.128.20.11.43088 > 10.128.20.1.53: 22295+ AAAA? ns.auth.acme-dns.io. (37)
07:28:18.518537 IP 10.128.20.11.56272 > 10.128.20.1.53: 144+ A? ns.auth.acme-dns.io. (37)
07:28:18.527488 IP 10.128.20.1.53 > 10.128.20.11.56272: 144 1/0/0 A 46.4.128.227 (53)
07:28:18.543452 IP 10.128.20.1.53 > 10.128.20.11.43088: 22295 0/0/0 (37)
07:28:18.543646 IP 10.128.20.11.60674 > 46.4.128.227.53: Flags [S], seq 3673031871, win 64240, options [mss 1460,sackOK,TS val 2349545146 ecr 0,nop,wscale 7], length 0
07:28:19.578246 IP 10.128.20.11.60674 > 46.4.128.227.53: Flags [S], seq 3673031871, win 64240, options [mss 1460,sackOK,TS val 2349546180 ecr 0,nop,wscale 7], length 0
07:28:21.594252 IP 10.128.20.11.60674 > 46.4.128.227.53: Flags [S], seq 3673031871, win 64240, options [mss 1460,sackOK,TS val 2349548196 ecr 0,nop,wscale 7], length 0
07:28:25.846248 IP 10.128.20.11.60674 > 46.4.128.227.53: Flags [S], seq 3673031871, win 64240, options [mss 1460,sackOK,TS val 2349552448 ecr 0,nop,wscale 7], length 0

So the questions:

(i) If there are no resolvers line, why does certMagic/Caddy do direct DNS to the acmedns.io ?
(ii) how can I tell certMagic caddy to explicitly trust the http POST update, and not to check the TXT records?
(iii) how do I tell certMagic/caddy to rather wait/delay a configurable time, for the records (instead of checking) and then to proceed with the solving requests?

What more do I have to produce to show my issues/problems inside a protected internal environment?

1 Like

I’m not sure I understand what you mean here. What’s the “smart thing”, exactly?

I’m a bit lost in the size of your comment frankly :sweat_smile:

If you don’t specify a resolver, it’ll use Go’s default which is outlined here:

That’s not for Caddy/Certmagic to control, I think, it’s up to the acmedns plugin to control how that’s resolved.

I agree there should be a way to turn off propagation checks, we’ll add this soon.

Yeah we could probably add a delay thing, but it’s probably not needed, we can just poll the issuer to see if it’s done yet.

PLEASE :slight_smile:

Okay, here you are contradicting the ACME_DNS plugin authors:

As noted in the Caddy forum thread, DNS propagation check is done by certmagic. Any option to resolve records differently or even to disable the propagation check would have to be implemented there, this plugin doesn’t have that kind of control. Closing the issue.

With the “smart thing” I meant it tries to go direct, instead of using system resolvers.

WHY does the Go resolver then go direct to the acmedns.io? see the tcpdump. I believe it’s the certMagic that is at fault here, not the go resolver.

I’m pointing the evidences ;(

The propagation check is done by certmagic (i.e. checking for _acme-challenge.sa.see.trosint.ovh), but that’s not the same as resolving the acmedns.io domain (i.e. the acmedns server’s domain name).

Any communication done with the acmedns server to set the TXT record is done by the plugin. This is done here, via an HTTP request to your configured acmedns server:

I might be misunderstanding this discussion, but the propagation check is resolving <something>.acme-dns.io because certmagic is following a CNAME record (as it should):

_acme-challenge.sa.see.trosint.ovh. 3600 IN CNAME 3b13c262-628e-4576-8a38-3b5f52a77896.auth.acme-dns.io.

Not because of something specific that acmedns plugin does. Relevant certmagic code.

But I think @Hendrik_Visage has a problem with the fact that certmagic sends a DNS query to ns.auth.acme-dns.io authoritative nameservers directly instead of getting the TXT record from their DNS server. If I read the code correctly, certmagic tries to follow CNAME records when first attempt to fetch the TXT from the default DNS servers fails: code here.

@Hendrik_Visage , could you try restarting Caddy (forcing it to try to get a certificate) and then running

dig _acme-challenge.sa.see.trosint.ovh -t TXT

Does it find any TXT records?

1 Like

@Hendrik_Visage
I think this is similar to a problem I experienced before. My default DNS resolver just wasn’t returning the TXT records in time and Caddy couldn’t verify the propagation (even though Let’s Encrypt would have been able to verify them). For me the solution was to use Cloudflare’s resolvers.

To check if this is the case with your resolver:

  1. Run Caddy so that it creates ACME DNS records.
  2. See if you can see these TXT records with your default DNS resolver in your restricted system:
    dig -t TXT _acme-challenge.sa.see.trosint.ovh.
  3. See if you can see TXT records using Cloudflare resolvers (outside your restricted system):
    dig -t TXT _acme-challenge.sa.see.trosint.ovh. @1.1.1.1

If you see the TXT records with Cloudflare but you can’t see the records with your default DNS, there’s your problem - Caddy can’t find the records because your DNS resolver can’t. If that’s the case, the only solution would be to disable propagation checks.

2 Likes

And right there is the problem I’ve been trying to state in the subject field ;(

ie. Certmagic might NOT be able to check it, as it’s an OUTSIDE DNS, while the internal does NOT necessarily have, nor need those DNS entries CertMagic tries to check.

Certbot goes into a sleep, to “hope” it updates. I want a similar for CertMagic, especially where I could tell it to wait a configurable period and then continue with the solving.

Exactly!!

The EXTERNAL DNS succesfully does find that record yes, the acmedns plugin does work correctly and as expected.

The INTERNAL DNS I need to coerce as it’s not setup for that, but could be cloned… not ideal for the dual/split DNS deployments I’m targeting, but doable.

The current CertMagic solution with 2.5.0-rc1: cut yet another hole in the firewall for DNS direct ;(

I’m basically looking at falling back to Certbot+ACMEDNS and then Caddy set to no auto-https ;(

Was hoping to not, but CertMAgic trying to “be smart” is the problem here ;(

For a system that is out there in the Big-Bad-Internet, doing what CertMagic is doing, is… well… nothing to worry about. But when you are inside a trusted and protected environment, doing things like those are, well… it’s destined for failure as firewalls etc. needs to protect, and direct stuff like CErtMagic does is definite no-no

2 Likes

Thank you @francislavoie !!!

The working config: (With custom xcaddy build since this is just pushed to master :wink: )

*.silo1.pint.ovh silo1.pint.ovh {
	tls {
		issuer acme {
			dns acmedns /etc/acmedns/clientstorage.json
			propagation_delay 30s
			propagation_timeout -1
		}
	}
	reverse_proxy http://a.see.trosint.ovh:9000
}
1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.