New DNS Plugin that doesn't require any API

Hello fellow Go programmers!

I have created a new DNS provider module for Caddy.
It’s quite different from the other DNS providers because it does not use any API, instead it simply makes Caddy serve the challenge records itself.
To get this to work, only one additional DNS record (NS record for _acme-challenge) is required.

I’ve tested it against Caddy’s own embedded ACME CA and Let’s Encrypt staging (and it works), but it’s not production ready yet.

There are a few reasons for this:

  • I’m not actually a Go programmer (this is the first thing I’ve written in it)
  • It relies on deprecated & undocumented Caddy behavior (forgive me)
  • It’s only been tested manually so far
  • dns_challenge_override_domain / override_domain isn’t supported, and neither is the acme_dns Caddyfile global option

I’d be happy if you guys could take a look at it and give me feedback, especially about whether I’m using Go right.

I’d also like to know whether the caddy.Replacer stuff in Provision() makes any sense at all (to be honest, I just copied that from another DNS provider module and don’t really know what it does).

If anyone has a lot of experience with DNS, I’m also curious about the error codes I’ve chosen to return for the various cases (in the make_handler() function): is it okay to reply REFUSED to all non-TXT requests?
Realistically, this handler will only be running for a second or so, while the challenge is being completed, but it should still be correct (and in my testing I also saw the LE staging CA query for CAA records).

Finally, there’s the issue with implementing acmez.Solver instead of certmagic.ACMEDNSProvider with the libdns interfaces.
I chose to do this because I think the acmez.Solver interface fits a lot better, “semantically”, for what this module provides. In practical terms, it would probably work just as well to launch the server thread and do the Present() stuff in AppendRecords() and then do the CleanUp() in DeleteRecords()… but to me that seemed worse than disregarding the deprecation.
I suppose if framed as “this is a server that runs as long as it has records to serve”, it could make sense as a libdns-style provider, too.
If nothing else works out, this could always be implemented as a certmagic.Issuer under tls.issuance instead of dns.providers, but that would involve copying almost all of the code from the tls.issuance.acme issuer.
Since even the libdns interfaces are still experimental / WIP I thought it’d be best to just do it like this for now and find out whether this is even useful or practical at all.

Ok, here’s the link (but remember: not production ready):

I’m curious to see what you think! :slight_smile:

Regards,
xaos

2 Likes

If you’re running a DNS server, I’d suggest your plugin should be a caddy.App primarily to run the DNS server, and then have a DNS module which communicates to the App.

An app can live for the duration of the config, whereas a DNS module is only invoked when used. That would make it easier to push libdns actions to the App for it to manage in-memory.

I think with your current approach, you might run into problems if your module is configured more than once because there might be contention over which module “owns” the DNS server.

I didn’t read through the code closely right now, just skimming quickly, but just wanted to point out some ideas that might unblock you.

1 Like

Hey Francis, great point about the contention!
My idea was to have each instance of the module have it’s own independent & totally ephemeral server, but I made a mistake in the way I initialized the server which meant they did all share some global state.
I fixed that now, so each instance should be completely independent. Unfortunately, as you point out, that’s not quite as useful as it seems, because it means that they will contend over being able to bind to the address.
I’m not quite sure what to do about that, but it seems clear I do need some global state to synchronize the access to the network ports (or rather, address + port combinations).
I have a basic idea of how that would work but since the different instances could be trying to bind to different address + port combos and I get those as strings, I think I’ll need a way to “canonicalize” these address strings. Or at least, some way of comparing two address strings and determining if they are actually different or just different representations of the same address. Do you know a good way to do that?

I hadn’t considered implementing it as a caddy.App yet, I’ll look into that, thanks!

Yeah – the app is how you would set up global state.

Caddy has functions for managing listeners. See caddy package - github.com/caddyserver/caddy/v2 - Go Packages

Hey!

so I rewrote almost the entire thing, now it’s a caddy.App, and it can serve any DNS record you want (defined in the config).
The provider is properly implemented as a certmagic.ACMEDNSProvider now, it adds the TXT records by sending them to the app over a Go channel.
The contention issue with the previous version is gone now, but I’ve introduced another limitation, temporarily, in its stead: there can only be one DNS server, and it can only bind to a single address.
There’s a couple other issues with it still (I added a “Limitations & Bugs” section to the readme), but I’d be happy if you (or whoever else is interested) could take another look.

I have a couple questions:

Is it ok to use dns as the namespace for the app? Since the providers already use dns.providers, and it seems like that would overlap, but it hasn’t been an issue in my testing.

I’m really happy with the tests I added: it’s awesome to be able to just write a Caddyfile and run it with tester.InitServer() in the test. I think the DNS server part is reasonably well covered now, but the provider & adding records dynamically isn’t covered at all.
Since Caddy has the built-in ACME server, I was thinking it might be possible to have complete “end-to-end” testing, by just spinning up 2 Caddies, one as the server and one doing the challenge, which would be really cool. However I think I’m missing two pretty big pieces: a way to override the ACME-Caddy DNS resolution so that it queries the solver-Caddy, and a way for the solver-Caddy to trust the (self-signed) TLS cert of the ACME-Caddy.
Getting that to work might be harder than just testing the provider part in isolation though.

I’m not sure why, but I’ve only been able to get the challenge to work by disabling the propagation checks with propagation_timeout -1. Both the internal server & LE staging have no problem querying the record and issuing a certificate, but the Caddy that’s solving won’t even ask them to, if the propagation checks are enabled.
Actually, I just tried it again on the server that’s in the real DNS root (as opposed to pretending to be example.com via /etc/hosts) and it seems it might be failing because there’s no SOA record being served. I’m still not sure why that record would be necessary (and again, it works if the checks are disabled).
I’ll have to look into it some more.

Finally, I’m thinking this plugin/package might need to be rebranded.
While it’s not a full-featured DNS server by any means, now that it can serve whatever records, it’s also not a “stub” anymore, either. I’ve already renamed the provider module to dns.providers.internal, which I prefer, but I don’t know what to do for the package, especially considering there are quite a few “Caddy & DNS” related packages already.

2 Likes

Yeah, I’d say that’s probably too generic a name. Try to come up with something more creative, and it’ll avoid concerns of naming overlap.

There is resolvers config for DNS challenges in the tls config, if that’s what you mean. If not, I’m not sure where you mean.

When you spin up the ACME server, it generates a root.crt in its data storage. You can point it to that.

Probably :man_shrugging: that’s more effort than I’m willing to put into testing :joy:

Did you configure resolvers to allow Caddy to query the right DNS server? It’ll look to see that it can confirm the TXT record does appear before allowing the rest of the process to continue.

But to be honest, I think we should remove disable propagation checks by default, they’re a speedbump which doesn’t really save any processing time. The propagation delay does make sense though if we know the DNS server is slow to propagate (e.g. GoDaddy’s had been known to have issues with that).

Propagation also has the “multi-perspective” problem, because Caddy itself is not necessarily going to be able to see the same DNS query results that the ACME issuer does. Configuring resolvers can help with that in most cases by using something like 1.1.1.1 or whatever, but it’s flaky in general.

Related: Multi-Perspective Validation Improves Domain Validation Security - Let's Encrypt

Oh yeah maybe. SOA records are looked up to find the “zone” for the domain, since the TXT records are written to the root of the zone. I think. @matt can probably confirm that point, I’m a bit rusty on the details there.

2 Likes

Uh, I’m no DNS expert either :sweat_smile:

But yes that sounds about right. SOA helps us know what zone to look in, in an attempt to mimic what an ACME server would do.

1 Like

No, I meant for the verification part.
I was imagining the perfect integration test: I just write 2 Caddyfiles and launch them both with caddytest.Tester.InitServer. One of them uses the built-in ACME server, to “host” the challenge and the other one solves it.

# Caddy A
acme {
	tls internal
	acme_server # ← This guy would need to use a different resolver, i.e. 127.0.0.123
}
# Caddy B
{
	acme_ca https://acme/acme/local/directory
	# Needs to trust Caddy A's certificate to do the HTTPS here
	dns 127.0.0.123:53 {
		record "example.com. A 127.0.0.123"
	}
}

example.com {
	bind 127.0.0.123
	tls {
		dns internal
		propagation_timeout -1
	}
	respond "Works"
}

… Or something like that. I just thought it would be really cool to get the Caddies to test each other :smiley:. But yea, it’s probably not possible without modifying a bunch of stuff.
(I guess I’d also need to figure how to check that Caddy B is serving HTTPS with a cert that’s signed by Caddy A).

Anyway, I just realized I should probably rename the provider module again, it’s too similar to the tls internal directive for self-signing certificates with the internal PKI.

No, I figured since it’s in the actual DNS root, any resolver should be able to find it.

Yea, before I spotted those SOA queries being rejected, I thought it might be a firewall issue: that because the system is querying itself, the packets wouldn’t count as INPUT, and different rules would apply.

Yea maybe that’s the right call. I’m not really qualified to say :sweat_smile:. For now, I think I’ll read up a bit more on SOA records, and I’ll try and implement them as a special case, so that they get served automatically and don’t prevent the listener from shutting down if they’re the only record it has to serve.

Oddly, I’ve never seen the LE staging CA query for SOAs, they seem to go straight for the TXT records. Maybe they only query those for the outer zones (I put an NS record pointing to my server for the _acme-challenge “sub-zone” in my testing subdomain).

Do you mean for acme_server to resolve the DNS challenge?

Yeah that’s fair, we should probably look into that. If you want to open an issue on github to track that, it’d be helpful.

This topic was automatically closed after 30 days. New replies are no longer allowed.