Hello Caddy Community,
I’d like to start a discussion and propose an enhancement to Caddy’s built-in service discovery capabilities, specifically regarding its integration with Consul for services running on dynamic ports. My goal is to achieve a fully self-contained Caddy setup that does not depend on host system DNS configurations, which I believe is a common and important cloud-native pattern.
The Journey: From a Simple Goal to a Deep Insight
My journey began with a simple objective: use Caddy’s dynamic srv
to reverse proxy to my backend services registered in Consul, which are running on dynamic ports.
1. The Initial Configuration & Problem
I started with what seemed to be the correct Caddyfile, pointing the resolvers
to my Consul agent:
reverse_proxy {
dynamic srv _tenant._tcp.service.consul {
resolvers 192.168.50.119:8600
}
}
However, this configuration only worked if I also configured my host’s system DNS to forward .consul
queries to the Consul agent. Without the system DNS configuration, Caddy would fail with a 502 error.
2. The Evidence: The Two-Step DNS Lookup
After some debugging, the Caddy error log provided the crucial clue:
ERROR http.log.error dial tcp: lookup c0a832fb.addr.dc1.consul. on 127.0.0.53:53: no such host
This log shows that Caddy was falling back to the system resolver (127.0.0.53:53
) for the second part of the lookup.
A dig
query confirmed the two-step nature of the process:
$ dig @192.168.50.119 -p 8600 _tenant._tcp.service.consul SRV
;; ANSWER SECTION:
_tenant._tcp.service.consul. 0 IN SRV 1 1 43019 c0a832fb.addr.dc1.consul.
;; ADDITIONAL SECTION:
c0a832fb.addr.dc1.consul. 0 IN A 192.168.50.251
This revealed the core issue:
- Step 1 (SRV Lookup): Caddy correctly uses the specified
resolvers
to query for the SRV record. - Step 2 (A-Record Lookup): When it gets the target hostname (
c0a832fb.addr.dc1.consul.
) from the SRV answer, it ignores theresolvers
directive and falls back to the system DNS to resolve this hostname to an IP.
3. The Deeper Insight: The Unused ADDITIONAL SECTION
The dig
output also shows that Consul is already providing the necessary A record in the ADDITIONAL SECTION
of the same DNS response. An efficient DNS client could use this information to completely avoid the second network lookup. It appears Caddy’s current SRV resolver implementation does not leverage this optimization.
Proposal for Enhancement
Based on these findings, I believe the out-of-the-box experience for this common use case can be significantly improved. I would like to propose two potential enhancements for the Caddy development team to consider:
Suggestion 1 (Ideal & Most Efficient): Enhance Caddy’s DNS client to read and trust the ADDITIONAL SECTION
in DNS responses. When an SRV query is resolved, if the A/AAAA record for the target host is already present in the additional section, Caddy should use it directly instead of initiating a new query. This would be more performant and would solve the issue entirely.
Suggestion 2 (Alternative): Make the resolvers
directive “sticky” throughout the resolution chain. If a resolver is defined within the reverse_proxy
block, it should be used for all subsequent DNS lookups required by that proxy, including the A/AAAA lookups that follow an SRV query.
Conclusion
I believe implementing either of these suggestions would make Caddy an even more powerful and self-contained tool for modern, dynamic infrastructures. It would remove the “hidden” dependency on system-level DNS configuration, making deployments cleaner, more portable, and less surprising for users.
I’m opening this topic to share my findings, understand if this is a known limitation, and hear the thoughts of the community and developers.
Thank you for your incredible work on this project!