Discussion & Proposal: Improving built-in Consul discovery for dynamic ports (A deep-dive on SRV resolver behavior)

Hello Caddy Community,

I’d like to start a discussion and propose an enhancement to Caddy’s built-in service discovery capabilities, specifically regarding its integration with Consul for services running on dynamic ports. My goal is to achieve a fully self-contained Caddy setup that does not depend on host system DNS configurations, which I believe is a common and important cloud-native pattern.

The Journey: From a Simple Goal to a Deep Insight

My journey began with a simple objective: use Caddy’s dynamic srv to reverse proxy to my backend services registered in Consul, which are running on dynamic ports.

1. The Initial Configuration & Problem

I started with what seemed to be the correct Caddyfile, pointing the resolvers to my Consul agent:

reverse_proxy {
    dynamic srv _tenant._tcp.service.consul {
        resolvers 192.168.50.119:8600
    }
}

However, this configuration only worked if I also configured my host’s system DNS to forward .consul queries to the Consul agent. Without the system DNS configuration, Caddy would fail with a 502 error.

2. The Evidence: The Two-Step DNS Lookup

After some debugging, the Caddy error log provided the crucial clue:

ERROR   http.log.error  dial tcp: lookup c0a832fb.addr.dc1.consul. on 127.0.0.53:53: no such host

This log shows that Caddy was falling back to the system resolver (127.0.0.53:53 ) for the second part of the lookup.

A dig query confirmed the two-step nature of the process:

$ dig @192.168.50.119 -p 8600 _tenant._tcp.service.consul SRV

;; ANSWER SECTION:
_tenant._tcp.service.consul. 0  IN      SRV     1 1 43019 c0a832fb.addr.dc1.consul.

;; ADDITIONAL SECTION:
c0a832fb.addr.dc1.consul. 0     IN      A       192.168.50.251

This revealed the core issue:

  • Step 1 (SRV Lookup): Caddy correctly uses the specified resolvers to query for the SRV record.
  • Step 2 (A-Record Lookup): When it gets the target hostname (c0a832fb.addr.dc1.consul. ) from the SRV answer, it ignores the resolvers directive and falls back to the system DNS to resolve this hostname to an IP.

3. The Deeper Insight: The Unused ADDITIONAL SECTION

The dig output also shows that Consul is already providing the necessary A record in the ADDITIONAL SECTION of the same DNS response. An efficient DNS client could use this information to completely avoid the second network lookup. It appears Caddy’s current SRV resolver implementation does not leverage this optimization.

Proposal for Enhancement

Based on these findings, I believe the out-of-the-box experience for this common use case can be significantly improved. I would like to propose two potential enhancements for the Caddy development team to consider:

Suggestion 1 (Ideal & Most Efficient): Enhance Caddy’s DNS client to read and trust the ADDITIONAL SECTION in DNS responses. When an SRV query is resolved, if the A/AAAA record for the target host is already present in the additional section, Caddy should use it directly instead of initiating a new query. This would be more performant and would solve the issue entirely.

Suggestion 2 (Alternative): Make the resolvers directive “sticky” throughout the resolution chain. If a resolver is defined within the reverse_proxy block, it should be used for all subsequent DNS lookups required by that proxy, including the A/AAAA lookups that follow an SRV query.

Conclusion

I believe implementing either of these suggestions would make Caddy an even more powerful and self-contained tool for modern, dynamic infrastructures. It would remove the “hidden” dependency on system-level DNS configuration, making deployments cleaner, more portable, and less surprising for users.

I’m opening this topic to share my findings, understand if this is a known limitation, and hear the thoughts of the community and developers.

Thank you for your incredible work on this project!