1. Caddy version (caddy version
):
v2.4.1 h1:kAJ0JB5Xk5gPdTH/27S5cyoMGqD5lBAe9yZ8zTjVJa0=
2. How I run Caddy:
Running caddy run
in my terminal emulator
a. System environment:
Distributor ID: | Ubuntu |
---|---|
Description: | Ubuntu 20.04.2 LTS |
Release: | 20.04 |
Codename: | focal |
Linux: 5.4.0-70-generic
b. Command:
caddy run
c. Service/unit/compose file:
Not using Docker/systemd/Kubernetes/make etc
d. My complete Caddyfile or JSON config:
{
acme_dns cloudflare <cloudlfare_token>
storage consul {
address "127.0.0.1:8500"
token " <consul_token>"
timeout 10
prefix "caddytls"
value_prefix "caddysite"
aes_key "<aes_key>"
tls_enabled "false"
tls_insecure "true"
}
}
<my-domain> {
reverse_proxy {
to srv+http://explorer-dashboard.service.dc1.consul srv+http://explorer-dashboard.service.dc2.consul
lb_try_duration 2s
lb_policy round_robin
fail_duration 5s
max_fails 2
unhealthy_status 5xx
unhealthy_request_count 2
}
}
3. The problem I’m having:
I am trying to achieve service discovery and failover for my app using Caddy and Consul DNS. I have two Consul DCs and two instances of my app (one per dc). Each service instance has Consul health check. So, if the consul health check fails - Consul excludes failed instance from DNS response (it’s by design of Consul). E.g if explorer-dashboard in dc1 fails - DNS query to explorer-dashboard.service.dc1.consul will return nothing. I’m expecting that Caddy can handle this.
I mean its behaviour will be like "resolving explorer-dashboard.dc1 - OK and resolving explorer-dashboard.dc2 - FAILED. Okay, the good upstream for proxying is only explorer-dashboard.dc1. I will route all traffic to it and as explorer-dashboard.dc2 will be raised up I will perform a load balancing ". But Caddy doesn’t work as I expect. It returns HTTP 502 on each second request because one instance of my app is down, e.g
the first request - curl https://my-domain.com returns 200
the second request -curl https://my-domain.com returns 502
It tries to resolve DNS SRV record of the failed instance and receive an error from the DNS resolver
How can I exclude instances which cannot be resolved from traffic routing?
4. Error messages and/or full log output:
ERROR http.log.error making dial info: lookup explorer-dashboard.service.dc1.consul on 127.0.0.53:53: no such host
5. What I already tried:
I tried to tune reverse_proxy healthchecks but It seems that the problem not in healthchecks (or I am a noob and missed something in docs)
6. Links to relevant resources:
–