Global Health Status for Load Balanced Endpoints

1. The problem I’m having:

I have two load balanced endpoints:

http://10.136.133.148
http://10.136.133.149

They run a service that listens on ports 35997 and 35998. Each endpoint also has an API at :8080/healthcheck that checks to make sure the service is working properly. The healthcheck for each endpoint is “global” for that endpoint. Meaning, the response to: http://10.136.133.148:8080/healthcheck represents the health of:

http://10.136.133.148:35997
and
http://10.136.133.148:35998

the way this is setup now, Caddy hits :8080/healthcheck twice every 10 seconds. Is there a way to apply the heath status of port 35997 to 35998 so I only run this heath check once every 10 seconds?

Here is my Caddyfile

my.hc1node.com:35997 {
        reverse_proxy http://10.136.133.148:35997 http://10.136.133.149:35997 {
                header_up Host {upstream_hostport}
                lb_policy random
                health_interval 10s
                health_timeout 5s
                health_uri /healthcheck
                health_port 8080
                health_status 2xx
        }
}

my.hc1node.com:35998 {
        reverse_proxy http://10.136.133.148:35998 http://10.136.133.149:35998 {
                lb_policy random
                health_interval 10s
                health_timeout 5s
                health_uri /healthcheck
                health_port 8080
                health_status 2xx
        }
}

2. Error messages and/or full log output:

No errors are produced

3. Caddy version:

v2.6.4 h1:2hwYqiRwk1tf3VruhMpLcYTg+11fCdr8S3jhNAdnPy8=

4. How I installed and ran Caddy:

a. System environment:

Distributor ID: Ubuntu
Description: Ubuntu 22.10
Release: 22.10
Codename: kinetic
DigitalOcean: 2vcpu-2gb-intel,

b. Command:

systemctl start caddy

c. Service/unit/compose file:

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
Type=notify
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile --force
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateDevices=yes
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

d. My complete Caddy config:

{
        debug
}

my.hc1node.com:35997 {
        reverse_proxy http://10.136.133.148:35997 http://10.136.133.149:35997 {
                header_up Host {upstream_hostport}
                lb_policy random
                health_interval 10s
                health_timeout 5s
                health_uri /healthcheck
                health_port 8080
                health_status 2xx
        }
}

my.hc1node.com:35998 {
        reverse_proxy http://10.136.133.148:35998 http://10.136.133.149:35998 {
                lb_policy random
                health_interval 10s
                health_timeout 5s
                health_uri /healthcheck
                health_port 8080
                health_status 2xx
        }
}

5. Links to relevant resources:

Active health checks are designed to be per-upstream-address. What you’re asking for isn’t a supported usecase. First time I’ve ever heard anyone trying to do this as well, so it’s not something we designed for.

Why is it a problem to do health checks more than every 10 seconds? Health checks should always be super fast/instantaneous to run.

Understood. I think I have a unique use case where the endpoint is a node in a blockchain. The node is not always in sync due to latency across the network. This can cause the node health check to respond slower than expected. Maybe I need to rethink how I do the health check so it’s always fast.

Thank you for looking at my question.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.