Can you pass in the full upstream to an active health check?

dylanschultzie · June 27, 2024, 8:20pm

1. The problem I’m having:

Note that this is quite similar to a previous answered question by me: HOW TO: Pass health_uri the specific upstream endpoint as a query parameter but I believe it’s different enough to warrant asking in a different way as this might be possible and I’m just not seeing how at this time. If this also doesn’t make sense, a simple No is fine so you don’t need to spend time responding.

Is there a way to reference an upstream during active health checks? Right now you can check what the Host header is during an active health check:

Host: 5.0.0.17:53336

which is great - but I’d like to use the original, full upstream rather than with the modified port. The original upstream being:

5.0.0.17:25857

Is there a way to pass that in as an argument to the active health check?

2. Error messages and/or full log output:

NA

3. Caddy version:

v2.8.2

4. How I installed and ran Caddy:

a. System environment:

Ubuntu 22.04, systemd

b. Command:

sudo systemctl start caddy.service

c. Service/unit/compose file:

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
Type=notify
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile --force
TimeoutStopSec=5s
Restart=always
RuntimeMaxSec=43200
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

d. My complete Caddy config:

(active-lb) {
        lb_policy ip_hash
        lb_retries 2
        health_uri /health
        health_port 53336
        health_interval 10s
        health_timeout 15s
        health_status 2xx
        health_headers {
            Full-Upstream {args[0]}
        }
}

:1000 {
  reverse_proxy 5.0.0.17:25857 5.0.0.17:25858 {
   import active-lb upstream
  }
}

5. Links to relevant resources:

francislavoie · June 30, 2024, 5:39am

Your config doesn’t make sense to me. The point of health_port is to provide a way to query a different port a given host to report the health of that upstream.

But in your case, both your upstreams use the same IP address, so the health check’s result is basically valid for both.

If you use different IP addresses for each upstream, then this would be easier to deal with.

What you’re trying to do seems strange though. Why isn’t your actual upstream self-reporting its health status?

Currently there isn’t because there hasn’t been a need for it, but we could wire up the replacer for health_headers to contain a placeholder with the current upstream dial address.

But as above, I’m not really convinced what you’re trying to do makes sense, fundamentally.

dylanschultzie · July 4, 2024, 2:59pm

Hey @francislavoie,

Thanks for the reply. I agree this is a strange need.

Why isn’t your actual upstream self-reporting its health status?

They do, however they’re not incredibly reliable/trustworthy in a basic sense. To flush out the reasoning a bit, this is for relatively high performance computing where uptime is crucial. The basic health check works right now, but it’s insufficient.

My thought is writing an intermediate service which does a more thorough health check on each service on the local server. Say you have 3 services on the same server, utilizing different ports:

5.0.0.1:25857
5.0.0.1:25858
5.0.0.1:25859

Each of these you can check their health using /health. /health reports the last moment at which is was working, as well as a general “am I working?” response. You’d think if the last “working time” and “now” diverged, it would report as not working, but it doesn’t.

This is where the intermediary service comes in. It can use the /status endpoint (and other checks) to do a more thorough examination of each upstream. In the example above, the intermediary service sits on port 53336.

My issue right now is that there’s no way to communicate the port of the upstreams to the intermediary service. In this example, all 4 services sit on the same server.

For what it’s worth, here’s how it currently works, which is as you suggest:

mywebsite.com { 
  reverse_proxy 5.0.3.3:12857 5.0.4.2:12857 {
    import lb-config
  }
}

(lb-config) {
	lb_policy client_ip_hash
	lb_retries 3
	health_uri /cosmos/base/tendermint/v1beta1/syncing
	health_interval 30s
	health_timeout 5s
	health_status 2xx
	health_body syncing"\s*:\s*false
}

And in doing this, I can use this intermediate service because I can indeed pass in the various necessary ports because the services are on different IPs, as you say. Indeed, the problem really arises in having many of the same service sitting on the same server, but this is something of a requirement.

In this example, the intermediate service sits alongside each corresponding upstream. In the original post example, there would be a single intermediate service sitting alongside 3 upstreams.

francislavoie · July 6, 2024, 6:11am

I think this should do what you need:

Basically you’d do:

        health_headers {
            Full-Upstream {http.reverse_proxy.active.target_host}
        }

I’m not sure if that’s the exact placeholder name we’ll go with, but that’s the idea.

I didn’t test it, if you’d like to build from that branch and confirm that it does what you want that’d be great.

dylanschultzie · July 8, 2024, 7:57pm

Hey @francislavoie thanks for this, I greatly appreciate it.

Unfortunately it’s not quite what I’m looking for, and is probably worth reverting out of the codebase as the Host header provides the same information:

+Host: 5.0.0.17:53335
User-Agent: Go-http-client/1.1
+Full-Upstream: 5.0.0.17:53335
Accept-Encoding: gzip

I played around with building/rebuilding caddy with different options locally, and I suspect health_port is taking place too early in replacing the upstream’s port: caddy/modules/caddyhttp/reverseproxy/healthchecks.go at master · caddyserver/caddy · GitHub.

Given that the port change is happening up the stack, I’m not sure there’s a great place to add the header following Caddy’s paradigms.

francislavoie · July 9, 2024, 2:44am

Can you try this?

repl.Set("http.reverse_proxy.active.target_dial", upstream.Dial)

francislavoie · July 9, 2024, 2:48am

I think networkAddr (line 279) is the value we want (i.e. passed through the replacer so some placeholders would have been replaced already, e.g. env vars). I think we can pass that in as another arg to doActiveHealthCheck and use that for headers.

dylanschultzie · July 9, 2024, 3:10am

Sure thing, I’ll try them first thing in the morning.

dylanschultzie · July 9, 2024, 3:37pm

upstream.Dial works!

5.0.0.33 - - [09/Jul/2024 15:35:04] "GET /health HTTP/1.1" 200 -
+Host: 5.0.0.33:53336
User-Agent: Go-http-client/1.1
+Full-Upstream: 5.0.0.33:8545
Port: 8545
Accept-Encoding: gzip

Should I throw up a PR to update that value?

francislavoie · July 9, 2024, 4:48pm

Sure if you’d like I think it would be best to wire up networkAddr (add another arg to doActiveHealthCheck) just in case people use a placeholder for the upstream in their config though.

dylanschultzie · July 9, 2024, 4:49pm

Got it, I’ll test that out and submit a PR shortly.

Thanks again for your help.

dylanschultzie · July 9, 2024, 4:58pm

PR submitted:

github.com/caddyserver/caddy

reverseproxy: Add placeholder for networkAddr in active health check headers

caddyserver:master ← dylanschultzie:schultzie/target_dial

opened 04:57PM - 09 Jul 24 UTC

dylanschultzie

+3 -3

Follows: https://github.com/caddyserver/caddy/pull/6440 Previously `hostAddr`… would duplicate the `Host` header. This now appropriately passes the full original upstream: ```diff 5.0.0.33 - - [09/Jul/2024 15:35:04] "GET /health HTTP/1.1" 200 - +Host: 5.0.0.33:53336 User-Agent: Go-http-client/1.1 +Full-Upstream: 5.0.0.33:8545 Accept-Encoding: gzip ``` Resolves forum post: https://caddy.community/t/can-you-pass-in-the-full-upstream-to-an-active-health-check/24718

And confirmation of functionality:

5.0.0.33 - - [09/Jul/2024 16:57:37] "GET /health HTTP/1.1" 200 -
+Host: 5.0.0.33:53336
User-Agent: Go-http-client/1.1
+Full-Upstream: 5.0.0.33:8545
Accept-Encoding: gzip

dylanschultzie · July 9, 2024, 6:13pm

@francislavoie as a fast-follower (this may need to be a new forum post), is there a way to set the hostAddr as localhost?

I was inspired by: addr.IsUnixNetwork.

github.com

caddyserver/caddy/blob/master/modules/caddyhttp/reverseproxy/healthchecks.go#L309


      
          				addr.StartPort, addr.EndPort = hcp, hcp
          			}
          			if addr.PortRangeSize() != 1 {
          				h.HealthChecks.Active.logger.Error("multiple addresses (upstream must map to only one address)",
          					zap.String("address", networkAddr),
          				)
          				return
          			}
          			hostAddr := addr.JoinHostPort(0)
          			dialAddr := hostAddr
          			if addr.IsUnixNetwork() {
          				// this will be used as the Host portion of a http.Request URL, and
          				// paths to socket files would produce an error when creating URL,
          				// so use a fake Host value instead; unix sockets are usually local
          				hostAddr = "localhost"
          			}
          			err = h.doActiveHealthCheck(DialInfo{Network: addr.Network, Address: dialAddr}, hostAddr, networkAddr, upstream)
          			if err != nil {
          				h.HealthChecks.Active.logger.Error("active health check failed",
          					zap.String("address", hostAddr),
          					zap.Error(err),

If I can do this, I can sit the intermediate health_check service alongside caddy, then pass the upstream to the health_check service using the header option above.

Using the previous output as an example of what I’m saying:

5.0.0.33 - - [09/Jul/2024 16:57:37] "GET /health HTTP/1.1" 200 -
+Host: 127.0.0.1:53336
User-Agent: Go-http-client/1.1
+Full-Upstream: 5.0.0.33:8545
Accept-Encoding: gzip

The benefit here would just be that fewer health_check services would exist; right now for every caddy process running, there’s ~30 servers. Therefore, there’d be 1:30 caddy:health_checker, whereas if I could set localhost as the hostAddr I could have 1:1 caddy:health_check.

francislavoie · July 9, 2024, 6:30pm

I don’t think there’s a way to do that now. Maybe a health_host config option could be wired up to override it. We’d document it that it should be used in conjunction with health_headers to pass the original upstream along I guess.

Alternatively (more flexibly) we could make active health checking modular, i.e. an interface where we can pass the upstream to some plugin module which is called at regular intervals, that way health checking could be done with custom logic instead of via the configured transport.

dylanschultzie · July 9, 2024, 6:37pm

Alternatively (more flexibly)…

This sounds like quite a bit of a rework.

Are you opposed to me implementing health_host as you suggest and adding documentation for it?

francislavoie · July 9, 2024, 8:32pm

Yeah we can go with health_host for now. Maybe we call it health_upstream actually, because it doesn’t configure the host only (i.e host:port) but the whole upstream. It overrides health_port etc.

dylanschultzie · July 9, 2024, 8:52pm

Beauty. I’ll do health_upstream.

dylanschultzie · July 9, 2024, 9:05pm

Actually - does adding health_host make more sense than health_upstream as adding health_upstream would add a layer of confusion if someone tries to set both:

        lb_policy ip_hash
        lb_retries 2
        health_uri /health
        health_host 127.0.0.1
        health_port 53336
        health_status 2xx
        health_headers {
            Full-Upstream {args[0]}
        }

vs

        lb_policy ip_hash
        lb_retries 2
        health_uri /health
        health_upstream 127.0.0.1:53336
        health_status 2xx
        health_headers {
            Full-Upstream {args[0]}
        }

…Actually, health_upstream might be cleaner.

francislavoie · July 9, 2024, 10:17pm

We can just throw an error if they use health_port when using health_upstream.

system · August 8, 2024, 10:18pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.