HOW TO: Pass health_uri the specific upstream endpoint as a query parameter

dylanschultzie · February 16, 2024, 6:54pm

1. The problem I’m having:

Is there a way to pass in an upstream to health_uri?

reverse_proxy 1.1.1.1:26657 2.2.2.2:26657
lb_policy round_robin
    lb_retries 2
health_uri localhost:1000/health/?endpoint={UPSTREAM}
    health_interval 5s
    health_timeout 15s
    health_status 2xx
    health_body syncing"\s*:\s*false

Basically the idea is I want to be able to query a script locally and pass in the endpoint to check the health of the node.

Is there any way to achieve this?

Why?

Some nodes I am using can’t query a health endpoint directly. For example a json-rpc endpoint:

curl -s -m2 -N -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' "https://dymension-jsonrpc.lavenderfive.com:443"

results:

{"jsonrpc":"2.0","id":1,"result":false}

Using health_uri, I could pass in the upstream to a script that would check the health status and return the appropriate value.

2. Error messages and/or full log output:

NA

3. Caddy version:

v2.7.6 h1:w0NymbG2m9PcvKWsrXO6EEkY9Ru4FJK8uQbYcev1p3A=

4. How I installed and ran Caddy:

Downloaded from caddy.server

Deleted most of this as this is a pretty open-ended question.

dylanschultzie · February 16, 2024, 7:45pm

Looking at the code, it does NOT appear this is possible.

github.com

caddyserver/caddy/blob/master/modules/caddyhttp/reverseproxy/caddyfile.go#L327-L337


      
          		case "health_uri":
          			if !d.NextArg() {
          				return d.ArgErr()
          			}
          			if h.HealthChecks == nil {
          				h.HealthChecks = new(HealthChecks)
          			}
          			if h.HealthChecks.Active == nil {
          				h.HealthChecks.Active = new(ActiveHealthChecks)
          			}
          			h.HealthChecks.Active.URI = d.Val()

However, there is an “up” variable that may make this achievable.

francislavoie · February 16, 2024, 11:22pm

This doesn’t make sense. health_uri is supposed to be a request path+query to request on the current upstream. You could change the health_port to use a different port on the same upstream’s IP, but not a totally different upstream.

Active health checks are meant to allow upstreams to self-report their health. If it can’t be reached and confirmed healthy, then it’s marked unhealthy.

You could write your own dynamic upstreams provider which can perform its own health checks.

Maybe we should add support for sending a POST + request body for health checking.

dylanschultzie · February 18, 2024, 4:52pm

This doesn’t make sense.

Yeah, I totally agree. This was certainly a gross workaround to try and mold Caddy to do something that aught not be done.

I looked into doing a dynamic upstream, but the docs suggested active health checks don’t work with them. reverse_proxy (Caddyfile directive) — Caddy Documentation

Maybe we should add support for sending a POST + request body for health checking.

This would be supremely helpful.

Ultimately what I ended up doing was utilizing the health_port option in addition to args, like so:

(lb-config) {
  lb_policy round_robin
    lb_retries 2
  health_uri /cosmos/base/tendermint/v1beta1/syncing
    health_port {args[0]}
    health_interval 5s
    health_timeout 15s
    health_status 2xx
    health_body syncing"\s*:\s*false
}

Where the new port is implementing the logic I wanted to achieve using a different upstream. This isn’t ideal, as that means the health logic will need to be maintained among 200+ different upstreams needing the same logic, but it works.

This is why being able to change the upstream would be helpful. Right now the same script is duplicated 200+ times to act as a health-check-proxy.

If you could change the upstream, that single script could be passed the upstream & port and do the same logic for all 200 duplications.

Thanks for your insight.

francislavoie · February 18, 2024, 5:07pm

Yeah, you’d implement your own active health checks in your dynamic upstreams plugins (only handing Caddy the upstreams that are healthy). Gives you full control over how you want to do health checking.

Holy moly, 200 upstreams???

Probably something we will need to do. PRs welcome if you need it sooner rather than later. Shouldn’t be too complicated.

If you want it to be prioritized, consider sponsoring

dylanschultzie · February 18, 2024, 5:12pm

Yeah, you’d implement your own active health checks in your dynamic upstreams plugins (only handing Caddy the upstreams that are healthy). Gives you full control over how you want to do health checking.

Ahhh I didn’t realize that’s how it’d work. I’ll look deeper into it.

Holy moly, 200 upstreams???

Yeah, trust me, it’s a doozy of a directory of Caddyfiles. Ansible to the rescue.

If you want it to be prioritized, consider sponsoring

I already am! However - this isn’t high enough priority for me to press the point. Thanks again for your input.

matt · February 18, 2024, 10:52pm

Just catching up on this thread; interesting use case-- we could probably expand the capabilities of the active health checker. I’ll talk to Francis and others and see what is the best way to go about this.

dylanschultzie · February 20, 2024, 5:32pm

Following up here -

Turns out this solution of redirecting to a different port that has a sitting health checker is only partially effective.

grpc.website.com {
  reverse_proxy {
	transport http {
		versions h2c 2
		dial_timeout 3s
	}
    to IP:22390
    import lb-config 22317
  }
}

Due to the network being upgraded to h2c ://for grpc, passing in the port doesn’t work as there’s no way to change the transport method back to http://.

So close and yet so far.

dylanschultzie · February 29, 2024, 6:32pm

@matt can you speak to the difficulty with which to add a change in transport for active health checks from, for example, h2c:// to http://?

matt · February 29, 2024, 11:18pm

Mild to moderate. Right now the transport used is the same as that of the reverse_proxy handler. So if it’s designed to use h2c, we’d have to make a separate transport configrable for active health checks. (which, I originally considered, is not really a great idea because now your health check uses a different protocol entirely than what you’re actually proxying).

So, difficult? Not really… Good idea? Also not sure

dylanschultzie · March 1, 2024, 3:21pm

Thanks for the quick answer.

I guess to round out the discussion, is there currently a way to do active health checks on h2c/grpc endpoints using caddy at this time?

matt · March 1, 2024, 4:39pm

I’m not very familiar with gRPC — can you help me get a sense of what a health check looks like in that scenario?

dylanschultzie · March 3, 2024, 4:52pm

Sure!

For gRPC health checks, you generally must provide the Proto Service/Content/Method like so:

Service Name:

Query

Proto Method

params

Proto Content

syntax = "proto3";
package cosmos.bank.v1beta1;

service Query {
  rpc Params(QueryParamsRequest) returns (QueryParamsResponse) {
    option (cosmos.query.v1.module_query_safe) = true;
    option (google.api.http).get               = "/cosmos/bank/v1beta1/params";
  }
}

message QueryParamsRequest {}

message QueryParamsResponse {
  Params params = 1 [(gogoproto.nullable) = false, (amino.dont_omitempty) = true];
}

message Params {
  option (amino.name) = "cosmos-sdk/x/bank/Params";
  repeated SendEnabled send_enabled         = 1 [deprecated = true];
  bool                 default_send_enabled = 2;
}

message SendEnabled {
  option (gogoproto.equal) = true;
  string denom             = 1;
  bool   enabled           = 2;
}

Then from there the message is built to make requests. As part of the gRPC standard there should be a /health endpoint to hit, but again, it requires building the query in the method outlined above.

If we circle back around to the caddy active health checks, in theory it would look like:

  health_uri /health
  health_interval 5s
  health_timeout 15s
  health_status 2xx
  health_body syncing"\s*:\s*false

But again that’d miss all the proto methods that have to be tied in.

To be clear, my question about whether grpc health wasn’t intended to be snide, more covering bases so I understand if I need to expend additional effort trying to see if caddy does support grpc health checks. It sounds like not.

Circling back around, this is why I think being able to swap out the scheme may be the easiest method for supporting grpc health checks.

system · April 9, 2024, 4:24pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.