Load balance health check based on underlying state

hstove · February 14, 2023, 4:02pm

1. Caddy version:

2.5.2

2. How I installed, and run Caddy:

I’m using Caddy with Docker.

a. System environment:

Using Docker on an Alpine linux VM. Using fly.io

b. Command:

caddy run --config /etc/caddy/Caddyfile

c. Service/unit/compose file:

FROM caddy:2.5.2-alpine

COPY Caddyfile /etc/caddy/Caddyfile

CMD ["caddy", "run", "--config", "/etc/caddy/Caddyfile"]

d. My complete Caddy config:

My Caddyfile is:

Note that PROXY_HOST is a string with (sometimes) multiple endpoints, like http://server-a.com http://server-b.com. I am doing this as a more manual way of getting the behavior I am asking about in this post.

:8080 {
	respond /health-check "OK" 200

	reverse_proxy * {$PROXY_HOST} {
		lb_policy least_conn
		health_uri /v2/info
		health_interval 60s
	}
}

3. The problem I’m having:

I am running multiple p2p services (blockchain nodes) behind my load balancer, and it’s common that they are not perfectly “in sync” with the external network. For example, my underlying servers return JSON similar to this under /status:

{
  tip: 1000
}

Where “tip” is essentially the highest “block height” that the node has synced to.

My goal is to have my load balancer only send traffic to nodes that are at the highest “tip” - I don’t want to send traffic to nodes that are less synced than another. This would be especially helpful in cases where one node goes down or restarts, and might be well behind sync with the others.

I would prefer not to modify the source code of the underlying server itself, but I can run a custom http server in the same VM that can expose a custom “health check” endpoint to compare the sync height of the local VM against the others behind my load balancer. And then use Caddy within that VM to reverse proxy a custom health check endpoint to my custom service.

Some pseudocode explaining what this approach might look like:

async function checkHealth(localNode: string, otherNodes: string[]) {
  let highestOtherNode = 0;
  const myNodeHeight = await getHeight(localNode);
  otherNodes.forEach(node => {
    const height = await getHeight(node);
    if (height > highestOtherNode) highestOtherNode = height;
  })
  
  return myNodeHeight >= highestOtherNode;
}

I assume an approach like this can work. A question would be:

Is it possible to write a custom module that can do this within my Caddy reverse proxy, so that I don’t need to include a custom service in each underlying VM?
Can I URL encode my PROXY_HOST string into my “health check” endpoint in my Caddyfile? This would allow my internal “health checker” to dynamically check each endpoint, vs having to configure each of my VMs manually.
Can I access and expose the individual underlying host in the health_uri property?

Something like:

reverse_proxy * {$PROXY_HOST} {
  health_uri "/check_health?nodes={encodeUriComponent($PROXY_HOST}&self={REVERSE_PROXY_HOST}"
}

4. Error messages and/or full log output:

5. What I already tried:

At the moment, I’m only using one node behind my load balancer, and being diligent with monitoring to ensure that the node is in good health. I would like to scale this out, but without sending traffic to less synced nodes.

6. Links to relevant resources:

francislavoie · February 15, 2023, 12:14am

Please upgrade to v2.6.4.

You don’t need this line in your Dockerfile. The default CMD is retained, so it’s not necessary.

What you’d want is to write a dynamic upstreams module, instead of relying on health checks. That would give you a lot more control over which upstreams Caddy will connect to. See reverse_proxy (Caddyfile directive) — Caddy Documentation

You’ll want to implement a module that conforms to the UpstreamsSource interface, and the module ID should look like http.reverse_proxy.upstreams.<your-module>

Essentially you’d feed the list of upstreams to your module via config, then your module would take care of periodically fetching the state it needs from your upstreams, sort them, and only return the top one at any given time. You can read the source code for the SRV and A modules to get an idea of how it should be structured.

hstove · February 27, 2023, 6:30am

Thanks for your help! I’ll have to dig in to this, but it seems like dynamic upstreams is exactly what I need.

system · March 29, 2023, 6:30am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.