Loadbalancing Not Working


(Gyver Chang) #1

Hi there, I am currently load balancing a web app to 3 different instances as seen below:

To test this set up, I deliberately put the first host at port 3000 down. By right, all the traffic should be routed to the two other hosts at port 3002 and 3003, however, traffic is still forwarded to the dead host.

Are there any special setup to allow the health check functions to work?

Thank you!


(Matthew Fay) #2

Health check functions are configured with the health_check, health_check_port, health_check_interval, and health_check_timeout subdirectives.

https://caddyserver.com/docs/proxy

That said, with a fail_timeout and a try_duration configured, Caddy should avoid routing your traffic to the down instance for the timeout duration, and should be able to move to a working instance for the triggering request.

I tried with the following Caddyfile:

:2015 {
  proxy / :2016 :2017 :2018 {
    policy round_robin
    fail_timeout 1m
    try_duration 1s
    max_fails 1
  }
  log / stdout "{upstream}"
}

:2017 {
  status 200 /
}

:2018 {
  status 200 /
}

(Naturally with no server running at :2016).

After a series of requests to localhost:2015, I see the following output from Caddy on stdout:

http://:2017
http://:2018
http://:2017
http://:2018
http://:2017
http://:2018
http://:2017
http://:2018
http://:2017
http://:2018
http://:2017
http://:2018

With no failed requests, indicating that the failure handling is working as expected. My Caddy version is 0.11.0.


(Gyver Chang) #3

Thanks for the explanation Matthew!

I did the same experiment that you did and got the same results, very interesting.

I will do more testing with different configurations to see what I can come up with.

Regarding the ip_hash policy, if the backend that the user is “assigned” to fails, does caddy automatically switch to an alternative backend?

Also, does the health check functions require specific feature support on the backend servers or is it universal?

Thank you!


(Matthew Fay) #4

No worries!

That’s a good question, actually! I’m afraid I don’t know, and the documentation doesn’t specify.

It’s useful information, so it might be worth opening an issue on the Github requesting the docs be updated to explain.

It requires only a standard functional HTTP server, returning a status code. You can of course implement your own behaviour here to respond with different status codes under different circumstances, but I expect simply health checking / will have good results for most backends. It’s good for checking if a backend is returning a 500-series error, for example, because a 500 series error is still a valid response and won’t trip the fail_timeout (only a non-responsive server would).