To test this set up, I deliberately put the first host at port 3000 down. By right, all the traffic should be routed to the two other hosts at port 3002 and 3003, however, traffic is still forwarded to the dead host.
Are there any special setup to allow the health check functions to work?
That said, with a fail_timeout and a try_duration configured, Caddy should avoid routing your traffic to the down instance for the timeout duration, and should be able to move to a working instance for the triggering request.
That’s a good question, actually! I’m afraid I don’t know, and the documentation doesn’t specify.
It’s useful information, so it might be worth opening an issue on the Github requesting the docs be updated to explain.
It requires only a standard functional HTTP server, returning a status code. You can of course implement your own behaviour here to respond with different status codes under different circumstances, but I expect simply health checking / will have good results for most backends. It’s good for checking if a backend is returning a 500-series error, for example, because a 500 series error is still a valid response and won’t trip the fail_timeout (only a non-responsive server would).