Make sure to configure an ask endpoint. Using on_demand is dangerous otherwise.
There’s no silver-bullet, it entirely depends on your setup and performance characteristics.
But that seems fine.
With that, if dialing fails then it will only try at most once more, because dial 600ms > wait interval 1s > retry attempt at 1.6s > dial 600ms > wait interval 1s > 3.2s which is outside the try_duration window – so you may want to tweak the numbers if you want to retry more than once.
The fail_duration option turns on passive health checks. Setting it to 20s means if one of the backends fails to connect (dial timeout is reached for example) then it’ll be marked unhealthy, and won’t be retried for 20s. I think 20s is pretty long because that means if all your backends go down, it’ll take up to 20s for them to become available again after being fixed. You can adjust the numbers if you run into problems.
Also, you may want to configure max_fails if you don’t want to immediately wait 20s if one of the backends has a dial problem, it could probably be tried a few more times before it’s actually taken out forcing the 20s wait.
You might want to turn on active health checks, which may notice earlier if a backend has a problem. The benefits of that depend on your traffic level though, if you have pretty low/infrequent traffic, then periodic active health check would give you a baseline requests-per-second where problems are noticed even if no real requests are coming in.
@francislavoie ma man, thank you for reaching out. It says max_fails is the number of failed requests within fail_timeout. I can’t seem to find anything on fail_timeout. Is that the same as fail_duration?
Hey, I am just curious… Does this setting ip_hash mean that the server assignment is done in a round-robin fashion, but once the user gets a server assigned he/she is pretty much served from that server only?
The docs show the syntax. lb_policy cookie [<name> [<secret>]]. Takes an optional name (the [ ] mean optional) of the cookie that will be written back to the client, and an optional secret to HMAC hash the cookie value. The name by default will be lb.
Random, then the name of the upstream is hashed then set as a cookie then proxies. If the cookie is present in the request, it loops through each upstream and hashes them to compare to find a match. If a match is found, goes there. If no match, random again.
@matt@francislavoie I have updated our prod settings to use the cookie setting, and it is working well! I also noticed there is another sticky session setting, namely ip_hash. There is not much documentation on that, so please let me know if I am understanding it correctly.
cookie is set in the client’s browser. If the client happens to delete the cookie, then another server might be assigned to the client on a new page load.
ip_hash is set based on the client’s IP. So, even if the client deletes all the cookies that client is still kept on the same server ( unless their IP changes )
That’s right. It’s done mathematically though (no memory). If an upstream server that would normally be picked goes down, it’ll fallback to the other. If that server comes back online, it won’t stay on that fallback, it’ll revert back to the first one. If you add additional upstreams, there’s no guarantees of which server will be used.