Not sure how you would define or determine “become available” (return Status 200 OK?) or “hold requests”.
As a very simple starting point, setting lb_policy to round_robin would seem useful. You might also experiment with lb_try_duration or lb_retries on top of that, and perhaps lb_try_interval depending on the nature of synchronization.
I don’t think Caddy has any built-in way to force requests to be processed serially, so if the COM wrapper doesn’t have at least some synchronization logic this is not likely to work well.
It’s been quite a while since I’ve done anything with COM, so I’m not sure what kind of threading issues you’re dealing with there. But presumably there’s some way to detect that there’s already a request being processed by a particular worker process, and you’d need to make sure it returns an appropriate HTTP error that Caddy’s load balancer will recognize and use to try another backend. You’d need to set fail_duration in the Caddyfile so the passive health check logic is turned on, and perhaps unhealthy_status. In principle, this should give you the best utilization and fairness, because requests are shuffled around with small delays till a currently-unused backend can handle them immediately.
Alternatively, just setting long timeouts, round-robin backend selection, and using a mutex or something within the wrapper to force serialization would allow you to do almost as well, delaying some proportion of requests until the previous one finished up, but spreading that out fairly evenly. (It’s possible the COM component already actually does this.) This does have the potential to leave some backends under-utilized even under heavy load, though.
In either case, obviously, request latencies and even error rates can get really bad in the long tail if you’re close to your sustained throughput limits.
unhealthy_request_count marks an upstream “unhealthy” (meaning “don’t pick me for new requests”) if it has N or more requests already in flight. Using lb_try_duration has Caddy hold requests for up to N seconds if all upstreams are already handling requests.
Though lb_try_duration is not a queue, it a polling retry, so there’s a chance a new request can skip the line and get handled before one that’s waiting from lb_try_duration gets to go through. Depends how high the throughput you’re trying to handle is. Obviously you could scale up with more upstreams to increase your amount of concurrent requests.
and reverse-proxy module JSON Config Structure - Caddy Documentation that describes try_duration and try_interval which seems to be identical in purpose to lb_try_duration and lb_try_interval.
I’m referencing Caddyfile config. Caddyfile does map to JSON config (Caddyfile is an adapter, its job is to produce JSON config). It’s named differently in JSON because it’s structured within a load_balancing object, but it’s flat in the Caddyfile with lb_ as a prefix.