Config updates acquire a lock so that only 1 update happens concurrently. The majority of the time spent in config reloads is shutting down HTTP servers gracefully: this means starting the new listeners (sometimes OSes take many ms for this system call), waiting for in-flight requests to finish on the old listeners, and then finally closing out the old listeners. We have to block during this because otherwise it’ll be impossible to know whether the reload was successful. And we can’t have multiple reloads happening at a time because, well, data races and other worse, awful things.
While Caddy itself can handle hundreds of config reloads per second (I tested this by disabling all graceful features and network listeners then hammering it really hard), it’s the OS that’s too slow to keep up. Those system calls are buggers.
To ease contention, you could configure less graceful reloads, for example by shortening the grace period: JSON Config Structure - Caddy Documentation
However, the way I’d recommend is that you batch your API updates instead. Like I said, Caddy can handle frequent config updates fine, but you have to cater to what your OS is capable of. So if you’re giving it 100 config updates so close together and your OS can’t keep up (or you can’t gracefully cycle servers fast enough), you can simply combine them into one update. What this looks like depends on what your API calls are. If you’re just adding hostnames, this is really, really simple.
As you noticed, something else to consider is certificate operations. These can take a few seconds to a few minutes. To avoid leaking resources and to avoid spamming CAs with transactions, we cancel them when the associated config is unloaded. We have to cancel them because there’s no way to know if it’ll be needed with the new config. We could wait and find out, but that requires yet another goroutine that – you guessed it – waits until the config is fully loaded and then does some sort of diff of the two configs and then applies only the delta. But then we’d have to do that for potentially every single config parameter. And that goroutine that waits is itself another resource, so it would still end up leaking resources for frequent reloads.
So, the only “queueing” that Caddy is doing is the natural blocking of HTTP requests on a mutex over the config value. I guess that certainly works as a queue, but if your config updates come in bursts of 100, I would batch them instead.
If you’re sending 100 concurrent config updates continuously (like, 100 every second or something, over the lifetime of the server, rather than in bursts), then you should find a way to limit
I think with more developer resources we can probably find ways to make config reloads even faster under pressure, but it’d be mostly heuristics and really specific optimizations for the most common use cases, like finding ways to correctly and safely reuse HTTP listeners. (Remember that to Caddy, a config is mostly a black box. It’s up to each individual module to provision itself and clean up after itself. And the HTTP server is one such module.)