We run a site farm with up to a thousand hosted websites with different domains. Sites are frequently published (serving static files specific for each site), unpublished (serving static files to a global default folder), etc. In both cases, ssl is done by letâs encrypt.
When publishing/unpublishing only one site, it all goes well.
But when doing bulk actions on sites, thing may become a bit more complex.
Consider doing things sequentially: Bulk publishing 20 sites, would result to 20 API calls to the Caddyâs API endpoint.
So 20 API calls, lead to 20 config reload, hence:
20 Ă websocket connections losses (websockets auto reconnects, but still, loading spinners everywhere :D)
20 Ă calls to letâs encrypt
Eventually, caddy will timeout if too much API calls are done
What I would have love is:
1st. Call a caddyâs API to disable auto-reload (maybe with config lock to prevent concurrent updates ?)
2nd. Do all my config change sequentially as before
3thd Call a caddyâs API to commit the changes and restart config auto-reload
(Yes, think like a DB transaction )
I know I can just pull all the json config, do my things and then repush all the config, but it as a âtoo much powerâ and error prone feeling
Anyhow, thanks for caddy, itâs admin API , auto letâs encrypt, etc. Itâs a very handy web server!!!
Config reloads by themselves are quite lightweight. They do tidy up to avoid leaking resources, but this can often be minimized with the right config.
I donât really know a good way to solve this yet. The only alternative I know of is to leave the WS dangling but that of course leaks resources. What if the new config doesnât have that WS endpoint, for example? What if the way the connection is established has changed in the new config? A config change signals that you donât want clients to continue using the old config, so having them reconnect is the only way I know of to enforce that.
Outstanding/in-progress ACME transactions will be canceled to avoid leaking resources. If youâre seeing a lot of ACME transactions every time you load config, you might consider on-demand TLS instead.
This should not be the case. How can we reproduce this?
If the changes only happen within a specific scope (/config/foo/bar/... path), you can scope your changes to only that part of the config to limit inadvertent harm done. I donât really understand how doing a batch change on your end is âtoo much powerâ given that the whole point of the API is to give you the control/power over your server.
Well, it seems that everything for my use case is actually already possible!
On demand TLS seems a perfect fit (and may even actually remove my needs for frequent config changes ). The only thing is that my admin site is now a critical dependency for caddy to run consistently over time (for the automation ask param).
Regarding websockets connections losses, I have read other similar topics on it. I understand how itâs a complex thing, and itâs absolutely not a blocker for me, just a small annoyance for my serviceâs admin users.
Regarding timeouts, I tried to publish few hundreds sites simultaneously. But all in all, the limiting factor quickly came from letsencrypt rates limiting of course. It was few weeks ago, so maybe the timeouts werenât from caddyâs side but from my http client now that you say it. I canât remember correctly