Dynamic rate limits with rate_limit plugin across multiple Caddy replicas

1. The problem I’m having:

I’m exploring using Caddy with the rate limit module as a rate limiting reverse proxy. Our API is constructed around tasks (what those tasks are isn’t relevant to this discussion. If you’re curious it’s the IETF’s Distributed Aggregation Protocol) and its request paths generally look like tasks/{task-id}/resource, where {task-id} is a URL-safe Base64 blob unique to each task. We want to apply different rate limits to different tasks. So far so good: I can craft a Caddyfile containing rate limits with a path or path_regexp that matches on a particular {task-id} value.

I also want to run multiple instances of the rate-limiter and have them do what I’ve seen called “global rate-limiting”. No problem: the rate limiting plugin supports what it calls “distributed” rate limits using a few different storage backends like Redis.

We also have the perhaps unusual goal of being able to dynamically set per-task rate limits. Caddy’s administration API seems like a good fit for this: I can do PATCH /config/[path-to-rate-limits] to add or modify rate limit entries.

Where I get stuck is dynamic rate limits and distributed rate limits. It seems like the admin API governs a single instance of Caddy, so that if I wanted to update rate limits, I’d have to hit each Caddy replica’s admin API and update the rate limits one by one. That seems risky since the rate_limit module’s docs caution that “[i]n order for [distributed rate limits] to work, all instances in the cluster must have the exact same RL zone configurations.”

One way forward I can see would be to have the Caddy replicas get their RL zone configurations from a common config store. In my case, I am running all this in Kubernetes, so they would all get their Caddyfile from the same ConfigMap. So then updating rate limits means updating that ConfigMap and then restarting/reloading the Caddy replicas so they pick up the new configuration. I think this approach has the same problem with replicas having different configs loaded while a config change is being rolled out, though.

I think ultimately my question is about managing configuration across replicas of Caddy and not specifically about rate limiting. Is there some prior art or a plugin I can use for this?

On the other hand, am I reading too much into the rate_limit module’s caution about consistent rate limit zone configurations? Is it OK if replicas briefly have an inconsistent view of RL config during updates?

2. Error messages and/or full log output:

No error messages (yet); this is an architecture question.

3. Caddy version:


4. How I installed and ran Caddy:

n/a, I haven’t tried this yet

a. System environment:

b. Command:

Please use the preview pane to ensure it looks nice.

c. Service/unit/compose file:

Please use the preview pane to ensure it looks nice.

d. My complete Caddy config:

Please use the preview pane to ensure it looks nice.

5. Links to relevant resources:

My first thought is to utilize the dynamic config loading functionality. Standard caddy comes with HTTP config loader module.

Inside the admin>config key, there’s a field called load_delay that tells Caddy to re-load the config from the dynamic config loader module after the specified duration.

1 Like

@tgeoghegan This sounds like it would be good to hop on a call to discuss your requirements and questions – and if we get you set up with a sponsorship we can better help support your business. Want to book a time at https://matt.chat and we can work through your questions?

Interesting, thank you for the pointer to the config loader! However, as far as I can tell, this will load config once when Caddy starts and never again, so I’d still be responsible for restarting Caddy or otherwise making it reload its config should it change. That’s about the same as the situation where I plumb in the config as a file.

@matt: Thank you, I’d love to talk. I’ve scheduled a time slot via the link you provided. Talk to you soon!

Gah, there’s a bug in our docs backend that’s causing a config property to not show here:

Along with persist and load, there’s also load_delay, which will cause Caddy to load the config after a specified time. And if the pulled config also has that set, it will continue the cycle. Maybe that will help?

Absolutely – thanks for booking! We’ll go over what’s in this post and talk about sponsorship options to help support your business more going forward.