Dynamically adding multiple domains and its limit

1. Caddy version (caddy version):

v2.0.0 h1:pQSaIJGFluFvu8KDGDODV8u4/QRED/OPyIR+MWYYse8=

2. How I run Caddy:

a. System environment:

Ubuntu 18.04 LTS

b. Command:

nohup caddy run --resume > /dev/null 2>&1 & 

d. My complete Caddyfile or JSON config:

beta1.tripkindle.com

root * /home/tripkindle
encode gzip zstd
try_files {uri} {uri}/ /index.html
file_server

3. The problem I’m having:

My use case for Caddy is that I will be serving an Angular Single Page Application which is white-labelled based on the domain that is pointing to the Caddy server. The white-labelling logic lives inside the Angular app.

There will be a signup form where I will be collecting custom domains that will be pointing to this Caddy Server which can grow with time. Currently, I have a Node.js API living on Heroku that SSHs into the Ubuntu machine running the Caddy server → gets the current caddy config with curl "http://localhost:2019/config/" → checks if new domains to be added aren’t already in the hosts array → performs the following command to update it’s config (where domainsToAdd is for example ["beta2.tripkindle.com", "beta3.tripkindle.com"] ):

curl -X POST -H "Content-Type: application/json" -d '${domainsToAdd}' "http://localhost:2019/config/apps/http/servers/srv0/routes/0/match/0/host/..."

This adds the domains known by Caddy and provisions an SSL for the new domains successfully.

Although it’s working fine, I want to confirm some points:

  1. Is this solution scalable? How many domains can be added to the host array? (Possibly a maximum value?)

  2. How can I be sure of certificate renewals?

  3. Is the weekly certificate rate limit increased to 50 for Caddy as per Rate Limits - Let's Encrypt?

  4. How can I make use of wildcard domain names for certificates? I tried using *.tripkindle.com at the first line of my Caddyfile but that does not work while specifying the complete domain name works.

  5. I read that Automatic HTTPS — Caddy Documentation could be useful for my use case since I won’t need to specify domain names but based on the increased handshake time and probable future deprecation I am not keen on using it. Is there any other way to make my use case easier?

  6. How can I revert to running the last JSON config. I am aware that caddy run --resume loads it when starting up Caddy but how can I rollback to the previous running config while it’s running?

  7. How can I utilise the log rolling for all access? I tried adding logs: {} to the JSON config after loading my initial Caddyfile but it only takes into account beta1.tripkindle.com mentioned in the Caddyfile for the logs and not the dynamically added domains.

  8. Are there any sane defaults for security and caching/performance when serving the SPA that are applied or if not what should I be reading about most?

  9. I have also implemented an endpoint to delete a domain from the hosts array with curl -X DELETE "http://localhost:2019/config/apps/http/servers/srv0/routes/0/match/0/host/${indexToDelete}". Is there a way to delete multiple hosts at once. I tried PATCH and it works but needs to replace the complete hosts array so I was seeking for an alternative.

  10. What happens if a domain is deleted from the hosts array? Is the certificate renewal stopped?

  11. Is there a way to convert Caddy JSON config to Caddyfile?

I also wanted to add that it has been an absolute breeze working with Caddy 2 and given it’s young age, the documentation is wonderful!

Thank you Matt and everyone else involved!

3 Likes
  1. I wouldn’t recommend it, it’s more brittle than using on_demand.

  2. Caddy renews the certs for all the domains it knows uses HTTPS. Nothing extra to do.

  3. Yes I believe so. Caddy is careful to back-off if it thinks it’s close to hitting rate limits.

  4. Two ways. You can either get wildcard certs by setting up the DNS challenge which requires compiling Caddy with a plugin that adds support for your DNS provider (easy with Docker, see the readme on docker hub for an explanation of how to use the builder image). Or, you can enable on demand TLS for the site which will issue a certificate for a domain the first time Caddy sees a request for it and keeps it renewed thereafter. For on demand, you should set up the ask global option with a simple backend service that will respond to requests from Caddy with a yes or no whether the domain is one you want to allow (i.e. that you have a customer for already). Caddy only lets you use * if you have either of those two things enabled.

  5. The future deprecation is quite unlikely, it’s just a warning that Caddy can’t guarantee that LE won’t add some limitations in the future that make it less viable. You would still need to specify domains, but via the “ask” option instead of hardcoded in the config. It would let you avoid making JSON config changes as well. Many companies use this in production already.

  6. You would need to either keep a backup yourself of the autosave.json file in the config storage directory before you push changes, or GET the config from the API and store it somewhere beforehand. FYI Caddy will automatically reject a config change if the syntax is broken and continue to use the old config.

  7. I think you would need to also add the domain to the logging section of your config as well. You might be able to use a wildcard in the logging config. I’m not too sure though.

  8. Right now Caddy doesn’t do any caching but there are a couple plugins out there that implement some caching functionality. Click on the wiki section of the forums (top-right in the top-nav on desktop), there’s a thread with the list of known plugins.

  9. I don’t think there is right now unfortunately. If you need to remove a bunch, you could replace the whole array (GET the whole list, modify it, then push it back up all together).

  10. Yeah Caddy will stop renewing the cert. I think it’s kept around in the storage as long as it’s valid. I’m not sure if Caddy removes domains it doesn’t have listed in its config from the storage actually. @matt could clarify.

  11. Nope. It’s a one-way adapter. The Caddyfile needs to do too much magic that doing the inverse is not feasible.

4 Likes

Thank you for answering @francislavoie, that’s really helpful. I’ll try on_demand TLS as well. Just to confirm, if I use the following Caddyfile, it should enable on-deman TLS, right?:

beta1.tripkindle.com

{
    email myemail@example.com
}

tls {
    on_demand
}

root * /home/tripkindle
encode gzip zstd
try_files {uri} {uri}/ /index.html
file_server

Also, is there a limit to the number of hostnames that can be added to the following array for SSL certificates?
http://localhost:2019/config/apps/http/servers/srv0/routes/0/match/0/host/...

Yes, except the global options need to come before the site label. See the Caddyfile structure here:

Also, I think you should use {path} rather than {uri}, because you don’t need the query params when checking for files on disk.

I don’t think so. You might start to see performance issues eventually since host matching is O(n) I believe. But otherwise, Caddy doesn’t impose any limits on matchers.

1 Like

Thank you again! I’ve come up with this Caddyfile now:
To confirm, at the ask endpoint (NodeJS API) I can parse the domain property from the query string and return HTTP status 200 or 500 if allowed or false respectively, right?

{
    email myemail@example.com
}

beta1.tripkindle.com

tls {
    on_demand
    ask {
        https://example-api.com/checkDomain
    }
}

root * /home/tripkindle
encode gzip zstd
try_files {path} {path}/ /index.html
file_server

This should now work for any domain that points to the Ubuntu machine running Caddy that the ask endpoint allows?

ask is actually a global option, not part of the tls directive:

Probably more like this:

{
    email myemail@example.com
    on_demand_tls {
        ask https://example-api.com/checkDomain
    }
}

beta1.tripkindle.com

tls {
    on_demand
}

root * /home/tripkindle
encode gzip zstd
try_files {path} {path}/ /index.html
file_server

Yeah that’s right. 400 would probably be better than 500 though, I’d say. But Caddy only expects a 200 for a good domain, anything else and it rejects the connection and doesn’t issue a cert for that domain.

I’d also like to add that you can run the caddy adapt and/or caddy validate commands with your config to check if it’s good before running Caddy with it. You’ll be told if the syntax isn’t acceptable.

Thank you so much!

For generating logs if I add

log {
	output file /home/logs/access.log
}

I hope it won’t just be restricted to say beta1.tripkindle.com that was mentioned in the initial Caddyfile

Oh also you’ll probably want to do beta1.tripkindle.com*.tripkindle.com, because otherwise Caddy would only accept requests to beta1.

And yeah, that’s right for the logs. It’ll log all requests to sites that match the site label.

That sounds great, thank you!

If I leave beta1.tripkindle.com, on_demand wouldn’t provision any certificate for say beta2.tripkindle.com? In that case I’ll change it to wildcard.

Also, in future if there needs to be a change, I can transition from on_demand into putting domains into the http://localhost:2019/config/apps/http/servers/srv0/routes/0/match/0/host/... array, right and it would work fine?

Yeah - I wouldn’t mix and match though. On-demand should fulfill your needs.

Certainly, will only keep on_demand then. Just need to be sure in case there are changes with LE and possible deprecation of the feature, that we can move the connected domains to the hosts array and they’ll continue to work fine.

Also, does Caddy restart automatically if I first run it with:

nohup caddy run --resume > /dev/null 2>&1 & 

I was having issues with the Caddyfile I put before for any domain pointing to my server. It only worked with tripkindle.com subdomains and not any other domain pointing to it. I have since replaced the wildcard and hostname (*.tripkindle.com) with :443 and now have the following, although I still see no logs being generated:

{
    email my_email@domain.com
    on_demand_tls {
        ask https://verifyDomain.com/checkIfAllowed
    }
}

:443

tls {
    on_demand
}

log {
    output file /home/logs/access.log
}

root * /home/tripkindle
encode gzip zstd
try_files {path} {path}/ /index.html
file_server

EDIT: I can now see the logs, maybe something was wrong with the earlier Caddyfile?

I recommend running Caddy as a service instead, it’ll be more reliable:

Also with your current caddy run command you’re just throwing away the stdout output from Caddy… don’t do that, there’s very important messages in there that you should have access to.

1 Like

I am late to this thread, and @francislavoie has done a great job with the discussion. I just want to chime in on a few points anyway, to add clarification where there was some uncertainty before.

I know of some Caddy instances that have tens of thousands of domains in this array, and it’s just fine so long as your machine has enough memory for the certificates. You can probably get up to the millions. If you don’t control the domain’s A/AAAA records, you should try to only add them to the list after you know that the customer has set their A/AAAA records properly. If you are not in control of them, then On-Demand TLS is better, as Francis suggested; but I recommend On-Demand only in that situation – it is usually better to just add the domains Caddy should manage certs for into your config directly if possible.

You can’t, because you (probably?) don’t control the domains or the other external infrastructure required to get validations from Let’s Encrypt (or whatever CA you’re using). Just keep an eye on the logs.

Which rate limit are you referring to? There are no special exceptions for Caddy, if that’s what you’re asking. If you need an exceptionally large number of certificates in an exceptionally short amount of time, you should ask Let’s Encrypt for an exemption. (This is true regardless of ACME client you’re using.)

On-Demand TLS can be more brittle in practice because it takes place during the first TLS handshake; however, if a client has reached your Caddy instance through the domain name, then that means its DNS has likely already propagated, so the ACME challenge should probably succeed, and historically we’ve seen these challenges take less than 5-10 seconds. Every other handshake should be just as fast as usual.

Still, I recommend using on-demand TLS only in situations where you don’t control the nameservers.

Just hold onto the previous config you gave Caddy, I suppose. Why do you need to do this? Caddy won’t run a config if it fails to load, it rolls back an invalid config automatically. If you need to do it yourself for some reason, though, just export the config before you change it. Caddy doesn’t keep the last N config loads. Just whatever is currently running.

That’s the best way to do it.

Yes.

Just the virtual memory limits of your machine and physical storage space.

Yep, but we can get that down to O(log(n)) easily enough. Let me know if you/anyone encounters slowness here and we can implement a binary search. (Wildcards excepted. I think…)

As far as certificate management goes, though, that all happens in the background, so you’re limited primarily by memory and storage.

No; if Caddy is not running it cannot start itself. That’s the job of your OS or process supervisor or service manager (or whatever it’s called).

3 Likes

Hi Matt, thank you so much for taking time out for the valuable clarifications.

That sounds really promising. For the DNS records, we will not be in control of when they are set by our customers but can be sure that they will at some point, be pointing their A records or a CNAME to beta1.tripkindle.com, which is already pointing to the server, for the domains the customer’s asked us to add.

I was just wondering if there was a dry run that I could do to make sure certificates will be renewed for the hostnames in the host array.

I read in some posts (like "Too many certificates" using caddy and docker - #4 by Patches - Help - Let's Encrypt Community Support) that the certificates and ultimately new domains that can be added to the hosts supported are 20 per week but since then I can see that the rate limit is 50 Rate Limits - Let's Encrypt so Caddy will be able to provision 50 new certificates per week, right?

Does this mean that when using on_demand TLS, it’s only the very first time that a domain pointing to Caddy is accessed that it takes longer to serve and will be smooth after that? I’m not feeling too sure if I should use the on-demand TLS feature or keep adding new domains to the hosts array programmatically as that might be a stable thing for the longer run?

This was more of a curiosity that might be useful if something feels wrong in the loaded config so all good here, thank you!

Is there a way that Caddy can roll logs generated by nohup caddy run --resume & so .nohup.out doesn’t get too large? I’m using caddy run --resume since caddy start doesn’t load up the programatically added hosts that were not part of the initial Caddyfile (with beta1.tripkindle.com domain) that was used to configure it. I was thinking of adding nohup caddy run --resume & to /etc/rc.localso that it runs everytime the system boots. Is there another way you would recommend?

I’ve also experienced that when I specifiy a single domain with the first loaded config and programmatically add hosts, the logs aren’t taking into account all of the hosts, might be something wrong with my config, will test it out and post back if it doesn’t work,

Thank you so much @matt!

The only “dry run” is to actually run it, generally speaking any kind of test that isn’t the real thing won’t actually fully emulate the real thing and won’t turn up all the problems that might occur.

You could try set up something to run validations off the ACME staging server instead to check first, but Caddy will retry against the staging server anyway if it encounters any errors on production.

To be clear - this is 50 domains per week per registered domain.

That means you could requisition separate certificates for sub1.example.com through sub50.example.com in one week, but going for sub51.example.com will get rate limited until next week. (This is one pretty good use case for a wildcard!)

If you’re requesting certificates for different registered domains, the rate limit you’ll probably want to note is this one:

For users of the ACME v2 API you can create a maximum of 300 New Orders per account per 3 hours.
Rate Limits - Let's Encrypt

But Caddy v2 should keep on top of this for you without any major issues. Basically if every customer has a single unique registered domain, you could onboard ≤100 customers per hour indefinitely. Just throw the lot at Caddy, it’ll handle it all.

That’s exactly correct. First handshake is slow, the rest are as normal since Caddy will have the certificate on hand after that.

Adding and removing programmatically has the benefit of a certain level of permanence and Caddy knows to continue maintaining the cert for the domain you’ve chosen. On-Demand TLS certificates will drop out if Caddy stops getting requests for them, I believe. Francis would probably set up On-Demand TLS, based on his posts; Matt appears to recommend the programmatic approach.

I myself like the simplicity and lack of requirement for extra software external to Caddy, that would actually perform this programmatic updating, so On-Demand TLS is appealing to me in that regard. But the programmatic approach also has its merits. I’d definitely strongly weigh the latter if I was already writing any kind of API interaction for my back end systems.

I don’t believe so. Running as some kind of service will direct stdout to a logger, e.g. journalctl for systemd.

If you’re opposed to a process supervisor, yeah this’ll probably do the job - although the aforementioned stdout output won’t roll. I’ll invite anyone else on the forums with ideas to chip in here, though.

Very curious! If you can get a repro, we’d love to investigate that one.

5 Likes

Excellent answer.

@matt @Whitestrake
Thank you so much. I have a few more points I was wondering to clear out:

Our customers will be filling out a form to enter domain names that they would be pointing to our Caddy server and currently these would be added to Caddy config with curl -X POST -H "Content-Type: application/json" -d '${domainsToAdd}' "http://localhost:2019/config/apps/http/servers/srv0/routes/0/match/0/host/..." once they submit that form. I am now considering if I should use On-Demand TLS instead.

  1. If I go with the programmatic approach of adding domains to the hosts array, is there any disadvantage if I add their domains before their DNS records are pointing to the Caddy Server?
  2. For On-Demand TLS, are the certificates then, only renewed if Caddy sees the domain requesting files close to the certificate expiry date but otherwise work stable? What if the certificate expires and then Caddy sees a request from that domain?
  3. Is On-Demand TLS, a scalable solution as well?
  4. If in future if we move from On-Demand TLS to the programmatic approach of adding domains, will it be possible to do so without causing issues in the SSL certificates? (Given that we will maintain a record of whitelisted domain names Caddy should serve on, for example, a cloud storage)

The obvious one is that Caddy might try (and then fail, predicably) to requisition a certificate.

The rest of your questions are starting to get really in-depth at this stage - lots of nuance to consider, highly technical and of course it’s all fairly subjective to your needs. At this point I’d strongly recommend looking at Caddy’s business level support: https://caddyserver.com/business

Engaging with professionals to consult on the specifics will allow them to really answer questions like “is XYZ a scalable solution?”, which is more or less impossible to answer generically with confidence (everything is scalable, to an extent, and then no further - nothing scales forever). While they’re at it they should be able to give you very precise answers as to any internal mechanisms of the software you might need understanding / guarantees for.