How to handle more than 20k domains in Caddy?

  • I want to migrate 70k domains (parked at a registrar) to Caddy. They currently run on nginx+certbot.
  • I’m running tests on a separate server, copied the SSL certificates, and generated a JSON config for Caddy using a script. I’d like to use the existing certificates without reissuing them, having Caddy only renew them as they expire. This is because if we try to reissue them all at once, we’ll hit the Let’s Encrypt certificate issuance limit for a single IP.
  • It worked for a few test domains - old certificates were used, and new ones were issued for test domains without existing certs. For these test domains, the A record was pointed to the test server to allow certificate issuance.
  • With 70k domains, Caddy took a long time to start, showed errors about being unable to renew certificates that were due for renewal (which makes sense, as the DNS A record points to the production server, not the test one). It seemed to start but domains wouldn’t load, showing an SSL clipper error.
  • I added the pberkel/caddy-storage-redis module.
  • Added to the JSON:
    “storage”: {
    “module”: “redis”,
    “address”: [“localhost:6379”],
    “db”: 0,
    “key_prefix”: “caddy_”,
    “timeout”: “30”
    },
  • Confirmed it works with a small number of domains. Sites load, and certificate records are added to Redis.
  • It didn’t work with 70k domains.
  • I had a theory that it wasn’t working because some domains had expired certificates that Caddy couldn’t renew. But it worked fine with 1k domains - I added a test domain to my local hosts file, and the placeholder from the test server with HTTPS and the old certificate loaded.
  • I started testing to see how many domains it could handle: 1k, 5k, 10k still work. At 20k domains, it can’t cope, and the browser shows an ERR_SSL_VERSION_OR_CIPHER_MISMATCH error instead of loading the domain.
  • According to TOP, the server still has resources available (it’s a Vultr VM with 8GB RAM for testing). TOP shows low load average, with 2.6GB of free memory.
  • How can I figure out what exactly is the bottleneck, and how can I increase the number of domains Caddy can handle?
1 Like

Can you please post a portion of your Caddyfile? How do you know that the domains are using the old certificates, are you manually specifying that in the site block?

1 Like

Welcome, both of you!

If you have that many certs, it’s helpful to enable on-demand TLS, which will only load certificates when needed, instead of all at once at config-load-time. Caddy should start pretty much instantly then.

But yes, we’ll need to please see your config file, any error messages you’re getting, etc.

PS. If you haven’t already, we recommend signing up for a sponsorship once you start deploying Caddy at that scale (or really any business use case)!

3 Likes

This is worked json for 4 domains, 2 of them with old certs, and 2 without.
With redis.

{
   "storage": {
    "module": "redis",
    "address": ["localhost:6379"],
    "db": 0,
    "key_prefix": "caddy_",
    "timeout": "30"
  },
  "apps": {
    "tls": {
      "certificates": {
        "load_files": [
          {
            "certificate": "/etc/imported-certificates/testingsnow.top/testingsnow.top.crt",
            "key": "/etc/imported-certificates/testingsnow.top/testingsnow.top.key"
          },
          {
            "certificate": "/etc/imported-certificates/equinoxgroup.co.uk/equinoxgroup.co.uk.crt",
            "key": "/etc/imported-certificates/equinoxgroup.co.uk/equinoxgroup.co.uk.key"
          }
        ]
      },
      "automation": {
        "policies": [
          {
            "subjects": ["testingsnow.top", "equinoxgroup.co.uk", "caddy1.aka-root.com", "caddy2.aka-root.com"],
            "issuers": [
              {
                "module": "acme"
              }
            ]
          }
        ]
      }
    },
    "http": {
      "servers": {
        "myserver": {
          "listen": [":443"],
          "routes": [
            {
              "match": [{"host": ["testingsnow.top"]}],
              "handle": [
                {
                  "handler": "reverse_proxy",
                  "upstreams": [{"dial": "localhost:9000"}],
                  "headers": {
                    "request": {
                      "set": {
                        "X-Forwarded-For": ["{http.request.remote.host}"],
                        "X-Real-IP": ["{http.request.remote.host}"]
                      }
                    }
                  }
                }
              ]
            },
            {
              "match": [{"host": ["equinoxgroup.co.uk"]}],
              "handle": [
                {
                  "handler": "reverse_proxy",
                  "upstreams": [{"dial": "localhost:9000"}],
                  "headers": {
                    "request": {
                      "set": {
                        "X-Forwarded-For": ["{http.request.remote.host}"],
                        "X-Real-IP": ["{http.request.remote.host}"]
                      }
                    }
                  }
                }
              ]
            },
            {
              "match": [{"host": ["caddy1.aka-root.com"]}],
              "handle": [
                {
                  "handler": "reverse_proxy",
                  "upstreams": [{"dial": "localhost:9000"}],
                  "headers": {
                    "request": {
                      "set": {
                        "X-Forwarded-For": ["{http.request.remote.host}"],
                        "X-Real-IP": ["{http.request.remote.host}"]
                      }
                    }
                  }
                }
              ]
            },
            {
              "match": [{"host": ["caddy2.aka-root.com"]}],
              "handle": [
                {
                  "handler": "reverse_proxy",
                  "upstreams": [{"dial": "localhost:9000"}],
                  "headers": {
                    "request": {
                      "set": {
                        "X-Forwarded-For": ["{http.request.remote.host}"],
                        "X-Real-IP": ["{http.request.remote.host}"]
                      }
                    }
                  }
                }
              ]
            }
          ]
        }
      }
    }
  }
}

1 Like

So one thing to consider is to move those certs into your configured storage, in the paths that Caddy expects, with the metadata JSON files next to them, so that Caddy will just start using them. Otherwise it will load those certs at load-time, instead of when needed, slowing down your startup times and config reloads.

(We could probably document this better; but if you find 1 cert+key+metadata set in storage for an automated certificate, basically mimic that structure for all your certs.)

1 Like

Thanks for advices!
I create proper file structure and generate .json file, but it not worked.
Can’t figure out how to generate “_uniqueIdentifier”, so i just make it pseudo random.
This is generated sample result, all other fields info extracted automaticly from certs.

{
  "sans": [
    "testingsnow.top"
  ],
  "issuer_data": {
    "url": "https://acme-v02.api.letsencrypt.org/acme/cert/048161B7D0C9D3A165207C4D1C069FC17450",
    "ca": "https://acme-v02.api.letsencrypt.org/directory",
    "renewal_info": {
      "suggestedWindow": {
        "start": "2024-10-03T12:13:32Z",
        "end": "2024-10-05T12:13:32Z"
      },
      "_uniqueIdentifier": "migrated-048161B7D0C9D3A165207C4D1C069FC17450",
      "_retryAfter": "2024-08-27T22:22:57.318632Z",
      "_selectedTime": "2024-10-04T00:13:32Z"
    }
  }
}

Thanks for this advice, on-demand TLS works perfectly!

Also i add module GitHub - caddy-dns/powerdns: Caddy module: dns.providers.powerdns to issue certs.
Around 4 sec and new domain alive with https!

I hope soon we start using caddy in prod.

1 Like

Can you show the file structure, etc, in relation to the other managed certs?

I wonder if the “renewal_info” object can be omitted, actually.

1 Like

Now i don’t have file structure, certs goest directly to reddit, caddy don’t create files when reddit storage is activated.
We deceide skip old certs, and just move domains slowly to awoid lets encrypt limits, generating new certs from caddy.

I’m not sure I follow this train of thought/actions. It would be helpful if your company can get a sponsorship so I can assist you personally over email or a call or something.

But, if it’s working for you (?) then that’s good. :100:

1 Like