How to force renewal of Let's Encrypt certificates

I’m on the same position. I really hate the whole thing, including communication from Let’s encrypt.

We use version 2.3 with custom modules and redis storage. Upgrading to 2.4 when doing 10 req/s in 2 days without doesn’t let much room for testing if all.

  1. If we upgrade and restart, will all certificates renew automatically… what if we have 1000 certificates? will that triggers some rate limits on Let’s encrypt?

  2. we have multiple servers, we want to try take one server out of the load balance, upgade to 2.4.3 and test it with a few accounts. after upgrading (i guess it’s just replacing executable and restarting the service) will all 1000+ certificates start to renew at once? If yes, will the 2.3.0 version on the other servers will work with certificates issues by 2.4.3 version as they share the same redis storage?

1 Like

Regarding deleting the certs, this means that our users will have a delay until they got renewed?

Also, in our redis storage we have entries for a domain:

  1. “caddytls/ocsp/domain”
  2. “caddytls/certificates/acme-v02.api.letsencrypt.org-directory/domain/domain.key”
  3. “caddytls/certificates/acme-v02.api.letsencrypt.org-directory/domain/domain.crt”
  4. “caddytls/certificates/acme-v02.api.letsencrypt.org-directory/domain/domain.json”

Should we delete all of them? Isn’t a way to force renewal without deleting and having downtime?

I would recommend taking a look at JSON Config Structure › apps › tls › automation › policies › renewal_window_ratio.

By setting apps->tls->automation->policies[...]->renewal_window_ratio to a high value you can start renewing all your certificates immediately. As always YMMV but I managed to start renewing a bunch of certificates we happen to use.

what value should we set to renew them right away?

I wish there is a command on caddy to manually trigger renewal.

I think this should be the best approach as it won’t take our users website down until it renews… some users have lots of domains with traffic on them, it’s not easy to delete all of them and hope everything works.

if we can trigger or force a renewal without messing with existing certs will be amazing.

Same here. Running Caddy v2.2.2 h1:Ha3bvEvkb/GLGEX648/qI5zTt6uJCnfQhZHmZBxhzDY= for a SaaS business with some custom-built modules and a little bit below 100 custom domains going through caddy. Some domains are not owned by us. Certificates are stored on AWS EFS to share it between loadbalanced caddy instances.

$ ls -la /mnt/efs/fs1/caddy/certificates/acme-v02.api.letsencrypt.org-directory/ | wc -l
81

Luckily we use “on_demand_tls” so I do not expect any downtime except an initial slow request per domain. In our case those single domains are not frequently visited (5-10 reqs/min).

I tested one domain and it worked like a charm.

$ mv /mnt/efs/fs1/caddy/certificates/acme-v02.api.letsencrypt.org-directory/random-domain-from.com /backup/caddy/random-domain-from.com
$ service caddy stop
$ service caddy start

The first request against this (and only this domain) performed the certificate renewal.
Will delete all certificates now and let caddy do it’s amazing job!

@matt thank you so much for bringing this amazing piece of stable and reliable software!

1 Like

(Was sleeping)

Everyone… it can take a few days for Caddy to detect the revocation and do the replacement. Caddy will DEFINITELY not let them expire, it won’t even let their OCSP staple expire. OCSP staples are usually valid for just a few days to a week, and Caddy refreshes them halfway through, so if will see the Revoked status in at most a few days once they are revoked. In the meantime Caddy will continue to serve Good OCSP staples.

You should be fine even if it takes a few days. Remember that Let’s Encrypt won’t revoke for another day or so yet.

It a-okay to be proactive and take extra initiative, of course. Especially if your users have clients that may not honor valid, signed OCSP responses. In that case just delete the certificates from Caddy’s storage and reload, and Caddy will right away get new certificates.

(I’m of course only talking about Caddy instances from the last ~6 months or so since this feature was released. V2.4.2 or higher. If you’re not already on the latest version, please upgrade!)

That is one way to do it, but the right value depends on you. You can set to 1 and it will always renew every time it scans certificates. Of course in production this will quickly run you up against duplicate cert rate limits enforced by Let’s Encrypt. But 1 will guarantee it renews them all. Just be sure to remove the custom value right away afterwards.

Thanks for chiming in.
Our clients require a high uptime rate and having lots of certificates, we can’t risk anything. We saw many times that we get too many requests error and new certificates don’t issue for hours, we can’t let this happen for clients where their sales are directly affected by this.

How often the certificates are scanned? So if i set it to 1 when can we expect to trigger the renewal?

We have like 1000+ certificates which should be renewed until Friday. we also can’t afford downtime.

Hard to say, but it’s unlikely. They’ll renew as their OCSP staples become stale. OCSP staples are scanned every hour. If they were all obtained at the same time or stapled OCSP at the same time, then yes they’d all renew at the same time, but I figure this is unlikely.

Btw some LE rate limits don’t apply to renewals.

Yes, but they won’t know to renew the cert so they won’t even look in storage. You can reload them though and they’ll get the new certs. Still, please keep Caddy up to date in the future.

Ok, thanks.
So this is what I’ll do:

  1. take one server out of load balancer an upgrade caddy to latest version
  2. do some tests to make sure everything works, including custom modules
  3. if it works, upgrade caddy on all servers
  4. once that’s done, set renewal_window_ratio=1 for one of the servers , reload caddy and wait 1-2 hours
  5. pray that 1000+ clients websites that host their checkout pages don’t go offline and won’t get myself and our clients screwed
1 Like

OCSP staples are scanned every hour. But if you change config (by setting that value to 1, for example) Caddy will immediately renew the certificate regardless of scan time, because it always checks when the config is first loaded.

Only that many? That’s fine. Remember that downtime is a function of both server, client, and CA when it comes to certificates. Caddy will do its best to not let your sites go down. There are a number of factors though that affect this which you may not be able to control, to name just two for example:

  • CA rate limits (1000 won’t be too much of a problem, maybe 300 orders every 3 hours, but Caddy will just retry until it can keep getting more, including trying ZeroSSL as a fallback)
  • Clients that may ignore or reject signed, valid OCSP staples – not much you can do about their trust decisions unless they make it configurable (unlikely)

Seeing a lot of users that rely on Caddy for their business. That’s awesome! I’d recommend please sponsoring the project so we can better help you. This feature was only possible in the first place thanks to sponsors like ZeroSSL and several others:

1 Like

thanks. just to be extra cautious, if i change the value to 1 on renewal_window_ratio and do a sudo systemctl caddy reload , is there any chance to break existing certificates?

I mean, “any chance” is a pretty wide descriptor so I hesitate to say absolutely “no” but I think in the general case, you’d be fine; just be sure to remove that value afterward.

i upgraded to 2.4.3 and the server i took out of the load balancer is down, it doesn’t serve request anymore. After i replaced caddy with the new version and reloaded the service i saw this error:

6 [ERROR] While deleting old OCSP staples, unable to load staple file: unable to decrypt data for ocsp/
many of them for each domain.

After that, i couldn’t access any domain on the server. the logs don’t show any error. I even tried to put back the old 2.3 version, not working. I restearted caddy, not working

btw, we use proxy-protocol and redis storage if that helps

Just wondering… why not upgrade to the latest version?

Sounds like a problem from the redis plugin. You’ll have to ask the author about that, I have no clue. GitHub - gamalan/caddy-tlsredis: Redis Storage using for Caddy TLS Data

Looks like caddy is working though

yes, that seems to be the issue. will use 2.3 until i find a fix. i’ll set renewal_window_ratio=1 on the old caddy version.

Beginner question: how do I set this value using Caddyfile not json? Or this is not available on 2.3.0? Unfortunately i can’t use other version because of redis issue

It’s not configurable via Caddyfile at this time (neither on 2.3.0 or latest) because it has never been relevant for anyone to change.

Did you try to delete the OCSP data from Redis? If they’re deleted from storage, then Caddy will refresh them next time it runs that process.

i deleted 2 of them 30 minutes ago, will check 1 hour later.

Later edit: more than 1 hour passed and still nothing in the logs. Can be more than 1 hour on 2.30 version?

Btw, looking at journal log I’m seeing this for each domain every 2 days “advancing OCSP staple”. I hope that’s not when the OCSP is checked.

Isn’t possible to push that configuration via API?

Yep, absolutely. Even if you’re using a Caddyfile.

:warning: I haven’t tested these instructions, I probably got these slightly wrong, be sure to test in a staging or dev environment first.

First, do a GET /config/ to get Caddy’s current config:

curl "http://localhost:2019/config/"

Then identify the path to the relevant TLS automation policy. It should be at apps/tls/automation/policies/N, where N is an index into the array (0-based as usual).

If you find it (look at subjects for your affected domain name) then you can do:

curl -X PUT \
	-H "Content-Type: application/json" \
	-d '1' \
	"http://localhost:2019/config/apps/tls/automation/policies/N/renewal_window_ratio"

If you don’t see any policies, it’s probably implicit, just create one:

curl -X POST \
	-H "Content-Type: application/json" \
	-d '{"subjects": ["example.com"], "renewal_window_ratio": 1}' \
	"http://localhost:2019/config/apps/tls/automation/policies"

Then be sure to clear the setting when you are done:

curl -X DELETE \
	-H "Content-Type: application/json" \
	"http://localhost:2019/config/apps/tls/automation/policies/N/renewal_window_ratio"
2 Likes

Thanks a lot @matt Once this is over I’ll try to convince my company to donate. Right now the only thing that worked is to delete the certificate completely, restart caddy and follow the link to issue a new one.

All other tries with deleting the OCSP and waiting failed (waited 3 hours).