Force cert renewal?

goldenratio · May 2, 2018, 10:31pm

I moved a working Caddy server behind CloudFlare’s CDN. As such, I don’t expect the HTTP or TLS-SNI challenges to continue working, and I’ve added the relevant configs to use the DNS challenge instead:

tls {
    dns cloudflare
}

I’ve also set the environment variables CLOUDFLARE_EMAIL and CLOUDFLARE_API_KEY per the automatic-https#dns-challenge doc (which oddly I cannot link to here).

So… I think everything should be configured correctly, but… how do I test this out without waiting for the certificates to expire naturally? Is there a way to force certificate renewal? I don’t really want to wait ~ 60 days to find out something’s wrong.

Whitestrake · May 2, 2018, 11:22pm

The best practice method of testing would be to set up a separate test Caddyfile:

subdomain.example.com {
  tls {
    dns cloudflare
  }
}

And run Caddy manually, specifying the LetsEncrypt staging API and the test Caddyfile:

caddy -agree -log stdout -ca https://acme-staging-v02.api.letsencrypt.org/directory -conf /path/to/test/Caddyfile

It will retrieve an untrusted certificate using the same process as the production endpoint, which will highlight any potential DNS/server configuration issues. Test to your heart’s content; there are greatly relaxed rate limits on the staging server.

Staging Environment - Let's Encrypt
https://caddyserver.com/docs/cli#flags

Looks like the system moderator was a little overzealous, and flagged posting links to the same domain (caddyserver.com) in a number of places. It picked up some of your older posts, too, but they should be cleared up now.

goldenratio · May 3, 2018, 7:21pm

Thanks for the reply @Whitestrake. The example test command was helpful, especially the bit about the staging server.

I did as suggested, using a copy of the Caddyfile for testing. 9 out of 11 domains obtained certificates via the DNS challenge without incident, but the other 2 domains had what seemed like transient issues with Cloudflare’s API:

2018/05/02 21:13:33 [example1.com] failed to get certificate: acme: Error 403 - urn:ietf:params:acme:error:unauthorized - No TXT record found at _acme-challenge.example1.com

2018/05/02 21:17:36 [www.example2.com] failed to get certificate: acme: Error 400 - urn:ietf:params:acme:error:dns - DNS problem: NXDOMAIN looking up TXT for _acme-challenge.www.example2.com

At this point, Caddy fails to start and drops me back into my shell. Since this is just a test using the CLI, I just re-ran the command, and it obtained those 2 certificates without issue.

However, this has not been the most reassuring test. 2 out of 11 domains failed on the first run, due to issues I can’t really control, and Caddy apparently just stops, as opposed to serving traffic for the 9 domains it obtained certs for.

Also, at this point, my production instance of Caddy is still using the existing certs obtained with HTTP or TLS-SNI challenges. And you seem to be telling me there’s nothing I can do about that (using Caddy’s toolset) until the certs expire? This seems problematic for a few reasons:

I’ve not really truly tested the production infrastructure. Ex: I used acme staging servers.
I’d like to know the production configuration will work now, while all this is fresh in my mind, not in 60 days.
If a problem does arise when the production configuration hits the natural renewal cycle, I won’t be sitting in front of the server watching logs, and from my test, it seems like Caddy will fail and stop serving traffic for all domains. (I will enable monitoring, but let’s treat that as an aside).

For #3, perhaps systemd will restart the service? I’m not sure, and this seems like another reason to test the production configuration now.

So is there really no way to force renewal of the certificates? Certbot has this option via the “–force-renewal” argument, which I’ve successfully used in the past. I am aware of the caveats with regard to rate limits.

I think I can probably hack around this by deleting the existing certificates on disk, but I’d rather not resort to that if possible.

Whitestrake · May 3, 2018, 11:38pm

Caddy will never take your sites offline if it’s already serving them. Because you configured it to manage your certs and it ran into a problem, it failed straight away, and refused to start. This decision is designed to highlight critical issues so the administrator can fix them now rather than have them cause problems later.

NXDOMAIN means the name server indicated the domain does not exist. That sounds like a bit more than a transient Cloudflare API issue; that’s LetsEncrypt telling you a DNS lookup totally failed on their end.

The lack of TXT record for the first domain is interesting, since Caddy usually waits for DNS propagation before it tells LetsEncrypt to proceed. There’s some text missing from that log that outlines that process.

To be unambiguous, there’s no flag or config you can use to tell Caddy to renew certificates on disk that are currently valid. Might have merit as a feature request.

That’s what it’s there for; it’s identical to the production endpoint, with the exception of issuing untrusted certificates, for test purposes.

Once you’ve worked out the kinks in your last two domains on the staging server, run Caddy manually again against the production endpoint to generate the live certificates you need.

Caddy won’t ever bring your sites down (even if the certificates start failing to renew).

goldenratio · May 4, 2018, 6:36pm

Caddy will never take your sites offline if it’s already serving them. Because you configured it to manage your certs and it ran into a problem, it failed straight away, and refused to start. This decision is designed to highlight critical issues so the administrator can fix them now rather than have them cause problems later.

Caddy won’t ever bring your sites down (even if the certificates start failing to renew).

This all makes sense. Thanks for clarifying.

–

NXDOMAIN means the name server indicated the domain does not exist. That sounds like a bit more than a transient Cloudflare API issue; that’s LetsEncrypt telling you a DNS lookup totally failed on their end.

Except everything worked fine on the second manual run. I don’t have a good explanation for the NXDOMAIN issue either, but it worked fine only a minute or two later.

–

Once you’ve worked out the kinks in your last two domains on the staging server, run Caddy manually again against the production endpoint to generate the live certificates you need.

Oh… ha. That’s exactly what I needed. I just didn’t get that this would happen on a manual run. Thanks!

I think I’m all set now. The only odd thing I ran into is 1 of the 11 domains just would not get past the DNS validation until I manually created a TXT record on CloudFlare like “_acme-challenge.host.example.com”. After that it worked fine. I think there are some hiccups with the CloudFlare API.

Appreciate all your help @Whitestrake!

system · August 2, 2018, 6:36pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.