ZeroSSL + DNS Challenge failing often (Route53 plugin)

jjanyan · October 6, 2021, 11:18am

I don’t. From what I can tell, you suggested it. (see quote below)

I think this quoting feature is partly to blame for the confusion going on here. Leaving out surrounding context makes paragraphs and sentences mean entirely different things. In this situation, the immediately preceding paragraph outlined that I needed to switch from EC to RSA for better integration with Microsoft services and how I did that using DNS challenges.

I appreciate everyone’s time and the caddy project. I’ll work it out with ZeroSSL, figure out the DNS auth challenge myself, or find another solution.

Cheers

jjanyan · October 6, 2021, 3:26pm

Update: I forked certmagic, overrode the 90 seconds to 300 (excessive, i know) and built a custom caddy binary using xcaddy. It worked on the first try. I’ve tried it on a few more domains and so far so good. I believe increasing that timeout would also make LE DNS auth challenges go from 90% success rate to 100%.

So, my question is if it’s possible/reasonable to bubble up that 90 seconds from certmagic to the Caddyfile to be override if necessary?

Cheers

matt · October 6, 2021, 5:49pm

Fascinating. We can probably make it configurable. Until then though, maybe a timeout of 300s is a fair enough change from 90s. Did your TLS handshake really block that long? How long did it take on average?

jjanyan · October 6, 2021, 7:15pm

I just finished regenerating all of them, but I did get to time 2 of them. 113.28 seconds and 96.33 seconds. Maybe 1-2 seconds of that is my application responding.

So 90 seconds isn’t too far off with this small sample. I think 90s is fair when using HTTP challenges. For DNS challenges, maybe 180s is a good default?

The only issues I had just now when regenerating all the certs to ZeroSSL is when the acme TXT records weren’t cleaned up properly from previous attempts. I believe this is also due to the 90s timeout and not giving the Route53 plugin a chance to clean things up properly.

That’s a problem waiting to happen even with increased timeouts. If previous attempts time out (maybe the ACME provider is having serious issues) and it leaves a mess, it won’t be able to renew until someone manually intervenes and deletes the old record.

Maybe always give the plugin time to cleanup after the timeout is exceeded?

@matt for the timeout specifically, should I file a ticket on github? if so, on certmagic or caddy?

Cheers

matt · October 6, 2021, 7:59pm

Cool, thanks – I’ll update to 180s, but yeesh, that is a long time.

CleanUp is invoked as evidenced by the logs, but it’s possible that the route53 client will immediately abort on a cancelled context instead of performing what was asked of it: route53/client.go at 39e18f642387609ed98576a63f3f1789758b6851 · libdns/route53 · GitHub

That is just a guess. I am not familiar enough to be sure (I didn’t write that code nor do I use route53). If verified to be correct, though, maybe we should be passing a new, uncancelled context to CleanUp.

Eh, true, but that might depend on the implementation of the individual DNS provider in libdns, or the provider’s API.

The timeout is in certmagic, and I’ll push the update soon.

system · November 3, 2021, 12:36am

This topic was automatically closed after 30 days. New replies are no longer allowed.