ZeroSSL + DNS Challenge failing often (Route53 plugin)

jjanyan · October 4, 2021, 12:35am

1. Caddy version (`caddy version`):

v2.4.3 h1:Y1FaV2N4WO3rBqxSYA8UZsZTQdN+PwcoOcAiZTM8C0I= (linux version)
and
v2.4.5 h1:P1mRs6V2cMcagSPn+NWpD+OEYUYLIf6ecOa48cFGeUg= (macos version)

2. How I run Caddy:

systemd

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
Type=notify
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

a. System environment:

$ lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.2 LTS
Release:	20.04
Codename:	focal

b. Command:

sudo systemctl start caddy.service

c. Service/unit/compose file:

see above

d. My complete Caddyfile or JSON config:

{
    debug
        on_demand_tls {
                ask http://127.0.0.1:5000/_domain_check
        }
    acme_ca https://acme.zerossl.com/v2/DV90
    acme_eab {
        key_id  nope
        mac_key nope-nope
    }

}

:443 {

    # API RELATED CONFIGS

    # allow password reset
    @api_password_reset {
        header_regexp apihost Host api\.(.*)
        path /password_reset*
    }
    handle @api_password_reset {
        redir https://www.{re.apihost.1}{uri} permanent
    }

    # allow activate
    @api_activate {
        header_regexp apihost Host api\.(.*)
        path /activate*
    }
    handle @api_activate {
        redir https://www.{re.apihost.1}{uri} permanent
    }

    # allow admin
    @api_admin {
        header Host api.*
        path /admin*
    }
    handle @api_admin {
        reverse_proxy 127.0.0.1:5001
    }
    # allow crossbar-api-clients
    @api_client {
        header Host api.*
        header User-Agent crossbar-api-client
    }
    handle @api_client {
        reverse_proxy 127.0.0.1:5001 {
                fail_duration 0s
                max_fails 100000
                unhealthy_status 5xx
        }
    }

    # api static assets
    @api_static {
        header Host api.*
        path /static*
    }
    handle @api_static {
        file_server /static/* {
            root /var/www/cb/api/api/
        }
    }

    # api media assets
    @api_media {
        header Host api.*
        path /media*
    }
    handle @api_static {
        file_server /media/* {
            root /var/www/cb/api/api/
        }
    }
    # send non crossbar-api-clients to 403 - must go last!
    @api_the_rest {
        header Host api.*
    }
    handle @api_the_rest {
        header {
            Content-Type "text/html; charset=UTF-8"
        }
        respond "Forbidden 禁止の" 403
    }

    @marketing_app header Host a.crossbar.org
    handle @marketing_app {
        reverse_proxy 127.0.0.1:5003
    }

    @www_app header Host crossbar.org
    handle @www_app {
        file_server /static/* {
            root /var/www/cb/www/
        }
    }
    handle @www_app {
        reverse_proxy 127.0.0.1:5002
    }

    # APP RELATED CONFIGS
    @app header Host www.*

    # serve static files
    handle @app {
        file_server /static/* {
            root /var/www/cb/app/
        }
    }
    # proxy to uwsgi server and/or redirec to www
    handle @app {
        reverse_proxy 127.0.0.1:5000
    }

    # send non-www to www
    @needs_www {
        not header Host api.*
    }
    handle @needs_www {
        redir https://www.{host}{uri}
    }

    # old domain redirects
    @crossbarhq_root header Host crossbarhq.com
    handle @crossbarhq_root {
        redir https://crossbar.org{uri} permanent
    }
    @crossbarhq_www header Host www.crossbarhq.com
    handle @crossbarhq_www {
        redir https://crossbar.org{uri} permanent
    }


    tls josh.anyan@nope.org {
        on_demand
        dns route53 {
            max_retries 10
            aws_profile "default"
        }
        ciphers TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    }

    log {
        output file /tmp/caddy.log {
            roll_size 100MiB
            roll_keep 10
            roll_keep_for 336h
        }
    }
}

3. The problem I’m having:

When using ZeroSSL + DNS challenge it often fails to generate a cert.

4. Error messages and/or full log output:

2021/10/04 00:18:32.803	DEBUG	tls.handshake	no matching certificates and no custom selection logic	{"identifier": "claytonhockey.com"}
2021/10/04 00:18:32.803	DEBUG	tls.handshake	no matching certificates and no custom selection logic	{"identifier": "*.com"}
2021/10/04 00:18:32.803	DEBUG	tls.handshake	no matching certificates and no custom selection logic	{"identifier": "*.*"}
2021/10/04 00:18:32.862	INFO	tls.on_demand	obtaining new certificate	{"server_name": "claytonhockey.com"}
2021/10/04 00:18:32.863	INFO	tls.obtain	acquiring lock	{"identifier": "claytonhockey.com"}
2021/10/04 00:18:32.888	INFO	tls.obtain	lock acquired	{"identifier": "claytonhockey.com"}
2021/10/04 00:18:33.252	DEBUG	tls.obtain	trying issuer 1/2	{"issuer": "acme.zerossl.com-v2-DV90"}
2021/10/04 00:18:33.252	INFO	tls.issuance.acme	waiting on internal rate limiter	{"identifiers": ["claytonhockey.com"], "ca": "https://acme.zerossl.com/v2/DV90", "account": "josh.anyan@crossbar.org"}
2021/10/04 00:18:33.252	INFO	tls.issuance.acme	done waiting on internal rate limiter	{"identifiers": ["claytonhockey.com"], "ca": "https://acme.zerossl.com/v2/DV90", "account": "josh.anyan@crossbar.org"}
2021/10/04 00:18:33.516	DEBUG	tls.issuance.acme.acme_client	http request	{"method": "HEAD", "url": "https://acme.zerossl.com/v2/DV90/newNonce", "headers": {"User-Agent":["Caddy/2.4.5 CertMagic acmez (darwin; amd64)"]}, "response_headers": {"Access-Control-Allow-Origin":["*"],"Cache-Control":["max-age=-1"],"Content-Type":["application/octet-stream"],"Date":["Mon, 04 Oct 2021 00:18:33 GMT"],"Link":["<https://acme.zerossl.com/v2/DV90>;rel=\"index\""],"Replay-Nonce":["ubEBj5DN-kpRAL6whc-cak7LBSpVuPggnYf3y11ifqI"],"Server":["nginx"],"Strict-Transport-Security":["max-age=15552000"]}, "status_code": 200}
2021/10/04 00:18:33.734	DEBUG	tls.issuance.acme.acme_client	http request	{"method": "POST", "url": "https://acme.zerossl.com/v2/DV90/newOrder", "headers": {"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.4.5 CertMagic acmez (darwin; amd64)"]}, "response_headers": {"Access-Control-Allow-Origin":["*"],"Cache-Control":["max-age=0, no-cache, no-store","max-age=-1"],"Content-Length":["279"],"Content-Type":["application/json"],"Date":["Mon, 04 Oct 2021 00:18:33 GMT"],"Location":["https://acme.zerossl.com/v2/DV90/order/2xsKEI-V4fsgqVS7uCBshw"],"Replay-Nonce":["4GxHoc_bjLW-D-2Rm7rQ6JJHUnSv1asOwwb_DsVFKNA"],"Server":["nginx"],"Status":[""],"Strict-Transport-Security":["max-age=15552000"]}, "status_code": 201}
2021/10/04 00:18:33.840	DEBUG	tls.issuance.acme.acme_client	http request	{"method": "POST", "url": "https://acme.zerossl.com/v2/DV90/authz/Yh7h7t9bNfGFKSsAU797xg", "headers": {"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.4.5 CertMagic acmez (darwin; amd64)"]}, "response_headers": {"Access-Control-Allow-Origin":["*"],"Cache-Control":["max-age=-1"],"Content-Length":["447"],"Content-Type":["application/json"],"Date":["Mon, 04 Oct 2021 00:18:33 GMT"],"Link":["<https://acme.zerossl.com/v2/DV90>;rel=\"index\""],"Replay-Nonce":["3Y4WgomJf6n009oZIdj0Xrw1hRcEAK_z_7JPHkgGT3A"],"Retry-After":["5"],"Server":["nginx"],"Strict-Transport-Security":["max-age=15552000"]}, "status_code": 200}
2021/10/04 00:18:33.840	DEBUG	tls.issuance.acme.acme_client	no solver configured	{"challenge_type": "http-01"}
2021/10/04 00:18:33.840	INFO	tls.issuance.acme.acme_client	trying to solve challenge	{"identifier": "claytonhockey.com", "challenge_type": "dns-01", "ca": "https://acme.zerossl.com/v2/DV90"}
2021/10/04 00:19:13.165	DEBUG	tls.issuance.acme.acme_client	http request	{"method": "POST", "url": "https://acme.zerossl.com/v2/DV90/chall/gCI5gZaVLmEpOIhVGm_UsA", "headers": {"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.4.5 CertMagic acmez (darwin; amd64)"]}, "response_headers": {"Access-Control-Allow-Origin":["*"],"Cache-Control":["max-age=-1"],"Content-Length":["163"],"Content-Type":["application/json"],"Date":["Mon, 04 Oct 2021 00:19:13 GMT"],"Link":["<https://acme.zerossl.com/v2/DV90>;rel=\"index\"","<https://acme.zerossl.com/v2/DV90/authz/Yh7h7t9bNfGFKSsAU797xg>;rel=\"up\""],"Replay-Nonce":["30O3m42KlBPurhyjzzFj0AvCod-01OAwC-5oDFG5VvI"],"Retry-After":["10"],"Server":["nginx"],"Strict-Transport-Security":["max-age=15552000"]}, "status_code": 200}
2021/10/04 00:19:13.166	DEBUG	tls.issuance.acme.acme_client	challenge accepted	{"identifier": "claytonhockey.com", "challenge_type": "dns-01"}
2021/10/04 00:19:13.515	DEBUG	tls.issuance.acme.acme_client	http request	{"method": "POST", "url": "https://acme.zerossl.com/v2/DV90/authz/Yh7h7t9bNfGFKSsAU797xg", "headers": {"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.4.5 CertMagic acmez (darwin; amd64)"]}, "response_headers": {"Access-Control-Allow-Origin":["*"],"Cache-Control":["max-age=-1"],"Content-Length":["316"],"Content-Type":["application/json"],"Date":["Mon, 04 Oct 2021 00:19:13 GMT"],"Link":["<https://acme.zerossl.com/v2/DV90>;rel=\"index\""],"Replay-Nonce":["frQj5OfSJ7SDptFl_JJlxerwESaZ8agS1_CVrYvfSJQ"],"Retry-After":["5"],"Server":["nginx"],"Strict-Transport-Security":["max-age=15552000"]}, "status_code": 200}
2021/10/04 00:20:02.861	ERROR	tls.issuance.acme.acme_client	cleaning up solver	{"identifier": "claytonhockey.com", "challenge_type": "dns-01"}
2021/10/04 00:20:02.861	INFO	tls.issuance.acme.acme_client	validations succeeded; finalizing order	{"order": "https://acme.zerossl.com/v2/DV90/order/2xsKEI-V4fsgqVS7uCBshw"}
2021/10/04 00:20:02.862	WARN	tls.issuance.acme.acme_client	HTTP request failed; retrying	{"url": "https://acme.zerossl.com/v2/DV90/order/2xsKEI-V4fsgqVS7uCBshw/finalize", "error": "performing request: Post \"https://acme.zerossl.com/v2/DV90/order/2xsKEI-V4fsgqVS7uCBshw/finalize\": context deadline exceeded"}
2021/10/04 00:20:02.862	ERROR	tls.obtain	could not get certificate from issuer	{"identifier": "claytonhockey.com", "issuer": "acme.zerossl.com-v2-DV90", "error": "[claytonhockey.com] finalizing order https://acme.zerossl.com/v2/DV90/order/2xsKEI-V4fsgqVS7uCBshw: attempt 1: https://acme.zerossl.com/v2/DV90/order/2xsKEI-V4fsgqVS7uCBshw/finalize: context deadline exceeded (ca=https://acme.zerossl.com/v2/DV90)"}
2021/10/04 00:20:02.862	DEBUG	tls.obtain	trying issuer 2/2	{"issuer": "acme.zerossl.com-v2-DV90"}
2021/10/04 00:20:02.862	INFO	tls.issuance.acme	waiting on internal rate limiter	{"identifiers": ["claytonhockey.com"], "ca": "https://acme.zerossl.com/v2/DV90", "account": "josh.anyan@crossbar.org"}
2021/10/04 00:20:02.862	ERROR	tls.obtain	could not get certificate from issuer	{"identifier": "claytonhockey.com", "issuer": "acme.zerossl.com-v2-DV90", "error": "context canceled"}
2021/10/04 00:20:02.862	INFO	tls.obtain	releasing lock	{"identifier": "claytonhockey.com"}
2021/10/04 00:20:02.863	DEBUG	http.stdlib	http: TLS handshake error from 127.0.0.1:50027: [claytonhockey.com] Obtain: context canceled

5. What I already tried:

Different ways of authenticating with ZeroSSL. Previously I tried email only. I’m currently using the EAB method.

Although I don’t think the way I’m authenticating with ZeroSSL is the issue, but it was something easy to try.

It almost seems like a timing or timeout issue with respect to DNS challenge. I had a similar problem with Let’s Encrypt, but it would succeed maybe 90% of the time. With ZeroSSL it seems to succeed about 10% of the time. Even worse, with certain domains it seems to never succeed, no many how many times I try. The one above, claytonhockey.com, it is continually failing, but www.claytonhockey.com worked on the second try.

I’m currently trying this locally on my Mac, so the version above calls that out. I’m using the same config as production. I’m just running this manually right now sudo caddy run but I’m worried this is going to fail in production as well.

6. Links to relevant resources

francislavoie · October 4, 2021, 8:40am

Hmm, I don’t think it makes sense to configure the DNS challenge at the same time as on_demand.

The on_demand feature is meant for situations where you don’t control the DNS for the domains you need certificates for, e.g. customers pointing their own domains to your server.

The DNS challenge is meant for when you can’t solve the HTTP or TLS-ALPN challenges because your server is not directly accessible, or if you need wildcard certificates.

These two usecases are generally mutually exclusive.

Could you explain why you’re trying to use these two together?

jjanyan · October 4, 2021, 9:44am

We provide a service to manage our customers’ website. We sign-up new customers frequently enough that we are attempting to automate on boarding, so there’s no manual configuration of certs, configs, etc. That is why we chose on-demand.

99% of the time, these are pre-existing sites with traffic and we want to provide a zero downtime experience when migrating their domain to us, which leads us to the DNS challenge.

So our setup is that we automatically migrate their DNS records to our DNS provider, the customer then updates their registrar to point their name servers to our DNS provider, we detect that, then begin cert generation (and other setup). When the customer is ready, they more or less click ‘switch’ and they are done. Automatic setup and zero downtime.

jjanyan · October 4, 2021, 1:03pm

@francislavoie

My current theory is there’s a timeout issue. Where caddy is aborting the process too early for zerossl + dns challenge specifically.

Can I either increase the overall Caddy context timeout or decrease the time required for generating a cert?

I don’t see where to increase Caddy’s overall timeout, so I’m trying to decrease the default times for generating a cert. I’m stumbling over the Caddyfile format for it though.

I get
run: adapting config using caddyfile: parsing caddyfile tokens for 'tls': Caddyfile:157 - Error during parsing: cannot mix issuer subdirective (explicit issuers) with other issuer-specific subdirectives (implicit issuers) The line number it is complaining about is the final brace for the tls directive.

Any suggestions?

    tls josh.anyan@nope.com {
        on_demand
        # forcing RSA ciphers and certs
        key_type rsa2048
        dns route53 {
            max_retries 10
            aws_profile "default"
        }
        ca https://acme.zerossl.com/v2/DV90
        issuer zerossl {
            email josh.anyan@nope.com
            propagation_timeout 30s
            timeout 120s
        }
        ciphers TLS_RSA_WITH_AES_128_CBC_SHA TLS_RSA_WITH_AES_256_CBC_SHA TLS_RSA_WITH_AES_128_GCM_SHA256 TLS_RSA_WITH_AES_256_GCM_SHA384 TLS_AES_128_GCM_SHA256 TLS_AES_256_GCM_SHA384 TLS_CHACHA20_POLY1305_SHA256 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256

    } # this is line 157 from the error above

francislavoie · October 4, 2021, 4:08pm

This error is telling you that some options you configured in tls block need to be moved to inside your issuer zerossl block, because otherwise there’s a conflict.

I think specifically, the ca option can simply be removed, it’s redundant, since you’re already configuring issuer zerossl which already uses that CA path.

As for the timeout issue, you could try adjusting propagation_timeout, to make Caddy wait longer before giving up on DNS propagation.

I’ve asked @matt to take a look at this one when he finds the time, I’m not sure what problem you’re running into otherwise.

jjanyan · October 4, 2021, 7:55pm

I figured out the correct format for the tls directive for zerossl. truncated down for others to use.

i tried various timeouts and nothing seemed to impact the results. from propagation_timeout of 20s and timeout of 30s on up to what’s below.

:443 {
    tls {
        on_demand
        # forcing RSA ciphers and certs
        key_type rsa2048
        issuer zerossl {
            email josh.anyan@nope.com
            propagation_timeout 120s
            timeout 240s
            dns route53 {
                max_retries 10
                aws_profile "default"
            }
        }
    }
}

@matt any suggestions?

matt · October 4, 2021, 8:35pm

My troll answer is to simply use both so you have 90%+10% = 100% success

I’ve seen quite a few errors on “finalizing order” before with the ZeroSSL endpoints. It looks like “context cancelled” is happening, and that usually happens when things take too long (especially with on-demand, which is blocking, so it can’t take very long). If ZeroSSL isn’t sending a response within 2 minutes, I’m not sure what to tell you other than to contact ZeroSSL support… I’d be happy to work with their team if they have any logs or other information useful for debugging if it is a server issue.

Also, make sure you use curl when testing as I don’t know what kind of funkiness browsers do with regards to TLS connections.

jjanyan · October 4, 2021, 8:52pm

Re: using both. We would like to move 100% to ZeroSSL for better client support.

We have some users on old versions of MacOS that are unable to connect to the current LE cert. Since these users are our customer’s customers, we can’t tell them to just upgrade their computer.

Re: curl, yep, I am. It’s how we’re pre-generating certs ahead of time for the switch over. For those that might find it useful…

generate_cert.sh

curl -v --resolve "$1":443:127.0.0.1 https://"$1"

Then with Caddy running locally (make sure you sudo):

bash generate_cert.sh my_domain.com

You can switch out the 127 IP for whatever IP is running Caddy. This works best with DNS challenges since you can do it regardless of your DNS routing and accessibility.

Thanks @matt I just filed a request with ZeroSSL and linked to this thread. I’ll post back when I hear something.

matt · October 4, 2021, 8:57pm

Thanks, good to know.

Of course it is possible it’s a bug in Caddy, but it’s weird that only a certain domain always fails, others work on the second try, and others work fine… we also have several other users configured for ZeroSSL at a fairly large scale and I haven’t heard of any ongoing issues.

On-demand TLS ops are limited to 90 seconds because they block TLS handshakes, and blocking a connection for more than 90 seconds seems bad, even 90 is probably way too long.

Here’s the relevant code:

github.com

caddyserver/certmagic/blob/6b2f5f9b1b9e5feb653b48eb5467aa3b61780221/handshake.go#L434-L457

    
      
          	if log != nil {
          		log.Info("obtaining new certificate", zap.String("server_name", name))
          	}
          
          
	// TODO: use a proper context; we use one with timeout because retries are enabled because interactive is false
          	ctx, cancel := context.WithTimeout(context.TODO(), 90*time.Second)
          	defer cancel()
          
          
	// Obtain the certificate
          	err = cfg.ObtainCertAsync(ctx, name)
          
          
	// immediately unblock anyone waiting for it; doing this in
          	// a defer would risk deadlock because of the recursive call
          	// to getCertDuringHandshake below when we return!
          	unblockWaiters()
          
          
	if err != nil {
          		// shucks; failed to solve challenge on-demand
          		return Certificate{}, err
          	}

This file has been truncated. show original

jjanyan · October 4, 2021, 9:17pm

Thanks for the follow up.

Are we the odd duck here with using on-demand and DNS challenge? Based on some questions about how we’re doing things (see above for example), I get the feeling we’re treading a less worn path.

I’ve considered forking the necessary repos to test out overriding the 90 seconds to something like 3 minutes. Given that we’re DNS based, I’m not terribly concerned with the time it takes.

I’ve very much a golang novice, so I wasn’t sure how much of a rabbit hole that would lead me down. I see about 25 references to throughout caddy and the modules. Push comes to shove, change those references and alias to certmagic?

Hopefully ZeroSSL can provide some guidance before I go down that path, but it’s always good to have more paths to a solution.

Cheers and thanks for an awesome project!
Josh A

matt · October 4, 2021, 10:25pm

Yeah, definitely. I think most service providers advise pointing an A or CNAME record, rather than NS record, to their server.

I would be interested to know if that helps.

Just add a replace to your go.mod file to point to your fork and you should be good to go. No other changes needed to use your fork. Go Modules Reference - The Go Programming Language

Why do you need the DNS challenge, again? For wildcards?

jjanyan · October 4, 2021, 11:00pm

Sorry, the odd duck is in reference to combining Caddy on-demand and DNS challenge, not name servers. We use name servers over CNAME because we manage more than just their website.

I outlined the reason we’ve gone down on-demand & DNS-challenge path above: ZeroSSL + DNS Challenge failing often (Route53 plugin) - #3 by jjanyan

Thanks for the replace idea, I’ll see how it goes with ZeroSSL and might give that a try.

Cheers

matt · October 5, 2021, 12:22am

Understood, but that is still unusual.

Right, but I still don’t fully understand why you need DNS challenge. You say this:

but you don’t explain what that path that “leads you to” the DNS challenge actually is. The DNS challenge is useful if:

you need wildcard certs, or
you cannot open/use one of ports 80 or 443

and that’s usually about it. What is your requirement?

So if the DNS challenge is what is causing the timeouts, one way around this might be to use the default challenge types if you can.

Still, let me know what you find out.

jjanyan · October 5, 2021, 12:06pm

Our main reason for going down the DNS path was to eliminate any chance of downtime when we cut users over to our platform. This could be due to let’s encrypt downtime or maintenance while our system (which has the goal of being 100% automated) is in the process of migrating a customer.

So we looked and it seemed like a “well, DNS is an option that seems like we can ensure 100% cut over success” because we can pre-fetch the certs and verify everything is working with them in place before transitioning. 100% bulletproof automation.

Unfortunately, it seems the DNS challenge is the less worn path and has some kinks to be worked out.

Now, we didn’t predict this ahead of time, but an unexpected benefit was when I needed to switch from EC certs to RSA for Microsoft’s calendar integration. If I understand it correctly, to get a new cert with a different cipher, you need the proper config in place, then remove the current cert, then reload caddy to force the cert out of memory and kick the cert generation process off.

Since I solved via DNS challenge, I’m not 100% sure how this looks via the HTTP challenge path. I believe with HTTP challenge I’d need to spin up a new server and migrate domains from one to the other. About ten domains per minute because of the, reasonable, rate limiting. Is that correct? Not a huge ask, but sort of a bummer to babysit that and there’s some increased risk, however small.

With the DNS auth, I generated a list of domains and subdomains, spun up Caddy locally, executed a script similar to the one I posted above, and went to bed. The next morning I copied the certs to production and restarted the server. No babysitting a process and no risk. Seems great to me.

If the DNS challenge was more bulletproof in cert generation, I’d be shouting the praises of it over HTTP challenges. HTTP is great, but if you can bother with the DNS setup, I’d go with DNS challenge every time. Am I missing something?

francislavoie · October 5, 2021, 2:02pm

If you share the storage between your Caddy instances, you wouldn’t need to re-issue any certificates.

The rate limit in v2.4.5 is currently 20 per minute, but will be increased in the next release to 10 per 10 seconds (effectively 60 per minute). This rate limit was kept more aggressive earlier due to concerns and apprehension that it would be too fast and floor ACME CAs, but now that Caddy supports two issuers by default, that concern is lessened.

The DNS challenge generally works pretty well, but it’s totally dependent on the DNS provider’s API not sucking (a lot do, e.g. Namecheap) or that they even have an API (e.g. Google DNS, no API at all) and that they propagate records quickly enough. I don’t know where Route53 falls in all that, I don’t use AWS and I haven’t spoken to enough users who use it to say.

But HTTP/TLS-ALPN doesn’t depend on any external entities, so it’s more reliable. You just need to have your server be publicly reachable on ports 80/443.

jjanyan · October 5, 2021, 2:57pm

Thanks for the response @francislavoie

If I have a server currently configured to generate and use EC certs (the default) and I spin up a new server that’s configured to generate and use RSA certs and those servers share storage, there won’t be a problem? I don’t see anything in the cert file path that would differentiate between RSA and EC certs, so I’m curious how that would work.
I would think there would be a problem on one server or the other when it attempts to fetch the cert for d0main.com and it’s the wrong type, right? If a request comes in on A (EC) and it sees an EC cert, it should use it. A request comes in on B (RSA) and it sees an EC cert, I’d imagine it’d stop, go get an RSA cert, then overwrite the cert on the file system. Repeating an infinite loop until something breaks.
Maybe either A or B will serve whatever cert is available and the config is just for cert generation, not serving? In which case, I’d still need to delete the cert out from both servers and hope traffic goes to server B first to generate a RSA cert, which would effectively be migrating domains a group at a time.
I’m very curious what I’m misunderstanding, either with certs in general or Caddy.

I’m not sure what all entails not sucking beyond not having an API. AWS has an API and I guess it works well enough? I tried configuring the propagation timeout to 2 minutes and I was still having issues. I’d usually see the proper value from Route53 (AWS DNS) in about 20 seconds or less (via dig), so I think ZeroSSL/LE should have plenty of time to see it. For ZeroSSL I even considered DNS caching could be an issue, so I waited a day (very extreme I know) before trying a given domain again but with no success. But the more I think about it, the less I’m sure it’s anything Route53 related. LE worked with pretty good reliability. Worst case, I’d try generating a cert 2x before everything worked. Based on that, it seems to either be an issue on ZeroSSL’s end or a subtle misunderstanding of the dance (and the timing) between Caddy and ZeroSSL.

I really appreciate the questions and answers. It’s been a great learning experience.

Cheers

matt · October 6, 2021, 12:19am

That’s possible, and if it turns out to be the case, we’ll get it fixed.

Huh, me too… one would overwrite or at least take precedence over the other, that’s for sure – but an ambiguous configuration like that sounds like a bad idea in the first place.

I’m not sure about that. It’d probably just use it without even thinking about the key type. Basically, it would ignore your preference for RSA in that hypothetical scenario.

In general, it’s not a good idea to mix configurations that share storage so that they conflict with each other. I still don’t understand why this is necessary, it’s definitely the first time I’ve heard of something like this. Half the cluster serving RSA and the other half serving EC. Can you help me understand the motivation for that and how you came to that being the best solution? (I’m learning here too.)

jjanyan · October 6, 2021, 12:30am

The replies you quoted were in response to @francislavoie suggestion (as I understood it), not what I’m doing.

francislavoie · October 6, 2021, 3:23am

Matt’s point still stands though – why do you need Caddy instances with different configurations that may serve the same domains? It’s not clear why you brought up the issue of certificate migration.

If you need to have multiple instances serving the same domains, then make them share the same storage. If you need to use RSA for client compatibility, then make sure all the instances are configured to use RSA keys.

matt · October 6, 2021, 6:13am

Yeah, I know. I just thought you brought up some good thinking points. I really want to understand how our business users are using Caddy.