Website errors, while renewing many domain SSL certificates

1. Output of caddy version:

v2.5.1

2. How I run Caddy:

Systemd as caddy user

a. System environment:

AWS EC2 Instance pool
Linux 5.4.0-1072-aws #77~18.04.1-Ubuntu x86_64 x86_64 x86_64 GNU/Linux

b. Command:

service caddy start

c. Service/unit/compose file:

[Unit]
Description=Caddy HTTP/2 web server
Documentation=https://caddyserver.com/docs
After=network-online.target remote-fs.target nginx.service
Wants=network-online.target systemd-networkd-wait-online.service
Requires=nginx.service
BindsTo=nginx.service

[Service]
Type=notify
Restart=on-abort

; User and group the process will run as.
User=caddy
Group=caddy

; Letsencrypt-issued certificates will be written to this directory.
Environment=CADDYPATH=/etc/ssl/caddy/v2

; Always set "-root" to something safe in case it gets forgotten in the Caddyfile.
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile_v2 -watch
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile_v2 -watch

; Use graceful shutdown with a reasonable timeout
KillMode=mixed
KillSignal=SIGQUIT
TimeoutStopSec=5s

; Limit the number of file descriptors; see `man systemd.exec` for more limit settings.
LimitNOFILE=1048576
; Unmodified caddy is not expected to use more than that.
LimitNPROC=2048

; Use private /tmp and /var/tmp, which are discarded after caddy stops.
PrivateTmp=true
; Use a minimal /dev (May bring additional security if switched to 'true', but it may not work on Raspberry Pi's or other devices, so it has been disabled in this dist.)
PrivateDevices=false
; Hide /home, /root, and /run/user. Nobody will steal your SSH-keys.
ProtectHome=true
; Make /usr, /boot, /etc and possibly some more folders read-only.
ProtectSystem=full
; … except /etc/ssl/caddy, because we want Letsencrypt-certificates there.
;   This merely retains r/w access rights, it does not add any new. Must still be writable on the host!
ReadWriteDirectories=/etc/ssl/caddy/v2

; The following additional security directives only work with systemd v229 or later.
; They further restrict privileges that can be gained by caddy. Uncomment if you like.
; Note that you may have to add capabilities required by any plugins in use.
;CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
;NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

d. My complete Caddy config:

# Global Options
# https://caddyserver.com/docs/caddyfile/options
{
        admin off

        #Order for rate limiting
        order rate_limit before basicauth

        # TLS Options
        email ssl@example.com
        on_demand_tls {
                ask https://sites.example.com/allowed
                interval 2m
                burst 25
        }

        grace_period 10m

        #Set Certificate Location
        storage file_system /etc/ssl/caddy/v2
}

:443 {
        tls {
                on_demand
        }

        reverse_proxy @geofilter http://127.0.0.1:7777

        #Set GeoBlocking
        @geofilter {
                maxmind_geolocation {
                        db_path /etc/caddy/GeoLite2-Country.mmdb
                        deny_countries RU RO VN
                }
        }

        #Set Rate Limiting
        rate_limit {
                zone dynamic {
                        key {remote_host}
                        events 60
                        window 120s
                }
        }
        #respond "I'm behind the rate limiter!"

        #CDN Setting
         uri replace s3.amazonaws.com/example static.example.com

        #Set Headers
        header {
                Strict-Transport-Security max-age=31536000;
                X-Content-Type-Options nosniff
                X-XSS-Protection "1; mode=block"
        }

        #Enable Compression
        encode {
                #gzip 9
                gzip 9

                #match {
                #header Content-Type text/*
                #}
        }

        #Define Logs
        #https://caddyserver.com/docs/caddyfile/directives/log
        log {
                output file /var/log/caddy/access.log {
                        roll_size 25
                        roll_keep 10
                        roll_keep_for 7d
                }
                format filter {
                        wrap console {
                                time_format rfc3339
                                time_key timestamp
                        }
                        fields {
                                request>headers>Accept delete
                        }
                }
        }
}

#Define Snippet for On-Demand TLS
(onDemand) {
        tls {
                on_demand
        }
}

#Include Redirs
import redir2/*
import redir2MultiDomain/*

3. The problem I’m having:

We host a number of custom sites for our customers (about 600 domains). When moved over from Caddy v1 to v2 a few months back, we tried to stagger the initial SSL requests, but as all our domains are monitored by Uptime Robot, a big bulk of those initial certs were obtained in a fairly small window.
When the first renewal cycle came around, we had a number of sites being flagged as down by Uptime Robot as well as giving SSL errors in the browser. Unfortunately as there were all on spot instances we were not able to grab any useful logs in time.

Again we’ve hit another big renewal batch, but manage to piece together a timeline of events to delve into. We’ve had to mask the domains, but the batch of domains and sub www domains was around 150 or so.

  1. Ask endpoint rate limit
    As can be seen in the logs, the 1st bottleneck we’ve hit was the API ask endpoint rate limit, this we will have a look at and increase. Using this domain as an example, this took around 12min before the lock was obtained.

  2. CA rate limit
    The next hurdle is Caddy’s internal CA rate limit, this varies from domain to domain. From the domains we’ve looked at this is somewhere under 90s or so.

  3. WWW sub-domain
    In the case of this domain, there is a redirect from the Apex to the WWW sub-domain, which essentially kicks off the same loop as above. Another 10min on the ASK endpoint.

  4. Cert renewed
    Some 26min from the start of the automatic TLS certificate management process, both certificates for this domain are renewed.

Unfortunately the Nginx access and error logs do not provide any useful info.
All we have is the fact that Uptime Robot reported these as down

Log below is only from 1 server in the pool

4. Error messages and/or full log output:

16:03 Monitor is DOWN: Example Domain ( https://WWW.EXAMPLE.com//common/uptime.txt ) - Reason: Connection Timeout
16:24 Monitor is UP: Example Domain ( https://WWW.EXAMPLE.com//common/uptime.txt ). It was down for 21 minutes and 6 seconds.

Aug 19 15:20:46 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1660922446.3377583,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["
Aug 22 16:00:18 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184018.0249126,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["EXAMPLE.COM"],"not_after":1663804799,"error":"on-demand rate limit exceeded"}
Aug 22 16:00:18 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184018.0222106,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620780.977792053,"revoked":false}
Aug 22 16:00:45 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184045.1811464,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:00:45 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184045.178184,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620753.821817846,"revoked":false}
Aug 22 16:01:18 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184078.46461,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:01:18 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184078.4616654,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620720.538336619,"revoked":false}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184083.2170634,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["WWW.EXAMPLE.com"],"not_after":1663804799,"error":"on-demand rate limit exceeded"}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184083.2292068,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184083.2625551,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184083.4164937,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184083.2019935,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2620715.798008921,"revoked":false}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184083.2264144,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620715.773587356,"revoked":false}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184083.258924,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620715.741083728,"revoked":false}
Aug 22 16:01:23 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184083.413288,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620715.586714863,"revoked":false}
Aug 22 16:01:35 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184095.0239494,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:01:35 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184095.0210218,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620703.97898065,"revoked":false}
Aug 22 16:02:35 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184155.0601783,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:02:35 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184155.0574512,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620643.942550814,"revoked":false}


16:03 Monitor is DOWN: Example Domain ( https://WWW.EXAMPLE.com//common/uptime.txt ) - Reason: Connection Timeout


Aug 22 16:03:12 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184192.174023,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:03:12 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184192.3026228,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:03:12 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184192.171337,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620606.828665069,"revoked":false}
Aug 22 16:03:12 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184192.2996438,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620606.700358146,"revoked":false}
Aug 22 16:03:16 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184196.3063202,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:03:16 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184196.302489,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620602.697513473,"revoked":false}
Aug 22 16:03:18 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184198.0657535,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:03:18 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184198.062622,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620600.937379828,"revoked":false}
Aug 22 16:03:25 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184205.4106278,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:03:25 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184205.5437033,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:03:25 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184205.404376,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620593.595626509,"revoked":false}
Aug 22 16:03:25 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184205.5409274,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620593.459075128,"revoked":false}
Aug 22 16:04:28 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184268.9135776,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:04:28 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184268.910612,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620530.089389536,"revoked":false}
Aug 22 16:04:29 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184269.614886,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:04:29 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184269.612051,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620529.387950983,"revoked":false}
Aug 22 16:04:35 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184275.4369092,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:04:35 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184275.4322422,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620523.567759653,"revoked":false}
Aug 22 16:04:36 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184276.252381,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:04:36 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184276.2480426,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620522.751960206,"revoked":false}
Aug 22 16:04:38 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184278.2076626,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184629.4556766,




"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184629.4662507,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184629.6023467,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184629.7736459,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184629.4510524,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620169.548950445,"revoked":false}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184629.4621649,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620169.53783708,"revoked":false}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184629.5976346,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620169.402367575,"revoked":false}
Aug 22 16:10:29 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184629.7699845,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620169.230018991,"revoked":false}
Aug 22 16:10:30 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184630.900929,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:10:30 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184630.895827,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620168.104174945,"revoked":false}
Aug 22 16:10:49 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184649.620305,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:10:49 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184649.6160445,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620149.383958794,"revoked":false}
Aug 22 16:11:17 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184677.0166008,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:11:17 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184677.012193,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620121.987809112,"revoked":false}
Aug 22 16:11:21 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184681.7718718,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:11:21 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184681.7670832,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620117.232918907,"revoked":false}
Aug 22 16:11:47 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184707.2119017,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"WWW.EXAMPLE.com","error":"on-demand rate limit exceeded"}
Aug 22 16:11:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184707.2070749,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2620091.792927151,"revoked":false}
Aug 22 16:11:56 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184716.2190902,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:11:56 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184716.2144525,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620082.78554956,"revoked":false}
Aug 22 16:11:58 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184718.820161,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"EXAMPLE.COM","error":"on-demand rate limit exceeded"}
Aug 22 16:11:58 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184718.8159678,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620080.184033861,"revoked":false}
Aug 22 16:12:00 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184720.6723864,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"EXAMPLE.COM","subjects":["EXAMPLE.COM"],"expiration":1663804799,"remaining":2620078.327615668,"revoked":false}
Aug 22 16:12:00 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184720.7744038,"logger":"tls.renew","msg":"acquiring lock","identifier":"EXAMPLE.COM"}
Aug 22 16:12:00 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184720.8081515,"logger":"tls.renew","msg":"lock acquired","identifier":"EXAMPLE.COM"}
Aug 22 16:12:00 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184720.826112,"logger":"tls.renew","msg":"renewing certificate","identifier":"EXAMPLE.COM","remaining":2620078.173889999}
Aug 22 16:12:00 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184720.848554,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["EXAMPLE.COM"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"ssl@company.com"}
Aug 22 16:12:00 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184720.8485818,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["EXAMPLE.COM"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"ssl@company.com"}
Aug 22 16:12:01 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184721.0681064,"logger":"tls.renew","msg":"could not get certificate from issuer","identifier":"EXAMPLE.COM","issuer":"acme-v02.api.letsencrypt.org-directory","error":"HTTP 400 urn:ietf:params:acme:error:malformed - JWS verification error"}
Aug 22 16:12:01 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184721.0758636,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["EXAMPLE.COM"],"ca":"https://acme.zerossl.com/v2/DV90","account":"ssl@company.com"}
Aug 22 16:12:01 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184721.0758908,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["EXAMPLE.COM"],"ca":"https://acme.zerossl.com/v2/DV90","account":"ssl@company.com"}
Aug 22 16:12:24 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184744.676851,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"EXAMPLE.COM","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Aug 22 16:12:47 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184767.2065961,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"WWW.EXAMPLE.com","error":"on-demand rate limit exceeded"}
Aug 22 16:12:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184767.2024643,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2620031.797537377,"revoked":false}
Aug 22 16:13:03 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184783.5256505,"logger":"tls.renew","msg":"certificate renewed successfully","identifier":"EXAMPLE.COM"}
Aug 22 16:13:03 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184783.5256736,"logger":"tls.renew","msg":"releasing lock","identifier":"EXAMPLE.COM"}
Aug 22 16:13:03 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184783.6183686,"logger":"tls.cache","msg":"replaced certificate in cache","subjects":["EXAMPLE.COM"],"new_expiration":1668988799}
Aug 22 16:14:47 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184887.2222166,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"WWW.EXAMPLE.com","error":"on-demand rate limit exceeded"}
Aug 22 16:14:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184887.21649,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2619911.783514127,"revoked":false}
Aug 22 16:15:47 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661184947.2319193,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"WWW.EXAMPLE.com","error":"on-demand rate limit exceeded"}
Aug 22 16:15:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661184947.2275894,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2619851.772413969,"revoked":false}
Aug 22 16:20:55 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661185255.4097362,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"WWW.EXAMPLE.com","error":"on-demand rate limit exceeded"}
Aug 22 16:20:55 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185255.4051483,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2619543.594853932,"revoked":false}
Aug 22 16:22:47 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661185367.2693617,"logger":"tls.handshake","msg":"maintaining newly-loaded certificate","server_name":"WWW.EXAMPLE.com","error":"on-demand rate limit exceeded"}
Aug 22 16:22:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185367.2339175,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2619431.766084304,"revoked":false}


16:24 Monitor is UP: Example Domain ( https://WWW.EXAMPLE.com//common/uptime.txt ). It was down for 21 minutes and 6 seconds.


Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"error","ts":1661185487.5802984,"logger":"tls.renew","msg":"could not get certificate from issuer","identifier":"WWW.EXAMPLE.com","issuer":"acme-v02.api.letsencrypt.org-directory","error":"HTTP 400 urn:ietf:params:acme:error:malformed - JWS verification error"}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.2351162,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"WWW.EXAMPLE.com","subjects":["WWW.EXAMPLE.com"],"expiration":1663804799,"remaining":2619311.764886193,"revoked":false}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.3119922,"logger":"tls.renew","msg":"acquiring lock","identifier":"WWW.EXAMPLE.com"}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.335562,"logger":"tls.renew","msg":"lock acquired","identifier":"WWW.EXAMPLE.com"}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.3526208,"logger":"tls.renew","msg":"renewing certificate","identifier":"WWW.EXAMPLE.com","remaining":2619311.647380437}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.3660529,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["WWW.EXAMPLE.com"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"ssl@company.com"}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.366088,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["WWW.EXAMPLE.com"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"ssl@company.com"}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.588028,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["WWW.EXAMPLE.com"],"ca":"https://acme.zerossl.com/v2/DV90","account":"ssl@company.com"}
Aug 22 16:24:47 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185487.588056,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["WWW.EXAMPLE.com"],"ca":"https://acme.zerossl.com/v2/DV90","account":"ssl@company.com"}
Aug 22 16:25:09 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185509.6953151,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"WWW.EXAMPLE.com","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Aug 22 16:25:20 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185520.721378,"logger":"tls.issuance.acme","msg":"served key authentication","identifier":"WWW.EXAMPLE.com","challenge":"http-01","remote":"91.199.212.132:47548","distributed":false}
Aug 22 16:26:12 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185572.0699565,"logger":"tls.renew","msg":"certificate renewed successfully","identifier":"WWW.EXAMPLE.com"}
Aug 22 16:26:12 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185572.069982,"logger":"tls.renew","msg":"releasing lock","identifier":"WWW.EXAMPLE.com"}
Aug 22 16:26:12 ip-172-31-24-11 caddy[920]: {"level":"info","ts":1661185572.1669269,"logger":"tls.cache","msg":"replaced certificate in cache","subjects":["WWW.EXAMPLE.com"],"new_expiration":1668988799}

5. What I already tried:

We haven’t made any changes yet, we’re trying to understand why it seems that the valid certificate was unavailable throught the process of renewing these domains

6. Links to relevant resources:

Please use Markdown code blocks for the files and logs in your post. It’s very difficult to read as-is without using code formatting. Put triple backticks ``` on their own line before and after the content for code blocks.

Sorry about that

1 Like

I’ve went ahead and edited your post, it wasn’t just your logs that were the issue, but also your Caddyfile and systemd config. See the edit history to see the difference.

Please upgrade to v2.5.2

If you turn off admin, then you cannot reload Caddy’s config and are forced to restart Caddy after any config change. I strongly recommend keeping this on.

Since you turned off admin and can’t reload the config, setting a grace_period doesn’t do anything because this affects how long to wait for servers to shut down before tearing down the old config.

It usually doesn’t make sense to change the gzip level. The default is tuned to work the best in the most situations. Often, a higher gzip number will perform worse than a more middling number.

While this works, these should be reversed in order. You should always define the matcher before the directive it’s paired with. A matcher is essentially like a variable containing a boolean expression result. It doesn’t make sense to use a variable before it’s defined.

Then just remove the interval and burst config. It’s optional, and all it does is slow down rate at which Caddy can do its work.

The internal rate limit is 10 per 10 seconds, which is effectively 60 per minute. This should be much more than enough to cover your usecase of 600 domains.

Why would it be down? Renewals should start happening after 2/3 of the lifetime of the cert, i.e. 60 days after issuance, 30 days before expiry. This means there’s a whole 30 days where it will continue to be valid, so Caddy should almost never serve an expired cert until the cert is renewed, unless renewal fails for an entire 30 days in a row.

Make a request with curl -v and show us what it looks like when it’s “failing”.

I can’t do a curl now as all the certs are valid and ok. That’s the part that makes this tricky to troubleshoot, we have to grab what we can in the moment.

These renewals were happening around the 60day mark for these certs and yes the strange part is why is it seen as down. It feels like caddy isn’t serving the valid cert while its doing the renewal. The site goes “down” as soon as the maintenance process start querying the ask endpoint and only comes back when the cert is renewed. Would this be plausible at all? Would it stop forwarding traffic to the reverse proxy maybe while this is all happening?

We will tackle the ask endpoint bottleneck, but even so the site should remain working while the renewal is happening, and trying to understand why its not is our objective

Will take a look at all your other suggestions too

The master branch of Caddy now uses CertMagic v0.17.0 which I tagged only a few minutes ago. It should now return the current, unexpired certificate if the “ask” endpoint fails or rejects the request. It also logs more errors from the ask endpoint.

I haven’t looked into whether this actually is related to your issues but I’d say the latest commit of Caddy is worth trying. Please build from source (or download the latest artifact from CI on GitHub) and you can try it out. Thanks!

Will schedule an upgrade of Caddy in the next few days

This topic was automatically closed after 30 days. New replies are no longer allowed.