Internal rate limit & context deadline exceeded

1. Caddy version (caddy version):

2.3.0

2. How I run Caddy:

a. System environment:

Ubuntu 18.04.5 LTS
systemd
baremetal
installed package from deb repo https://dl.cloudsmith.io/public/caddy/stable/deb/debian

b. Command:

systemctl start caddy.service

c. Service/unit/compose file:

cat /lib/systemd/system/caddy.service
[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

d. My complete Caddyfile or JSON config:

{
  #Bind admin endpoint on all interfaces
  admin 0.0.0.0:2019

	on_demand_tls {
	    interval 1s
	    burst    100
	}
}


# Redirect any accesses using an IP instead of a domaine name
# hackerone report: https://hackerone.com/reports/723126
51.91.128.49:80,
91.121.37.140:80,
147.135.229.201:80
{
    redir https://app.lemlist.com{uri}
}

(cloadflare_matchers) {
    remote_ip 103.21.244.0/22
    remote_ip 103.22.200.0/22
    remote_ip 103.31.4.0/22
    remote_ip 104.16.0.0/12
    remote_ip 108.162.192.0/18
    remote_ip 131.0.72.0/22
    remote_ip 141.101.64.0/18
    remote_ip 162.158.0.0/15
    remote_ip 172.64.0.0/13
    remote_ip 173.245.48.0/20
    remote_ip 188.114.96.0/20
    remote_ip 190.93.240.0/20
    remote_ip 197.234.240.0/22
    remote_ip 198.41.128.0/17
    remote_ip 2400:cb00::/32
    remote_ip 2606:4700::/32
    remote_ip 2803:f800::/32
    remote_ip 2405:b500::/32
    remote_ip 2405:8100::/32
    remote_ip 2c0f:f248::/32
    remote_ip 2a06:98c0::/29
}

(front_servers) {
    to localhost:42001
    to localhost:42002
    to localhost:42003
    to localhost:42004
    to localhost:42005
    to localhost:42006
    to localhost:42007
    to localhost:42008

    health_path /api/ping
    health_interval 1s
    health_timeout 5s
}

#
# Main application
#

app.lemlist.com
{
    log {
        level INFO
        output file /var/log/caddy/front.log {
        	roll_size     100MiB
        }
        format console
    }

    # route keyword is mandatory to force Caddy to keep our
    #   directives order instead of using the Caddy builtin order
    route {

        request_body {
            max_size 20MB
        }

        @from_cloudflare {
            import cloadflare_matchers
        }

        @not_from_cloudflare {
            not {
                import cloadflare_matchers
            }
        }

        ##############################################
        # pass all api lempire (e, i) requests to one
        #   dedicated Meteor app process
        ##############################################
        @api_e_i {
            #path /campaigns/*
            path /api/lempire/*
        }

        header @api_e_i {
            X-Cache-Debug "app.api_e_i"
            Server ""
        }

        reverse_proxy @api_e_i {
            to localhost:8000
        }

        ##############################################
        # pass all api requests to Meteor api process
        ##############################################
        @api {
            #path /campaigns/*
            path /api/*
        }

        header @api {
            X-Cache-Debug "app.api"
            Server ""
        }

        reverse_proxy @api {
            to localhost:42100
        }

        ##############################################
        # Caches
        ##############################################
        # must be after /api routes or https://app.lemlist.com/api/froala/files/*.jpg will not be served by meteor
        @cache_app {
            path *.svg *.woff2 *.eot *.ttf *.woff *.jpg *.jpeg *.png *.gif *.ico *.html
        }

        header @cache_app {
            # 1d expiration
            Cache-Control max-age=86400
            X-Cache-Debug "app.cache"
            Server ""
        }

        file_server @cache_app {
            root /opt/lemlist/app/programs/web.browser/app
        }

        @cache_app2 {
            path_regexp ^/lib/.*\.(map|js|css)$
        }

        header @cache_app2 {
            # 1d expiration
            Cache-Control max-age=86400
            X-Cache-Debug "app.cache2"
            Server ""
        }

        file_server @cache_app2 {
            root /opt/lemlist/app/programs/web.browser/app
        }


        @cache_version {
            path_regexp ^/[0-9a-z]+\.(js|css|map).*
        }

        header @cache_version {
            # 1d expiration
            Cache-Control max-age=86400
            X-Cache-Debug "app.cache_version"
            Server ""
        }

        file_server @cache_version {
            root /opt/lemlist/web.browser
        }


        ##############################################
        # Anything else goes to Meteor fronts
        ##############################################
        header * {
            X-Cache-Debug "app.main"
            Server ""
        }

        reverse_proxy @from_cloudflare {
            header_up X-Real-IP {http.request.header.CF-Connecting-IP}
            header_up X-Forwarded-For {http.request.header.CF-Connecting-IP}

            import front_servers

            lb_policy   header CF-Connecting-IP
        }

        reverse_proxy @not_from_cloudflare {

            import front_servers

            lb_policy   ip_hash
        }

        handle * {
            respond 404
        }
    }

    handle_errors {
        @errors {
            expression {http.error.status_code} == '502'
        }
        rewrite @errors /{http.error.status_code}.html
        file_server {
            root /var/www
        }
    }

    #dev only : to work on laptop localhost
    #tls internal
}

#
# Custom domaines
#
(custom_domaines) {
    log {
        level INFO
        output file /var/log/caddy/custom_{args.0}.log {
            roll_size     100MiB
        }
        format console
    }

    route {
        #######################################
        # render a special page when on root so the user can see if
        #   its custom domain is correctly setted up
        #######################################
        header / {
            X-Cache-Debug "custom.cname_ok"
            Server ""
        }

        rewrite / /lemlist_cname_ok.html

        file_server /lemlist_cname_ok.html {
            root /var/www
        }

        #######################################
        # pass all requests "track open/click" to track
        #######################################
        @track {
            path_regexp ^/api/(t/c|track/open|reply)/
        }

        header @track {
            X-Cache-Debug "custom.track"
            Server ""
        }

        reverse_proxy @track {
            to localhost:42160
        }

        #######################################
        # pass all requests "images + pages" to images
        #######################################

        @images {
            path /*
        }

        header @images {
            X-Cache-Debug "custom.images"
            Server ""
        }

        reverse_proxy @images {
            to localhost:42140
        }

        handle * {
            respond 404
        }

    }
}

# WARNING : DO NOT add any other site address than :443 here
:443
{
    import custom_domaines on_demand

    tls {
        on_demand

        issuer acme {
            email cert@lempire.co
            timeout 180s
            dir https://acme.zerossl.com/v2/DV90
            eab 3....w V....Q
        }

        issuer acme {
            email cert@lempire.co
            timeout 180s
            dir https://acme-v02.api.letsencrypt.org/directory
        }
    }
}

pages.lemlist.com,
lemlst.org,
lemlst.eu,
lmlst.com
{
    import custom_domaines predefined
}

# WARNING : DO NOT add any other site address than :80 here
:80
{
    import custom_domaines port80
}

#
# public direct access to service
#
(public_passthrough) {
    {args.0}
    {
        log {
            level INFO
            output file /var/log/caddy/public_passthrough/{args.0}.log {
                roll_size     100MiB
            }
            format console
        }

        route {
            request_body {
                max_size 20MB
            }

            header * {
                X-Cache-Debug "{args.0}.main"
                Server ""
            }

            reverse_proxy * {
                to localhost:{args.1}
            }
        }
    }
}
import public_passthrough api.lemlist.com 42100
import public_passthrough img.lemlist.com 42150
import public_passthrough front.lempire.co 8000


#
# private direct access to services
#
(private_passthrough) {
    front.lempire.co:{args.0}
    {
        log {
            level INFO
            output file /var/log/caddy/private_passthrough/{args.0}.log {
                roll_size     100MiB
            }
            format console
        }

        route {
            @cache_version {
                path_regexp ^/[0-9a-z]+\.(js|css|map).*
            }

            header @cache_version {
                # 1d expiration
                Cache-Control max-age=86400
                X-Cache-Debug "direct.cache_version"
                Server ""
            }

            file_server @cache_version {
                root /opt/lemlist/web.browser
            }

            header * {
                X-Cache-Debug "direct.main"
                Server ""
            }

            reverse_proxy * {
                to localhost:{args.1}
            }
        }

        basicauth * {
            a...z J...D
        }
    }
}

# Fronts
import private_passthrough 43001 42001
import private_passthrough 43002 42002
import private_passthrough 43003 42003
import private_passthrough 43004 42004
import private_passthrough 43005 42005
import private_passthrough 43006 42006
import private_passthrough 43007 42007
import private_passthrough 43008 42008

# API
import private_passthrough 43100 42100
# Cron
import private_passthrough 43130 42130
# Images
import private_passthrough 43140 42140
# Images Front
import private_passthrough 43150 42150
# Track
import private_passthrough 43160 42160

3. The problem I’m having:

Hello,

We use Caddy in production for now 1.5 month, it runs very well, but we got a persistent “issue” with on_demand tls certificate generation.

We provide vanity domains for our thousands of customers. for ex : foobar.bigcorp.com that is pointing to our server custom.lemlist.com where Caddy is running.
Unfortunately, we got a lot of “waiting on internal rate limiter” and “context deadline exceeded” (see logs), and so we are only able to generate 1 certificate every 5 - 10 minutes. Which is not enough in our case.

Using the following snippet, I can estimate to about 50 the number of tls_on_demand per minute:

timeout 60 journalctl -fu caddy | grep "\"obtaining new certificate"

My supposition, is that we hit a Caddy internal rate limiter, and when the wait for this rate limit ends, if it took more than 90 seconds, then we got a “context deadline exceeded” and the whole request stop.
If it takes less than 90s to wait for the rate limiter AND waiting for CA answer then the process succceed and we got certificate.

So, the main question is : Where do you think, the limit comes from ? Caddy or CA ?
How can we improve this ? Or where should we investigate to ?

4. Error messages and/or full log output:

Filtered log of a successful try:

Mar 23 15:27:46 front caddy[28894]: {"level":"info","ts":1616509666.1737103,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"baaaaa.lempi.re"}
Mar 23 15:27:46 front caddy[28894]: {"level":"info","ts":1616509666.1739392,"logger":"tls.obtain","msg":"acquiring lock","identifier":"baaaaa.lempi.re"}
Mar 23 15:27:46 front caddy[28894]: {"level":"info","ts":1616509666.1739995,"logger":"tls.obtain","msg":"lock acquired","identifier":"baaaaa.lempi.re"}
Mar 23 15:27:46 front caddy[28894]: {"level":"info","ts":1616509666.174333,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["baaaaa.lempi.re"]}
Mar 23 15:28:50 front caddy[28894]: {"level":"info","ts":1616509730.161438,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["baaaaa.lempi.re"]}
Mar 23 15:28:51 front caddy[28894]: {"level":"info","ts":1616509731.5251303,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"baaaaa.lempi.re","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:28:55 front caddy[28894]: {"level":"info","ts":1616509735.2210684,"logger":"tls.issuance.acme","msg":"served key authentication","identifier":"baaaaa.lempi.re","challenge":"http-01","remote":"91.199.212.132:59328"}
Mar 23 15:29:12 front caddy[28894]: {"level":"info","ts":1616509752.039788,"logger":"tls.obtain","msg":"certificate obtained successfully","identifier":"baaaaa.lempi.re"}
Mar 23 15:29:12 front caddy[28894]: {"level":"info","ts":1616509752.039803,"logger":"tls.obtain","msg":"releasing lock","identifier":"baaaaa.lempi.re"}
Mar 23 15:29:12 front caddy[28894]: {"level":"warn","ts":1616509752.0859249,"logger":"tls","msg":"stapling OCSP","error":"no OCSP stapling for [baaaaa.lempi.re]: parsing OCSP response: ocsp: error from server: unauthorized"}

Filtered log of a failling try:

Mar 23 15:30:25 front caddy[28894]: {"level":"info","ts":1616509825.5424554,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"oooo.lempi.re"}
Mar 23 15:30:25 front caddy[28894]: {"level":"info","ts":1616509825.5427058,"logger":"tls.obtain","msg":"acquiring lock","identifier":"oooo.lempi.re"}
Mar 23 15:30:25 front caddy[28894]: {"level":"info","ts":1616509825.542906,"logger":"tls.obtain","msg":"lock acquired","identifier":"oooo.lempi.re"}
Mar 23 15:30:25 front caddy[28894]: {"level":"info","ts":1616509825.5434377,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["oooo.lempi.re"]}
Mar 23 15:31:50 front caddy[28894]: {"level":"info","ts":1616509910.2317042,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["oooo.lempi.re"]}
Mar 23 15:31:51 front caddy[28894]: {"level":"info","ts":1616509911.573187,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"oooo.lempi.re","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:31:55 front caddy[28894]: {"level":"error","ts":1616509915.543706,"logger":"tls.issuance.acme.acme_client","msg":"deactivating authorization","identifier":"oooo.lempi.re","authz":"https://acme.zerossl.com/v2/DV90/authz/kIjx4bBZilb8M1LZcqriQA","error":"request to https://acme.zerossl.com/v2/DV90/authz/kIjx4bBZilb8M1LZcqriQA failed after 1 attempts: context deadline exceeded"}
Mar 23 15:31:55 front caddy[28894]: {"level":"info","ts":1616509915.5439215,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["oooo.lempi.re"]}
Mar 23 15:31:55 front caddy[28894]: {"level":"info","ts":1616509915.5439365,"logger":"tls.obtain","msg":"releasing lock","identifier":"oooo.lempi.re"}

Raw logs

Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.6534648,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"email.spear.agency"}
Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.6536934,"logger":"tls.obtain","msg":"acquiring lock","identifier":"email.spear.agency"}
Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.653772,"logger":"tls.obtain","msg":"lock acquired","identifier":"email.spear.agency"}
Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.6543267,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["email.spear.agency"]}
Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.6683092,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"link.breaktheweb.agency"}
Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.668565,"logger":"tls.obtain","msg":"acquiring lock","identifier":"link.breaktheweb.agency"}
Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.6686497,"logger":"tls.obtain","msg":"lock acquired","identifier":"link.breaktheweb.agency"}
Mar 23 15:56:47 front caddy[28894]: {"level":"info","ts":1616511407.6692038,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["link.breaktheweb.agency"]}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.0006843,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["trail.coachfoundations.com"]}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.000721,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["trail.coachfoundations.com"]}
Mar 23 15:56:48 front caddy[28894]: {"level":"warn","ts":1616511408.000765,"logger":"tls.issuance.acme.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/acme/new-nonce","error":"performing request: Head \"https://acme-v02.api.letsencrypt.org/acme/new-nonce\": context deadline exceeded"}
Mar 23 15:56:48 front caddy[28894]: {"level":"error","ts":1616511408.0007906,"logger":"tls.obtain","msg":"will retry","error":"[trail.coachfoundations.com] Obtain: [trail.coachfoundations.com] creating new order: fetching new nonce from server: context deadline exceeded (ca=https://acme-v02.api.letsencrypt.org/directory)","attempt":1,"retrying_in":60,"elapsed":90.000031017,"max_duration":2592000}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.0008008,"logger":"tls.obtain","msg":"releasing lock","identifier":"trail.coachfoundations.com"}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.2327697,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["track.princekapoor.com"]}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.2327998,"logger":"tls.obtain","msg":"releasing lock","identifier":"track.princekapoor.com"}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.3311765,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["trail.prodej-ambulance.cz"]}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.3312006,"logger":"tls.obtain","msg":"releasing lock","identifier":"trail.prodej-ambulance.cz"}
Mar 23 15:56:48 front caddy[28894]: {"level":"info","ts":1616511408.9315224,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["t.mythemeshop.com"]}
Mar 23 15:56:49 front caddy[28894]: {"level":"info","ts":1616511409.5016632,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["track.turbocyclist.com"]}
Mar 23 15:56:50 front caddy[28894]: {"level":"info","ts":1616511410.135758,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"t.mythemeshop.com","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:56:50 front caddy[28894]: {"level":"info","ts":1616511410.1389298,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["track.merchantchimp.com"]}
Mar 23 15:56:50 front caddy[28894]: {"level":"info","ts":1616511410.1656828,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["track.happy-customers.com.au"]}
Mar 23 15:56:50 front caddy[28894]: {"level":"info","ts":1616511410.2341647,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["track.ado.media"]}
Mar 23 15:56:50 front caddy[28894]: {"level":"info","ts":1616511410.2341998,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["track.appexmedia.com"]}
Mar 23 15:56:50 front caddy[28894]: {"level":"info","ts":1616511410.6608245,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"track.turbocyclist.com","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.2468774,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"track.rvsmedia.co.uk"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.2478786,"logger":"tls.obtain","msg":"acquiring lock","identifier":"track.rvsmedia.co.uk"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.2486346,"logger":"tls.obtain","msg":"lock acquired","identifier":"track.rvsmedia.co.uk"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.2506652,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["track.rvsmedia.co.uk"]}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.329539,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"track.happy-customers.com.au","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.3366666,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"track.merchantchimp.com","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.4144146,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"track.appexmedia.com","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.4346058,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"track.ado.media","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.5841262,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"track.giftingmade.com"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.584895,"logger":"tls.obtain","msg":"acquiring lock","identifier":"track.giftingmade.com"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.5855772,"logger":"tls.obtain","msg":"lock acquired","identifier":"track.giftingmade.com"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.5871668,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["track.giftingmade.com"]}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.6053195,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"tracking.growthstrategy.co.za"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.6061482,"logger":"tls.obtain","msg":"acquiring lock","identifier":"tracking.growthstrategy.co.za"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.60678,"logger":"tls.obtain","msg":"lock acquired","identifier":"tracking.growthstrategy.co.za"}
Mar 23 15:56:51 front caddy[28894]: {"level":"info","ts":1616511411.6084485,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["tracking.growthstrategy.co.za"]}
Mar 23 15:56:52 front caddy[28894]: {"level":"warn","ts":1616511412.7253697,"logger":"tls.issuance.acme.acme_client","msg":"HTTP request failed; retrying","url":"https://acme.zerossl.com/v2/DV90/authz/fg27oHe9Gc1wHjw5r-xBOw","error":"performing request: Post \"https://acme.zerossl.com/v2/DV90/authz/fg27oHe9Gc1wHjw5r-xBOw\": context deadline exceeded"}
Mar 23 15:56:52 front caddy[28894]: {"level":"error","ts":1616511412.7253914,"logger":"tls.issuance.acme.acme_client","msg":"deactivating authorization","identifier":"t.mythemeshop.com","authz":"https://acme.zerossl.com/v2/DV90/authz/fg27oHe9Gc1wHjw5r-xBOw","error":"request to https://acme.zerossl.com/v2/DV90/authz/fg27oHe9Gc1wHjw5r-xBOw failed after 1 attempts: context deadline exceeded"}
Mar 23 15:56:52 front caddy[28894]: {"level":"info","ts":1616511412.7255385,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["t.mythemeshop.com"]}
Mar 23 15:56:52 front caddy[28894]: {"level":"info","ts":1616511412.725548,"logger":"tls.obtain","msg":"releasing lock","identifier":"t.mythemeshop.com"}
Mar 23 15:56:52 front caddy[28894]: {"level":"info","ts":1616511412.8838143,"logger":"tls.on_demand","msg":"obtaining new certificate","server_name":"t.mythemeshop.com"}
Mar 23 15:56:52 front caddy[28894]: {"level":"info","ts":1616511412.8840482,"logger":"tls.obtain","msg":"acquiring lock","identifier":"t.mythemeshop.com"}
Mar 23 15:56:52 front caddy[28894]: {"level":"info","ts":1616511412.8841367,"logger":"tls.obtain","msg":"lock acquired","identifier":"t.mythemeshop.com"}
Mar 23 15:56:52 front caddy[28894]: {"level":"info","ts":1616511412.8845453,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["t.mythemeshop.com"]}
Mar 23 15:56:55 front caddy[28894]: {"level":"info","ts":1616511415.586907,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["trail.smart-brand.fr"]}

5. What I already tried:

  • increased issuers timeout to 180s
  • increased burst rate to 100/s
  • swap zerossl/letencrypt order
  • registered on zerossl to manualy specify EAB
  • started to dig into code of Caddy + Certmagic (as “waiting on internal rate limiter” comes from Certmagic), but codebase is a bit too much for a Go newbie like me
  • googled a lot about this issue without finding any revelant info.

6. Links to relevant resources:

:sweat_smile: typo, cloadflare → cloudflare

I highly recommend configuring ask for On-Demand TLS. You are setting very loose restrictions here, which means that if an attacker wanted to, they could point a wildcard DNS record, say *.example.com to your server then make requests like evil1.example.com, then evil2.example.com, then evil3.example.com and so on until infinity. This would cause your server to constantly issue certificates for each one, either making you hit rate limits with Let’s Encrypt and ZeroSSL, or worse making your server run out of storage space. This is very dangerous. The solution is to use ask to make sure that domains being pointed to your system are ones you know about. This means making your customers register their domain with you first, which you can store in your database then look up when your backend receives the ask request from Caddy.

Caddy sets global internal rate limits, because it does not want to cause pressure on ACME issuers.

ACME challenges take at least a few seconds, and internal rate limiting helps mitigate accidental abuse. Caddy uses internal rate limiting in addition to what you or the CA configure so that you can hand Caddy a platter with a million domain names and it will gradually – but as fast as it can – obtain certificates for all of them. Caddy’s internal rate limit is currently 10 attempts per ACME account per minute.

Thanks for the fast reply and the hint for typo & ask , i will have a look for sure !

About rate limit, my point is also :
Is it intended most of the request fail bc of “context deadline exceeded” ? Shouldn’t the rate limiter only postpone the request ? So it will be “slow”, but at least each request got a certificate ?

Is there any way then to increase timeout so at least there is no more “context deadline exceeded” ?

On demand issuance holds onto TLS handshakes, so there’s a hard limit of 2 minutes I think. Maybe less. I’d have to check. But having clients hang that long is unreasonable so we cancel if it takes too long.

1 Like

The internal rate limiter is currently 10 ACME transactions every 1 minute, which has been sufficient for most users; 1 cert every 5-10 minutes is not a rate limit that Caddy or CertMagic use.

What if I raise the limit to 30 every minute, i.e. 1 every 2 seconds - is that enough?

Hello,

Thanks for your proposal about raising the rate. To be honest, that’s what i did on my side, I increased hardcoded rate limit and rebuilt Caddy. But all in all, I think there is no need to increase the rate limit, as it will not solve my issue. I deeply dig into the certmagic go code, do lot of debug, and I now completely understand what is going on.

Too make it short @francislavoie was right from the beginning, the “ask” will solve my issue. :slight_smile:

Let me explain. We got a lot of customers, with vanity domain (7000+)
But, a dozen of them have their dns badly configured, in particular they kept Cloudflare in front of our https endpoint.
The issue is that when a https request trigger on_demand tls, an item is pushed into the rate limit queue (ring buffer) it stays there fo 90 seconds. 90s, because it will timeout since the CA challenge request will never be able to reach our Caddy server (seems related to Cloudflare setup).
So with only a 6-7 requests per minutes you can completely jam the rate limiter queue
And then there is no more room for the proper requests.

I will deploy the “ask” route tomorrow morning and I will be able to confirm it solve the issue.

But it is good to know, that failing challenges can quickly fill the rate limiter queue.

1 Like

Ah yep, that’s not too surprising. On-demand TLS should always be gated, otherwise it’s easy to abuse.

Hello,

A final update, to confirm that adding the “ask” hook fixed my issue.

Thanks for the help !

2 Likes

This topic was automatically closed after 30 days. New replies are no longer allowed.