Caddy V2 fails to renew SSL certs - Cloudflare DNS challenge consistently fails (SERVFAIL)

1. Caddy version (caddy version):

v2.2.0-rc.1

2. How I run Caddy:

a. System environment:

LXC container (Proxmox)

b. Command:

/root/caddy run --config /root/Caddyfile2 --adapter caddyfile

d. My complete Caddyfile or JSON config:

# CADDY 2
#
(SecurityHeaders) {
        header {
                Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
                X-Xss-Protection "1; mode=block"
                X-Content-Type-Options "nosniff"
                X-Frame-Options "SAMEORIGIN"
                Content-Security-Policy "upgrade-insecure-requests"
                Referrer-Policy "strict-origin-when-cross-origin"
                Cache-Control "public, max-age=15, must-revalidate"
                Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
        }
}

chat.domain.name {
        import SecurityHeaders
        file_server
        reverse_proxy 192.168.1.145:3000
        tls {
                dns cloudflare token_here
        }
}

home.domain.name {
        import SecurityHeaders
        @excludeDirs {
                not path /local/community* 
                not path /local/private/* 
                not path /local/Thumbs.db
        }
        file_server
        reverse_proxy @excludeDirs 192.168.1.59:8123
        tls {
                dns cloudflare token_here
        }
}

nas.domain.name {
        import SecurityHeaders
        file_server
        reverse_proxy 192.168.1.18:5000
        tls {
                dns cloudflare token_here
        }
}

file.domain.name {
        import SecurityHeaders
        file_server
        reverse_proxy 192.168.1.52:1999
        tls {
                dns cloudflare token_here
        }
}


3. The problem I’m having:

My SSL certificates for 4 selfhosted subdomains recently expired, and I cannot successfully replace the certificates as Caddy never succeeds in refreshing the certificates. It has worked fine in Caddy V1 for the last 2 years and I’m not sure why it’s failing now.
I’ve let it run for over an hour with the same result as the below log - it will just loop seemingly infinite.

4. Error messages and/or full log output:

2020/09/11 21:15:02.469 INFO    using provided configuration    {"config_file": "/root/Caddyfile2", "config_adapter": "caddyfile"}
2020/09/11 21:15:02.473 INFO    admin   admin endpoint started  {"address": "tcp/localhost:2019", "enforce_origin": false, "origins": ["localhost:2019", "[::1]:2019", "127.0.0.1:2019"]}
2020/09/11 21:15:02.473 INFO    http    server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS {"server_name": "srv0", "https_port": 443}
2020/09/11 21:15:02.473 INFO    http    enabling automatic HTTP->HTTPS redirects        {"server_name": "srv0"}
2020/09/11 21:15:02.474 INFO    tls.cache.maintenance   started background certificate maintenance      {"cache": "0xc0006bdab0"}
2020/09/11 21:15:02.477 INFO    http    enabling automatic TLS certificate management   {"domains": ["nas.domain.name", "file.domain.name", "chat.domain.name", "home.domain.name"]}
2020/09/11 21:15:02.478 INFO    tls     cleaned up storage units
2020/09/11 21:15:02.485 INFO    tls.obtain      acquiring lock  {"identifier": "nas.domain.name"}
2020/09/11 21:15:02.485 INFO    tls.obtain      lock acquired   {"identifier": "nas.domain.name"}
2020/09/11 21:15:02.486 INFO    tls.obtain      acquiring lock  {"identifier": "file.domain.name"}
2020/09/11 21:15:02.486 INFO    tls.obtain      lock acquired   {"identifier": "file.domain.name"}
2020/09/11 21:15:02.492 INFO    tls.issuance.acme       waiting on internal rate limiter        {"identifiers": ["file.domain.name"]}
2020/09/11 21:15:02.492 INFO    tls.issuance.acme       done waiting on internal rate limiter   {"identifiers": ["file.domain.name"]}
2020/09/11 21:15:02.493 INFO    tls.issuance.acme       waiting on internal rate limiter        {"identifiers": ["nas.domain.name"]}
2020/09/11 21:15:02.493 INFO    tls.issuance.acme       done waiting on internal rate limiter   {"identifiers": ["nas.domain.name"]}
2020/09/11 21:15:02.579 WARN    tls     stapling OCSP   {"error": "no OCSP stapling for [chat.domain.name]: parsing OCSP response: ocsp: error from server: unauthorized"}
2020/09/11 21:15:02.579 INFO    autosaved config        {"file": "/root/.config/caddy/autosave.json"}
2020/09/11 21:15:02.579 INFO    serving initial configuration
2020/09/11 21:15:02.584 INFO    tls.renew       acquiring lock  {"identifier": "chat.domain.name"}
2020/09/11 21:15:02.585 INFO    tls.renew       lock acquired   {"identifier": "chat.domain.name"}
2020/09/11 21:15:02.585 INFO    tls.obtain      acquiring lock  {"identifier": "home.domain.name"}
2020/09/11 21:15:02.585 INFO    tls.obtain      lock acquired   {"identifier": "home.domain.name"}
2020/09/11 21:15:02.586 INFO    tls.renew       renewing certificate    {"identifier": "chat.domain.name", "remaining": -746823.586510328}
2020/09/11 21:15:02.590 INFO    tls.issuance.acme       waiting on internal rate limiter        {"identifiers": ["home.domain.name"]}
2020/09/11 21:15:02.590 INFO    tls.issuance.acme       done waiting on internal rate limiter   {"identifiers": ["home.domain.name"]}
2020/09/11 21:15:02.591 INFO    tls.issuance.acme       waiting on internal rate limiter        {"identifiers": ["chat.domain.name"]}
2020/09/11 21:15:02.591 INFO    tls.issuance.acme       done waiting on internal rate limiter   {"identifiers": ["chat.domain.name"]}
2020/09/11 21:15:05.600 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "home.domain.name", "challenge_type": "dns-01", "ca": "https://acme-v02.api.letsencrypt.org/directory"}
2020/09/11 21:15:06.008 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "nas.domain.name", "challenge_type": "dns-01", "ca": "https://acme-v02.api.letsencrypt.org/directory"}
2020/09/11 21:15:06.759 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "chat.domain.name", "challenge_type": "dns-01", "ca": "https://acme-v02.api.letsencrypt.org/directory"}
2020/09/11 21:15:07.280 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "file.domain.name", "challenge_type": "dns-01", "ca": "https://acme-v02.api.letsencrypt.org/directory"}
2020/09/11 21:15:08.865 ERROR   tls.obtain      will retry      {"error": "[home.domain.name] Obtain: [home.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.home.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.home.domain.name. (order=https://acme-v02.api.letsencrypt.org/acme/order/87405323/5138745626) (ca=https://acme-v02.api.letsencrypt.org/directory)", "attempt": 1, "retrying_in": 60, "elapsed": 6.280413376, "max_duration": 2592000}
2020/09/11 21:15:08.961 ERROR   tls.obtain      will retry      {"error": "[nas.domain.name] Obtain: [nas.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.nas.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.nas.domain.name. (order=https://acme-v02.api.letsencrypt.org/acme/order/87405323/5138745800) (ca=https://acme-v02.api.letsencrypt.org/directory)", "attempt": 1, "retrying_in": 60, "elapsed": 6.475995438, "max_duration": 2592000}
2020/09/11 21:15:09.342 ERROR   tls.renew       will retry      {"error": "[chat.domain.name] Renew: [chat.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.chat.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.chat.domain.name. (order=https://acme-v02.api.letsencrypt.org/acme/order/87405323/5138745941) (ca=https://acme-v02.api.letsencrypt.org/directory)", "attempt": 1, "retrying_in": 60, "elapsed": 6.757515605, "max_duration": 2592000}
2020/09/11 21:15:09.986 ERROR   tls.obtain      will retry      {"error": "[file.domain.name] Obtain: [file.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.file.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.file.domain.name. (order=https://acme-v02.api.letsencrypt.org/acme/order/87405323/5138746415) (ca=https://acme-v02.api.letsencrypt.org/directory)", "attempt": 1, "retrying_in": 60, "elapsed": 7.500347734, "max_duration": 2592000}
2020/09/11 16:15:52 http: TLS handshake error from 162.158.187.135:55876: no certificate available for 'home.domain.name'
2020/09/11 16:15:52 http: TLS handshake error from 162.158.74.201:43776: no certificate available for 'home.domain.name'
2020/09/11 16:15:52 http: TLS handshake error from 162.158.187.91:22886: no certificate available for 'home.domain.name'
2020/09/11 16:15:52 http: TLS handshake error from 172.68.74.24:22184: no certificate available for 'home.domain.name'
2020/09/11 16:15:52 http: TLS handshake error from 162.158.75.38:42660: no certificate available for 'home.domain.name'
2020/09/11 16:15:52 http: TLS handshake error from 172.68.74.24:22256: no certificate available for 'home.domain.name'
2020/09/11 16:15:57 http: TLS handshake error from 172.68.74.24:24844: no certificate available for 'home.domain.name'
2020/09/11 16:16:02 http: TLS handshake error from 172.68.74.24:27746: no certificate available for 'home.domain.name'
2020/09/11 16:16:02 http: TLS handshake error from 172.68.74.12:46916: no certificate available for 'home.domain.name'
2020/09/11 21:16:09.343 INFO    tls.renew       renewing certificate    {"identifier": "chat.domain.name", "remaining": -746890.343483099}
2020/09/11 21:16:09.405 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "home.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 21:16:09.663 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "nas.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 21:16:09.822 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "chat.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 21:16:10.452 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "file.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 21:16:12.017 ERROR   tls.obtain      will retry      {"error": "[home.domain.name] Obtain: [home.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.home.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.home.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147584237) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 2, "retrying_in": 120, "elapsed": 69.43177083, "max_duration": 2592000}
2020/09/11 21:16:12.292 ERROR   tls.obtain      will retry      {"error": "[nas.domain.name] Obtain: [nas.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.nas.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.nas.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147584240) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 2, "retrying_in": 120, "elapsed": 69.806706389, "max_duration": 2592000}
2020/09/11 21:16:12.319 ERROR   tls.renew       will retry      {"error": "[chat.domain.name] Renew: [chat.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.chat.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.chat.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147584242) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 2, "retrying_in": 120, "elapsed": 69.733921079, "max_duration": 2592000}
2020/09/11 21:16:12.896 ERROR   tls.obtain      will retry      {"error": "[file.domain.name] Obtain: [file.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.file.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.file.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147584245) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 2, "retrying_in": 120, "elapsed": 70.410639717, "max_duration": 2592000}
2020/09/11 16:16:13 http: TLS handshake error from 108.162.237.30:32444: no certificate available for 'home.domain.name'
2020/09/11 16:18:08 http: TLS handshake error from 172.68.74.108:34376: no certificate available for 'home.domain.name'
2020/09/11 21:18:12.320 INFO    tls.renew       renewing certificate    {"identifier": "chat.domain.name", "remaining": -747013.320100185}
2020/09/11 21:18:12.511 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "home.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 21:18:12.798 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "nas.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 21:18:12.842 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "chat.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 21:18:13.369 INFO    tls.issuance.acme.acme_client   trying to solve challenge       {"identifier": "file.domain.name", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2020/09/11 16:18:14 http: TLS handshake error from 172.68.74.24:45908: no certificate available for 'home.domain.name'
2020/09/11 16:18:14 http: TLS handshake error from 172.68.74.12:63944: no certificate available for 'home.domain.name'
2020/09/11 16:18:14 http: TLS handshake error from 172.68.74.108:37552: no certificate available for 'home.domain.name'
2020/09/11 21:18:14.953 ERROR   tls.obtain      will retry      {"error": "[home.domain.name] Obtain: [home.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.home.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.home.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147585407) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 3, "retrying_in": 120, "elapsed": 192.367423455, "max_duration": 2592000}
2020/09/11 21:18:15.251 ERROR   tls.renew       will retry      {"error": "[chat.domain.name] Renew: [chat.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.chat.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.chat.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147585410) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 3, "retrying_in": 120, "elapsed": 192.665971449, "max_duration": 2592000}
2020/09/11 21:18:15.406 ERROR   tls.obtain      will retry      {"error": "[nas.domain.name] Obtain: [nas.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.nas.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.nas.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147585409) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 3, "retrying_in": 120, "elapsed": 192.921200133, "max_duration": 2592000}
2020/09/11 21:18:15.834 ERROR   tls.obtain      will retry      {"error": "[file.domain.name] Obtain: [file.domain.name] solving challenges: waiting for solver *certmagic.DNS01Solver to be ready: checking DNS propagation of _acme-challenge.file.domain.name: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.file.domain.name. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/13918507/147585418) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 3, "retrying_in": 120, "elapsed": 193.34826302, "max_duration": 2592000}
^C2020/09/11 21:18:33.409       INFO    shutting down   {"signal": "SIGINT"}
2020/09/11 21:18:33.409 INFO    tls.obtain      releasing lock  {"identifier": "nas.domain.name"}
2020/09/11 21:18:33.409 INFO    tls.renew       releasing lock  {"identifier": "chat.domain.name"}
2020/09/11 21:18:33.409 INFO    tls.obtain      releasing lock  {"identifier": "home.domain.name"}
2020/09/11 21:18:33.409 INFO    tls.cache.maintenance   stopped background certificate maintenance      {"cache": "0xc0006bdab0"}
2020/09/11 21:18:33.409 INFO    tls.obtain      releasing lock  {"identifier": "file.domain.name"}
2020/09/11 16:18:33 [ERROR] Unable to clean up lock: remove /root/.local/share/caddy/locks/cert_acme_home.domain.name_acme-v02.api.letsencrypt.org-directory.lock: no such file or directory (lock=cert_acme_home.domain.name_acme-v02.api.letsencrypt.org-directory storage=filetorage:/root/.local/share/caddy)
2020/09/11 21:18:33.411 INFO    admin   stopped previous server
2020/09/11 21:18:33.411 INFO    shutdown done   {"signal": "SIGINT"}
2020/09/11 21:18:33.411 ERROR   tls     job failed      {"error": "chat.domain.name: renewing certificate: context canceled"}
2020/09/11 21:18:33.411 ERROR   tls     job failed      {"error": "home.domain.name: obtaining certificate: context canceled"}

5. What I already tried:

  • Rebooted the LXC container
  • Rebuilt Caddy2 from scratch using the following xcaddy command: xcaddy build v2.2.0-rc.1 --with github.com/caddy-dns/cloudflare@latest
  • Let it run for over an hour

SERVFAIL means there’s an error communicating with the DNS server. How’s your network config? Especially TCP/UDP on port 53, and your DNS settings?

I have another container running Pihole. Pihole is NOT blocking ads - I have it bypassed in order to renew the certs - and it’s connecting the standard Cloudflare upstream nameservers. This Pihole instance is acting as the DNS server.

Are there any specific commands I can run from either container to test where it is going wrong? I have no other DNS issues so maybe it’s something to do with getting the certificates through Pihole (which again, is not blocking anything right now).