Caddy DNS Challenge Not Working

1. The problem I’m having:

Caddy’s DNS Challenge will work on a new build (sometimes) yet upon docker compose down caddy && docker compose up -d caddy or docker compose restart caddy there’s no sign of the DNS challenge even attempting with the Namecheap plugin. I have debug included and it just seems to fail to use the plugin.

It worked 2 times for me, once for the code.example.com domain when I first built the dns module with xcaddy and the second time for portainer.example.com when I rebuilt the caddy image again. It seems it will work on a fresh build with no volumes but upon making adjustments to the Caddyfile, it does not trigger the per site DNS challenge again…

2. Error messages and/or full log output:

INF ts=1732646995.3700247 msg=using config from file file=/etc/caddy/Caddyfile

INF ts=1732646995.3839753 msg=adapted config to JSON adapter=caddyfile

INF ts=1732646995.3855848 logger=admin msg=admin endpoint started address=localhost:2019 enforce_origin=false origins=["//127.0.0.1:2019","//localhost:2019","//[::1]:2019"]

INF ts=1732646995.3859825 logger=http.auto_https msg=server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS server_name=srv0 https_port=443

INF ts=1732646995.3860044 logger=http.auto_https msg=enabling automatic HTTP->HTTPS redirects server_name=srv0

WRN ts=1732646995.3860314 logger=http.auto_https msg=server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server server_name=srv1 http_port=80

DBG ts=1732646995.3862822 logger=http.auto_https msg=adjusted config tls={"automation":{"policies":[{"subjects":["portainer.example.com","ns1.example.com"]},{"subjects":["network.example.com","auth.example.com","odoo.example.com"]},{}]}} http={"servers":{"srv0":{"listen":[":443"],"routes":[{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"portainer:9000"}]}]}]}],"terminal":true},{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"headers","response":{"deferred":true,"delete":["Server"],"set":{"Referrer-Policy":["strict-origin-when-cross-origin"],"Strict-Transport-Security":["max-age=3600; includeSubDomains; preload"],"X-Content-Type-Options":["nosniff"],"X-Frame-Options":["SAMEORIGIN"],"X-Xss-Protection":["1; mode=block"]}}}]},{"handle":[{"handler":"reverse_proxy","transport":{"protocol":"http","versions":["h2c","2"]},"upstreams":[{"dial":"signal:10000"}]}],"match":[{"path":["/signalexchange.SignalExchange/*"]}]},{"handle":[{"handler":"reverse_proxy","transport":{"protocol":"http","versions":["h2c","2"]},"upstreams":[{"dial":"management:80"}]}],"match":[{"path":["/management.ManagementService/*"]}]},{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"relay:80"}]}],"match":[{"path":["/relay*"]}]},{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"management:80"}]}],"match":[{"path":["/api/*"]}]},{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"dashboard:80"}]}],"match":[{"path":["/*"]}]}]}],"terminal":true},{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"headers","response":{"deferred":true,"delete":["Server"],"set":{"Referrer-Policy":["strict-origin-when-cross-origin"],"Strict-Transport-Security":["max-age=3600; includeSubDomains; preload"],"X-Content-Type-Options":["nosniff"],"X-Frame-Options":["SAMEORIGIN"],"X-Xss-Protection":["1; mode=block"]}}},{"handler":"reverse_proxy","upstreams":[{"dial":"zitadel:8080"}]}]}]}],"terminal":true},{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"zigop.netbird.selfhosted:8069"}]}]}]}],"terminal":true},{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"technitium-dns:5380"}]}]}]}],"terminal":true}],"tls_connection_policies":[{}],"automatic_https":{}},"srv1":{"listen":[":80"],"routes":[{},{"handle":[{"handler":"headers","response":{"deferred":true,"delete":["Server"],"set":{"Referrer-Policy":["strict-origin-when-cross-origin"],"Strict-Transport-Security":["max-age=3600; includeSubDomains; preload"],"X-Content-Type-Options":["nosniff"],"X-Frame-Options":["SAMEORIGIN"],"X-Xss-Protection":["1; mode=block"]}}}]},{"handle":[{"handler":"reverse_proxy","transport":{"protocol":"http","versions":["h2c","2"]},"upstreams":[{"dial":"signal:10000"}]}]},{"handle":[{"handler":"reverse_proxy","transport":{"protocol":"http","versions":["h2c","2"]},"upstreams":[{"dial":"management:80"}]}]},{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"relay:80"}]}]},{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"management:80"}]}]},{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"dashboard:80"}]}]},{}],"automatic_https":{"disable":true}}}}

INF ts=1732646995.3870878 logger=http msg=enabling HTTP/3 listener addr=:443

INF ts=1732646995.3871508 msg=failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.

DBG ts=1732646995.387268 logger=http msg=starting server loop address=[::]:443 tls=true http3=true

INF ts=1732646995.3872778 logger=http.log msg=server running name=srv0 protocols=["h1","h2","h3"]

DBG ts=1732646995.3873096 logger=http msg=starting server loop address=[::]:80 tls=false http3=false

INF ts=1732646995.3886507 logger=http.log msg=server running name=srv1 protocols=["h1","h2","h3"]

INF ts=1732646995.3887045 logger=http msg=enabling automatic TLS certificate management domains=["auth.example.com","odoo.example.com","ns1.example.com","portainer.example.com","network.example.com"]

DBG ts=1732646995.3889763 logger=tls msg=loading managed certificate domain=auth.example.com expiration=1739978435 issuer_key=acme-v02.api.letsencrypt.org-directory storage=FileStorage:/data/caddy

DBG ts=1732646995.389368 logger=tls.cache msg=added certificate to cache subjects=["auth.example.com"] expiration=1739978435 managed=true issuer_key=acme-v02.api.letsencrypt.org-directory hash=redact cache_size=1 cache_capacity=10000

DBG ts=1732646995.390747 logger=events msg=event name=cached_managed_cert id=9cb2ce55-562e-4ee1-b886-7631755bd8c2 origin=tls data={"sans":["auth.example.com"]}

DBG ts=1732646995.3910155 logger=tls msg=loading managed certificate domain=odoo.example.com expiration=1739978444 issuer_key=acme-v02.api.letsencrypt.org-directory storage=FileStorage:/data/caddy

DBG ts=1732646995.3916628 logger=tls.cache msg=added certificate to cache subjects=["odoo.example.com"] expiration=1739978444 managed=true issuer_key=acme-v02.api.letsencrypt.org-directory hash=redact cache_size=2 cache_capacity=10000

DBG ts=1732646995.3918872 logger=events msg=event name=cached_managed_cert id=fecea9e6-8dce-4463-8df4-a69c8a1e44a7 origin=tls data={"sans":["odoo.example.com"]}

DBG ts=1732646995.392637 logger=tls msg=loading managed certificate domain=network.example.com expiration=1739978434 issuer_key=acme-v02.api.letsencrypt.org-directory storage=FileStorage:/data/caddy


{"level":"info","ts":1732646995.3929372,"logger":"tls","msg":"storage cleaning happened too recently; skipping for now","storage":"FileStorage:/data/caddy","instance":"4bb9d250-0f94-4e3f-89ea-cbc0b9b0382f","try_again":1732733395.3929355,"try_again_in":86399.999999679}

INF ts=1732646995.3941386 logger=tls msg=finished cleaning storage units

INF ts=1732646995.3905456 logger=tls.cache.maintenance msg=started background certificate maintenance cache=0xc000613800

DBG ts=1732646995.3943655 logger=tls.cache msg=added certificate to cache subjects=["network.example.com"] expiration=1739978434 managed=true issuer_key=acme-v02.api.letsencrypt.org-directory hash=redact cache_size=3 cache_capacity=10000

DBG ts=1732646995.3944361 logger=events msg=event name=cached_managed_cert id=b00f6798-a2ba-4a3b-8149-838a5d852f92 origin=tls data={"sans":["network.example.com"]}

INF ts=1732646995.3948941 logger=tls.obtain msg=acquiring lock identifier=ns1.example.com

INF ts=1732646995.3951735 msg=[INFO][FileStorage:/data/caddy] Lock for 'issue_cert_ns1.example.com' is stale (created: 2024-11-26 18:41:00.261445267 +0000 UTC, last update: 2024-11-26 18:49:10.431256221 +0000 UTC); removing then retrying: /data/caddy/locks/issue_cert_ns1.example.com.lock

DBG ts=1732646995.3955538 logger=tls.cache msg=added certificate to cache subjects=["portainer.example.com"] expiration=1739978502 managed=true issuer_key=acme-v02.api.letsencrypt.org-directory hash=redact cache_size=4 cache_capacity=10000

DBG ts=1732646995.3956673 logger=events msg=event name=cached_managed_cert id=296edffe-a16f-40d2-b167-9e801a17d563 origin=tls data={"sans":["portainer.example.com"]}

INF ts=1732646995.3959072 logger=tls.obtain msg=lock acquired identifier=ns1.example.com

INF ts=1732646995.3960688 logger=tls.obtain msg=obtaining certificate identifier=ns1.example.com

DBG ts=1732646995.3973002 logger=events msg=event name=cert_obtaining id=1ccf30e7-85a4-4edd-956f-9e248473ab4e origin=tls data={"identifier":"ns1.example.com"}

DBG ts=1732646995.397737 logger=tls.obtain msg=trying issuer 1/1 issuer=acme-v02.api.letsencrypt.org-directory

INF ts=1732646995.3979106 msg=autosaved config (load with --resume flag) file=/config/caddy/autosave.json

INF ts=1732646995.397962 msg=serving initial configuration

INF ts=1732646995.398579 logger=tls.issuance.acme msg=waiting on internal rate limiter identifiers=["ns1.example.com"] ca=https://acme-v02.api.letsencrypt.org/directory account=certs@example.com

INF ts=1732646995.39863 logger=tls.issuance.acme msg=done waiting on internal rate limiter identifiers=["ns1.example.com"] ca=https://acme-v02.api.letsencrypt.org/directory account=certs@example.com

INF ts=1732646995.398675 logger=tls.issuance.acme msg=using ACME account account_id=https://acme-v02.api.letsencrypt.org/acme/acct/2069888037 account_contact=["mailto:certs@example.com"]

DBG ts=1732646995.4833274 logger=tls.issuance.acme.acme_client msg=http request method=GET url=https://acme-v02.api.letsencrypt.org/directory headers={"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]} response_headers={"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["746"],"Content-Type":["application/json"],"Date":["Tue, 26 Nov 2024 18:49:55 GMT"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]} status_code=200

DBG ts=1732646995.4835422 logger=tls.issuance.acme.acme_client msg=creating order account=https://acme-v02.api.letsencrypt.org/acme/acct/2069888037 identifiers=["ns1.example.com"]

DBG ts=1732646995.5110762 logger=tls.issuance.acme.acme_client msg=http request method=HEAD url=https://acme-v02.api.letsencrypt.org/acme/new-nonce headers={"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]} response_headers={"Cache-Control":["public, max-age=0, no-cache"],"Date":["Tue, 26 Nov 2024 18:49:55 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["gtp33icHkEcke3t9SK4Blh1eaa7r1Zdu5ZEUAHrRzH2RfCEg0oU"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]} status_code=200

DBG ts=1732646995.6957638 logger=tls.issuance.acme.acme_client msg=http request method=POST url=https://acme-v02.api.letsencrypt.org/acme/new-order headers={"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]} response_headers={"Boulder-Requester":["2069888037"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["352"],"Content-Type":["application/json"],"Date":["Tue, 26 Nov 2024 18:49:55 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Location":["https://acme-v02.api.letsencrypt.org/acme/order/2069888037/327082937507"],"Replay-Nonce":["gtp33icHGU3-8GL9USLj0HbIJaLtQ2Tr4JNsz1X2cr58-dfYhEY"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]} status_code=201

DBG ts=1732646995.7409625 logger=tls.issuance.acme.acme_client msg=http request method=POST url=https://acme-v02.api.letsencrypt.org/acme/authz/2069888037/435944080097 headers={"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]} response_headers={"Boulder-Requester":["2069888037"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["826"],"Content-Type":["application/json"],"Date":["Tue, 26 Nov 2024 18:49:55 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["gtp33icHfBOJuWiu5LLyQEQ9HZssWccoLztUJ84pO0Y_dhAol1g"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]} status_code=200

INF ts=1732646995.7418923 logger=tls.issuance.acme.acme_client msg=trying to solve challenge identifier=ns1.example.com challenge_type=dns-01 ca=https://acme-v02.api.letsencrypt.org/directory

ERR ts=1732646995.7441037 logger=tls.issuance.acme.acme_client msg=cleaning up solver identifier=ns1.example.com challenge_type=dns-01 error=no memory of presenting a DNS record for "_acme-challenge.ns1.example.com" (usually OK if presenting also failed)

DBG ts=1732646995.8146636 logger=tls.issuance.acme.acme_client msg=http request method=POST url=https://acme-v02.api.letsencrypt.org/acme/authz/2069888037/435944080097 headers={"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]} response_headers={"Boulder-Requester":["2069888037"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["830"],"Content-Type":["application/json"],"Date":["Tue, 26 Nov 2024 18:49:55 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["gtp33icHTr3KkvZah9q5rEmFfmll616T9uonvCFA-EN__Pxp5Zc"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]} status_code=200

ERR ts=1732646995.814781 logger=tls.obtain msg=could not get certificate from issuer identifier=ns1.example.com issuer=acme-v02.api.letsencrypt.org-directory error=[ns1.example.com] solving challenges: presenting for challenge: could not determine zone for domain "_acme-challenge.ns1.example.com": unexpected response code 'SERVFAIL' for _acme-challenge.ns1.example.com. (order=https://acme-v02.api.letsencrypt.org/acme/order/2069888037/327082937507) (ca=https://acme-v02.api.letsencrypt.org/directory)

DBG ts=1732646995.8148265 logger=events msg=event name=cert_failed id=1d2cc2b6-31ee-454c-93a6-61b75c712346 origin=tls data={"error":{},"identifier":"ns1.example.com","issuers":["acme-v02.api.letsencrypt.org-directory"],"renewal":false}

ERR ts=1732646995.8148348 logger=tls.obtain msg=will retry error=[ns1.example.com] Obtain: [ns1.example.com] solving challenges: presenting for challenge: could not determine zone for domain "_acme-challenge.ns1.example.com": unexpected response code 'SERVFAIL' for _acme-challenge.ns1.example.com. (order=https://acme-v02.api.letsencrypt.org/acme/order/2069888037/327082937507) (ca=https://acme-v02.api.letsencrypt.org/directory) attempt=1 retrying_in=60 elapsed=0.418814524 max_duration=2592000

DBG ts=1732646996.8947306 logger=http.stdlib msg=http: TLS handshake error from 172.18.0.1:41308: EOF

3. Caddy version:

v2.8.4

4. How I installed and ran Caddy:

a. System environment:

Caddy is running in docker compose on an Ubuntu VPS:

Docker Compose version v2.29.7

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy

b. Command:

First I downloaded xcaddy into a Docker-Build directory using the following from the xcaddy resource (listed at bottom)

sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/xcaddy/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-xcaddy-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/xcaddy/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-xcaddy.list
sudo apt update
sudo apt install xcaddy

Then, I created a Dockerfile:

# Use the official Caddy 2.8.4 image
FROM caddy:2.8.4 AS builder

# Copy the custom Caddy binary into the builder stage
COPY ./caddy /usr/bin/caddy

# Use the same Caddy version for the final image
FROM caddy:2.8.4

# Copy the custom Caddy binary into the final image
COPY --from=builder /usr/bin/caddy /usr/bin/caddy

Next, I built the docker file with
docker build -t zignet-caddy .

Lastly, I referenced the build in my compose.yaml file and ran:

docker compose up -d

NOTE: The first run worked, and the DNS challenge triggered the Namecheap API. I confirmed in the DNS section of Namecheap that the TXT record was present. That is the only time it worked for my domains that use the Namecheap plugin. After updating the Caddyfile with new entries, it simply shows no sign that caddy is attempting the to run the Namecheap plugin. But, it does attempt to look for the TXT record that the API should have injected.

c. Service/unit/compose file:

  caddy:
    image: zignet-caddy
    container_name: caddy
    restart: unless-stopped
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    networks:
      - zitadel
    ports:
      - "80:80"
      - "443:443"
    depends_on:
      - zitadel
    env_file:
      - ./caddy.env

networks:
  zitadel:

volumes:
  db_data:
  caddy_data:
  caddy_config:

d. My complete Caddy config:

Please forgive me for redacting my website name but being it’s private, I do not want it on web fourms. However, I do not think it’s necessary to debug the issue.

{
	debug
	email certs@example.com
	servers :80,:443 {
		protocols h1 h2c
	}
}

(security_headers) {
	header * {
		# enable HSTS
		# https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html#strict-transport-security-hsts
		# NOTE: Read carefully how this header works before using it.
		# If the HSTS header is misconfigured or if there is a problem with
		# the SSL/TLS certificate being used, legitimate users might be unable
		# to access the website. For example, if the HSTS header is set to a
		# very long duration and the SSL/TLS certificate expires or is revoked,
		# legitimate users might be unable to access the website until
		# the HSTS header duration has expired.
		# The recommended value for the max-age is 2 year (63072000 seconds).
		# But we are using 1 hour (3600 seconds) for testing purposes
		# and ensure that the website is working properly before setting
		# to two years.

		Strict-Transport-Security "max-age=3600; includeSubDomains; preload"

		# disable clients from sniffing the media type
		# https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html#x-content-type-options
		X-Content-Type-Options "nosniff"

		# clickjacking protection
		# https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html#x-frame-options
		X-Frame-Options "SAMEORIGIN"

		# xss protection
		# https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html#x-xss-protection
		X-XSS-Protection "1; mode=block"

		# Remove -Server header, which is an information leak
		# Remove Caddy from Headers
		-Server

		# keep referrer data off of HTTP connections
		# https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html#referrer-policy
		Referrer-Policy strict-origin-when-cross-origin
	}
}

# Route for Zitadel
auth.example.com {
	import security_headers
	reverse_proxy zitadel:8080
}

# Route for Netbird
:80, network.example.com:443 {
	import security_headers
	# relay
	reverse_proxy /relay* relay:80
	# Signal
	reverse_proxy /signalexchange.SignalExchange/* h2c://signal:10000
	# Management
	reverse_proxy /api/* management:80
	reverse_proxy /management.ManagementService/* h2c://management:80
	# Dashboard
	reverse_proxy /* dashboard:80
}

####################
# PRIVATE SERVICES #
####################


#https://monitor.example.com {
#	reverse_proxy zignet.netbird.selfhosted:10000

#	tls {
#           dns namecheap {
#                api_key redact
#                user myusername
#                api_endpoint https://api.namecheap.com/xml.response
#            }
#        }
#}


# Odoo Instance - ZIGOP
https://odoo.example.com:443 {
	reverse_proxy zigop.netbird.selfhosted:8069
}

# OneDev Repository Instance - ZIGOP
#https://code.example.com:443 {
#    reverse_proxy zigop.netbird.selfhosted:6610

#    tls {
#        dns namecheap {
#            api_key redact
#            user myusername
#            api_endpoint https://api.namecheap.com/xml.response
#        }
#    }
#}

# Technitium-DNS Nameserver 1 - ZIGNET 
https://ns1.example.com {
	reverse_proxy technitium-dns:5380
	tls {
		dns namecheap {
			api_key redact
			user myusername
			api_endpoint https://api.namecheap.com/xml.response
			client_ip <vps.public.ip>
		}
	}
}

https://portainer.example.com {
	reverse_proxy portainer:9000
	tls {
		dns namecheap {
			api_key redact
			user myusername
			api_endpoint https://api.namecheap.com/xml.response
			client_ip <vps.public.ip>
		}
	}
}

I’m including this for transparency but Caddy does not actually pick up these variables currently.
caddy.env:

ACME_EMAIL=certs@example.com
ACME_CA=acme-staging-v02.api.letsencrypt.org

5. Links to relevant resources:

Where I got the Namecheap DNS plugin:

Where I got xcaddy:

Hi @papazig,

What do see using the online tool Let’s Debug for the DNS-01 challenge?

Change the dropdown box from
image

to
image
before running.

Please note example.com is the proper domain name to use if you redacted the actual domain name.

As https://example.com/ shows the intended usages

Example Domain
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

Also see:

Thank you for input, I have made the changes.

1 Like

Thank you for the response. All of my sites are OK on the debug. The issue, at least my understanding of it, is not with Caddy’s ability to query Let’s Encrypt and interpret the results, the issue is that the Namecheap API Provider Module is not triggering on updates of the Caddyfile. From the logs:

INF ts=1732646995.7418923 logger=tls.issuance.acme.acme_client msg=trying to solve challenge identifier=ns1.example.com challenge_type=dns-01 ca=https://acme-v02.api.letsencrypt.org/directory

ERR ts=1732646995.7441037 logger=tls.issuance.acme.acme_client msg=cleaning up solver identifier=ns1.example.com challenge_type=dns-01 error=no memory of presenting a DNS record for "_acme-challenge.ns1.example.com" (usually OK if presenting also failed)

DBG ts=1732646995.8146636 logger=tls.issuance.acme.acme_client msg=http request method=POST url=https://acme-v02.api.letsencrypt.org/acme/authz/2069888037/435944080097 headers={"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]} response_headers={"Boulder-Requester":["2069888037"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["830"],"Content-Type":["application/json"],"Date":["Tue, 26 Nov 2024 18:49:55 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["gtp33icHTr3KkvZah9q5rEmFfmll616T9uonvCFA-EN__Pxp5Zc"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]} status_code=200

ERR ts=1732646995.814781 logger=tls.obtain msg=could not get certificate from issuer identifier=ns1.example.com issuer=acme-v02.api.letsencrypt.org-directory error=[ns1.example.com] solving challenges: presenting for challenge: could not determine zone for domain "_acme-challenge.ns1.example.com": unexpected response code 'SERVFAIL' for _acme-challenge.ns1.example.com. (order=https://acme-v02.api.letsencrypt.org/acme/order/2069888037/327082937507) (ca=https://acme-v02.api.letsencrypt.org/directory)

DBG ts=1732646995.8148265 logger=events msg=event name=cert_failed id=1d2cc2b6-31ee-454c-93a6-61b75c712346 origin=tls data={"error":{},"identifier":"ns1.example.com","issuers":["acme-v02.api.letsencrypt.org-directory"],"renewal":false}

ERR ts=1732646995.8148348 logger=tls.obtain msg=will retry error=[ns1.example.com] Obtain: [ns1.example.com] solving challenges: presenting for challenge: could not determine zone for domain "_acme-challenge.ns1.example.com": unexpected response code 'SERVFAIL' for _acme-challenge.ns1.example.com. (order=https://acme-v02.api.letsencrypt.org/acme/order/2069888037/327082937507) (ca=https://acme-v02.api.letsencrypt.org/directory) attempt=1 retrying_in=60 elapsed=0.418814524 max_duration=2592000

It looks like Caddy is trying to process the certs as a dns challenge but again, whatever method/function/sub/process (I have very little experience with go) that is supposed to trigger the dns.providers.namecheap module either is failing to do it or is skipping over it. I would think a failure of it’s implementation would trigger a log post but since there’s no mention of it even trying in the logs, I think there’s something else going on here.

I confirmed the module is included in the caddy instance with the following:
docker exec -it caddy sh

caddy list-modules
...
dns.providers.namecheap

  Non-standard modules: 1
...
1 Like

Sounds like a buggy local DNS resolver wigging out. Caddy needs to find out what the base domain is and I think from memory it makes an SOA query to do that; we see the occasional broken local DNS impeding things for this reason.

As a quick workaround, get Caddy to skip local resolution and use a known good public resolver for this.

https://caddyserver.com/docs/caddyfile/directives/tls#resolvers

1 Like

Whitestrake! My HERO! This did the trick! I’ll buy you a beer or some coffee if you have a link :smiley:

I’m going to attempt to understand why this may have happened:

I am running Netbird which runs it’s own locally embedded DNS so maybe that’s part of the reason? Or, the fact I am running Technitium-DNS which the Zone is set to example.com and I made an entry in there first before updating the Caddyfile… So, is it possible caddy was trying to send a DNS challenge to Technitium instead of Namecheaps endpoint ? Not sure if that makes sense either. Its hard to think with how excited I am that this is resolved and I can finally move on.

I’m going to do some testing when I get a chance to see if an alternate procedure solves the issue without me needing to use the resolvers setting.

I don’t see a global option to set the resolvers nor do I know if I even should versus site level in the TLS directive. Any thoughts on that?

Once again, THANK YOU!

Happy to help!

Caddy doesn’t really get too involved here, it just leans on the system resolver and makes a query. If the system-configured local resolver points to a server that returns SERVFAIL instead of recursing out to working public DNS, it’ll break things. That might be Netbird throwing the error if its resolver is taking precedence, or it might be Technitium, or maybe it’s just your router/ISPs resolver. Either way, the public DNS works, and cases like this are ultimately why that option exists.

If you can figure out where that DNS request is going and which server is returning the SERVFAIL, and either update your Caddy host to prioritise a different resolver or get that DNS server to recurse instead of erroring out, you wouldn’t need the override.

I don’t think there is a global option for this, I’m afraid. You can use snippets, though: https://caddyserver.com/docs/caddyfile/concepts#snippets

2 Likes