1. The problem I’m having:
Depending on which LAN network I run a Caddyfile (below), caddy/acme will timeout at the DNS TXT record verification step.
On the LAN that mis behaves the _acme-challenge record is successfully written to my route53 host (I can see it in the AWS console) but then acme/caddy is unable to confirm that with a lookup. I even tried to set a custom “resolver” to 8.8.8.8 and that did not help.
I don’t see this as a caddy issue. but maybe collective experience here can help me. I mean there is nothing wrong with my caddy container nor the Caddyfile (since it runs fine on another machine/LAN) but rather something is amiss with the process of how the DNS TXT is verified after being written into with this particular machine/LAN/gateway-router.
I guess if someone could explain how the lookup step is done I might be able to track down why this is happening. I mean is that lookup step done from acme servers or from my caddy instance out from my LAN directly? If I dig that TXT record from that machine/LAN in question it comes back with the correct value so why if I can do this via command line does it not work from caddy/acme.
; <<>> DiG 9.16.1-Ubuntu <<>> TXT _acme-challenge.admin.sj111.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20483
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;_acme-challenge.admin.sj111.net. IN TXT
;; ANSWER SECTION:
_acme-challenge.admin.sj111.net. 0 IN TXT "NuCBJY1ncyDidUAWgNQEz4c21rDrY9e7H74Y8D4FnFs"
;; Query time: 59 msec
;; SERVER: 192.168.8.1#53(192.168.8.1)
;; WHEN: Sun Feb 11 09:45:46 PST 2024
;; MSG SIZE rcvd: 116
If I look it up from outside my LAN like this no problem it’s “there” almost immediately. TXT Record Lookup - Check Text Record (TXT) DNS records for any domain
2. Error messages and/or full log output:
{"level":"error","ts":1707412747.1206288,"logger":"tls.obtain","msg":"could not get certificate from issuer" ,"identifier":"*.admin.sj111.net","issuer":"acme-staging-v02.api.letsencrypt.org-directory","error":"[*.admin.sj111.net] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/135555693/14340715083) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)"}
3. Caddy version:
v2.7.6 h1:w0NymbG2m9PcvKWsrXO6EEkY9Ru4FJK8uQbYcev1p3A=
4. How I installed and ran Caddy:
I run caddy in a docker container with custom image I build.
The image grabs the latest release rather then load via alpine packages.
a. System environment:
latest alpine 3.19
b. Command:
/opt/caddy/bin/caddy run --config test.conf --adapter caddyfile
btw that container is running on a host of ubuntu focal on an RPI4
services:
caddy:
container_name: ${NAME:-caddy}
image: ${IMAGE:-caddy}
# if no $CONF is given then Caddyfile in ${PWD}/conf:/opt/caddy/conf will be used
command: caddy run ${CONF}
hostname: ${NAME:-caddy}
env_file:
- $CREDENTIALS
volumes:
- data:/opt/caddy/data
- settings:/opt/caddy/settings
- conf:/opt/caddy/conf
# - files:/opt/caddy/files
restart: unless-stopped
ports:
- 80:80
- 443:443
- 2019:2019
# binding data and settings are not required
# But if there volumes are deleted caddy will need to redo all the certs
volumes:
data:
# driver_opts:
# type: none
# device: ${PWD}/data
# o: bind
settings:
# driver_opts:
# type: none
# device: ${PWD}/config
# o: bind
# files:
# driver_opts:
# type: none
# device: /data/Hacking/webfiles
# o: bind
conf:
driver_opts:
type: none
device: ${PWD}/conf
o: bind
d. My complete Caddy config:
{
acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
}
*.sj111.net, *.admin.sj111.net, *.dashboard.sj111.net {
tls sj111.net@gmail.com {
resolvers 8.8.8.8
dns route53 {
max_retries 10
}
}
@docker host docker.sj111.net
handle @docker {
reverse_proxy admin.111.net:9005
}
}