One particular domain refuses to get a proper SSL cert using Cloudflare DNS, while others work fine

1. Caddy version (caddy version): V2, latest as of today

2. How I run Caddy:

LXC container

a. System environment:

Debian 10

b. Command:

caddy run --config /root/Caddyfile --adapter caddyfile

d. My complete Caddyfile or JSON config:

my.domain.name {
        header {
                Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
                X-Xss-Protection "1; mode=block"
                X-Content-Type-Options "nosniff"
                X-Frame-Options "SAMEORIGIN"
                Content-Security-Policy "upgrade-insecure-requests"
                Referrer-Policy "strict-origin-when-cross-origin"
                Cache-Control "public, max-age=15, must-revalidate"
                Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
        }
        reverse_proxy 192.168.1.2:3456
        tls {
                dns cloudflare token_here
        }
}

my2.domain.name {
        header {
                Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
                X-Xss-Protection "1; mode=block"
                X-Content-Type-Options "nosniff"
                X-Frame-Options "SAMEORIGIN"
                Content-Security-Policy "upgrade-insecure-requests"
                Referrer-Policy "strict-origin-when-cross-origin"
                Cache-Control "public, max-age=15, must-revalidate"
                Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
        }
        @excludeDirs {
                not path /local/dir1* /local/dir2* /local/dir3/* /local/file.db
        }
        reverse_proxy @excludeDirs 192.168.1.5:1892
        tls {
                dns cloudflare token_here
        }
}

my3.domain.name {
        header {
                Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
                X-Xss-Protection "1; mode=block"
                X-Content-Type-Options "nosniff"
                X-Frame-Options "SAMEORIGIN"
                Content-Security-Policy "upgrade-insecure-requests"
                Referrer-Policy "strict-origin-when-cross-origin"
                Cache-Control "public, max-age=15, must-revalidate"
                Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
        }
        reverse_proxy 192.168.1.81:2181
        tls {
                dns cloudflare token_here
        }
}

my4.domain.name {
        header {
                Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
                X-Xss-Protection "1; mode=block"
                X-Content-Type-Options "nosniff"
                X-Frame-Options "SAMEORIGIN"
                Content-Security-Policy "upgrade-insecure-requests"
                Referrer-Policy "strict-origin-when-cross-origin"
                Cache-Control "public, max-age=15, must-revalidate"
                Feature-Policy "accelerometer 'none'; ambient-light-sensor 'none'; autoplay 'self'; camera 'none'; encrypted-media 'none'; fullscreen 'self'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; midi 'none'; payment 'none'; picture-in-picture *; speaker 'none'; sync-xhr 'none'; usb 'none'; vr 'none'"
        }
        reverse_proxy 192.168.1.50:9283
        tls {
                dns cloudflare token_here
        }
}

3. The problem Iā€™m having:

All of the domains in my Caddyfile get a SSL cert just fine, EXCEPT the first one. I just cannot get it to properly obtain an SSL certificate. Iā€™ve tried for the last 3 days and every single time it fails. When I now visit the site I get error 525: SSL handshake failed. Hereā€™s more information from Cloudflare on this error: https://support.cloudflare.com/hc/en-us/articles/115003011431#525error

4. Error messages and/or full log output:

This is the full output of when I try to run the caddy command.

root@caddy:~# /root/caddy run --config /root/Caddyfile --adapter caddyfile
2020/05/30 19:02:40.385 INFO    using provided configuration    {"config_file": "/root/Caddyfile2", "config_adapter": "caddyfile"}
2020/05/30 19:02:40.390 INFO    admin   admin endpoint started  {"address": "tcp/localhost:2019", "enforce_origin": false, "origins": ["127.0.0.1:2019", "localhost:2019", "[::1]:2019"]}
2020/05/30 19:02:40.390 INFO    http    enabling automatic HTTP->HTTPS redirects        {"server_name": "srv0"}
2020/05/30 14:02:40 [INFO][cache:0xc0007e71d0] Started certificate maintenance routine
2020/05/30 19:02:40.392 INFO    tls     cleaned up storage units
2020/05/30 19:02:40.392 INFO    http    enabling automatic TLS certificate management   {"domains": ["my4.domain.name", "my.domain.name", "my3.domain.name", "my2.domain.name"]}
2020/05/30 19:02:40.400 INFO    autosaved config        {"file": "/root/.config/caddy/autosave.json"}
2020/05/30 19:02:40.400 INFO    serving initial configuration
2020/05/30 14:02:40 [INFO][my.domain.name] Obtain certificate; acquiring lock...
2020/05/30 14:02:40 [INFO][my.domain.name] Obtain: Lock acquired; proceeding...
2020/05/30 14:02:40 [INFO][my.domain.name] Waiting on rate limiter...
2020/05/30 14:02:40 [INFO][my.domain.name] Done waiting
2020/05/30 14:02:40 [INFO] [my.domain.name] acme: Obtaining bundled SAN certificate given a CSR
2020/05/30 14:02:41 [INFO] [my.domain.name] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/4914613081
2020/05/30 14:02:41 [INFO] [my.domain.name] acme: Could not find solver for: tls-alpn-01
2020/05/30 14:02:41 [INFO] [my.domain.name] acme: Could not find solver for: http-01
2020/05/30 14:02:41 [INFO] [my.domain.name] acme: use dns-01 solver
2020/05/30 14:02:41 [INFO] [my.domain.name] acme: Preparing to solve DNS-01
2020/05/30 14:02:41 [INFO] [my.domain.name] acme: Trying to solve DNS-01
2020/05/30 14:02:41 [INFO] [my.domain.name] acme: Checking DNS record propagation using [192.168.1.100:53 192.168.1.74:53]
2020/05/30 14:02:41 [INFO] Wait for propagation [timeout: 1m0s, interval: 2s]
2020/05/30 14:02:41 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:43 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:45 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:47 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:49 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:51 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:53 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:56 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:02:58 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:00 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:02 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:04 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:06 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:08 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:10 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:12 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:14 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:16 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:18 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:20 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:22 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:24 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:26 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:28 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:30 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:32 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:34 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:36 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:38 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:40 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:03:42 [INFO] [my.domain.name] acme: Cleaning DNS-01 challenge
2020/05/30 14:03:43 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/4914613081
2020/05/30 14:03:43 [ERROR] error: one or more domains had a problem:
[my.domain.name] time limit exceeded: last error: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.my.domain.name.
 (challenge=dns-01 remaining=[])
2020/05/30 14:03:45 [ERROR] attempt 1: [my.domain.name] Obtain: [my.domain.name] error: one or more domains had a problem:
[my.domain.name] time limit exceeded: last error: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.my.domain.name.
 - retrying in 1m0s (1m4.764369248s/720h0m0s elapsed)...
2020/05/30 14:04:45 [INFO] [my.domain.name] acme: Obtaining bundled SAN certificate given a CSR
2020/05/30 14:04:45 [INFO] [my.domain.name] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/60139893
2020/05/30 14:04:45 [INFO] [my.domain.name] acme: authorization already valid; skipping challenge
2020/05/30 14:04:45 [INFO] [my.domain.name] acme: Validations succeeded; requesting certificates
2020/05/30 14:04:46 [INFO] [my.domain.name] Server responded with a certificate.
2020/05/30 14:04:46 [INFO][my.domain.name] Waiting on rate limiter...
2020/05/30 14:04:46 [INFO][my.domain.name] Done waiting
2020/05/30 14:04:46 [INFO] [my.domain.name] acme: Obtaining bundled SAN certificate given a CSR
2020/05/30 14:04:49 [INFO] [my.domain.name] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/4914644597
2020/05/30 14:04:49 [INFO] [my.domain.name] acme: Could not find solver for: tls-alpn-01
2020/05/30 14:04:49 [INFO] [my.domain.name] acme: Could not find solver for: http-01
2020/05/30 14:04:49 [INFO] [my.domain.name] acme: use dns-01 solver
2020/05/30 14:04:49 [INFO] [my.domain.name] acme: Preparing to solve DNS-01
2020/05/30 14:04:49 [INFO] [my.domain.name] acme: Trying to solve DNS-01
2020/05/30 14:04:49 [INFO] [my.domain.name] acme: Checking DNS record propagation using [192.168.1.100:53 192.168.1.74:53]
2020/05/30 14:04:49 [INFO] Wait for propagation [timeout: 1m0s, interval: 2s]
2020/05/30 14:04:49 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:04:51 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:04:53 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:04:55 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:04:57 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:04:59 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:01 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:03 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:05 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:07 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:09 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:11 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:13 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:15 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:17 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:19 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:21 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:24 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:26 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:28 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:30 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:32 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:34 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:36 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:36 http: TLS handshake error from 172.69.62.70:40746: no certificate available for 'my.domain.name'
2020/05/30 14:05:36 http: TLS handshake error from 173.245.54.167:31342: no certificate available for 'my.domain.name'
2020/05/30 14:05:38 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:40 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:42 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:44 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:46 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:48 [INFO] [my.domain.name] acme: Waiting for DNS record propagation.
2020/05/30 14:05:50 [INFO] [my.domain.name] acme: Cleaning DNS-01 challenge
2020/05/30 14:05:50 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/4914644597
2020/05/30 14:05:50 [ERROR] error: one or more domains had a problem:
[my.domain.name] time limit exceeded: last error: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.my.domain.name.
 (challenge=dns-01 remaining=[])
2020/05/30 14:05:52 [INFO][my.domain.name] Obtain: Releasing lock
2020/05/30 14:05:52 [ERROR] my.domain.name: obtaining certificate: [my.domain.name] Obtain: [my.domain.name] error: one or more domains had a problem:
[my.domain.name] time limit exceeded: last error: NS dom.ns.cloudflare.com. returned SERVFAIL for _acme-challenge.my.domain.name.
2020/05/30 14:06:02 http: TLS handshake error from 162.158.78.56:53458: no certificate available for 'my.domain.name'
2020/05/30 14:13:05 http: TLS handshake error from 162.158.187.239:44524: no certificate available for 'my.domain.name'
2020/05/30 14:13:06 http: TLS handshake error from 162.158.187.239:44962: no certificate available for 'my.domain.name'

5. What I already tried:

Iā€™ve tried this for the past 3 days and every time it fails for this one domain. Again, all the other 3 domains work just fine with a proper SSL cert. I have no idea why this one keeps failing.

How do I get it to work?

I think specific Cloudflare nameservers will throw SERVFAIL on zones for which they are not authoritative.

Are you sure your first domainā€™s nameserver is dom.ns.cloudflare.com instead of some other name? Is it on the same Cloudflare account?

The credentials seem OK, weā€™re not seeing any errors from the attempt to update the zone.

1 Like

Yes, all the subdomains are running on the same Cloudflare account. I had no issues with Caddy V1, only when I upgraded to V2 did it start having this problem.

I checked my Cloudflare account and saw this in the DNS section:

That looks correct to me. The other 3 subdomains work fine though. Any other things I can check, maybe more verbose logs on why itā€™s failing?

What is the actual domain name? Itā€™s going to be impossible to help more without knowing what the actual domain names are.

Iā€™m not comfortable sharing my domain names in public. Can I DM you (and maybe Whitestrake) with this to further help troubleshoot this? Not sure how to DM on this forum but perhaps you have another way for me to DM you.

I canā€™t offer private help for free, sorry. Your domain names are already in public transparency logs anyway.

Please read the following:

What other things can I try to try and get more information about why itā€™s failing? Would more verbose logging help?

Again, this worked fine in V1, only in V2 am I getting these errors.

If we knew your domain name, it would help because we could see what about the logic of the cert validation might be affected depending on the domain name and the DNS zone, etc.

V2ā€™s Cloudflare provider plugin is new, implemented here: GitHub - libdns/cloudflare: Cloudflare provider implementation for libdns

You can use the v1 implementation and see if it works for you, though: GitHub - caddy-dns/lego-deprecated: (DEPRECATED) DNS modules so Caddy can solve the ACME DNS challenge with over 75 providers - if that works, then knowing your domain name will be extremely helpful in knowing what between the two implementations broke, but your domain name is really the only input parameter so itā€™s kind of essential. This is why our rules require that you donā€™t redact anything (almost everyone breaks it though, making troubleshooting difficult).

I can confirm V1 works, as I have both Caddy versions on the same system, and I just swapped the versions out after a reboot. V1 loads the site without issues, although that already had an existing certificate from a few weeks / months ago, which could be why (but again, so did the other 3 subdomains, and those had no issue with V2).

Whatā€™s the best way to force V1 to force a renewal of the cert for this specific subdomain? I might have to try this to make absolutely sure V1 does work.

While I understand you want money, looking at your assessment of this problem, this seems to be more of a bug report rather than a support ticket. If there is a way to privately give you the domain so you can check whatā€™s going wrong, then Iā€™m very willing to do so, in order for it to be potentially fixed in next versions. But I want to protect my privacy on a public forum. Keep in mind that me sending this domain name privately would be no different then me placing my domain out in this forum, just in a private way, in whatever form that might be. Sure, my domain is out there somewhere for you to find but currently you wouldnā€™t be able to trace it to this username, nor email. I understand you need and want all information including domains and emails for most bug reports, but privacy often (and understandably so) gets in the way of that. Iā€™m fairly certain this is a bug, but I would need to find a way to securely or privately share my domain name with the developers.

Thatā€™s good to know, but I still need to know specifically if the lego-deprecated plugin with v2 works for you. It uses the same implementation as v1 but whether that works tells us something different than what you already said (which is that v1 itself works). I need that differential piece of information.

I believe you that v1 works ā€“ but that is not particularly relevant to this issue. As v1 is no longer being developed, I would recommend not using it anymore. (Still, you can force a ā€œrenewalā€ by deleting the certificate from disk. But make sure to switch to the staging endpoint first so that you donā€™t get rate-limited if it failsā€¦)

What I need to know next is whether the lego-deprecated plugin (for v2) works for you. If so, that alone will greatly narrow down where the problem may lie. That, plus the domain name(s), will put it almost entirely in the crosshairs. Right now your guess is as good as mine (actually yours is probably better since you have more information than I do about the issue youā€™re experiencing).

Itā€™s not that I ā€œwant moneyā€ ā€“ itā€™s that I have a social contract to uphold to both the open source community and to business clients who pay for the private support they receive. If I offer private help for free, it undermines the value of my time, which sponsors ā€“ both corporate and individual ā€“ are both supporting. I shouldnā€™t have to explain to a user of open source software that its development happens in the open, as this is a community effort. Additionally, if only I know some crucial piece of information ā€“ which is not secret ā€“ it prevents the rest of the community from collaborating on it or learning from it, which stifles the development culture of the project and suffocates the freedom of open source that much more.

After getting enough flak for several years trying to make the project sustainable through non-open distribution, I have decided that this project needs to remain open source, and as part of that, I simply canā€™t afford to help in private for free. Itā€™s not a matter of greed (though I have been accused of that before - original article is taken down but I have a copy if you want to see it), itā€™s a matter of economy: the economics of open source work simply do not work out if private development and help is offered for free, rather than as part of a collaborative effort. I cannot express how many times I have to rely on the community to pull together what I cannot.

Should we be able to identify a bug, I will be happy to fix it, in the same manner that we fix all the other bugs, in an open and documented way. Your domain name is public information already, and if it is going to be trusted by public browsers, to a public IP address, the domain name is not secret. If the domain name is not documented we cannot satisfactorily explain the fix. Even if you send it to me privately, it will probably end up in a commit message if not posted here.

If youā€™re going to withhold useful information, you gotta work with me here, meet me halfway. Myself and other contributors are putting lots of time and effort into this to get it right, but you canā€™t keep fighting us on it like this, or nothing will get accomplished, and this will be a waste of both of our time.

I hope you will not think I am greedy, and instead contribute what information you know to help us fix this, while respecting our forumā€™s rules.

As an alternative, you could provide repro instructions here that I or any other reader could follow concretely to reliably reproduce the problem. Yes, this means exact steps and domain names, but it doesnā€™t have to be the same domains youā€™re having trouble with. Yes, this takes effort, but also remember how much effort we are going to in order to write the software and implement the fix.

Another alternative is we could just skip all the hubbub and you could submit a pull request to fix the problem. Then I or another maintainer need only review it and voila, weā€™re done! And you even get to use the fix before it gets merged since you authored it. AND you get credit in the commit history and basically become a tighter member of the Caddy community ā€“ wins all around.

So, there you have several options. One of them ought to be doable for you!

2 Likes

Iā€™m in full agreement with @matt.

As strange as this might sound, I donā€™t volunteer hours and hours and hours on these forums to help anyone individually; I spend that time helping the Caddy community collectively, ensuring everyone in future can benefit from what assistance I give out being public, and that means that - in keeping with the principles of open source and public support - Iā€™m not going to volunteer to solve your problem privately. Sharing just the domain name in private isnā€™t quite the same as doing a whole support job in private, yes - but the principle is important.

Now, to address the argument of ā€œprivate support vs bug reportā€. At the moment, the bar hasnā€™t been cleared that this is a bug in Caddy. So far in this thread weā€™ve been told the issue has been confined consistently to one domain. I donā€™t have any domains that exhibit this issue and donā€™t think it would be feasible for me to start hocking more domains at the problem in the hopes one of my domains has whatever property of your domain thatā€™s specifically causing this issue.

The fact that it worked in v1 and not in v2 does not actually mean there is a bug in v2. Lets examine this a bit:

  • Caddyā€™s error indicates that your domainā€™s authoritative nameserver is returning SERVFAIL for the DNS challenge record.
  • Your authoritative namserver is ostensibly correct, according to Cloudflare.
  • Caddy has not returned any error from Cloudflareā€™s API when connecting, authenticating, or submitting the DNS record update request.

From this, we can infer a few things:

  • Cloudflare accepted the credentials provided, or silently rejected them.
  • Cloudflare accepted Caddyā€™s submitted DNS update, or silently rejected it.
  • Cloudflare did not make the updated DNS record available to a public request from Caddy.

So if Caddyā€™s behaviour with this DNS challenge module is to ensure that the record is publicly available (i.e. so LetsEncrypt would also be able to see it) before proceeding with the challenge, and Caddy never sees the updated record, it will never proceed to signal LetsEncrypt to check it, either. In normal operation, this is good, since we donā€™t want to keep asking LetsEncrypt to make checks weā€™re confident will fail.

There is one scenario I can see here that might make sense as a Caddy bug: if Cloudflare is not, in fact, returning SERVFAIL for the record and Caddy, for some reason, is mistakenly claiming that it is. Given that we have evidence from yourself that other Cloudflare domains on the same account, with the same nameserver, using the same DNS module do not encounter this issue, I am disinclined to lend much weight to this.

Quite frankly, my hunch is that this is a Cloudflare issue and they just have a problem with one of your zones. Itā€™s happened before.

Just one thing to note - I donā€™t believe Discourse makes user emails public at all. Even if you posted it here in plain text, a visitor would not be able to link your domain to a specific email address unless you shared the address, too.

2 Likes

To add to this, even admins of a Discourse forum cannot view user email addresses without something going in the logs.

This is true. There are many, many moving parts at play here. Caddy ā€“ especially the many moving parts related to an ACME challenge and DNS validation ā€“ is made of several dependencies. Weā€™re gradually transitioning to have control over those dependencies (Iā€™m doing lots of refactoring and rewriting work in the background), so thereā€™s a fair chance if the bug is in a dependency (and not an external piece of the system) then we can fix it. Thereā€™s a chance that my reimplementation of Cloudflare in libdns is buggy. Or maybe the new wrapper over libdns for Caddy 2. But right now thereā€™s too many unknowns to have any clue. Thatā€™s why we need a reproducible domain name (yours is handy, if you will reveal it), and for you to try the lego-deprecated plugin instead.

Also, @recklessnl, you can probably find some other maintainer/developer who is willing to help you privately, but again, thatā€™d be a gracious volunteering of their time to benefit only one person ā€“ and before the fix is merged, the rest of us would have to know either the one problematic domain or the class of problematic domains in order to justify the fix.

So, as @whitestrake and I have suggested, if you can find another domain name that causes the same problem, then that can also alleviate your privacy concerns.

There are way too many items to reply on here, so Iā€™m only going to reply to the on-topic parts that deal with the actual issue described in OP.

Iā€™m certainly willing to do this, but unsure how to approach this in the correct way for you to troubleshoot this. Whatā€™s the recommended way to do this? As explained in OP, I installed V2 using xcaddy, with only the Cloudflare DNS ā€˜pluginā€™. If you tell me how to re-do it with the lego-deprecated plugin for V2 I will certainly try this.

The complete caddyfile is listed in OP. I can tell you that the domain is registered at Namecheap, has a .me extension and the subdomain which is problematic (which I swapped out with my, my, my3 in the Caddyfile in OP) is chat.domain.me. The others are file.domain.me, san.domain.me, home.domain.me, and all of these except the chat subdomain work fine in V2. You have the complete log of what happens when I run caddy (my.domain.me = chat.domain.me, the problematic subdomain) in the OP.

Unfortunately I do not feel comfortable sharing the full domain name, but this is as close as Iā€™m publically willing to share.

This 1) assumes Iā€™m a coder and 2) that I understand where the issue lies and how to fix it, neither of these are true unfortunately.

Is there a specific action in my Cloudflare account that I could take in order to further test this?

Simply follow the instructions on the lego-deprecated readme. (I already linked to it above, but I see you havenā€™t even clicked it). That has all the information you need without me repeating myself again.

Welcome to open source development! This is how most of us get started with it.

2 Likes

We donā€™t know where the issue lies or how to fix it, either. Generally the best way to find out is to reproduce the issue and investigate it in place.

That means if you want help, you must help yourself in some way. Either by hiring a professional to investigate privately, or sharing something we can use to reproduce the issue ourselves and study it, or simply continuing to troubleshoot it on your own with what guidance we can provide (as you have been doing up to this point). It seems like the first two options arenā€™t on the table for you, so with the latter option we forge ahead - no worries, weā€™re more than happy to help in that way.

Iā€™d say raise a support ticket with Cloudflare - their team should be able to help you troubleshoot their service for potential issues. Unlike Caddy, as a massive enterprise with a paid service they also offer pretty good private support for their free tier, which Iā€™ve taken advantage of in the past; last time I had to contact them I was having issues with SSL negotiation to some of their proxy servers, and we got into the weeds with packet captures and some low level network troubleshooting, it was pretty nice.

Iā€™d especially recommend it in light of this:

The further information that this is all happening on a single domain, and the rest of the subdomains in the same zone are working fine is very confusing. Itā€™s probably not a zone-wide issue, then.

Tell them simply that you have an ACME client (Caddy v2) that uses their API to solve DNS challenges, and a single subdomain is producing SERVFAIL results while other subdomains are working fine. Then ask if they can provide any assistance in troubleshooting this rogue subdomain.

2 Likes

So hereā€™s an interesting update. I rebuilt Caddy 2 with the lego-deprecated plugin for V2. Rebooted the container, edited the Caddyfile for the syntax for that deprecated plugin (adding environment variables etc) - and nothing changed, same errors as before, no dice.

Then I rebuilt Caddy2 again with the new cloudflare plugin (what I originally had) and ran it again for good measure, with the original Caddyfile (posted above in this thread) and lo and beholdā€¦ it suddenly cleared the certificate! I have NO idea why, I have tried this every day for the last 7 days and only now does it clear it, after rebuilding Caddy twice.

My only guess is that re-building from scratch with xcaddy changed something? I changed nothing else. I had already created a support ticket on Cloudflare as well. Perhaps this is an issue that only pops up when upgrading from V1 to V2?

I guess this is fixed for now but I have no idea why. Iā€™ll take it though! Thanks for the pointers.

1 Like

Interesting. Itā€™s also not impossible it was some kind of transient DNS issue and timing with the xcaddy rebuild was purely coincidental.

Glad to hear itā€™s working now, though!

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.