Caddy DigitalOcean DNS can't renew


(Joel) #1

I have a few domains which are served by caddy. Some are run using the abiosoft docker image and some are run directly on the host (in various VMs). Recently all of them stopped being able to renew their LetsEncrypt certificates with errors like this in the log (I’ve redacted the AuthUrl parts because I’m not sure if that is private at all):

Oct 07 10:53:02 plex systemd[1]: Started Caddy HTTP/2 web server.
Oct 07 10:53:02 plex caddy[14360]: Activating privacy features... 2018/10/07 10:53:02 [INFO] Certificate for [plex.home.joelnb.co.uk] expires in 461h25m35.929062756s; att
Oct 07 10:53:02 plex caddy[14360]: 2018/10/07 10:53:02 [INFO][plex.home.joelnb.co.uk] acme: Trying renewal with 461 hours remaining
Oct 07 10:53:02 plex caddy[14360]: 2018/10/07 10:53:02 [INFO][plex.home.joelnb.co.uk] acme: Obtaining bundled SAN certificate
Oct 07 10:53:03 plex caddy[14360]: 2018/10/07 10:53:03 [INFO][plex.home.joelnb.co.uk] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/...
Oct 07 10:53:03 plex caddy[14360]: 2018/10/07 10:53:03 [INFO][plex.home.joelnb.co.uk] acme: Could not find solver for: http-01
Oct 07 10:53:03 plex caddy[14360]: 2018/10/07 10:53:03 [INFO][plex.home.joelnb.co.uk] acme: Could not find solver for: tls-alpn-01
Oct 07 10:53:03 plex caddy[14360]: 2018/10/07 10:53:03 [INFO][plex.home.joelnb.co.uk] acme: Trying to solve DNS-01
Oct 07 10:53:03 plex caddy[14360]: 2018/10/07 10:53:03 [ERROR] Renewing [plex.home.joelnb.co.uk]: acme: Error -> One or more domains had a problem:
Oct 07 10:53:03 plex caddy[14360]: [plex.home.joelnb.co.uk] Error presenting token: HTTP 404: not_found: The resource you were accessing could not be found.
Oct 07 10:53:03 plex caddy[14360]: ; trying again in 10s
Oct 07 10:53:13 plex caddy[14360]: 2018/10/07 10:53:13 [INFO][plex.home.joelnb.co.uk] acme: Trying renewal with 461 hours remaining
Oct 07 10:53:13 plex caddy[14360]: 2018/10/07 10:53:13 [INFO][plex.home.joelnb.co.uk] acme: Obtaining bundled SAN certificate
Oct 07 10:53:14 plex caddy[14360]: 2018/10/07 10:53:14 [INFO][plex.home.joelnb.co.uk] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/...
Oct 07 10:53:14 plex caddy[14360]: 2018/10/07 10:53:14 [INFO][plex.home.joelnb.co.uk] acme: Could not find solver for: http-01
Oct 07 10:53:14 plex caddy[14360]: 2018/10/07 10:53:14 [INFO][plex.home.joelnb.co.uk] acme: Could not find solver for: tls-alpn-01
Oct 07 10:53:14 plex caddy[14360]: 2018/10/07 10:53:14 [INFO][plex.home.joelnb.co.uk] acme: Trying to solve DNS-01
Oct 07 10:53:14 plex caddy[14360]: 2018/10/07 10:53:14 [ERROR] Renewing [plex.home.joelnb.co.uk]: acme: Error -> One or more domains had a problem:
Oct 07 10:53:14 plex caddy[14360]: [plex.home.joelnb.co.uk] Error presenting token: HTTP 404: not_found: The resource you were accessing could not be found.
Oct 07 10:53:14 plex caddy[14360]: ; trying again in 10s
Oct 07 10:53:24 plex caddy[14360]: 2018/10/07 10:53:24 [ERROR] too many renewal attempts; last error: acme: Error -> One or more domains had a problem:
Oct 07 10:53:24 plex caddy[14360]: [plex.home.joelnb.co.uk] Error presenting token: HTTP 404: not_found: The resource you were accessing could not be found.
Oct 07 10:53:24 plex caddy[14360]: done.
Oct 07 10:53:24 plex caddy[14360]: https://plex.home.joelnb.co.uk
Oct 07 10:53:24 plex caddy[14360]: 2018/10/07 10:53:24 https://plex.home.joelnb.co.uk
Oct 07 10:53:24 plex caddy[14360]: http://plex.home.joelnb.co.uk
Oct 07 10:53:24 plex caddy[14360]: 2018/10/07 10:53:24 http://plex.home.joelnb.co.uk
Oct 07 10:53:41 plex caddy[14360]: 192.168.2.150 - - [07/Oct/2018:10:53:41 +0000] "GET / HTTP/2.0" 401 157

The simplest Caddyfile I am using which exhibits this issue is:

(logging) {
    log stdout
    errors stdout
}

plex.home.joelnb.co.uk {
    proxy / 127.0.0.1:32400 {
        transparent
    }

    import logging

    tls myemail@gmail.com {
        dns digitalocean
    }
}

The DO_AUTH_TOKEN is exported via a systemd dropin and I don’t think the issue is with that - I assume there would be a more explicit error message & obviously the sites were able to get a cert in the first place.

Searching the error message led me to various DigitalOcean API issues (none of which seemed relevant) so I assume the issue is with an API call there but I’m unsure how to find out in more detail. I’m happy to delve into the code if I have to but just wondering if there is a simpler way to find out exacty what is going wrong before I do that.

Any pointers would be appreciated!


(Matt Holt) #2

It does seem like an issue with the API; has their API had any breaking changes or outages recently?


(Matthew Fay) #3

I thought DNS solver errors were meant to prefix the provider to the error message: https://github.com/xenolf/lego/blob/c09b12be08f08c58d9db082da7246e897995aa10/providers/dns/digitalocean/digitalocean.go#L108


(Joel) #4

Well I guess it would be worth me going through that file & trying to see exactly where the error comes from. I’ll try and do that later today and let you know what I find.

@matt I had a look at their API changelog and nothing really seems relevant to this situation.


(Matt Holt) #5

I believe that is from a newer commit than what is used by Caddy; almost certainly it’s an API error.


(Joel) #6

I think they have probably changed their API to make it stricter. Previously I had only defined the domain joelnb.co.uk through their web interface & most of my caddy sites are under *.home.joelnb.co.uk. When I set them up they were working fine and the records could be added. That seems to have changed because I was able to verify that a request to https://api.digitalocean.com/v2/domains/home.joelnb.co.uk/records would return a 404 while https://api.digitalocean.com/v2/domains/joelnb.co.uk/records would work (it was fine for the domain to be under home.joelnb.co.uk as long as that was not used in the URL).

It seems I have been able to solve it by also creating home.joelnb.co.uk as a separate domain through the GUI and then cleaning up some records created by caddy. The records now get created under the subdomain and caddy is able to renew certificates as required.

I guess it’s possible that it’s caddy (or the underlying libraries it uses) which have changed but as I’m not 100% sure if the version I am using changed at all since it was working that would be hard to track down.

Either way I hope this can be useful for anyone else experiencing the same behaviour.