ACME auto-ssl suddenly stopped working

Hi,

We are running a setup that uses the automatic SSL provisioning for a few thousand domains. Last night with no alterations to config, it seems it stops attempting to initiate a challenge.

The log just shows “Obtaining bundled SAN certificate” over and over again for a lot of different domains, with no further details.

I’ve tried to enable a higher verbosity but couldn’t find such an option.

How could we proceed to debug this issue?

2019/08/29 06:31:53 [INFO] [a.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:31:53 [INFO] [a.example] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/[redacted]
2019/08/29 06:31:53 [INFO] [a.example] acme: Could not find solver for: tls-alpn-01
2019/08/29 06:31:53 [INFO] [a.example] acme: use http-01 solver
2019/08/29 06:31:53 [INFO] [a.example] acme: Trying to solve HTTP-01
2019/08/29 06:31:59 [INFO] [a.example] The server validated our request
2019/08/29 06:31:59 [INFO] [a.example] acme: Validations succeeded; requesting certificates
2019/08/29 06:43:00 [INFO] [a.example] Server responded with a certificate.
2019/08/29 06:43:00 [INFO] [b.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:00 [INFO] [b.example] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/[redacted]
2019/08/29 06:43:00 [INFO] [b.example] acme: use tls-alpn-01 solver
2019/08/29 06:43:00 [INFO] [b.example] acme: Trying to solve TLS-ALPN-01
2019/08/29 06:43:06 [INFO] [b.example] The server validated our request
2019/08/29 06:43:06 [INFO] [b.example] acme: Validations succeeded; requesting certificates
2019/08/29 06:43:07 [INFO] [b.example] Server responded with a certificate.
2019/08/29 06:43:07 [INFO] [c.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:07 [INFO] [c.example] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/[redacted]
2019/08/29 06:43:07 [INFO] [c.example] acme: use tls-alpn-01 solver
2019/08/29 06:43:07 [INFO] [c.example] acme: Trying to solve TLS-ALPN-01
2019/08/29 06:43:12 [INFO] [c.example] The server validated our request
2019/08/29 06:43:12 [INFO] [c.example] acme: Validations succeeded; requesting certificates
2019/08/29 06:43:13 [INFO] [c.example] Server responded with a certificate.
2019/08/29 06:43:13 [INFO] [www.d.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:14 [INFO] [www.d.example] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/[redacted]
2019/08/29 06:43:14 [INFO] [www.d.example] acme: use tls-alpn-01 solver
2019/08/29 06:43:14 [INFO] [www.d.example] acme: Trying to solve TLS-ALPN-01
2019/08/29 06:43:22 [INFO] [www.d.example] The server validated our request
2019/08/29 06:43:22 [INFO] [www.d.example] acme: Validations succeeded; requesting certificates
2019/08/29 06:43:23 [INFO] [www.d.example] Server responded with a certificate.
2019/08/29 06:43:23 [INFO] [e.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:23 [INFO] [f.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:24 [INFO] [g.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:24 [INFO] [h.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:24 [INFO] [i.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:24 [INFO] [www.j.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:25 [INFO] [j.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:25 [INFO] [k.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:25 [INFO] [l.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:25 [INFO] [m.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:26 [INFO] [n.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:26 [INFO] [o.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:26 [INFO] [p.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:27 [INFO] [q.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:27 [INFO] [r.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:27 [INFO] [s.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:27 [INFO] [www.t.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:28 [INFO] [www.u.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:28 [INFO] [www.v.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:28 [INFO] [www.w.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:28 [INFO] [shop.x.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:29 [INFO] [y.example] acme: Obtaining bundled SAN certificate
2019/08/29 06:43:29 [INFO] [www.z.example] acme: Obtaining bundled SAN certificate
[...]

After this point in time, there are no further acme challenges attempts or anything else than “Obtaining bundled SAN certificate” repeated.

After I deleted around 150 domains from /root/.caddy/acme/acme-v02.api.letsencrypt.org/sites, some certificates are now being issued again (but for other domains it keeps showing only Obtaining bundled SAN certificate).

I wonder if it’s hitting some maximum.

I checked that number of file descriptors is set to something high:

ulimit -n
131072

It resumed working by itself after around 12 hours, with no change to the config. Caddy was restarted a few times, I don’t know if that had any impact.

I think it was related to a high influx of new domains, so a workaround could be to gradually migrate domains over instead of many at once.