I run a web hosting platform, and just replaced our Nginx + custom SSL setup with Caddy v2.2.1 for users who connect their own domain to our service.
Since doing this, we’ve regularly run into “too many open files” issues after about 12-18 hours of uptime. Restarting the Caddy server fixes this completely (until about 12 hours later). We’ve tried changing the file limits available to Caddy, but I’m not sure of the exact setup needed since this old machine is on Ubuntu 12.04, using Upstart instead of Systemd, etc.
For a variety of reasons, we can’t upgrade this server at the moment. So for the short term, I’m trying to figure out how to make Caddy work reliably here. Are there any configuration changes I can make to ensure Caddy isn’t keeping so many file descriptors open?
The majority of the custom domains we’re serving aren’t behind CloudFlare, but some are, yes. Do you think that’s the culprit?
I should mention that during the transition, I noticed that most sites using CloudFlare weren’t able to generate a certificate. So we started instructing CloudFlare users to switch off their proxy service, and added some entries to our Caddyfile as a stopgap, which seemed to fix connection issues. E.g.:
The TLS-ALPN challenge will always fail behind TLS termination, and the HTTP challenge will fail if the challenge request is not proxied through to Caddy. If one of the challenges can succeed, certificates won’t be a problem even behind a CDN, as long as they go through to the same Caddy cluster that initiated the challenge. There’s also the DNS challenge but this requires credentials to a provider API.
Thanks, that’s great to know! If I didn’t want to run the beta yet, is there any way to set timeouts in the Caddyfile? Or would I need to switch to JSON config?
Yeah, in my testing, it seems CloudFlare causes the HTTP challenge to fail with context deadline exceeded:
2020/12/09 19:11:15.898 tls.issuance.acme.acme_client deactivating authorization {"identifier": "behind.cloudflare.com", "authz": "https://acme-v02.api.letsencrypt.org/acme/authz-v3/1111111111", "error": "request to https://acme-v02.api.letsencrypt.org/acme/authz-v3/1111111111 failed after 1 attempts: context deadline exceeded"}