Caddy runs out of file descriptors

Yesterday, I set up a Caddy server (latest via go get) and got some traffic.

I used LimitNOFILE=8192 in my systemd service description. After a while, I started seeing the following in my syslog:

Mar 21 16:33:57 cloud-1 caddy:  21/Mar/2017:15:33:57 +0000 [NOTICE 404 https://<an-url-on-my-website>] could not load error page: open <my-website>/404.html: too many open files

Caddy started serving 404 because it had run out of file descriptors.

It stopped doing that after it was restarted, but when I look at the open file descriptors:

> pidof caddy
26664
> sudo ls -l /proc/26664/fd
...
lrwx------ 1 root root 64 Mar 21 19:36 159 -> socket:[286147]
lrwx------ 1 root root 64 Mar 21 18:20 16 -> socket:[283793]
lrwx------ 1 root root 64 Mar 21 19:36 160 -> socket:[286149]
lrwx------ 1 root root 64 Mar 21 18:20 17 -> socket:[283794]
lrwx------ 1 root root 64 Mar 21 18:20 18 -> socket:[283796]
... (many lines removed)

There are a lot of socket descriptors more than an hour old. I am not a Linux expert, but that does not seem right?

What’s your Caddyfile?

www.mydomain.net {
	root www.mydomain.net
	gzip

	errors {
		404 404.html
		500 500.html
	}

	log ./log/www.log {
		rotate {
			size 10  # Rotate after 10 MB
			keep 10  # Keep at most 10 log files
		}
	}
}

cloud.mydomain.net {
	root cloud.mydomain.net
	gzip

	log ./log/cloud.log {
		rotate {
			size 10  # Rotate after 10 MB
			keep 10  # Keep at most 10 log files
		}
	}
}

I forgot to mention that I run Cloudflare in front of my Caddy server.

Cloudflare says that

Cloudflare maintains keep-alive connections to improve performance and reduce cost of recurring TCP connects in the request transaction as Cloudflare proxies customer traffic from its edge network to the site’s origin.

But the default timeout for Caddy should be 2 minutes, I think. I see really old file descriptors.

Anyone else having experience with Caddy + Cloudflare?

Aha: https://github.com/mholt/caddy/commit/f49e0c9b560ea7efc25c0b15d422b59f42a6edb1 (httpserver: Disable default timeouts)

https://caddyserver.com/docs/timeouts reflects the latest released version, not master (as it should).

Looks like I should set a timeout value and everything should be good. I added this:

timeouts {
    read   10s
    header 10s
    write  20s
    idle   5m
}

Let’s see if it improves.

Still, unless you’re being attacked (or clients are buggy), you likely won’t see depletion of sockets. That’s highly unusual I think.

Yeah, you are probably right.

Another thing: Caddy opens the following file and writes to it:

/home/petter/wwwroot/access.log

I don’t see access.log anywhere in my Caddyfile (which is located in /home/petter/wwwroot/). Why is it written to? It gets content identical to ./log/www.log

That’s a bug. Sigh. We’ll have to get that fixed.

Filed https://github.com/mholt/caddy/issues/1529

Perfect. Thank you!

The server has been stable since I added the timeouts. Worth trying if anyone runs into trouble with Caddy + Cloudflare.

2 Likes

I had this issue. I was proxying Gitea thru Caddy. It was running over https as well.
No matter ho many file descriptors I ulimit
d to the shell which spawned the caddy process, it was only a matter of hours before caddy went into a catatonic state due to too many open files.

It seems to be the proxy mech, as the Gitea process was unaffected and did not need to be restarted like Caddy was.

just FYI. I went back to just letting gitea handle all the web traffic directly.

What version of Caddy are you using?

Caddy 0.10.14 (non-commercial use only)

and the Caddyfile was (me.here == a made up domain, my real one was in there)

yoda.me.here:443 {
proxy / 127.0.0.1:2368 {
header_upstream Host {host}
header_upstream X-Real-IP {remote}
header_upstream X-Forwarded-Proto {scheme}
}
tls kris@me.here
}
git.me.here:443 {
tls /etc/letsencrypt/live/me.here/cert.pem /etc/letsencrypt/live/me.here/pri
vkey.pem
proxy / 127.0.0.1:3000
}
dl.me.here:443 {
root /gitea/caddy/files
filemanager /admin /gitea/caddy/files
}

git host was the one where all the open files ended up happening. the other two virtual hosts (is term right?) were not running.

Try to update to 0.11.0.

0.10.14 had a leak that was patched here:
https://github.com/mholt/caddy/commit/fe664c00ff2285a7711dbc0b2402263a0d4798c9
https://github.com/mholt/caddy/pull/2134

2 Likes

Awesome, you beat me to it. :slight_smile:

Yes, could you please try 0.11? I think it will solve the problem.