Caddy shutdowns after 100+ concurrent requests

(Rubyj) #1

Hello,

I have deployed Caddy in production using it as a reverse proxy to asgi daphne on top of Django.

One of my workflows requires about 100+ or so concurrent POST requests to my webserver. While these requests do not seem to hog CPU and mem (I have plenty left during this period of time of the requests) My django/caddy server begins to hang, each reqeust takes longer and longer, until eventually the requests start to time out (over 5 minutes) with 499 and 502 errors and then finally the caddy server is deemed unhealthy by aws healthchecks (as all requests into it are taking forever to respond) and caddy is shutdown and rebooted.

Has anyone seen this before or know how to deal with it? I have tried to optimize my endpoint code to no avail. The issue stil occurs with 100+ concurrent POST requests.

This is my Caddyfile

0.0.0.0:2015
on startup ddtrace-run daphne peptidedb.asgi:application &

header / {
  -Server

	# be sure to plan & test before enabling
	# Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"

	Referrer-Policy "same-origin"
	X-XSS-Protection "1; mode=block"
	X-Content-Type-Options "nosniff"

	# customize for your app
	#Content-Security-Policy "connect-src 'self'; default-src 'none'; font-src 'self'; form-action 'self'; frame-ancestors 'none'; img-src data: 'self'; object-src 'self'; style-src 'self'; script-src 'self';"
	X-Frame-Options "DENY"
}

proxy / localhost:8000 {
	transparent
	websocket
	except /static
}

limits 5242880

log / stdout "{combined}"

errors stdout
(Rubyj) #2

I managed to optimize my endpoint a little further and I am seeing better results. However, I noticed that whatever first request I send to the endpoint always returns a 502 error. Then, subsequent requests work…

(Matthew Fay) #3

Just to confirm, the Caddy host isn’t running out of any resources? And you’ve tested directly against your backend and the problem doesn’t occur?

1 Like