When you perform your next test, you can go to localhost:2019/debug/pprof and capture memory and CPU profiles; that will tell you what is using CPU and memory that could be slowing things down.
Well the first thing I notice is that you are buffering both requests and responses Why? That should only be done if the backend is incapable of streaming (only certain gunicorn-style backends require this AFAIK). It allocates a ton of memory and slows things down under load, as shown in your heap profile. So remove the buffering options.
About 25-30% of the time is simply spent in system calls. In other words, your kernel is very slow! I think it’s unusually slow. Are you on shared hardware? It’s quite likely that the hypervisor is simply deprioritizing your time on the CPU to service other users while you make system calls.
If you’re on dedicated hardware, then I’d be interested in how to reproduce these results, as it seems a bit more extreme than I’m used to seeing.
I mean, if I understand your question, I would disable the buffering in the Caddy config. Just remove those lines entirely (buffer_requests and buffer_responses).
Performance improvements generally require boots to be on the ground where the battle is happening. I just don’t have the time to set everything up right now, but maybe someone does – or I could prioritize this for a business-tier sponsor. But I think this would be a fun challenge for someone enthusiastic to take up
I only put those lines because when using HTTP2/3, Caddy sends 200 without sending the PostBody to the upstream.
Why caddy is doing that?
Nginx also does that but we can fix the issue like this.
If you got some free time, please take a look at this. Or someone who is facing a similar issue like this, please post your solution. I did everything I could.
We just released v2.6.4 which deprecated the buffer_* options and replaced it with new request_buffers and response_buffers options which take a buffer size as input. You could play around with that, try different buffer sizes and see if it changes the behaviour. See reverse_proxy (Caddyfile directive) — Caddy Documentation