Caddy Reverse Proxy Getting Really Slow when using TLS | Encryption Overhead?

This is a valid Caddy Configuration?

192.168.1.9:443 { 
       tls internal
       reverse_proxy http://localhost:3000 {
         buffer_requests
         buffer_responses
         flush_interval -1
         max_buffer_size 35MiB
        }
}

Upload tests timeout with caddy server · Issue #67 · openspeedtest/Speed-Test · GitHub

I am running OpenSpeedTest Container

docker run --restart=unless-stopped --name openspeedtest -d -p 3000:3000 -p 3001:3001 openspeedtest/latest

Caddy Performance

  • 9500 for Download and 9600 for Download when using HTTP

  • 5400 for Download and 1200 for Upload when using HTTPS

Why I am getting very slow performance when I use HTTPS?

Test without using Caddy

  • 9500 for Download and 9600 for Download when using HTTP (3000 Docker Port)

  • 9500 for Download and 9600 for Download when using HTTPS (3001 Docker Port)

1. Caddy version:

v2.6.2 h1

2. How I installed, and run Caddy:

Using Command
sudo apt install caddy
Posted here Install — Caddy Documentation

a. System environment:

X86 64 Debian Linux

I don’t know how to troubleshoot this issue.
I am a new Caddy user.
Learning Terminal Commands and Linux OS.

Hi Hiro, welcome to the forum. What is your config when not using HTTPS?

192.168.1.9:80 { 
      
       reverse_proxy http://localhost:3000 {
         buffer_requests
         buffer_responses
         flush_interval -1
         max_buffer_size 35MiB
        }
}

When you perform your next test, you can go to localhost:2019/debug/pprof and capture memory and CPU profiles; that will tell you what is using CPU and memory that could be slowing things down.

How to view and understand Profile?
After few seconds got a file called profile.

Post it here and we’ll take a look!

It is not a text file to paste here

Thanks!

Well the first thing I notice is that you are buffering both requests and responses :open_mouth: Why? That should only be done if the backend is incapable of streaming (only certain gunicorn-style backends require this AFAIK). It allocates a ton of memory and slows things down under load, as shown in your heap profile. So remove the buffering options.

Here’s the top10 for your ‘upload’ CPU profile:

(pprof) top10
Showing nodes accounting for 3410ms, 73.81% of 4620ms total
Dropped 97 nodes (cum <= 23.10ms)
Showing top 10 nodes out of 155
      flat  flat%   sum%        cum   cum%
    1430ms 30.95% 30.95%     1430ms 30.95%  runtime/internal/syscall.Syscall6
     670ms 14.50% 45.45%      670ms 14.50%  runtime.futex
     470ms 10.17% 55.63%      470ms 10.17%  runtime.memmove
     300ms  6.49% 62.12%      300ms  6.49%  crypto/aes.gcmAesDec
     130ms  2.81% 64.94%      130ms  2.81%  runtime.memclrNoHeapPointers
     120ms  2.60% 67.53%      120ms  2.60%  runtime.epollwait
      90ms  1.95% 69.48%      170ms  3.68%  runtime.stealWork
      70ms  1.52% 71.00%       70ms  1.52%  runtime.procyield
      70ms  1.52% 72.51%      360ms  7.79%  runtime.selectgo
      60ms  1.30% 73.81%       60ms  1.30%  runtime.nanotime (inline)

and for ‘download’:

(pprof) top10
Showing nodes accounting for 18.03s, 79.11% of 22.79s total
Dropped 234 nodes (cum <= 0.11s)
Showing top 10 nodes out of 130
      flat  flat%   sum%        cum   cum%
     5.79s 25.41% 25.41%      5.79s 25.41%  runtime/internal/syscall.Syscall6
     3.97s 17.42% 42.83%      3.97s 17.42%  runtime.memmove
     2.86s 12.55% 55.38%      2.86s 12.55%  crypto/aes.gcmAesEnc
     2.42s 10.62% 65.99%      2.42s 10.62%  runtime.memclrNoHeapPointers
     1.50s  6.58% 72.58%      1.50s  6.58%  runtime.futex
     0.50s  2.19% 74.77%      1.40s  6.14%  runtime.selectgo
     0.45s  1.97% 76.74%      0.45s  1.97%  runtime.procyield
     0.25s  1.10% 77.84%      0.75s  3.29%  runtime.stealWork
     0.15s  0.66% 78.50%      0.30s  1.32%  runtime.lock2
     0.14s  0.61% 79.11%      0.32s  1.40%  net/http.(*http2priorityNode).walkReadyInOrder

About 25-30% of the time is simply spent in system calls. In other words, your kernel is very slow! I think it’s unusually slow. Are you on shared hardware? It’s quite likely that the hypervisor is simply deprioritizing your time on the CPU to service other users while you make system calls.

If you’re on dedicated hardware, then I’d be interested in how to reproduce these results, as it seems a bit more extreme than I’m used to seeing.

Yes, That was a VM. I will run this again on the same hardware without VM.

Upload Test was failing because HTTP2 and HTTP3 will not wait for the POST Body.

That is why they told me to add buffer_requests & buffer_responses to avoid above issue.

That fixed the Upload Problem, Caddy Performed very well for HTTP TEST (ScreenCast), Observed this issue only on HTTPS.

How can we avoid Buffering and emulate Nginx ‘proxy_pass’ like effect?

Edit
Tested without a VM.
Results are the same.
caddy with TSL on the Left, and OST Docker with TSL on the RIGHT

@matt @francislavoie Any ideas?

Sorry, been traveling for a few days – I will try to look at this when I have a chance, it just may not be soon.

1 Like

Any ideas to solve this? Hope you come back after travel. @matt @francislavoie

I mean, if I understand your question, I would disable the buffering in the Caddy config. Just remove those lines entirely (buffer_requests and buffer_responses).

Performance improvements generally require boots to be on the ground where the battle is happening. I just don’t have the time to set everything up right now, but maybe someone does – or I could prioritize this for a business-tier sponsor. But I think this would be a fun challenge for someone enthusiastic to take up :smiley:

I only put those lines because when using HTTP2/3, Caddy sends 200 without sending the PostBody to the upstream.

Why caddy is doing that?

Nginx also does that but we can fix the issue like this.

If you got some free time, please take a look at this. Or someone who is facing a similar issue like this, please post your solution. I did everything I could.

Setup is very easy,

Install docker and run

sudo docker run --restart=unless-stopped --name openspeedtest -d -p 3000:3000 -p 3001:3001 openspeedtest/latest

That is ist.

We just released v2.6.4 which deprecated the buffer_* options and replaced it with new request_buffers and response_buffers options which take a buffer size as input. You could play around with that, try different buffer sizes and see if it changes the behaviour. See reverse_proxy (Caddyfile directive) — Caddy Documentation

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.