Caddy Reverse Proxy Getting Really Slow when using TLS | Encryption Overhead?

Hiro_Hamada · January 27, 2023, 4:12am

This is a valid Caddy Configuration?

192.168.1.9:443 { 
       tls internal
       reverse_proxy http://localhost:3000 {
         buffer_requests
         buffer_responses
         flush_interval -1
         max_buffer_size 35MiB
        }
}

Upload tests timeout with caddy server · Issue #67 · openspeedtest/Speed-Test · GitHub

I am running OpenSpeedTest Container

docker run --restart=unless-stopped --name openspeedtest -d -p 3000:3000 -p 3001:3001 openspeedtest/latest

Caddy Performance

9500 for Download and 9600 for Download when using HTTP
5400 for Download and 1200 for Upload when using HTTPS

Why I am getting very slow performance when I use HTTPS?

Test without using Caddy

9500 for Download and 9600 for Download when using HTTP (3000 Docker Port)
9500 for Download and 9600 for Download when using HTTPS (3001 Docker Port)

1. Caddy version:

v2.6.2 h1

2. How I installed, and run Caddy:

Using Command
sudo apt install caddy
Posted here Install — Caddy Documentation

a. System environment:

X86 64 Debian Linux

I don’t know how to troubleshoot this issue.
I am a new Caddy user.
Learning Terminal Commands and Linux OS.

matt · January 27, 2023, 4:56am

Hi Hiro, welcome to the forum. What is your config when not using HTTPS?

Hiro_Hamada · January 27, 2023, 5:09am

192.168.1.9:80 { 
      
       reverse_proxy http://localhost:3000 {
         buffer_requests
         buffer_responses
         flush_interval -1
         max_buffer_size 35MiB
        }
}

matt · January 27, 2023, 4:38pm

When you perform your next test, you can go to localhost:2019/debug/pprof and capture memory and CPU profiles; that will tell you what is using CPU and memory that could be slowing things down.

Hiro_Hamada · January 28, 2023, 3:40am

How to view and understand Profile?
After few seconds got a file called profile.

francislavoie · January 28, 2023, 3:50am

Post it here and we’ll take a look!

Hiro_Hamada · January 28, 2023, 4:06am

It is not a text file to paste here

Hiro_Hamada · January 28, 2023, 4:13am

matt · January 28, 2023, 8:00pm

Thanks!

Well the first thing I notice is that you are buffering both requests and responses Why? That should only be done if the backend is incapable of streaming (only certain gunicorn-style backends require this AFAIK). It allocates a ton of memory and slows things down under load, as shown in your heap profile. So remove the buffering options.

Here’s the top10 for your ‘upload’ CPU profile:

(pprof) top10
Showing nodes accounting for 3410ms, 73.81% of 4620ms total
Dropped 97 nodes (cum <= 23.10ms)
Showing top 10 nodes out of 155
      flat  flat%   sum%        cum   cum%
    1430ms 30.95% 30.95%     1430ms 30.95%  runtime/internal/syscall.Syscall6
     670ms 14.50% 45.45%      670ms 14.50%  runtime.futex
     470ms 10.17% 55.63%      470ms 10.17%  runtime.memmove
     300ms  6.49% 62.12%      300ms  6.49%  crypto/aes.gcmAesDec
     130ms  2.81% 64.94%      130ms  2.81%  runtime.memclrNoHeapPointers
     120ms  2.60% 67.53%      120ms  2.60%  runtime.epollwait
      90ms  1.95% 69.48%      170ms  3.68%  runtime.stealWork
      70ms  1.52% 71.00%       70ms  1.52%  runtime.procyield
      70ms  1.52% 72.51%      360ms  7.79%  runtime.selectgo
      60ms  1.30% 73.81%       60ms  1.30%  runtime.nanotime (inline)

and for ‘download’:

(pprof) top10
Showing nodes accounting for 18.03s, 79.11% of 22.79s total
Dropped 234 nodes (cum <= 0.11s)
Showing top 10 nodes out of 130
      flat  flat%   sum%        cum   cum%
     5.79s 25.41% 25.41%      5.79s 25.41%  runtime/internal/syscall.Syscall6
     3.97s 17.42% 42.83%      3.97s 17.42%  runtime.memmove
     2.86s 12.55% 55.38%      2.86s 12.55%  crypto/aes.gcmAesEnc
     2.42s 10.62% 65.99%      2.42s 10.62%  runtime.memclrNoHeapPointers
     1.50s  6.58% 72.58%      1.50s  6.58%  runtime.futex
     0.50s  2.19% 74.77%      1.40s  6.14%  runtime.selectgo
     0.45s  1.97% 76.74%      0.45s  1.97%  runtime.procyield
     0.25s  1.10% 77.84%      0.75s  3.29%  runtime.stealWork
     0.15s  0.66% 78.50%      0.30s  1.32%  runtime.lock2
     0.14s  0.61% 79.11%      0.32s  1.40%  net/http.(*http2priorityNode).walkReadyInOrder

About 25-30% of the time is simply spent in system calls. In other words, your kernel is very slow! I think it’s unusually slow. Are you on shared hardware? It’s quite likely that the hypervisor is simply deprioritizing your time on the CPU to service other users while you make system calls.

If you’re on dedicated hardware, then I’d be interested in how to reproduce these results, as it seems a bit more extreme than I’m used to seeing.

Hiro_Hamada · January 28, 2023, 11:01pm

Yes, That was a VM. I will run this again on the same hardware without VM.

Upload Test was failing because HTTP2 and HTTP3 will not wait for the POST Body.

github.com

openspeedtest/Nginx-Configuration/blob/main/OpenSpeedTest-Server.conf#L81


      
                     add_header Access-Control-Allow-Credentials "true";

                     add_header 'Access-Control-Allow-Headers' 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Mx-ReqToken,X-Requested-With' always;

                     add_header 'Access-Control-Allow-Origin' "$http_origin" always;        

                     add_header Access-Control-Allow-Methods "GET, POST, OPTIONS" always;

                     return 204;

                         }

                     }

          

                 # IF and Only if you Enabled HTTP2 otherwise never enable the following 

                 #location = /upload {

                 # HTTP2 will return 200 withot waiting for upload to complete. it's smart but we don't need that to happen here when testing upload speed on HTTP2.    

                 #proxy_pass http://127.0.0.1:80/upload.bin;

                 #}

          

                    

          #Caching for Static Files,

          location ~* ^.+\.(?:css|cur|js|jpe?g|gif|htc|ico|png|html|xml|otf|ttf|eot|woff|woff2|svg)$ {

             access_log off;

             expires 365d;

             add_header Cache-Control public;

             add_header Vary Accept-Encoding;

That is why they told me to add buffer_requests & buffer_responses to avoid above issue.

That fixed the Upload Problem, Caddy Performed very well for HTTP TEST (ScreenCast), Observed this issue only on HTTPS.

How can we avoid Buffering and emulate Nginx ‘proxy_pass’ like effect?

Edit
Tested without a VM.
Results are the same.
caddy with TSL on the Left, and OST Docker with TSL on the RIGHT

Hiro_Hamada · February 2, 2023, 5:07am

@matt @francislavoie Any ideas?

matt · February 2, 2023, 7:29am

Sorry, been traveling for a few days – I will try to look at this when I have a chance, it just may not be soon.

Hiro_Hamada · February 17, 2023, 10:54pm

Any ideas to solve this? Hope you come back after travel. @matt @francislavoie

matt · February 18, 2023, 4:17am

I mean, if I understand your question, I would disable the buffering in the Caddy config. Just remove those lines entirely (buffer_requests and buffer_responses).

Performance improvements generally require boots to be on the ground where the battle is happening. I just don’t have the time to set everything up right now, but maybe someone does – or I could prioritize this for a business-tier sponsor. But I think this would be a fun challenge for someone enthusiastic to take up

Hiro_Hamada · February 18, 2023, 5:52am

I only put those lines because when using HTTP2/3, Caddy sends 200 without sending the PostBody to the upstream.

Why caddy is doing that?

Nginx also does that but we can fix the issue like this.

github.com

openspeedtest/Nginx-Configuration/blob/main/OpenSpeedTest-Server.conf#L79


      
          

               if ($request_method = OPTIONS ) {

               add_header Access-Control-Allow-Credentials "true";

               add_header 'Access-Control-Allow-Headers' 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Mx-ReqToken,X-Requested-With' always;

               add_header 'Access-Control-Allow-Origin' "$http_origin" always;        

               add_header Access-Control-Allow-Methods "GET, POST, OPTIONS" always;

               return 204;

                   }

               }

          

          #HTTP2 & HTTP3 will not wait for the post body and return 200. We need to stop that behaviour.

          #Otherwise, you will see abnormal upload speed. To fix this issue, Enable the following lines. (Only Applicable If you Enabled HTTP2 or HTTP3 in This Server.)

           

          #HTTP2 & HTTP3 -> UPLOAD FIX -- START

          

          #location = /upload {

          #proxy_pass http://127.0.0.1:3000/dev-null;

          #}

          #location = /dev-null {

          #return 200;

          #}

If you got some free time, please take a look at this. Or someone who is facing a similar issue like this, please post your solution. I did everything I could.

Setup is very easy,

Install docker and run

sudo docker run --restart=unless-stopped --name openspeedtest -d -p 3000:3000 -p 3001:3001 openspeedtest/latest

That is ist.

francislavoie · February 18, 2023, 5:58am

We just released v2.6.4 which deprecated the buffer_* options and replaced it with new request_buffers and response_buffers options which take a buffer size as input. You could play around with that, try different buffer sizes and see if it changes the behaviour. See reverse_proxy (Caddyfile directive) — Caddy Documentation

system · March 20, 2023, 5:59am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.