CPU usage is high

1. The problem I’m having:

I tested nginx and caddy using the same 1200 QPS frequency, but caddy’s CPU usage was too high, about twice that of nginx (nginx is 25%, caddy is 56%). I think there may be some issues with the parameters of the caddy I configured.

2. Error messages and/or full log output:

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

3. Caddy version:

v2.7.6

4. How I installed and ran Caddy:

./caddy run --config Caddyfile

a. System environment:

os is Linux
2 caddy server ,every is 1C1G
by docker start

b. Command:

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

c. Service/unit/compose file:

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

d. My complete Caddy config:

{
    log {
        level info
        output file /var/log/caddy/access.log
        format json {
            time_format iso8601
            message_key msg
        }
    }
}

test.xxxxx.com {
    tls  /etc/caddy/ssl/xxx.pem /etc/caddy/ssl/xxx.key {
        protocols tls1.2 tls1.3
        ciphers TLS_AES_128_GCM_SHA256 TLS_CHACHA20_POLY1305_SHA256 TLS_AES_256_GCM_SHA384 TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
    }
    route /robots.txt {
        header Content-Type text/plain
        respond "User-agent: *\nDisallow: *" 200
    }
    request_body {
        max_size 10MB
    }
    reverse_proxy /debug/pprof/* localhost:2019 {
    	header_up Host {upstream_hostport}
    }
    reverse_proxy {
        header_up Connection ""
        to http://gateway:7000
        header_up Host {http.request.host}
        header_up X-Real-IP {remote_host}
        header_up X-Forwarded-For {http.request.header.Get("X-Forwarded-For")}
        transport http {
            keepalive_idle_conns 200
            keepalive_idle_conns_per_host 10
            keepalive 60s
            dial_timeout 5s # 
            write_timeout 5s # 
            read_timeout 10s # 
        }
    }
    encode zstd gzip
    log {
        level info
        output file /var/log/caddy/access_gateway.log
        format json {
            time_format iso8601
            message_key msg
        }
    }
}


### 5. Links to relevant resources:
<!-- Optional, but can help get us on the same page quickly. -->
[pprof profile](https://github.com/qiannianshaonian/Bifrost/blob/master/profile1)
[details="Summary"]
This text will be hidden
[/details]
1 Like

2 Caddy servers, 1c1G
1200qps per CPU is 56%
Is it reasonable, please?

However, the memory usage is very low, below 5%. Is there any way to reduce the CPU by sacrificing memory

Does it get noticeably better if you disable gzip/zstd compression?

The caddyfile has been deleted encode zstd gzip
There is no significant decrease in CPU

1 Like

According to the PPROF analysis, CPU usage is mainly concentrated on the TLS module. Is there any other way to optimize the configuration parameters of the TLS plugin?

1 Like

Ah, that’s good. TLS is normally hardware accelerated these days but only for certain cipher suites. You customized your cipher suites, I’d have look at what the defaults are when I’m not mobile, but there’s a chance that the configured suites aren’t being hw accelerated. If you take out the config line that customizes the suites, does that help?

The caddyfile ciphers has been deleted.
There is no significant decrease in CPU too.

I haven’t seen any parameters that can be optimized for TLS related configurations from the caddyfile document.
Are there any other parameters that can be set? I would like to know if your stress test results are similar to mine. If so, I won’t be able to handle this issue in the short term.

High CPU usage in the TLS stack isn’t often normal. The profile screenshot you posted shows that it’s doing a lot of RSA decryption but it’s truncated after that. Do you have the full profile image (or at least the other half)?

I’m guessing lack of hardware acceleration – does your CPU support acceleration for the algorithms being used?

There is a profile file that I have uploaded to GitHub at the address Bifrost/profile1 at master · qiannianshaonian/Bifrost · GitHub
The other part of the screenshot is as follows

1 Like

What kind of system architecture is this running on? (CPU/OS/etc)

I will confirm the hardware acceleration and get back to you later

1 Like

os is Linux
cpu Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

Our cloud service provider is Alibaba Cloud

Caddy server running in k8s

1 Like

Thanks. I’m not sure if Go has an assembly implementation of RSA for Xeon, but then again I don’t know much about Xeon. Is it x86? I see this CL which may optimize the function but again, I’m out of my depth here. https://go-review.googlesource.com/c/go/+/481618

yes, its x86

I’m not good at this either.
we can try go 1.22 version to test it?

I don’t think it’s been merged yet. And I’m not even sure it’s the optimizations we’re looking for.

I’m not really sure what to suggest at this point other than to disable RSA cipher suites and see if that clears it up. (Hopefully clients are compatible with ECC ciphers.)

@matt relevant? crypto/tls: slow server-side handshake performance for RSA certificates without client session cache · Issue #20058 · golang/go · GitHub

3 Likes

Interesting, yeah that might be. @1437747313 Can you try an ECDSA certificate? (As opposed to changing cipher suites)

1 Like

Ah, the TLS issue has been resolved. As long as the client we are testing uses TLS 1.3, this password suite issue can be resolved. However, there are other issues that arise. From the analysis of the testing data, log. zap also takes up a considerable amount of CPU. But our log level is set to error, and the amount of logs is also very small, which confuses me in terms of consumption. As shown in the following figure.