How to debug high CPU (part2)

This is a continuation of How to debug high CPU so since the setup has not changed (see the link), I just jump straight to the debugging.

Thanks for the replies in other thread - somehow I did not get notified :thinking:
It overall felt like a reporting problem back in August - but now it is having an impact.

With a peak of about 50-60 req/s I a don’t feel we should have a CPU load like this.
It’s higher than the database and more than the backends themselves.

Caddy is acting pretty much only as a reverse proxy handling the TLS termination.

Is it just me or does this look excessive?

 go tool pprof cpu.prof
File: caddy
Build ID: 8a32851908a303960579c849e7121ac8eb2325d5
Type: cpu
Time: 2025-09-24 14:48:25 CEST
Duration: 30.17s, Total samples = 82.67s (274.05%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 74.88s, 90.58% of 82.67s total
Dropped 516 nodes (cum <= 0.41s)
Showing top 10 nodes out of 54
      flat  flat%   sum%        cum   cum%
    70.04s 84.72% 84.72%     70.04s 84.72%  internal/runtime/syscall.Syscall6
     2.46s  2.98% 87.70%      2.46s  2.98%  runtime.futex
     0.43s  0.52% 88.22%      1.08s  1.31%  runtime.exitsyscall
     0.36s  0.44% 88.65%      0.72s  0.87%  runtime.stealWork
     0.33s   0.4% 89.05%      3.32s  4.02%  runtime.netpoll
     0.29s  0.35% 89.40%     38.65s 46.75%  crypto/tls.(*Conn).Write
     0.29s  0.35% 89.75%      7.59s  9.18%  runtime.findRunnable
     0.27s  0.33% 90.08%     38.05s 46.03%  crypto/tls.(*Conn).writeRecordLocked
     0.26s  0.31% 90.40%     73.53s 88.94%  io.copyBuffer
     0.15s  0.18% 90.58%      0.80s  0.97%  crypto/internal/fips140/aes/gcm.seal
(pprof) top20 -cum
Showing nodes accounting for 71.80s, 86.85% of 82.67s total
Dropped 516 nodes (cum <= 0.41s)
Showing top 20 nodes out of 54
      flat  flat%   sum%        cum   cum%
         0     0%     0%     73.53s 88.94%  io.Copy (inline)
     0.26s  0.31%  0.31%     73.53s 88.94%  io.copyBuffer
         0     0%  0.31%     71.84s 86.90%  github.com/caddyserver/caddy/v2/modules/caddyhttp.(*hijackedConn).ReadFrom
         0     0%  0.31%     71.84s 86.90%  github.com/caddyserver/caddy/v2/modules/caddyhttp/reverseproxy.switchProtocolCopier.copyFromBackend
    70.04s 84.72% 85.04%     70.04s 84.72%  internal/runtime/syscall.Syscall6
     0.01s 0.012% 85.05%     68.97s 83.43%  internal/poll.ignoringEINTRIO (inline)
     0.09s  0.11% 85.16%     68.75s 83.16%  syscall.Syscall
     0.10s  0.12% 85.28%     67.25s 81.35%  syscall.RawSyscall6
     0.29s  0.35% 85.63%     38.65s 46.75%  crypto/tls.(*Conn).Write
     0.27s  0.33% 85.96%     38.05s 46.03%  crypto/tls.(*Conn).writeRecordLocked
     0.03s 0.036% 85.99%     36.83s 44.55%  net.(*conn).Write
     0.04s 0.048% 86.04%     36.80s 44.51%  net.(*netFD).Write
     0.10s  0.12% 86.16%     36.71s 44.41%  internal/poll.(*FD).Write
     0.04s 0.048% 86.21%     36.31s 43.92%  syscall.Write (inline)
     0.04s 0.048% 86.26%     36.27s 43.87%  syscall.write
     0.14s  0.17% 86.43%     35.92s 43.45%  crypto/tls.(*Conn).write
     0.04s 0.048% 86.48%     33.30s 40.28%  net.(*conn).Read
     0.06s 0.073% 86.55%     33.26s 40.23%  net.(*netFD).Read
     0.10s  0.12% 86.67%     33.19s 40.15%  internal/poll.(*FD).Read
     0.15s  0.18% 86.85%     33.04s 39.97%  net/http.(*readWriteCloserBody).Read

Is there a way to see what URLs contribute the most? (we don’t really have an access log atm)

How would you dig deeper?

What else would be of interest? I have this available:

curl -s "http://${CADDY_IP}:2019/debug/pprof/profile" > $DIR/cpu.prof
curl -s "http://${CADDY_IP}:2019/debug/pprof/goroutine?debug=1" > $DIR/go1.txt
#curl -s "http://${CADDY_IP}:2019/debug/pprof/goroutine?debug=2" > $DIR/go2.txt

Can you post the profile? And yeah the goroutine dump is good too.

Also see:

This will help you troubleshoot on your own while waiting for others to help.

1 Like

@matt I had to wait for a more busy time …but attached is the profile

For some context: caddy has about 10-15x the cpu usage of our database.
To me it looks like an IO bottleneck - but what is that for a reverse proxy?

Any suggestion much appreciated.

https://files.vafer.org/profile.tgz

Yeah man, that’s weird. 60 seconds spent deep in a syscall.

Something configured weirdly in your kernel? There’s a LOT of CPU time just spent writing buffers, in other words, I/O.

How’s your I/O hardware? Anything failing?

2 Likes

It’s a NixOS server on a virtual server on Hetzner via a Hypervisor. Caddy is running in k3s.
TBH I am not even sure where to start digging with all these layer :face_with_peeking_eye:.

Sometimes I really miss bare metal.

Right now we have little traffic so it’s a bit pointless … but this is what I would look at on the host system:

nix-shell -p iotop sysstat

iostat --human
Linux 6.12.19-hardened1 (blue-c-1)      10/04/2025      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.9%    0.0%    2.6%    0.1%    0.0%   94.3%

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sda             155.19        15.6M       220.4k       114.8k     159.3T       2.2T       1.1T

iotop -o
shows about around 160 K/s

vmstat 1
shows close to idle

sudo smartctl -a /dev/sda
all virtual

I am wondering if the syscalls could actually be a wait instead of a busy CPU?
But I guess I will have to wait for some traffic to dig deeper.

To test in this direction I added

      servers {
        timeouts {
          read_body 10s
          read_header 5s
          write 30s
        }
      }

to tighten timeouts.

I am wondering if websockets could be the main driver for the behaviour.
But without dissecting this during high traffic times that’s just speculation.

Under load:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          28.8%    0.0%   31.1%    0.1%    0.0%   40.0%

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.0k     0.00   0.0%    0.00     0.0k   69.00    312.0k     2.00   2.8%    0.96     4.5k    0.00      0.0k     0.00   0.0%    0.00     0.0k   17.00    0.24    0.07   1.1%
$ sudo iotop -o

shows more writes than expected … around 600K/s

$ vmstat 1                                                                                                                                                         procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu                                                                                              7  0      0 664404 3456648 21303808    0    0 16287   219 9405    6  3  3 94  0  0  0
 8  0      0 646264 3456648 21303808    0    0     0   544 54608 47362 22 30 48  0  0  0
13  0      0 641832 3456648 21304204    0    0     0   940 56520 50195 20 32 47  0  0  0
 7  0      0 627060 3456648 21304108    0    0    16  1012 50743 51557 23 31 46  0  0  0
12  0      0 636836 3456648 21304152    0    0    24   492 56306 51506 22 31 47  0  0  0
12  0      0 601968 3456648 21304392    0    0     0   536 60360 51603 31 31 38  0  0  0
12  0      0 549912 3456648 21304612    0    0    24   556 56148 48272 25 33 43  0  0  0                                                                                         8  0      0 578808 3456648 21304440    0    0     0   204 52973 48609 23 30 47  0  0  0
 7  0      0 579988 3456648 21304548    0    0     0   436 50653 54147 21 33 46  0  0  0                                                                                        14  0      0 579132 3456648 21304716    0    0     0   536 64240 55177 35 31 34  0  0  0
 8  0      0 576756 3456648 21304860    0    0     0   368 55395 55853 27 30 43  0  0  0
 8  0      0 560448 3456648 21305036    0    0     0   516 51363 46865 32 31 37  0  0  0
 6  0      0 599976 3456648 21304584    0    0     0   308 53645 46642 35 29 36  0  0  0
 6  0      0 593848 3456648 21304640    0    0     8  1068 56547 44477 20 30 49  0  0  0
 6  0      0 642732 3456648 21304656    0    0     0   440 53389 49895 18 31 51  0  0  0
 8  0      0 635900 3456648 21304664    0    0     0   272 54679 48894 19 31 50  0  0  0                                                                                         8  0      0 633384 3456648 21304736    0    0    40  1068 56049 46807 19 31 49  0  0  0

but overall I would say…

CPU

20-35% user plus 29-33% system CPU usage feels quite busy.
Only 34-51% idle and not much breathing room.

The high system/kernel CPU matches the profiling result with the system calls.
The 6-14 processes in the run queue indicate CPU contention.

Memory

No swap usage, no memory pressure
Looks OK.

IO

Minimal disk reads.
Moderate disk writes.
Not really IO bound.

System

50k-64k interrupts/sec seems high
44k-55k context switches/sec seems too high

Any thoughts?

The odd thing: I restart caddy and CPU drops and does not shoot back up again like I would expect if this is purely due to traffic.

So restarting Caddy fixes the problem? The CPU doesn’t come back up again?

I don’t know anything about Nix or Hetzner’s hypervisor. Can you reproduce it without one of those abstractions?

CPU ramps up slowly after the restart.
If this was cause by the traffic directly I would expect it to be much quicker.

I suspect all the k3s abstractions not dealing to well with longer websockets connections - but I somehow need a way to confirm this.

Can I see connections stats in the caddy metrics somehow? I didn’t really find much in that direction.

curl -s http://$CADDY:2019/metrics | grep conn

This really wants me to move caddy closer to the metal. Still pondering how to do the share the acme certs in a good way though. All I found seems to require xcaddy (consul, redis, other dbs)

There’s Prometheus metrics you can enable: Monitoring Caddy with Prometheus metrics — Caddy Documentation

Just embrace xcaddy (or our download page) and use plugins when you need to :+1:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.