tcurdt
(Torsten Curdt)
September 24, 2025, 1:15pm
1
This is a continuation of How to debug high CPU so since the setup has not changed (see the link), I just jump straight to the debugging.
Thanks for the replies in other thread - somehow I did not get notified
It overall felt like a reporting problem back in August - but now it is having an impact.
With a peak of about 50-60 req/s I a don’t feel we should have a CPU load like this.
It’s higher than the database and more than the backends themselves.
Caddy is acting pretty much only as a reverse proxy handling the TLS termination.
Is it just me or does this look excessive?
go tool pprof cpu.prof
File: caddy
Build ID: 8a32851908a303960579c849e7121ac8eb2325d5
Type: cpu
Time: 2025-09-24 14:48:25 CEST
Duration: 30.17s, Total samples = 82.67s (274.05%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 74.88s, 90.58% of 82.67s total
Dropped 516 nodes (cum <= 0.41s)
Showing top 10 nodes out of 54
flat flat% sum% cum cum%
70.04s 84.72% 84.72% 70.04s 84.72% internal/runtime/syscall.Syscall6
2.46s 2.98% 87.70% 2.46s 2.98% runtime.futex
0.43s 0.52% 88.22% 1.08s 1.31% runtime.exitsyscall
0.36s 0.44% 88.65% 0.72s 0.87% runtime.stealWork
0.33s 0.4% 89.05% 3.32s 4.02% runtime.netpoll
0.29s 0.35% 89.40% 38.65s 46.75% crypto/tls.(*Conn).Write
0.29s 0.35% 89.75% 7.59s 9.18% runtime.findRunnable
0.27s 0.33% 90.08% 38.05s 46.03% crypto/tls.(*Conn).writeRecordLocked
0.26s 0.31% 90.40% 73.53s 88.94% io.copyBuffer
0.15s 0.18% 90.58% 0.80s 0.97% crypto/internal/fips140/aes/gcm.seal
(pprof) top20 -cum
Showing nodes accounting for 71.80s, 86.85% of 82.67s total
Dropped 516 nodes (cum <= 0.41s)
Showing top 20 nodes out of 54
flat flat% sum% cum cum%
0 0% 0% 73.53s 88.94% io.Copy (inline)
0.26s 0.31% 0.31% 73.53s 88.94% io.copyBuffer
0 0% 0.31% 71.84s 86.90% github.com/caddyserver/caddy/v2/modules/caddyhttp.(*hijackedConn).ReadFrom
0 0% 0.31% 71.84s 86.90% github.com/caddyserver/caddy/v2/modules/caddyhttp/reverseproxy.switchProtocolCopier.copyFromBackend
70.04s 84.72% 85.04% 70.04s 84.72% internal/runtime/syscall.Syscall6
0.01s 0.012% 85.05% 68.97s 83.43% internal/poll.ignoringEINTRIO (inline)
0.09s 0.11% 85.16% 68.75s 83.16% syscall.Syscall
0.10s 0.12% 85.28% 67.25s 81.35% syscall.RawSyscall6
0.29s 0.35% 85.63% 38.65s 46.75% crypto/tls.(*Conn).Write
0.27s 0.33% 85.96% 38.05s 46.03% crypto/tls.(*Conn).writeRecordLocked
0.03s 0.036% 85.99% 36.83s 44.55% net.(*conn).Write
0.04s 0.048% 86.04% 36.80s 44.51% net.(*netFD).Write
0.10s 0.12% 86.16% 36.71s 44.41% internal/poll.(*FD).Write
0.04s 0.048% 86.21% 36.31s 43.92% syscall.Write (inline)
0.04s 0.048% 86.26% 36.27s 43.87% syscall.write
0.14s 0.17% 86.43% 35.92s 43.45% crypto/tls.(*Conn).write
0.04s 0.048% 86.48% 33.30s 40.28% net.(*conn).Read
0.06s 0.073% 86.55% 33.26s 40.23% net.(*netFD).Read
0.10s 0.12% 86.67% 33.19s 40.15% internal/poll.(*FD).Read
0.15s 0.18% 86.85% 33.04s 39.97% net/http.(*readWriteCloserBody).Read
Is there a way to see what URLs contribute the most? (we don’t really have an access log atm)
How would you dig deeper?
What else would be of interest? I have this available:
curl -s "http://${CADDY_IP}:2019/debug/pprof/profile" > $DIR/cpu.prof
curl -s "http://${CADDY_IP}:2019/debug/pprof/goroutine?debug=1" > $DIR/go1.txt
#curl -s "http://${CADDY_IP}:2019/debug/pprof/goroutine?debug=2" > $DIR/go2.txt
matt
(Matt Holt)
September 26, 2025, 3:04pm
2
Can you post the profile? And yeah the goroutine dump is good too.
Also see:
This will help you troubleshoot on your own while waiting for others to help.
1 Like
tcurdt
(Torsten Curdt)
September 29, 2025, 1:42pm
3
@matt I had to wait for a more busy time …but attached is the profile
For some context: caddy has about 10-15x the cpu usage of our database.
To me it looks like an IO bottleneck - but what is that for a reverse proxy?
Any suggestion much appreciated.
https://files.vafer.org/profile.tgz
matt
(Matt Holt)
October 3, 2025, 9:13pm
4
Yeah man, that’s weird. 60 seconds spent deep in a syscall.
Something configured weirdly in your kernel? There’s a LOT of CPU time just spent writing buffers, in other words, I/O.
How’s your I/O hardware? Anything failing?
2 Likes
tcurdt
(Torsten Curdt)
October 4, 2025, 10:41am
5
It’s a NixOS server on a virtual server on Hetzner via a Hypervisor. Caddy is running in k3s.
TBH I am not even sure where to start digging with all these layer .
Sometimes I really miss bare metal.
Right now we have little traffic so it’s a bit pointless … but this is what I would look at on the host system:
nix-shell -p iotop sysstat
iostat --human
Linux 6.12.19-hardened1 (blue-c-1) 10/04/2025 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.9% 0.0% 2.6% 0.1% 0.0% 94.3%
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
sda 155.19 15.6M 220.4k 114.8k 159.3T 2.2T 1.1T
iotop -o
shows about around 160 K/s
vmstat 1
shows close to idle
sudo smartctl -a /dev/sda
all virtual
I am wondering if the syscalls could actually be a wait instead of a busy CPU?
But I guess I will have to wait for some traffic to dig deeper.
To test in this direction I added
servers {
timeouts {
read_body 10s
read_header 5s
write 30s
}
}
to tighten timeouts.
I am wondering if websockets could be the main driver for the behaviour.
But without dissecting this during high traffic times that’s just speculation.
tcurdt
(Torsten Curdt)
October 6, 2025, 9:14am
6
Under load:
avg-cpu: %user %nice %system %iowait %steal %idle
28.8% 0.0% 31.1% 0.1% 0.0% 40.0%
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 0.00 0.0k 0.00 0.0% 0.00 0.0k 69.00 312.0k 2.00 2.8% 0.96 4.5k 0.00 0.0k 0.00 0.0% 0.00 0.0k 17.00 0.24 0.07 1.1%
$ sudo iotop -o
shows more writes than expected … around 600K/s
$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st gu 7 0 0 664404 3456648 21303808 0 0 16287 219 9405 6 3 3 94 0 0 0
8 0 0 646264 3456648 21303808 0 0 0 544 54608 47362 22 30 48 0 0 0
13 0 0 641832 3456648 21304204 0 0 0 940 56520 50195 20 32 47 0 0 0
7 0 0 627060 3456648 21304108 0 0 16 1012 50743 51557 23 31 46 0 0 0
12 0 0 636836 3456648 21304152 0 0 24 492 56306 51506 22 31 47 0 0 0
12 0 0 601968 3456648 21304392 0 0 0 536 60360 51603 31 31 38 0 0 0
12 0 0 549912 3456648 21304612 0 0 24 556 56148 48272 25 33 43 0 0 0 8 0 0 578808 3456648 21304440 0 0 0 204 52973 48609 23 30 47 0 0 0
7 0 0 579988 3456648 21304548 0 0 0 436 50653 54147 21 33 46 0 0 0 14 0 0 579132 3456648 21304716 0 0 0 536 64240 55177 35 31 34 0 0 0
8 0 0 576756 3456648 21304860 0 0 0 368 55395 55853 27 30 43 0 0 0
8 0 0 560448 3456648 21305036 0 0 0 516 51363 46865 32 31 37 0 0 0
6 0 0 599976 3456648 21304584 0 0 0 308 53645 46642 35 29 36 0 0 0
6 0 0 593848 3456648 21304640 0 0 8 1068 56547 44477 20 30 49 0 0 0
6 0 0 642732 3456648 21304656 0 0 0 440 53389 49895 18 31 51 0 0 0
8 0 0 635900 3456648 21304664 0 0 0 272 54679 48894 19 31 50 0 0 0 8 0 0 633384 3456648 21304736 0 0 40 1068 56049 46807 19 31 49 0 0 0
but overall I would say…
CPU
20-35% user plus 29-33% system CPU usage feels quite busy.
Only 34-51% idle and not much breathing room.
The high system/kernel CPU matches the profiling result with the system calls.
The 6-14 processes in the run queue indicate CPU contention.
Memory
No swap usage, no memory pressure
Looks OK.
IO
Minimal disk reads.
Moderate disk writes.
Not really IO bound.
System
50k-64k interrupts/sec seems high
44k-55k context switches/sec seems too high
Any thoughts?
tcurdt
(Torsten Curdt)
October 6, 2025, 9:24am
7
The odd thing: I restart caddy and CPU drops and does not shoot back up again like I would expect if this is purely due to traffic.
matt
(Matt Holt)
October 6, 2025, 9:55pm
8
So restarting Caddy fixes the problem? The CPU doesn’t come back up again?
I don’t know anything about Nix or Hetzner’s hypervisor. Can you reproduce it without one of those abstractions?
tcurdt
(Torsten Curdt)
October 6, 2025, 10:53pm
9
CPU ramps up slowly after the restart.
If this was cause by the traffic directly I would expect it to be much quicker.
I suspect all the k3s abstractions not dealing to well with longer websockets connections - but I somehow need a way to confirm this.
Can I see connections stats in the caddy metrics somehow? I didn’t really find much in that direction.
curl -s http://$CADDY:2019/metrics | grep conn
This really wants me to move caddy closer to the metal. Still pondering how to do the share the acme certs in a good way though. All I found seems to require xcaddy (consul, redis, other dbs)
matt
(Matt Holt)
October 7, 2025, 8:44pm
10
There’s Prometheus metrics you can enable: Monitoring Caddy with Prometheus metrics — Caddy Documentation
Just embrace xcaddy (or our download page) and use plugins when you need to
system
(system)
Closed
November 6, 2025, 8:45pm
11
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.