I’m only now noticing that your systemd config is non-standard. We ship a recommended one with our debian package, with additional details here if want to set it up manually:
It might be good to switch to using caddy run to reduce the variables at play here (most users use caddy run)
File: caddy
Type: cpu
Time: Apr 8, 2021 at 10:16pm (MDT)
Duration: 30mins, Total samples = 24.43s ( 1.36%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 14.58s, 59.68% of 24.43s total
Dropped 831 nodes (cum <= 0.12s)
Showing top 10 nodes out of 273
flat flat% sum% cum cum%
3.93s 16.09% 16.09% 4.49s 18.38% syscall.Syscall
3.14s 12.85% 28.94% 3.14s 12.85% crypto/aes.gcmAesEnc
2.01s 8.23% 37.17% 2.01s 8.23% golang.org/x/crypto/blowfish.encryptBlock
2.01s 8.23% 45.40% 2.01s 8.23% runtime.epollwait
1.57s 6.43% 51.82% 1.57s 6.43% runtime.futex
0.58s 2.37% 54.20% 0.58s 2.37% vendor/golang.org/x/crypto/curve25519.ladderstep
0.42s 1.72% 55.91% 0.42s 1.72% runtime.memmove
0.40s 1.64% 57.55% 0.40s 1.64% crypto/sha256.block
0.30s 1.23% 58.78% 4.27s 17.48% runtime.findrunnable
0.22s 0.9% 59.68% 1.30s 5.32% runtime.mallocgc
So over the 30 minute profile, it gathered 24s of CPU time, most of which were spent in system calls and AES/blowfish encryption. I also want to know the answer to Francis’ questions - what kind of hardware are you running this on (specifically)?
Sorry for late answer, there was some personal pressing matter to concern
Anyway, Hardware caddy is running on is bare-metal on premise server, HP ProLiant DL320e Gen8 model, which has cpu of Intel Xeon(R) CPU E3-1240 V2 3.40GHz 4Core 8Thread (HyperThreading), 8GB RAM.
I hope this is not related, but if it does, then… maybe I should find another way to deal with; I was okay with nginx, probably I have to switch back…?
Maybe we could jump on a video chat? Not today – probably tomorrow or something – but I can troubleshoot it with you, with you driving. (I won’t take over for liability reasons.)
I wonder if there’s something weird with the VPN connection or something else about the network. Can you reproduce the behavior in any other environment?
For the case of network, I do not have to use VPN myself; It is public IP address range with firewall enabled (because of some special environment I am in, every device within this network is given public IP address).
I cannot reproduce same behavior in any other environment (tested in aws vm, but is fast )
After discussing on a call, we determined that this is probably because Caddy (actually, Go standard library) is doing a graceful shutdown of the servers: http package - net/http - Go Packages
Shutdown gracefully shuts down the server without interrupting any active connections. Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down. If the provided context expires before the shutdown is complete, Shutdown returns the context’s error, otherwise it returns any error returned from closing the Server’s underlying Listener(s).
By default, no grace period is set, so there is no timeout. If a client holds onto connections for 20+ minutes (like if they are downloading a big file very slowly), it could cause connections to be retained. This is an ideal user experience but maybe not so ideal for servers.
The grace period is configurable in JSON: JSON Config Structure - Caddy Documentation - and I bet setting it (say, 5 to 10 seconds, maybe 30 if you really want to) should cause reloads to finish closer to that amount of time.
It is not currently configurable in the Caddyfile, which would be a nice global option to have.