1. The problem I’m having:
Caddy was oom-kill’ed due to a lot of traffic. I’m looking to tune Caddy to prevent this in the future.
I’m on a small 1GB Linode server, and I have other things running on there. Systemd killed Caddy when it got up to ~450MB in size (I’ve included the syslog dump of it getting killed below). I don’t track exact stats, but when it died, I’m guessing I had around 2000-4000 people actively viewing the webpage.
On that page, people will keep it open, and some JS will update the contents of their browser once a minute (all around the same time, but I try to add some smearing). The JS will make 2 calls, both just reverse_proxy to other apps (to domain2.example.com
and tracker.domain2.example.com
.
I’ve since added:
Restart=on-failure
RestartSec=10s
to my caddy.service file, so oom-kill won’t take my server down totally.
What I’m looking for: when I’m getting a lot of traffic to reverse_proxy sites that cannot be cached, what is the best way to reduce memory usage on Caddy?
I was previously using nginx, and never had memory issues with similar traffic patterns on this server. Is Caddy’s memory profile just different enough where I’ll hit these issues? Can I do anything to prevent it?
2. Error messages and/or full log output:
containerd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-999
CPU: 0 PID: 216019 Comm: containerd Not tainted 6.3.0-1-amd64 #1 Debian 6.3.7-1
Hardware name: Linode Compute Instance/Standard PC (Q35 + ICH9, 2009), BIOS Not Specified
Mem-Info:
active_anon:80218 inactive_anon:87798 isolated_anon:0
active_file:13 inactive_file:50 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:11108 slab_unreclaimable:32726
mapped:1562 shmem:2033 pagetables:2564
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:12618 free_pcp:62 free_cma:0
Node 0 active_anon:320872kB inactive_anon:351192kB active_file:52kB inactive_file:200kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:6248kB dirty:0kB writeback:0kB shmem:8132kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 43008kB writeback_tmp:0kB kernel_stack:7456kB pagetables:10256kB sec_pagetables:0kB all_unreclaimable? yes
Node 0 DMA free:4372kB boost:0kB min:728kB low:908kB high:1088kB reserved_highatomic:0KB active_anon:5384kB inactive_anon:2460kB active_file:8kB inactive_file:156kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 908 908 908 908
Node 0 DMA32 free:46100kB boost:0kB min:44324kB low:55404kB high:66484kB reserved_highatomic:2048KB active_anon:315488kB inactive_anon:348732kB active_file:48kB inactive_file:36kB unevictable:0kB writepending:0kB present:1032040kB managed:968432kB mlocked:0kB bounce:0kB free_pcp:248kB local_pcp:248kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 17*4kB (ME) 22*8kB (ME) 14*16kB (ME) 12*32kB (UE) 3*64kB (E) 10*128kB (UME) 2*256kB (UM) 1*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 4372kB
Node 0 DMA32: 3117*4kB (ME) 968*8kB (UME) 614*16kB (UME) 258*32kB (UME) 122*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46100kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2236 total pagecache pages
143 pages in swap cache
Free swap = 0kB
Total swap = 524284kB
262008 pages RAM
0 pages HighMem/MovableOnly
16060 pages reserved
0 pages hwpoisoned
Tasks state (memory values in pages):
[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 412] 104 412 2393 64 61440 224 -900 dbus-daemon
[ 439] 0 439 197052 13440 270336 5887 0 myapp
[ 445] 0 445 4421 64 73728 448 0 systemd-logind
[ 533] 0 533 27743 55 114688 2880 0 unattended-upgr
[ 535] 0 535 1468 64 45056 32 0 agetty
[ 537] 0 537 1374 64 57344 0 0 agetty
[ 122749] 997 122749 324476 112442 1363968 30080 0 caddy
[ 179366] 0 179366 20570 76 172032 231 -250 systemd-journal
[ 179442] 101 179442 22532 64 77824 224 0 systemd-timesyn
[ 179480] 0 179480 6366 53 69632 320 -1000 systemd-udevd
[ 200786] 0 200786 10727 58 69632 160 0 master
[ 200788] 108 200788 10877 32 77824 160 0 qmgr
[ 200815] 108 200815 12262 32 86016 352 0 tlsmgr
[ 211233] 0 211233 370750 4315 438272 3360 -500 dockerd
[ 211276] 0 211276 289092 1745 167936 1114 -500 docker-proxy
[ 211292] 0 211292 307669 2233 192512 1424 -500 docker-proxy
[ 211325] 0 211325 180190 544 110592 192 -998 containerd-shim
[ 211345] 0 211345 180190 510 110592 192 -998 containerd-shim
[ 211358] 0 211358 180126 543 110592 1214 -998 containerd-shim
[ 211374] 0 211374 180126 544 110592 1181 -998 containerd-shim
[ 211414] 101 211414 14041 64 147456 160 0 exim
[ 211421] 1000 211421 363823 17477 905216 35596 0 beam.smp
[ 211437] 70 211437 42561 96 122880 384 0 postgres
[ 211445] 101 211445 793595 9467 2654208 16457 0 clickhouse-serv
[ 211721] 70 211721 42593 96 135168 416 0 postgres
[ 211722] 70 211722 42578 96 118784 384 0 postgres
[ 211723] 70 211723 42570 96 110592 384 0 postgres
[ 211724] 70 211724 42744 160 122880 448 0 postgres
[ 211725] 70 211725 6274 96 94208 384 0 postgres
[ 211727] 70 211727 42700 96 110592 448 0 postgres
[ 215915] 1000 215915 221 32 36864 0 0 epmd
[ 215919] 1000 215919 200 32 45056 0 0 erl_child_setup
[ 215959] 1000 215959 208 32 36864 0 0 inet_gethost
[ 215960] 1000 215960 208 32 36864 0 0 inet_gethost
[ 215961] 1000 215961 208 32 36864 0 0 inet_gethost
[ 215976] 70 215976 43495 293 143360 950 0 postgres
[ 215977] 70 215977 43478 155 143360 1078 0 postgres
[ 215978] 70 215978 43740 240 147456 1174 0 postgres
[ 215979] 70 215979 43467 245 143360 1014 0 postgres
[ 215980] 70 215980 43483 346 143360 950 0 postgres
[ 215981] 70 215981 43490 216 143360 1046 0 postgres
[ 215982] 70 215982 43495 264 143360 982 0 postgres
[ 215983] 70 215983 43489 252 143360 982 0 postgres
[ 215984] 70 215984 43474 275 143360 1014 0 postgres
[ 215985] 70 215985 43475 238 143360 982 0 postgres
[ 215986] 70 215986 42914 97 131072 598 0 postgres
[ 216005] 0 216005 1654 64 53248 32 0 cron
[ 216008] 0 216008 2103 64 53248 768 0 haveged
[ 216010] 0 216010 55453 64 73728 352 0 rsyslogd
[ 216016] 0 216016 283896 2023 266240 2272 -999 containerd
[ 216017] 0 216017 3861 64 69632 288 -1000 sshd
[ 216025] 109 216025 5825 58 77824 320 0 opendkim
[ 216027] 109 216027 83688 61 122880 1184 0 opendkim
[ 269335] 108 269335 10829 32 77824 160 0 anvil
[ 269662] 108 269662 10829 32 69632 160 0 pickup
[ 269762] 70 269762 42897 640 126976 321 0 postgres
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=containerd.service,mems_allowed=0,global_oom,task_memcg=/system.slice/caddy.service,task=caddy,pid=122749,uid=997
Out of memory: Killed process 122749 (caddy) total-vm:1297904kB, anon-rss:449640kB, file-rss:128kB, shmem-rss:0kB, UID:997 pgtables:1332kB oom_score_adj:0
audit: type=1131 audit(1689673878.657:42408): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=caddy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
caddy.service: A process of this unit has been killed by the OOM killer.
caddy.service: Main process exited, code=killed, status=9/KILL
caddy.service: Failed with result 'oom-kill'.
caddy.service: Consumed 24min 12.047s CPU time.
3. Caddy version:
v2.6.4 h1:2hwYqiRwk1tf3VruhMpLcYTg+11fCdr8S3jhNAdnPy8=
4. How I installed and ran Caddy:
https://caddyserver.com/docs/install#debian-ubuntu-raspbian
, using systemd to start it up.
a. System environment:
- Linode Nanode 1 GB (shared 1-cpu 1GB memory, configured with 512MB swap)
- debian-testing
- systemd
b. Command:
/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
c. Service/unit/compose file:
d. My complete Caddy config:
I host ~5 domains here, so this is the full config. Below that are the main configs for this issue.
{
# auto_https off
email ...@gmail.com
}
(static) {
@static {
file
path *.ico *.css *.js *.gif *.jpg *.jpeg *.png *.svg *.webp *.woff *.woff2 *.json
}
header @static Cache-Control max-age=5184000
}
(security) {
header {
# enable HSTS
Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
# disable clients from sniffing the media type
X-Content-Type-Options nosniff
# keep referrer data off of HTTP connections
Referrer-Policy no-referrer-when-downgrade
}
}
(errorfiles) {
handle_errors {
@custom_err file /{err.status_code}.html /err.html
handle @custom_err {
rewrite * {file_match.relative}
file_server
}
respond "{err.status_code} {err.status_text}"
}
}
(logs) {
log {
output file /var/log/caddy/{args.0}.log
}
}
www.domain1.example.com {
# TODO
# import security
redir https://domain1.example.com{uri}
}
http:// {
respond "Hi!"
}
domain1.example.com {
root * /var/www/domain1.example.com/
encode zstd gzip
file_server
# import logs domain1.example.com
import static
handle_errors {
@custom_err file /{err.status_code}.html /err.html
handle @custom_err {
rewrite * {file_match.relative}
file_server
}
respond "{err.status_code} {err.status_text}"
}
}
tracker.domain2.example.com,
tracker.domain1.example.com {
rewrite /pl.js /js/plausible.js
reverse_proxy localhost:8000
header /pl.js {
Cache-Control: "Cache-Control max-age=604800, public, must-revalidate" #7d
}
}
www.domain2.example.com {
redir https://domain2.example.com{uri}
}
domain2.example.com {
reverse_proxy localhost:8001
}
www.domain3.com {
redir https://domain3.com{uri}
}
domain3.com {
root * /var/www/domain3.com/
encode zstd gzip
file_server
import static
import errorfiles
}
www.domain4.com {
redir https://domain4.com{uri}
}
domain4.com {
root * /var/www/domain4.com/
encode zstd gzip
file_server
import static
import errorfiles
}
www.domain5.com {
redir https://domain5.com{uri}
}
domain5.com {
root * /var/www/domain5.com/
encode zstd gzip
file_server
import static
import errorfiles
}
Copy/paste of the relavant configs:
tracker.domain2.example.com {
rewrite /pl.js /js/plausible.js
reverse_proxy localhost:8000
header /pl.js {
Cache-Control: "Cache-Control max-age=604800, public, must-revalidate" #7d
}
}
www.domain2.example.com {
redir https://domain2.example.com{uri}
}
domain2.example.com {
reverse_proxy localhost:8001
}