Are there ways to tune Caddy to reduce memory usage?

1. The problem I’m having:

Caddy was oom-kill’ed due to a lot of traffic. I’m looking to tune Caddy to prevent this in the future.

I’m on a small 1GB Linode server, and I have other things running on there. Systemd killed Caddy when it got up to ~450MB in size (I’ve included the syslog dump of it getting killed below). I don’t track exact stats, but when it died, I’m guessing I had around 2000-4000 people actively viewing the webpage.

On that page, people will keep it open, and some JS will update the contents of their browser once a minute (all around the same time, but I try to add some smearing). The JS will make 2 calls, both just reverse_proxy to other apps (to domain2.example.com and tracker.domain2.example.com.

I’ve since added:

Restart=on-failure
RestartSec=10s

to my caddy.service file, so oom-kill won’t take my server down totally.

What I’m looking for: when I’m getting a lot of traffic to reverse_proxy sites that cannot be cached, what is the best way to reduce memory usage on Caddy?

I was previously using nginx, and never had memory issues with similar traffic patterns on this server. Is Caddy’s memory profile just different enough where I’ll hit these issues? Can I do anything to prevent it?

2. Error messages and/or full log output:

containerd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-999
CPU: 0 PID: 216019 Comm: containerd Not tainted 6.3.0-1-amd64 #1  Debian 6.3.7-1
Hardware name: Linode Compute Instance/Standard PC (Q35 + ICH9, 2009), BIOS Not Specified
Mem-Info:
active_anon:80218 inactive_anon:87798 isolated_anon:0
 active_file:13 inactive_file:50 isolated_file:0
 unevictable:0 dirty:0 writeback:0
 slab_reclaimable:11108 slab_unreclaimable:32726
 mapped:1562 shmem:2033 pagetables:2564
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:12618 free_pcp:62 free_cma:0
Node 0 active_anon:320872kB inactive_anon:351192kB active_file:52kB inactive_file:200kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:6248kB dirty:0kB writeback:0kB shmem:8132kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 43008kB writeback_tmp:0kB kernel_stack:7456kB pagetables:10256kB sec_pagetables:0kB all_unreclaimable? yes
Node 0 DMA free:4372kB boost:0kB min:728kB low:908kB high:1088kB reserved_highatomic:0KB active_anon:5384kB inactive_anon:2460kB active_file:8kB inactive_file:156kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 908 908 908 908
Node 0 DMA32 free:46100kB boost:0kB min:44324kB low:55404kB high:66484kB reserved_highatomic:2048KB active_anon:315488kB inactive_anon:348732kB active_file:48kB inactive_file:36kB unevictable:0kB writepending:0kB present:1032040kB managed:968432kB mlocked:0kB bounce:0kB free_pcp:248kB local_pcp:248kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 17*4kB (ME) 22*8kB (ME) 14*16kB (ME) 12*32kB (UE) 3*64kB (E) 10*128kB (UME) 2*256kB (UM) 1*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 4372kB
Node 0 DMA32: 3117*4kB (ME) 968*8kB (UME) 614*16kB (UME) 258*32kB (UME) 122*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46100kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2236 total pagecache pages
143 pages in swap cache
Free swap  = 0kB
Total swap = 524284kB
262008 pages RAM
0 pages HighMem/MovableOnly
16060 pages reserved
0 pages hwpoisoned
Tasks state (memory values in pages):
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[    412]   104   412     2393       64    61440      224          -900 dbus-daemon
[    439]     0   439   197052    13440   270336     5887             0 myapp
[    445]     0   445     4421       64    73728      448             0 systemd-logind
[    533]     0   533    27743       55   114688     2880             0 unattended-upgr
[    535]     0   535     1468       64    45056       32             0 agetty
[    537]     0   537     1374       64    57344        0             0 agetty
[ 122749]   997 122749   324476   112442  1363968    30080             0 caddy
[ 179366]     0 179366    20570       76   172032      231          -250 systemd-journal
[ 179442]   101 179442    22532       64    77824      224             0 systemd-timesyn
[ 179480]     0 179480     6366       53    69632      320         -1000 systemd-udevd
[ 200786]     0 200786    10727       58    69632      160             0 master
[ 200788]   108 200788    10877       32    77824      160             0 qmgr
[ 200815]   108 200815    12262       32    86016      352             0 tlsmgr
[ 211233]     0 211233   370750     4315   438272     3360          -500 dockerd
[ 211276]     0 211276   289092     1745   167936     1114          -500 docker-proxy
[ 211292]     0 211292   307669     2233   192512     1424          -500 docker-proxy
[ 211325]     0 211325   180190      544   110592      192          -998 containerd-shim
[ 211345]     0 211345   180190      510   110592      192          -998 containerd-shim
[ 211358]     0 211358   180126      543   110592     1214          -998 containerd-shim
[ 211374]     0 211374   180126      544   110592     1181          -998 containerd-shim
[ 211414]   101 211414    14041       64   147456      160             0 exim
[ 211421]  1000 211421   363823    17477   905216    35596             0 beam.smp
[ 211437]    70 211437    42561       96   122880      384             0 postgres
[ 211445]   101 211445   793595     9467  2654208    16457             0 clickhouse-serv
[ 211721]    70 211721    42593       96   135168      416             0 postgres
[ 211722]    70 211722    42578       96   118784      384             0 postgres
[ 211723]    70 211723    42570       96   110592      384             0 postgres
[ 211724]    70 211724    42744      160   122880      448             0 postgres
[ 211725]    70 211725     6274       96    94208      384             0 postgres
[ 211727]    70 211727    42700       96   110592      448             0 postgres
[ 215915]  1000 215915      221       32    36864        0             0 epmd
[ 215919]  1000 215919      200       32    45056        0             0 erl_child_setup
[ 215959]  1000 215959      208       32    36864        0             0 inet_gethost
[ 215960]  1000 215960      208       32    36864        0             0 inet_gethost
[ 215961]  1000 215961      208       32    36864        0             0 inet_gethost
[ 215976]    70 215976    43495      293   143360      950             0 postgres
[ 215977]    70 215977    43478      155   143360     1078             0 postgres
[ 215978]    70 215978    43740      240   147456     1174             0 postgres
[ 215979]    70 215979    43467      245   143360     1014             0 postgres
[ 215980]    70 215980    43483      346   143360      950             0 postgres
[ 215981]    70 215981    43490      216   143360     1046             0 postgres
[ 215982]    70 215982    43495      264   143360      982             0 postgres
[ 215983]    70 215983    43489      252   143360      982             0 postgres
[ 215984]    70 215984    43474      275   143360     1014             0 postgres
[ 215985]    70 215985    43475      238   143360      982             0 postgres
[ 215986]    70 215986    42914       97   131072      598             0 postgres
[ 216005]     0 216005     1654       64    53248       32             0 cron
[ 216008]     0 216008     2103       64    53248      768             0 haveged
[ 216010]     0 216010    55453       64    73728      352             0 rsyslogd
[ 216016]     0 216016   283896     2023   266240     2272          -999 containerd
[ 216017]     0 216017     3861       64    69632      288         -1000 sshd
[ 216025]   109 216025     5825       58    77824      320             0 opendkim
[ 216027]   109 216027    83688       61   122880     1184             0 opendkim
[ 269335]   108 269335    10829       32    77824      160             0 anvil
[ 269662]   108 269662    10829       32    69632      160             0 pickup
[ 269762]    70 269762    42897      640   126976      321             0 postgres
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=containerd.service,mems_allowed=0,global_oom,task_memcg=/system.slice/caddy.service,task=caddy,pid=122749,uid=997
Out of memory: Killed process 122749 (caddy) total-vm:1297904kB, anon-rss:449640kB, file-rss:128kB, shmem-rss:0kB, UID:997 pgtables:1332kB oom_score_adj:0
audit: type=1131 audit(1689673878.657:42408): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=caddy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
caddy.service: A process of this unit has been killed by the OOM killer.
caddy.service: Main process exited, code=killed, status=9/KILL
caddy.service: Failed with result 'oom-kill'.
caddy.service: Consumed 24min 12.047s CPU time.

3. Caddy version:

v2.6.4 h1:2hwYqiRwk1tf3VruhMpLcYTg+11fCdr8S3jhNAdnPy8=

4. How I installed and ran Caddy:

https://caddyserver.com/docs/install#debian-ubuntu-raspbian, using systemd to start it up.

a. System environment:

  • Linode Nanode 1 GB (shared 1-cpu 1GB memory, configured with 512MB swap)
  • debian-testing
  • systemd

b. Command:

/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile

c. Service/unit/compose file:

d. My complete Caddy config:

I host ~5 domains here, so this is the full config. Below that are the main configs for this issue.

{
        #       auto_https off
        email ...@gmail.com
}

(static) {
        @static {
                file
                path *.ico *.css *.js *.gif *.jpg *.jpeg *.png *.svg *.webp *.woff *.woff2 *.json
        }
        header @static Cache-Control max-age=5184000
}

(security) {
        header {
                # enable HSTS
                Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
                # disable clients from sniffing the media type
                X-Content-Type-Options nosniff
                # keep referrer data off of HTTP connections
                Referrer-Policy no-referrer-when-downgrade
        }
}

(errorfiles) {
        handle_errors {
                @custom_err file /{err.status_code}.html /err.html
                handle @custom_err {
                        rewrite * {file_match.relative}
                        file_server
                }
                respond "{err.status_code} {err.status_text}"
        }
}

(logs) {
        log {
                output file /var/log/caddy/{args.0}.log
        }
}

www.domain1.example.com {
        # TODO
        # import security
        redir https://domain1.example.com{uri}
}

http:// {
        respond "Hi!"
}

domain1.example.com {
        root * /var/www/domain1.example.com/
        encode zstd gzip
        file_server
        # import logs domain1.example.com
        import static
        handle_errors {
                @custom_err file /{err.status_code}.html /err.html
                handle @custom_err {
                        rewrite * {file_match.relative}
                        file_server
                }
                respond "{err.status_code} {err.status_text}"
        }
}

tracker.domain2.example.com,
tracker.domain1.example.com {
        rewrite /pl.js /js/plausible.js
        reverse_proxy localhost:8000
        header /pl.js {
                Cache-Control: "Cache-Control max-age=604800, public, must-revalidate" #7d
        }
}

www.domain2.example.com {
        redir https://domain2.example.com{uri}
}

domain2.example.com {
        reverse_proxy localhost:8001
}

www.domain3.com {
        redir https://domain3.com{uri}
}

domain3.com {
        root * /var/www/domain3.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

www.domain4.com {
        redir https://domain4.com{uri}
}

domain4.com {
        root * /var/www/domain4.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

www.domain5.com {
        redir https://domain5.com{uri}
}

domain5.com {
        root * /var/www/domain5.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

Copy/paste of the relavant configs:

tracker.domain2.example.com {
        rewrite /pl.js /js/plausible.js
        reverse_proxy localhost:8000
        header /pl.js {
                Cache-Control: "Cache-Control max-age=604800, public, must-revalidate" #7d
        }
}

www.domain2.example.com {
        redir https://domain2.example.com{uri}
}

domain2.example.com {
        reverse_proxy localhost:8001
}

5. Links to relevant resources:

Can you inspect the memory use climbing while it’s happening?

Go to localhost:2019/debug/pprof on the instance and follow the link to the heap profile. The goroutine dump could be useful too. That’ll tell us where the code is at and what is making lots of allocations.

But since the provided config has been modified from its original, I can’t trust that that’s the real config. We’ll need the full, unmodified config to be able to help you and make suggestions. Otherwise all we can do is guess, which won’t be helpful. That memory and goroutine profile will also be necessary if we want to reduce memory usage.

If it works without memory issues on nginx, providing the full unredacted nginx config would also be useful to see if we’re comparing apples to apples.

Thanks

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

(this is a repost of Are there ways to tune Caddy to reduce memory usage?) with proper configs.

1. The problem I’m having:

My Caddy was oom-kill’ed due to a lot of traffic. I’m looking to tune Caddy to prevent this in the future.

I’m on a 2GB Linode server, and I have other things running on there. Systemd killed Caddy when it got up to ~1.2GB in size (I’ve included the syslog dump of it getting killed below). I don’t track exact stats, but when it died, I’m guessing I had around 2000-4000 people actively viewing the webpage.

On that page (https://is.xivup.com), people will keep it open, and some JS will update the contents of their browser once a minute (all around the same time for all users, but I try to add some smearing). The JS will make 2 calls, both just reverse_proxy to 2 sites: is.xivup.com and pls.xivup.com. The first being my Go app (which is the ffxiv process from the logs below), and the second is Plausible, which has a number of processes).

My normal traffic is only ~30 people on it at any given time. During certain events, it spikes to 100x or more, (which only happens every few months). Meaning recreating these conditions are difficult.

After the last post, I had added the below to my systemctl config, which at least mitigated the issue (as caddy auto-restarted within a few seconds and didn’t OOM die again)

Restart=on-failure
RestartSec=10s

What I’m looking for: when I’m getting a lot of traffic to reverse_proxy sites that cannot be cached, what is the best way to reduce memory usage on Caddy?

I was previously using nginx, and never had memory issues with similar traffic patterns on this server (I don’t have a like-for-like config as I put some zero-traffic sites on this server when I migrated to Caddy), but I may create one to see how nginx behaves in comparison). Is Caddy’s memory profile just different enough where I’ll hit these issues? Can I do anything to prevent it?

2. Error messages and/or full log output:

2023-10-07T15:08:22.401160-06:00   kernel: [200963.027484] containerd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-999
2023-10-07T15:08:22.402156-06:00   kernel: [200963.029263] CPU: 0 PID: 594 Comm: containerd Not tainted 6.5.0-1-amd64 #1  Debian 6.5.3-1
2023-10-07T15:08:22.402160-06:00   kernel: [200963.030124] Hardware name: Linode Compute Instance/Standard PC (Q35 + ICH9, 2009), BIOS Not Specified
2023-10-07T15:08:22.402193-06:00   kernel: [200963.031150] Call Trace:
2023-10-07T15:08:22.402198-06:00   kernel: [200963.031488]  <TASK>
2023-10-07T15:08:22.402199-06:00   kernel: [200963.031731]  dump_stack_lvl+0x47/0x60
2023-10-07T15:08:22.402200-06:00   kernel: [200963.032154]  dump_header+0x4a/0x240
2023-10-07T15:08:22.402215-06:00   kernel: [200963.032611]  oom_kill_process+0xf9/0x190
2023-10-07T15:08:22.402216-06:00   kernel: [200963.033053]  out_of_memory+0x256/0x540
2023-10-07T15:08:22.402296-06:00   kernel: [200963.033604]  __alloc_pages_slowpath.constprop.0+0xa11/0xd30
2023-10-07T15:08:22.402298-06:00   kernel: [200963.034393]  __alloc_pages+0x30b/0x330
2023-10-07T15:08:22.402299-06:00   kernel: [200963.034830]  folio_alloc+0x1b/0x50
2023-10-07T15:08:22.402299-06:00   kernel: [200963.035248]  __filemap_get_folio+0xca/0x240
2023-10-07T15:08:22.402300-06:00   kernel: [200963.035753]  filemap_fault+0x14b/0x9f0
2023-10-07T15:08:22.402307-06:00   kernel: [200963.036176]  ? filemap_map_pages+0x2d7/0x550
2023-10-07T15:08:22.402308-06:00   kernel: [200963.036696]  __do_fault+0x33/0x130
2023-10-07T15:08:22.402308-06:00   kernel: [200963.037122]  do_fault+0x248/0x3d0
2023-10-07T15:08:22.402309-06:00   kernel: [200963.037537]  __handle_mm_fault+0x65b/0xbb0
2023-10-07T15:08:22.402309-06:00   kernel: [200963.038004]  handle_mm_fault+0x155/0x350
2023-10-07T15:08:22.402310-06:00   kernel: [200963.038458]  do_user_addr_fault+0x216/0x640
2023-10-07T15:08:22.402311-06:00   kernel: [200963.038944]  ? kvm_read_and_reset_apf_flags+0x43/0x60
2023-10-07T15:08:22.402311-06:00   kernel: [200963.039501]  exc_page_fault+0x7f/0x180
2023-10-07T15:08:22.402319-06:00   kernel: [200963.039910]  asm_exc_page_fault+0x26/0x30
2023-10-07T15:08:22.402320-06:00   kernel: [200963.040347] RIP: 0033:0x55fe035236e0
2023-10-07T15:08:22.402320-06:00   kernel: [200963.040743] Code: Unable to access opcode bytes at 0x55fe035236b6.
2023-10-07T15:08:22.402333-06:00   kernel: [200963.041589] RSP: 002b:000000c0006f6e40 EFLAGS: 00010212
2023-10-07T15:08:22.402334-06:00   kernel: [200963.042211] RAX: 000000c00022c2d0 RBX: 000055fe035236e0 RCX: 000000c00022c2d0
2023-10-07T15:08:22.402335-06:00   kernel: [200963.043078] RDX: 000000c000577620 RSI: 000000c000239458 RDI: 0000000000000000
2023-10-07T15:08:22.402335-06:00   kernel: [200963.043897] RBP: 000000c0006f6fa8 R08: 0000000000000000 R09: 0000000000000000
2023-10-07T15:08:22.402336-06:00   kernel: [200963.044737] R10: 000000c0006d65e8 R11: 0000000000685ce1 R12: 0000000000000002
2023-10-07T15:08:22.402336-06:00   kernel: [200963.045554] R13: 0000000009fce388 R14: 000000c00012cea0 R15: 000000c00004d200
2023-10-07T15:08:22.402337-06:00   kernel: [200963.046414]  </TASK>
2023-10-07T15:08:22.402337-06:00   kernel: [200963.047042] Mem-Info:
2023-10-07T15:08:22.402345-06:00   kernel: [200963.047352] active_anon:262473 inactive_anon:90634 isolated_anon:0
2023-10-07T15:08:22.402345-06:00   kernel: [200963.047352]  active_file:1 inactive_file:43 isolated_file:0
2023-10-07T15:08:22.402346-06:00   kernel: [200963.047352]  unevictable:0 dirty:44 writeback:0
2023-10-07T15:08:22.402346-06:00   kernel: [200963.047352]  slab_reclaimable:10347 slab_unreclaimable:15602
2023-10-07T15:08:22.402347-06:00   kernel: [200963.047352]  mapped:1193 shmem:1377 pagetables:2918
2023-10-07T15:08:22.402347-06:00   kernel: [200963.047352]  sec_pagetables:0 bounce:0
2023-10-07T15:08:22.402348-06:00   kernel: [200963.047352]  kernel_misc_reclaimable:0
2023-10-07T15:08:22.402348-06:00   kernel: [200963.047352]  free:16886 free_pcp:280 free_cma:0
2023-10-07T15:08:22.402349-06:00   kernel: [200963.052632] Node 0 active_anon:1049892kB inactive_anon:362536kB active_file:4kB inactive_file:172kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4772kB dirty:176kB writeback:0kB shmem:5508kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 860160kB writeback_tmp:0kB kernel_stack:8960kB pagetables:11672kB sec_pagetables:0kB all_unreclaimable? no
2023-10-07T15:08:22.402350-06:00   kernel: [200963.056806] Node 0 DMA free:7984kB boost:0kB min:348kB low:432kB high:516kB reserved_highatomic:0KB active_anon:5304kB inactive_anon:120kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
2023-10-07T15:08:22.402352-06:00   kernel: [200963.060057] lowmem_reserve[]: 0 1910 1910 1910 1910
2023-10-07T15:08:22.402353-06:00   kernel: [200963.060628] Node 0 DMA32 free:59112kB boost:18432kB min:63136kB low:74312kB high:85488kB reserved_highatomic:18432KB active_anon:1044588kB inactive_anon:362416kB active_file:4kB inactive_file:172kB unevictable:0kB writepending:176kB present:2080616kB managed:1996876kB mlocked:0kB bounce:0kB free_pcp:1568kB local_pcp:1568kB free_cma:0kB
2023-10-07T15:08:22.402354-06:00   kernel: [200963.064431] lowmem_reserve[]: 0 0 0 0 0
2023-10-07T15:08:22.402354-06:00   kernel: [200963.065122] Node 0 DMA: 6*4kB (UM) 4*8kB (UM) 6*16kB (UM) 13*32kB (UM) 6*64kB (UM) 3*128kB (UM) 0*256kB 1*512kB (U) 0*1024kB 3*2048kB (M) 0*4096kB = 7992kB
2023-10-07T15:08:22.402355-06:00   kernel: [200963.066708] Node 0 DMA32: 5390*4kB (UMEH) 1452*8kB (UMEH) 1487*16kB (UMEH) 59*32kB (MH) 4*64kB (MH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 59112kB
2023-10-07T15:08:22.402356-06:00   kernel: [200963.068325] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
2023-10-07T15:08:22.402356-06:00   kernel: [200963.069722] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2023-10-07T15:08:22.402357-06:00   kernel: [200963.070680] 3456 total pagecache pages
2023-10-07T15:08:22.402357-06:00   kernel: [200963.071117] 2025 pages in swap cache
2023-10-07T15:08:22.402358-06:00   kernel: [200963.071526] Free swap  = 220kB
2023-10-07T15:08:22.402358-06:00   kernel: [200963.071881] Total swap = 524284kB
2023-10-07T15:08:22.402359-06:00   kernel: [200963.072306] 524152 pages RAM
2023-10-07T15:08:22.402359-06:00   kernel: [200963.072682] 0 pages HighMem/MovableOnly
2023-10-07T15:08:22.402360-06:00   kernel: [200963.073384] 21093 pages reserved
2023-10-07T15:08:22.402361-06:00   kernel: [200963.073814] 0 pages hwpoisoned
2023-10-07T15:08:22.402361-06:00   kernel: [200963.074171] Tasks state (memory values in pages):
2023-10-07T15:08:22.402365-06:00   kernel: [200963.074786] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
2023-10-07T15:08:22.402365-06:00   kernel: [200963.075801] [    539]     0   539     2103       47    61440     1248             0 haveged
2023-10-07T15:08:22.402366-06:00   kernel: [200963.076763] [    545]   997   545   622782   300975  2691072     8448             0 caddy
2023-10-07T15:08:22.402366-06:00   kernel: [200963.077961] [    548]     0   548     1657       64    53248        0             0 cron
2023-10-07T15:08:22.402367-06:00   kernel: [200963.078987] [    549]   104   549     2400      160    53248       96          -900 dbus-daemon
2023-10-07T15:08:22.402367-06:00   kernel: [200963.079954] [    551]     0   551   309277     9760   192512     1504             0 ffxiv
2023-10-07T15:08:22.402371-06:00   kernel: [200963.080834] [    554]     0   554    55468      416    77824        0             0 rsyslogd
2023-10-07T15:08:22.402372-06:00   kernel: [200963.081986] [    558]     0   558     4356      128    69632      192             0 systemd-logind
2023-10-07T15:08:22.402372-06:00   kernel: [200963.083047] [    570]     0   570     1474       64    49152       32             0 agetty
2023-10-07T15:08:22.402373-06:00   kernel: [200963.083971] [    571]     0   571     1379       64    49152       32             0 agetty
2023-10-07T15:08:22.402374-06:00   kernel: [200963.085194] [    577]     0   577   339043     3768   286720     2432          -999 containerd
2023-10-07T15:08:22.402374-06:00   kernel: [200963.086169] [    580]   109   580     5946       88    81920      288             0 opendkim
2023-10-07T15:08:22.402375-06:00   kernel: [200963.087266] [    582]   109   582    83809      771   126976      768             0 opendkim
2023-10-07T15:08:22.402376-06:00   kernel: [200963.088231] [    592]     0   592     3889      224    73728      160         -1000 sshd
2023-10-07T15:08:22.402376-06:00   kernel: [200963.089455] [    601]     0   601    27802     1197   126976     1696             0 unattended-upgr
2023-10-07T15:08:22.402377-06:00   kernel: [200963.090520] [    620]     0   620   371835     4706   438272     5760          -500 dockerd
2023-10-07T15:08:22.402378-06:00   kernel: [200963.091515] [    766]     0   766   252018       48   131072     2176          -500 docker-proxy
2023-10-07T15:08:22.402378-06:00   kernel: [200963.092575] [    792]     0   792   288884       36   143360     2720          -500 docker-proxy
2023-10-07T15:08:22.402379-06:00   kernel: [200963.093992] [    915]     0   915   179899      522   114688     1984          -998 containerd-shim
2023-10-07T15:08:22.402379-06:00   kernel: [200963.095055] [    987]     0   987   179899      497   110592     1824          -998 containerd-shim
2023-10-07T15:08:22.402383-06:00   kernel: [200963.096097] [   1017]     0  1017   179963      489   106496     1760          -998 containerd-shim
2023-10-07T15:08:22.402384-06:00   kernel: [200963.097480] [   1023]     0  1023   179899      508   106496     1696          -998 containerd-shim
2023-10-07T15:08:22.402385-06:00   kernel: [200963.098615] [   1077]   101  1077    14040      128   143360       96             0 exim
2023-10-07T15:08:22.402385-06:00   kernel: [200963.099628] [   1091]     0  1091    10732      133    73728       64             0 master
2023-10-07T15:08:22.402386-06:00   kernel: [200963.100588] [   1098]   108  1098    10885      128    77824       96             0 qmgr
2023-10-07T15:08:22.402386-06:00   kernel: [200963.101789] [   1117]   101  1117   953854     7223  2863104    25567             0 clickhouse-serv
2023-10-07T15:08:22.402387-06:00   kernel: [200963.102936] [   1127]  1000  1127   352962    16997   868352    43264             0 beam.smp
2023-10-07T15:08:22.402387-06:00   kernel: [200963.104035] [   1135]    70  1135    42561      320   122880      352             0 postgres
2023-10-07T15:08:22.402388-06:00   kernel: [200963.105328] [   1539]    70  1539    42593      576   131072      384             0 postgres
2023-10-07T15:08:22.402388-06:00   kernel: [200963.106334] [   1540]    70  1540    42578      288   118784      352             0 postgres
2023-10-07T15:08:22.402389-06:00   kernel: [200963.107369] [   1541]    70  1541    42570      288   110592      352             0 postgres
2023-10-07T15:08:22.402389-06:00   kernel: [200963.108344] [   1542]    70  1542    42744      160   118784      448             0 postgres
2023-10-07T15:08:22.402390-06:00   kernel: [200963.109666] [   1543]    70  1543     6274       96    94208      384             0 postgres
2023-10-07T15:08:22.402410-06:00   kernel: [200963.110687] [   1544]    70  1544    42700      128   114688      448             0 postgres
2023-10-07T15:08:22.402411-06:00   kernel: [200963.111644] [   1988]  1000  1988      224       32    36864        0             0 epmd
2023-10-07T15:08:22.402412-06:00   kernel: [200963.112692] [   1990]  1000  1990      203       32    40960        0             0 erl_child_setup
2023-10-07T15:08:22.402474-06:00   kernel: [200963.114012] [   2012]   108  2012    12279      128    90112      256             0 tlsmgr
2023-10-07T15:08:22.402475-06:00   kernel: [200963.115112] [   2013]  1000  2013      211       32    45056        0             0 inet_gethost
2023-10-07T15:08:22.402524-06:00   kernel: [200963.116069] [   2014]  1000  2014      211       32    45056        0             0 inet_gethost
2023-10-07T15:08:22.402524-06:00   kernel: [200963.117328] [   2015]  1000  2015      211       32    45056        0             0 inet_gethost
2023-10-07T15:08:22.402525-06:00   kernel: [200963.118391] [   2032]    70  2032    43441      685   143360     1056             0 postgres
2023-10-07T15:08:22.402525-06:00   kernel: [200963.119395] [   2033]    70  2033    43453      619   143360     1088             0 postgres
2023-10-07T15:08:22.402526-06:00   kernel: [200963.120393] [   2034]    70  2034    43706      793   147456     1056             0 postgres
2023-10-07T15:08:22.402527-06:00   kernel: [200963.121640] [   2036]    70  2036    43412      672   143360     1056             0 postgres
2023-10-07T15:08:22.402527-06:00   kernel: [200963.122650] [   2038]    70  2038    43442     1141   143360      640             0 postgres
2023-10-07T15:08:22.402528-06:00   kernel: [200963.123669] [   2039]    70  2039    43449      638   143360     1056             0 postgres
2023-10-07T15:08:22.402528-06:00   kernel: [200963.124741] [   2040]    70  2040    43450      731   143360     1024             0 postgres
2023-10-07T15:08:22.402529-06:00   kernel: [200963.126130] [   2041]    70  2041    43451      722   143360     1024             0 postgres
2023-10-07T15:08:22.402530-06:00   kernel: [200963.127246] [   2042]    70  2042    43441      821   143360      896             0 postgres
2023-10-07T15:08:22.402530-06:00   kernel: [200963.128240] [   2043]    70  2043    43453      744   143360      992             0 postgres
2023-10-07T15:08:22.402531-06:00   kernel: [200963.129434] [   2067]    70  2067    42914      192   131072      576             0 postgres
2023-10-07T15:08:22.402531-06:00   kernel: [200963.130413] [  15249]     0 15249    22558       64   204800      224          -250 systemd-journal
2023-10-07T15:08:22.402532-06:00   kernel: [200963.131490] [  15308]   101 15308    22622      114    77824      672             0 systemd-timesyn
2023-10-07T15:08:22.402533-06:00   kernel: [200963.132661] [  15363]     0 15363     6366       64    81920      224         -1000 systemd-udevd
2023-10-07T15:08:22.402533-06:00   kernel: [200963.133978] [  42088]   108 42088    10838      192    73728        0             0 pickup
2023-10-07T15:08:22.402534-06:00   kernel: [200963.135048] [  42484]   108 42484    10838      192    77824        0             0 anvil
2023-10-07T15:08:22.402534-06:00   kernel: [200963.136037] [  42519]   108 42519    12423      448    94208        0             0 smtpd
2023-10-07T15:08:22.402535-06:00   kernel: [200963.137309] [  42520]   108 42520    10839      192    73728        0             0 proxymap
2023-10-07T15:08:22.402535-06:00   kernel: [200963.138373] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=containerd.service,mems_allowed=0,global_oom,task_memcg=/system.slice/caddy.service,task=caddy,pid=545,uid=997
2023-10-07T15:08:22.402536-06:00   kernel: [200963.140355] Out of memory: Killed process 545 (caddy) total-vm:2491128kB, anon-rss:1203772kB, file-rss:128kB, shmem-rss:0kB, UID:997 pgtables:2628kB oom_score_adj:0

3. Caddy version:

v2.7.4 h1:J8nisjdOxnYHXlorUKXY75Gr6iBfudfoGhrJ8t7/flI=

4. How I installed and ran Caddy:

Install — Caddy Documentation, using systemd to start it up.

a. System environment:

  • Linode Linode 2GB (shared 1-cpu 2GB memory, configured with 512MB swap)
  • debian-testing
  • systemd

b. Command:

/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile

c. Service/unit/compose file:

as I use the default systemd config + overrides, here is the output of systemctl edit caddy

### Editing /etc/systemd/system/caddy.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Unit]
StartLimitIntervalSec=10s

[Service]
Restart=on-failure
RestartSec=5s

### Edits below this comment will be discarded


### /lib/systemd/system/caddy.service
# # caddy.service
# #
# # For using Caddy with a config file.
# #
# # Make sure the ExecStart and ExecReload commands are correct
# # for your installation.
# #
# # See https://caddyserver.com/docs/install for instructions.
# #
# # WARNING: This service does not use the --resume flag, so if you
# # use the API to make changes, they will be overwritten by the
# # Caddyfile next time the service is restarted. If you intend to
# # use Caddy's API to configure it, add the --resume flag to the
# # `caddy run` command or use the caddy-api.service file instead.
#
# [Unit]
# Description=Caddy
# Documentation=https://caddyserver.com/docs/
# After=network.target network-online.target
# Requires=network-online.target
#
#
# [Service]
# Type=notify
# User=caddy
# Group=caddy
# ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
# ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile --force
# TimeoutStopSec=5s
# LimitNOFILE=1048576
# LimitNPROC=512
# PrivateTmp=true
# ProtectSystem=full
# AmbientCapabilities=CAP_NET_ADMIN CAP_NET_BIND_SERVICE
#
#
# [Install]
# WantedBy=multi-user.target

d. My complete Caddy config:

{
        #       auto_https off
        email redacted@gmail.com
}

(static) {
        @static {
                file
                path *.ico *.css *.js *.gif *.jpg *.jpeg *.png *.svg *.webp *.woff *.woff2 *.json
        }
        header @static Cache-Control max-age=5184000
}

(security) {
        header {
                # enable HSTS
                Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
                # disable clients from sniffing the media type
                X-Content-Type-Options nosniff
                # keep referrer data off of HTTP connections
                Referrer-Policy no-referrer-when-downgrade
        }
}

(errorfiles) {
        handle_errors {
                @custom_err file /{err.status_code}.html /err.html
                handle @custom_err {
                        rewrite * {file_match.relative}
                        file_server
                }
                respond "{err.status_code} {err.status_text}"
        }
}

(logs) {
        log {
                output file /var/log/caddy/{args.0}.log
        }
}

www.jwendel.net {
        # TODO
        # import security
        redir https://jwendel.net{uri}
}

http:// {
        respond "Hi!"
}

jwendel.net {
        root * /var/www/jwendel.net/
        encode zstd gzip
        file_server
        # import logs jwendel.net
        import static
        handle_errors {
                @custom_err file /{err.status_code}.html /err.html
                handle @custom_err {
                        rewrite * {file_match.relative}
                        file_server
                }
                respond "{err.status_code} {err.status_text}"
        }
}

pls.xivup.com,
pls.jwendel.net {
        rewrite /pl.js /js/plausible.js
        reverse_proxy localhost:8000 {
                flush_interval -1
        }
        header /pl.js {
                Cache-Control: "Cache-Control max-age=604800, public, must-revalidate" #7d
        }
}

xivup.com {
        redir https://is.xivup.com{uri}
}

is.xivup.com,
test.xivup.com {
        reverse_proxy localhost:8001
}

www.kyrra.net {
        redir https://kyrra.net{uri}
}

kyrra.net {
        root * /var/www/kyrra.net/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

www.jameswendel.com {
        redir https://jameswendel.com{uri}
}

jameswendel.com {
        root * /var/www/jameswendel.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

www.nlworks.com {
        redir https://nlworks.com{uri}
}

nlworks.com {
        root * /var/www/nlworks.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

# Refer to the Caddy docs for more information:
# https://caddyserver.com/docs/caddyfile

5. Links to relevant resources:

As Matt wrote:

Please get a profile while memory seems somewhat high. Hard to tell what’s going on without that information.

I’ll try, but as I said, it may take a month or 2 to recreate. If there are any knobs or anything I can attempt to tweak or look at before than, it would be helpful.

I’ve set up a script to grab the heap and goroutine pprof data for when I see a traffic spike in the future.

There’s nothing I can recommend to tweak unless we actually know what the problem is. Right now we don’t have anything to go on unless we see a profile which would show what’s actually eating up memory.

You could use a benchmarking tool to simulate the load instead of waiting for it, like ab (apache bench) or wrk.

Cool, just wanted to make sure I wasn’t missing any magic settings that may make reverse proxy (or caddy) behave better in these cases.

I’ve tried using bombardier without success of recreating this issue yet. I’ll mess around a bit more to see if I can find anything.

I have a crontab set up to dump heap and goroutine pprofs every hour to timestamped files, so I can catch it if I’m not watching it. Hopefully that’ll lead to something.

2 Likes

Thanks for posting the full config. That, combined with those profiles, should help us get the info we need.

Also maybe logs before the OOM crash. So we have a clue as to traffic patterns.

PS. It looks like several other processes are using quite a bit of memory too, including clickhouse-serv, which used about the same as Caddy. Is the OS not killing it, too?

Just checking, what kind of logs are you thinking? I don’t want to log all HTTP traffic, as that previously made this little machine sad. I did just add an atomic counter to my app that prints total request counts (per endpoint) in the last minute, so I should have a much better idea of the amount of traffic during a spike.

From my reading of the syslog there, clickhouse-serv was only using 28MB or so (9760*4k page size). The largest 4 processes I see are:

[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[    545]   997   545   622782   300975        2691072     8448             0 caddy
[    551]     0   551   309277     9760         192512     1504             0 ffxiv
[   1117]   101  1117   953854     7223        2863104    25567             0 clickhouse-serv
[   1127]  1000  1127   352962    16997         868352    43264             0 beam.smp

While clickhouse has a large total_vm, it’s RSS right then was really small. Which to me says that it previously had a large memory allocation, but has since freed up a bunch of memory.

Ok, huh. Journald should take care of that for you (if not, sounds like deficiencies in journald; Caddy’s logs are extremely efficient – zero-alloc), sorry to hear that’s not the case. But yeah, HTTP access logs would be ideal so we can see what kinds of requests are correlated with higher memory use.

Anyway, the profile will be the most help.

Ideally we’d like the heap profile, the goroutine dump (stack traces), and a CPU profile could also be good!

1 Like

I guess I haven’t tried turning on request logging with Caddy yet, only with Nginx (which I believe was under journald). When I did a load test against it, journald on its own (when logging each request) would consume ~15% of the CPU.

I’ll give it a try again and see how the system holds up under load, as Caddy may behave differently.

1 Like

That doesn’t surprise me, I find times when journald uses a significant amount of CPU and I’m not sure why. (I’ve also seen times when journald has allowed disk space to fill up with logs, and other weird things.)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.