Are there ways to tune Caddy to reduce memory usage?

1. The problem I’m having:

Caddy was oom-kill’ed due to a lot of traffic. I’m looking to tune Caddy to prevent this in the future.

I’m on a small 1GB Linode server, and I have other things running on there. Systemd killed Caddy when it got up to ~450MB in size (I’ve included the syslog dump of it getting killed below). I don’t track exact stats, but when it died, I’m guessing I had around 2000-4000 people actively viewing the webpage.

On that page, people will keep it open, and some JS will update the contents of their browser once a minute (all around the same time, but I try to add some smearing). The JS will make 2 calls, both just reverse_proxy to other apps (to domain2.example.com and tracker.domain2.example.com.

I’ve since added:

Restart=on-failure
RestartSec=10s

to my caddy.service file, so oom-kill won’t take my server down totally.

What I’m looking for: when I’m getting a lot of traffic to reverse_proxy sites that cannot be cached, what is the best way to reduce memory usage on Caddy?

I was previously using nginx, and never had memory issues with similar traffic patterns on this server. Is Caddy’s memory profile just different enough where I’ll hit these issues? Can I do anything to prevent it?

2. Error messages and/or full log output:

containerd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-999
CPU: 0 PID: 216019 Comm: containerd Not tainted 6.3.0-1-amd64 #1  Debian 6.3.7-1
Hardware name: Linode Compute Instance/Standard PC (Q35 + ICH9, 2009), BIOS Not Specified
Mem-Info:
active_anon:80218 inactive_anon:87798 isolated_anon:0
 active_file:13 inactive_file:50 isolated_file:0
 unevictable:0 dirty:0 writeback:0
 slab_reclaimable:11108 slab_unreclaimable:32726
 mapped:1562 shmem:2033 pagetables:2564
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:12618 free_pcp:62 free_cma:0
Node 0 active_anon:320872kB inactive_anon:351192kB active_file:52kB inactive_file:200kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:6248kB dirty:0kB writeback:0kB shmem:8132kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 43008kB writeback_tmp:0kB kernel_stack:7456kB pagetables:10256kB sec_pagetables:0kB all_unreclaimable? yes
Node 0 DMA free:4372kB boost:0kB min:728kB low:908kB high:1088kB reserved_highatomic:0KB active_anon:5384kB inactive_anon:2460kB active_file:8kB inactive_file:156kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 908 908 908 908
Node 0 DMA32 free:46100kB boost:0kB min:44324kB low:55404kB high:66484kB reserved_highatomic:2048KB active_anon:315488kB inactive_anon:348732kB active_file:48kB inactive_file:36kB unevictable:0kB writepending:0kB present:1032040kB managed:968432kB mlocked:0kB bounce:0kB free_pcp:248kB local_pcp:248kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 17*4kB (ME) 22*8kB (ME) 14*16kB (ME) 12*32kB (UE) 3*64kB (E) 10*128kB (UME) 2*256kB (UM) 1*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 4372kB
Node 0 DMA32: 3117*4kB (ME) 968*8kB (UME) 614*16kB (UME) 258*32kB (UME) 122*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46100kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2236 total pagecache pages
143 pages in swap cache
Free swap  = 0kB
Total swap = 524284kB
262008 pages RAM
0 pages HighMem/MovableOnly
16060 pages reserved
0 pages hwpoisoned
Tasks state (memory values in pages):
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[    412]   104   412     2393       64    61440      224          -900 dbus-daemon
[    439]     0   439   197052    13440   270336     5887             0 myapp
[    445]     0   445     4421       64    73728      448             0 systemd-logind
[    533]     0   533    27743       55   114688     2880             0 unattended-upgr
[    535]     0   535     1468       64    45056       32             0 agetty
[    537]     0   537     1374       64    57344        0             0 agetty
[ 122749]   997 122749   324476   112442  1363968    30080             0 caddy
[ 179366]     0 179366    20570       76   172032      231          -250 systemd-journal
[ 179442]   101 179442    22532       64    77824      224             0 systemd-timesyn
[ 179480]     0 179480     6366       53    69632      320         -1000 systemd-udevd
[ 200786]     0 200786    10727       58    69632      160             0 master
[ 200788]   108 200788    10877       32    77824      160             0 qmgr
[ 200815]   108 200815    12262       32    86016      352             0 tlsmgr
[ 211233]     0 211233   370750     4315   438272     3360          -500 dockerd
[ 211276]     0 211276   289092     1745   167936     1114          -500 docker-proxy
[ 211292]     0 211292   307669     2233   192512     1424          -500 docker-proxy
[ 211325]     0 211325   180190      544   110592      192          -998 containerd-shim
[ 211345]     0 211345   180190      510   110592      192          -998 containerd-shim
[ 211358]     0 211358   180126      543   110592     1214          -998 containerd-shim
[ 211374]     0 211374   180126      544   110592     1181          -998 containerd-shim
[ 211414]   101 211414    14041       64   147456      160             0 exim
[ 211421]  1000 211421   363823    17477   905216    35596             0 beam.smp
[ 211437]    70 211437    42561       96   122880      384             0 postgres
[ 211445]   101 211445   793595     9467  2654208    16457             0 clickhouse-serv
[ 211721]    70 211721    42593       96   135168      416             0 postgres
[ 211722]    70 211722    42578       96   118784      384             0 postgres
[ 211723]    70 211723    42570       96   110592      384             0 postgres
[ 211724]    70 211724    42744      160   122880      448             0 postgres
[ 211725]    70 211725     6274       96    94208      384             0 postgres
[ 211727]    70 211727    42700       96   110592      448             0 postgres
[ 215915]  1000 215915      221       32    36864        0             0 epmd
[ 215919]  1000 215919      200       32    45056        0             0 erl_child_setup
[ 215959]  1000 215959      208       32    36864        0             0 inet_gethost
[ 215960]  1000 215960      208       32    36864        0             0 inet_gethost
[ 215961]  1000 215961      208       32    36864        0             0 inet_gethost
[ 215976]    70 215976    43495      293   143360      950             0 postgres
[ 215977]    70 215977    43478      155   143360     1078             0 postgres
[ 215978]    70 215978    43740      240   147456     1174             0 postgres
[ 215979]    70 215979    43467      245   143360     1014             0 postgres
[ 215980]    70 215980    43483      346   143360      950             0 postgres
[ 215981]    70 215981    43490      216   143360     1046             0 postgres
[ 215982]    70 215982    43495      264   143360      982             0 postgres
[ 215983]    70 215983    43489      252   143360      982             0 postgres
[ 215984]    70 215984    43474      275   143360     1014             0 postgres
[ 215985]    70 215985    43475      238   143360      982             0 postgres
[ 215986]    70 215986    42914       97   131072      598             0 postgres
[ 216005]     0 216005     1654       64    53248       32             0 cron
[ 216008]     0 216008     2103       64    53248      768             0 haveged
[ 216010]     0 216010    55453       64    73728      352             0 rsyslogd
[ 216016]     0 216016   283896     2023   266240     2272          -999 containerd
[ 216017]     0 216017     3861       64    69632      288         -1000 sshd
[ 216025]   109 216025     5825       58    77824      320             0 opendkim
[ 216027]   109 216027    83688       61   122880     1184             0 opendkim
[ 269335]   108 269335    10829       32    77824      160             0 anvil
[ 269662]   108 269662    10829       32    69632      160             0 pickup
[ 269762]    70 269762    42897      640   126976      321             0 postgres
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=containerd.service,mems_allowed=0,global_oom,task_memcg=/system.slice/caddy.service,task=caddy,pid=122749,uid=997
Out of memory: Killed process 122749 (caddy) total-vm:1297904kB, anon-rss:449640kB, file-rss:128kB, shmem-rss:0kB, UID:997 pgtables:1332kB oom_score_adj:0
audit: type=1131 audit(1689673878.657:42408): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=caddy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
caddy.service: A process of this unit has been killed by the OOM killer.
caddy.service: Main process exited, code=killed, status=9/KILL
caddy.service: Failed with result 'oom-kill'.
caddy.service: Consumed 24min 12.047s CPU time.

3. Caddy version:

v2.6.4 h1:2hwYqiRwk1tf3VruhMpLcYTg+11fCdr8S3jhNAdnPy8=

4. How I installed and ran Caddy:

https://caddyserver.com/docs/install#debian-ubuntu-raspbian, using systemd to start it up.

a. System environment:

  • Linode Nanode 1 GB (shared 1-cpu 1GB memory, configured with 512MB swap)
  • debian-testing
  • systemd

b. Command:

/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile

c. Service/unit/compose file:

d. My complete Caddy config:

I host ~5 domains here, so this is the full config. Below that are the main configs for this issue.

{
        #       auto_https off
        email ...@gmail.com
}

(static) {
        @static {
                file
                path *.ico *.css *.js *.gif *.jpg *.jpeg *.png *.svg *.webp *.woff *.woff2 *.json
        }
        header @static Cache-Control max-age=5184000
}

(security) {
        header {
                # enable HSTS
                Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
                # disable clients from sniffing the media type
                X-Content-Type-Options nosniff
                # keep referrer data off of HTTP connections
                Referrer-Policy no-referrer-when-downgrade
        }
}

(errorfiles) {
        handle_errors {
                @custom_err file /{err.status_code}.html /err.html
                handle @custom_err {
                        rewrite * {file_match.relative}
                        file_server
                }
                respond "{err.status_code} {err.status_text}"
        }
}

(logs) {
        log {
                output file /var/log/caddy/{args.0}.log
        }
}

www.domain1.example.com {
        # TODO
        # import security
        redir https://domain1.example.com{uri}
}

http:// {
        respond "Hi!"
}

domain1.example.com {
        root * /var/www/domain1.example.com/
        encode zstd gzip
        file_server
        # import logs domain1.example.com
        import static
        handle_errors {
                @custom_err file /{err.status_code}.html /err.html
                handle @custom_err {
                        rewrite * {file_match.relative}
                        file_server
                }
                respond "{err.status_code} {err.status_text}"
        }
}

tracker.domain2.example.com,
tracker.domain1.example.com {
        rewrite /pl.js /js/plausible.js
        reverse_proxy localhost:8000
        header /pl.js {
                Cache-Control: "Cache-Control max-age=604800, public, must-revalidate" #7d
        }
}

www.domain2.example.com {
        redir https://domain2.example.com{uri}
}

domain2.example.com {
        reverse_proxy localhost:8001
}

www.domain3.com {
        redir https://domain3.com{uri}
}

domain3.com {
        root * /var/www/domain3.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

www.domain4.com {
        redir https://domain4.com{uri}
}

domain4.com {
        root * /var/www/domain4.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

www.domain5.com {
        redir https://domain5.com{uri}
}

domain5.com {
        root * /var/www/domain5.com/
        encode zstd gzip
        file_server
        import static
        import errorfiles
}

Copy/paste of the relavant configs:

tracker.domain2.example.com {
        rewrite /pl.js /js/plausible.js
        reverse_proxy localhost:8000
        header /pl.js {
                Cache-Control: "Cache-Control max-age=604800, public, must-revalidate" #7d
        }
}

www.domain2.example.com {
        redir https://domain2.example.com{uri}
}

domain2.example.com {
        reverse_proxy localhost:8001
}

5. Links to relevant resources:

Can you inspect the memory use climbing while it’s happening?

Go to localhost:2019/debug/pprof on the instance and follow the link to the heap profile. The goroutine dump could be useful too. That’ll tell us where the code is at and what is making lots of allocations.

But since the provided config has been modified from its original, I can’t trust that that’s the real config. We’ll need the full, unmodified config to be able to help you and make suggestions. Otherwise all we can do is guess, which won’t be helpful. That memory and goroutine profile will also be necessary if we want to reduce memory usage.

If it works without memory issues on nginx, providing the full unredacted nginx config would also be useful to see if we’re comparing apples to apples.

Thanks

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.