Caddy memory spikes and crashes

1. The problem I’m having:

I’ve been getting this issue recently where the caddy service is running, but after a few hours, its memory will spike up to near 100% and then crash. I’m not sure what’s causing this, this issue started yesterday. I’m not sure where to get the logs, but I do have this screenshot from my server:
image

2. Error messages and/or full log output:

I don’t have logs of when it crashed :
If you need any other logs I can provide you with it

3. Caddy version:

v2.6.2

4. How I installed and ran Caddy:

Installation:

sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy

I ran caddy with systemctl start caddy && caddy reload

a. System environment:

root@doge-network
OS: Debian GNU/Linux 12 (bookworm) x86_64
Host: KVM/QEMU (Standard PC (i440FX + PIIX, 1996) pc-i440fx-7.2)
Kernel: 6.1.0-23-amd64
Uptime: 12 days, 2 hours, 36 mins
Packages: 487 (dpkg)
Shell: bash 5.2.15
Resolution: 1280x800
Terminal: node
CPU: AMD EPYC-Milan (4) @ 3.799GHz
GPU: 00:02.0 Vendor 1234 Device 1111
Memory: 2835MiB / 7906MiB

b. Command:

systemctl start caddy && caddy reload

c. Service/unit/compose file:

d. My complete Caddy config:

{
        on_demand_tls {
                ask http://localhost:3000/check/
        }
}

https:// {
        reverse_proxy /wisp/* 104.156.150.3
        reverse_proxy /bear/* 104.156.150.3
        reverse_proxy /epoxy/* 104.156.150.3
        reverse_proxy /baremux/* 104.156.150.3
        reverse_proxy /uv/* 104.156.150.3

        root * /var/lib/caddy/site/public
        file_server

        rewrite /app /index.html
        rewrite /portal /loader.html
        rewrite /apps /apps.html
        rewrite /gms /gms.html
        rewrite /info /info.html
        rewrite /edu loading.html
        tls {
                on_demand
        }

        encode gzip
}

# cp -r ~/v4/static/* /var/lib/caddy/site/public/

beta.derpman.lol {
        reverse_proxy localhost:8001
}

5. Links to relevant resources:

A notorious architectural defect is “ linux out of memory killer”; linux is the only significant operating system with this designed in defect.

So for serious work I suggest FeeBSD or OpenBSD.

If one must use linux have significantly more RAM than your first estimate. I mean what good is an application that’s written in a memory safe language with the operating system below it with significant memory handling issues.

First of all, this is too old. Upgrade to the latest v2.8.4.

The second important thing is, where is the answer to this? It’s the most important thing to know since the service is being killed by systemd. For instance, what’s the memory limit you have there, if any?

The screenshot isn’t readable. Share the logs as text. You’re running caddy as a service, so its logs can be extracted using journalctl.

We run a few servers with heavy traffic (for some definition of heavy) and aren’t seeing OOMs. There must be other factors causing the OS to kill Caddy for OOM. From what I’ve found, it may not be necessarily the fault of Caddy, as much as it happens to be the one that will free the most memory when killed. Check what other memory hungry processes are running on the server.

2 Likes

Agree, I’ve never seen that the process being Killed by The OOM Killer being the program that needs debugging (no reason it couldn’t be). The OOM Killer likes to kill big things that are low priority and that may not have run recently (from the scheduler’s perspective of time). Spent 1.5 years on one understanding the OOM Killer; a VP of the company even mandated it to be fixed but the department in China (that “took” over the kernel) refused to fix.

Debugging can be very difficult; we had a customer start a web browser (Firefox) and “apparently random” services were being Killed. No visible determinism from run to run.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.