Caddy comsumes a lot of memory after high traffic load but never shrink

1. The problem I’m having:

I have a 32GB RAM Machine running caddy, and I have noticed that it consumed a lot of memory (about 20GB, in comparison 2GB in the past) after a high traffic event.


And during the period of increased memory usage, it experienced up to 2000 QPS and about 35k Requests in flight due to a slow upstream.

I have collected some profile, but I can’t find the reason about the problem :thinking:.

Maybe the function runtime.malg have some memory leak? :thinking:.

2. Error messages and/or full log output:

It seems no error logs.

3. Caddy version:

v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

4. How I installed and ran Caddy:

Install — Caddy Documentation, using systemd to start it up.

a. System environment:

Debian 12 with 32GB RAM and 8 cores.

b. Command:

/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile

c. Service/unit/compose file:

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target
StartLimitIntervalSec=0

[Service]
Type=notify
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile --force
TimeoutStopSec=5s
LimitNOFILE=1048576
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_BIND_SERVICE
Restart=always
RestartSec=5
MemoryMax=30G

[Install]
WantedBy=multi-user.target

d. My complete Caddy config:

{
	order replace after encode
	cert_issuer acme {
		dir https://acme.zerossl.com/v2/DV90
		eab <redacted>
		dns duckdns <redacted>
		dns_challenge_override_domain buct.duckdns.org
		resolvers 223.5.5.5
	}
	servers {
		metrics
	}
}

(waf) {
	route {
		waf_chaitin {
			waf_engine_addr 10.2.137.30:8000
			initial_cap 8
			max_idle 32
			max_cap 64
			idle_timeout 30
		}
	}
}

(to_web_cluster) {
	reverse_proxy * {
		to http://192.168.1.10 http://192.168.1.36
		health_uri /main.htm
		health_status 2xx
		health_headers {
			Host www.buct.edu.cn
		}
	}
}

(log) {
	log {
		output file /var/log/caddy/{args[0]}.log {
			roll_keep_for 48h
		}
		format filter {
			wrap json
			fields {
				common_log delete
			}
		}
	}
}

https://*.buct.edu.cn {
	import log access
	encode zstd gzip
	header {
		Strict-Transport-Security max-age=31536000;
	}
	request_body {
		max_size 4GB
	}
	import waf
	import to_web_cluster

	@jwglxt-proxy4 host jwglxt-proxy4.buct.edu.cn
	handle @jwglxt-proxy4 {
		encode zstd gzip
		import waf
		rewrite /jwglxt/ /sso/jziotlogin
		@Permit {
			not remote_ip 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16
		}
		reverse_proxy * {
			to 121.195.154.186:80 121.195.154.187:80 121.195.154.188:80 121.195.154.189:80 121.195.154.190:80 121.195.154.191:80 121.195.154.192:80 121.195.154.193:80 121.195.154.194:80
			header_up X-Real-IP 202.4.130.100
			header_up X-Forwarded-For 202.4.130.100
			header_up X-Forwarded-Port 80
			header_up X-Forwarded-Proto http
			header_down X-Backend-Server {http.reverse_proxy.upstream.host}
			header_down X-Backend-Request-counts {http.reverse_proxy.upstream.requests}
			lb_policy cookie lb buct_lb {
				fallback ip_hash
			}
			transport http {
				keepalive_idle_conns_per_host 32
				max_conns_per_host 64
			}
		}
		reverse_proxy /WebReport* 121.195.154.186:8099 {
			header_up X-Real-IP 202.4.130.100
			header_up X-Forwarded-For 202.4.130.100
			header_up X-Forwarded-Port 80
			header_up X-Forwarded-Proto http
		}
		reverse_proxy @Permit {
			to 121.195.154.186:80 121.195.154.187:80 121.195.154.188:80 121.195.154.189:80 121.195.154.190:80 121.195.154.191:80 121.195.154.192:80 121.195.154.193:80 121.195.154.194:80
			header_up X-Real-IP 202.4.130.100
			header_up X-Forwarded-For 202.4.130.100
			header_up X-Forwarded-Port 80
			header_up X-Forwarded-Proto http
			header_down Location portal.buct.edu.cn experimental-auth-endpoint.buct.edu.cn
			header_down X-Backend-Server {http.reverse_proxy.upstream.host}
			header_down X-Backend-Request-counts {http.reverse_proxy.upstream.requests}
			lb_policy cookie lb buct_lb {
				fallback ip_hash
			}
			transport http {
				keepalive_idle_conns_per_host 32
				max_conns_per_host 64
			}
		}
		replace {
			stream
			match {
				header Content-Type application/json*
				header Content-Type application/x-javascript*
				header Content-Type text/*
			}
			http://121.195.154.186:8099/WebReport https://jwglxt-proxy4.buct.edu.cn/WebReport
		}
	}

	################### the end ###############################
}

5. Links to relevant resources:

Caddy binary was compile with

xcaddy build \
    --with github.com/caddy-dns/duckdns \
    --with github.com/W0n9/caddy_waf_t1k@v0.0.5 \
    --with github.com/caddyserver/replace-response \
    --with github.com/chaitin/t1k-go=github.com/w0n9/t1k-go@v1.5.6

Profiles:

  1. Before high load: caddy_issues/caddy_memory_inuse_space_bytes_space_bytes_2024-12-26_0759-to-2024-12-26_0829.pb.gz at master · W0n9/caddy_issues · GitHub
  2. During high load: caddy_issues/caddy_memory_inuse_space_bytes_space_bytes_2024-12-26_1225-to-2024-12-26_1300.pb.gz at master · W0n9/caddy_issues · GitHub
  3. After high load: caddy_issues/caddy_memory_inuse_space_bytes_space_bytes_2024-12-27_1446-to-2024-12-27_1516.pb.gz at master · W0n9/caddy_issues · GitHub

What does that line represent? Often times, an OS will not recover freed memory until/unless it is needed, for efficiency.

Capturing a heap and goroutine profile will definitively reveal whether Caddy is leaking memory or not.

1 Like

I think you’re running into this:

The Go maintainer says:

Is there a situation where this is actually causing a problem? Most servers, at least in steady state, would reuse these gs effectively.

That means although that much memory is allocated and not freed, it is re-used by Go.

3 Likes

Maybe the metrics I chose earlier were inappropriate. And now I choose some new metrics again.



In the above two graphs, we can see that when Caddy encountered a surge in traffic, it allocated about 80k goroutines.
However, after the peak, the memory usage of the caddy did not decrease, which was pointed out in the go_memstats_sys_bytes
and go_memstats_heap_sys_bytes.

I think you’re right, but I can’t seem to release this extra memory usage back to the operating system, even though I have a memory balloon device.
As a result, other programs running on the same VM will encounter OOM.
Do you have any solutions to this problem? Is the only way to solve it by restarting Caddy to free up memory? :thinking:

The issue on Go repository was closed after requesting more information and the OP didn’t share more details. I think you have the data and the information to have the conversation with the Go team. However, if your server receives frequent high traffic, the memory will be re-used and not wasted. Per the Go maintainer:

Most servers, at least in steady state, would reuse these g s effectively.

You can try setting GOGC and GOMEMLIMIT to control Go’s memory usage and GC (docs), though I’m not 100% sure whether those controls will help here.

You may also find better memory metrics based on this article:

3 Likes

Thank you for answer my questions.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.