I have Caddy configured as a reverse proxy for Grafana and InfluxDB. Grafana talks locally to InfluxDB to make queries. A python InfluxDB client is writing data to InfluxDB by way of the Caddy proxy every few seconds. I noticed that at regular intervals of time the server (1GB Vultr instance) was not responding, and found out that it runs out of memory, and once the OOM killer kills a process then things are OK for awhile, then it repeats.
After further investigation, it appears that Caddy is creating new connections to InfluxDB and never closing them. It’s probably been a little over an hour and Caddy has over 4000 local connections, and the count never goes down (yes, they are ESTABLISHED).
root@grafana:/etc/caddy/conf.d# ss -p -o -nt '( dport = :8086 )' -o | wc -l
1536
root@grafana:/etc/caddy/conf.d# ss -p -o -nt '( dport = :8086 )' -o | wc -l
1947
root@grafana:/etc/caddy/conf.d# ss -p -o -nt '( dport = :8086 )' -o | wc -l
3982
root@grafana:/etc/caddy/conf.d# ss -p -o -nt '( dport = :8086 )' -o | wc -l
4732
root@grafana:/etc/caddy/conf.d# ss -p -o -nt -o state established '( dport = :8086 )' | head
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 ::1:53508 ::1:8086 users:(("caddy",pid=8020,fd=3667)) timer:(keepalive,23sec,0)
0 0 ::1:54428 ::1:8086 users:(("caddy",pid=8020,fd=4128)) timer:(keepalive,29sec,0)
0 0 ::1:49504 ::1:8086 users:(("caddy",pid=8020,fd=1665)) timer:(keepalive,13sec,0)
0 0 ::1:56404 ::1:8086 users:(("caddy",pid=8020,fd=5115)) timer:(keepalive,5.016ms,0)
0 0 ::1:49572 ::1:8086 users:(("caddy",pid=8020,fd=1699)) timer:(keepalive,13sec,0)
0 0 ::1:53422 ::1:8086 users:(("caddy",pid=8020,fd=3624)) timer:(keepalive,19sec,0)
0 0 ::1:57902 ::1:8086 users:(("caddy",pid=8020,fd=5864)) timer:(keepalive,920ms,0)
0 0 ::1:48444 ::1:8086 users:(("caddy",pid=8020,fd=1135)) timer:(keepalive,13sec,0)
0 0 ::1:53818 ::1:8086 users:(("caddy",pid=8020,fd=3823)) timer:(keepalive,2.968ms,0)
Port 8086 is where InfluxDB is listening. If I watch the timer it appears to be 30 seconds, which would match the keepalive setting the reverse proxy code is using (If I understand correctly).
There are also persistent connections open to Grafana, even though I have closed my browser’s Grafana tab over an hour ago, so I wouldn’t expect anything to be happening between Caddy and Grafana.
root@grafana:/etc/caddy/conf.d# ss -p -o -nt '( dport = :3000 )' -o | wc -l
57
The InfluxDB client looks like it’s correctly using a single http keepalive session to Caddy. There’s usually just one connection, but sometimes two.
root@grafana:/etc/caddy/conf.d# ss -p -o -ant '( sport = :443 )'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 :::443 :::* users:(("caddy",pid=8020,fd=904))
ESTAB 0 0 ::ffff:NNN.NNN.NNN.NNN:443 :ffff:MM.MM.MM.MM:38998 users:(("caddy",pid=8020,fd=3))
Here is some memory usage over time. I’m assuming the massive number of open sockets is the culprit, as both influx and caddy show memory usage increasing over time.
root@grafana:/tmp# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 722396 20176 116712 0 0 21653 20 35 109 0 5 93 2 0
root@grafana:/tmp# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 680008 22840 122060 0 0 21594 20 35 109 0 5 93 2 0
root@grafana:/tmp# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 423664 35892 160592 0 0 21129 20 36 110 0 5 93 2 0
root@grafana:/etc/caddy/conf.d# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 273876 38828 147328 0 0 20800 20 36 113 0 5 93 2 0
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
7472 influxdb 20 0 647M 282M 18412 S 0.0 28.4 0:32.12 influxd -config /etc/influxdb/influxdb.conf
8020 www-data 20 0 187M 182M 10504 S 0.0 18.3 0:14.62 caddy -log stdout -agree=true -conf=/etc/caddy/Caddyfile -root=/var/tmp
I tried setting the the proxy max_conns
setting to small numbers (10 and 15), to no avail, although I have noticed some 502 errors on the client sometimes. Prior to using Caddy I was running this same setup on hosted containers but with haproxy as the frontend. The container memory sizes were no more than 128MB, and I never had an issue there. Are there some other config settings that can limit the proxy connections? I also found this post and the solution was to setup an idle timeout of 5 minutes. However, that post was from 2016 and the Caddy docs state this is now the default. So I’m not sure why all these connections are being kept open.
Here’s my configuration:
Running with:
/usr/local/bin/caddy -log stdout -agree=true -conf=/etc/caddy/Caddyfile -root=/var/tmp
Caddy version 0.10.14, Debian 9 x86_64
Caddyfile:
import conf.d/*
Grafana config, conf.d/grafana
(adding max_conns made no difference):
https://grafana.foo.bar {
proxy / localhost:3000 {
transparent
websocket
max_conns 15
}
gzip
log / stdout "[{when}] - {remote} -> {status} for {method} {host}{path}"
errors stderr
import ../ssl.conf
}
InfluxDB config, conf.d/influxdb
(adding max_conns made no difference):
https://influxdb.foo.bar {
proxy / localhost:8086 {
transparent
max_conns 10
}
basicauth user pass
log / stdout "[{when}] - {remote} -> {status} for {method} {host}{path}"
errors stderr
import ../ssl.conf
}