Panic: failed to create new OS thread

1. Output of caddy version:

v2.6.2 h1:wKoFIxpmOJLGl3QXoo6PNbYvGW4xLEgo32GPBEjWL8o=

2. How I run Caddy:

Via systemd, with default unit files + restart drop-in (see below).

a. System environment:

Ubunut 20.04, Kernel 5.4.0.131-generic.

# ulimit -u
79830
# ps -eLf | wc -l
4391
# sudo -u caddy -g caddy -H /usr/bin/caddy environ
caddy.HomeDir=/var/lib/caddy
caddy.AppDataDir=/var/lib/caddy/.local/share/caddy
caddy.AppConfigDir=/var/lib/caddy/.config/caddy
caddy.ConfigAutosavePath=/var/lib/caddy/.config/caddy/autosave.json
caddy.Version=v2.6.2 h1:wKoFIxpmOJLGl3QXoo6PNbYvGW4xLEgo32GPBEjWL8o=
runtime.GOOS=linux
runtime.GOARCH=amd64
runtime.Compiler=gc
runtime.NumCPU=12
runtime.GOMAXPROCS=12
runtime.Version=go1.19.2
os.Getwd=/var/lib/caddy

LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
MAIL=/var/mail/caddy
LOGNAME=caddy
USER=caddy
HOME=/var/lib/caddy
SHELL=/usr/sbin/nologin
SUDO_COMMAND=/usr/bin/caddy environ
SUDO_USER=root
SUDO_UID=0
SUDO_GID=0

The host VM has 12 vcores and is expected to reverse-proxy a few 1000 req/min to a handful Unifi Controller instances.

b. Command:

systemctl start caddy

c. Service/unit/compose file:

/lib/systemd/system/caddy.service:

# caddy.service
#
# For using Caddy with a config file.
#
# Make sure the ExecStart and ExecReload commands are correct
# for your installation.
#
# See https://caddyserver.com/docs/install for instructions.
#
# WARNING: This service does not use the --resume flag, so if you
# use the API to make changes, they will be overwritten by the
# Caddyfile next time the service is restarted. If you intend to
# use Caddy's API to configure it, add the --resume flag to the
# `caddy run` command or use the caddy-api.service file instead.

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
Type=notify
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile --force
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

/etc/systemd/system/caddy.service.d/auto-restart.conf:

[Service]
Restart=always
RestartSec=5

d. My complete Caddy config:

Just a few reverse_proxy vhosts. I don’t think theses are relevant here.

3. The problem I’m having:

Nov 07 10:28:29 unifi caddy[554]: runtime: failed to create new OS thread (have 16 already; errno=11)
Nov 07 10:28:29 unifi caddy[554]: runtime: may need to increase max user processes (ulimit -u)
Nov 07 10:28:29 unifi caddy[554]: fatal error: newosproc
Nov 07 10:28:29 unifi caddy[554]: runtime stack:

omitted, see https://gist.github.com/dmke/434df08ec9cf453cffe4459dcb664b6b

Nov 07 10:28:29 unifi systemd[1]: caddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 07 10:28:29 unifi systemd[1]: caddy.service: Failed with result 'exit-code'.
Nov 07 10:28:34 unifi systemd[1]: caddy.service: Scheduled restart job, restart counter is at 1.
Nov 07 10:28:34 unifi systemd[1]: Stopped Caddy.

4. Error messages and/or full log output:

Repeat ad infinitum:

Nov 07 10:28:34 unifi systemd[1]: Starting Caddy...
Nov 07 10:28:34 unifi systemd[7744]: caddy.service: Failed to execute command: Resource temporarily unavailable
Nov 07 10:28:34 unifi systemd[7744]: caddy.service: Failed at step EXEC spawning /usr/bin/caddy: Resource temporarily unavailable
Nov 07 10:28:34 unifi systemd[1]: caddy.service: Main process exited, code=exited, status=203/EXEC
Nov 07 10:28:34 unifi systemd[1]: caddy.service: Failed with result 'exit-code'.
Nov 07 10:28:34 unifi systemd[1]: Failed to start Caddy.
Nov 07 10:28:39 unifi systemd[1]: caddy.service: Scheduled restart job, restart counter is at 2.
Nov 07 10:28:39 unifi systemd[1]: Stopped Caddy.

5. What I already tried:

systemctl restart caddy immediately yields the aforementioned error. systemctl stop caddy, then waiting a few moments before systemctl start caddy (which, to me, should translate to a hard reset, but alas, yields the same error as well).

Rebooting th VM helps, for a moment, then caddy crashes again.

Rebooting into an older Kernel (5.4.0.126-generic) does not help.

Modifying the ulimit does not seem to help either, we’re currently using 5% of the quota, so caddy should have no problems allocating

Downgrading caddy to 2.6.1 did not prove useful (I didn’t expect it to be, this clearly looks like a system-level issue).

Addding another systemd drop-in (increasing LimitNPROC from 512 to 2048) does not help either.

6. Links to relevant resources:

Huh, that’s wild.

That error is coming from the Go runtime, not from Caddy directly. That’s not something we control. Go does have some env vars that can be tuned to affect the runtime’s behaviour though. But I’m not an expert on that, so I don’t know what to suggest.

It looks like errno=11 means EAGAIN, i.e. “try again”. I’m not sure why the system would throw that, or why Go doesn’t just “try again” :thinking:

I think, I’ve found the culprit: The mongod process of the MongoDB container runs under UID 999. Guess what UID the caddy user has?

# id caddy
uid=999(caddy) gid=998(caddy) groups=998(caddy),33(www-data)

When Caddy is not running, I can find 3000+ processes/threads belonging to the caddy user (these actually belong to MongoDB and are caused by just 3 Unifi Controller instances/700+ Java threads):

# ps -eLf | awk 'BEGIN{ caddy=0; unifi=0 } /^caddy/{ caddy+=1 } /^unifi/{ unifi+=1 } END{ print "caddy", caddy; print "unifi", unifi }'
caddy 3120
unifi 733

Such wasteful software… :rage:

This by far exceeds systemd’s LimitNPROC (512 by default, I’ve only tested up to 2048).

For the short term, I’ve increased LimitNPROC to 4k. Tomorrow, I’ll change the UID to be something unique…

3 Likes

Wowza.

Glad you found out the issue!

That’s crazy. Thanks for the follow-up to help others in the future!

This topic was automatically closed after 30 days. New replies are no longer allowed.