Caddy won't start, strace shows it's waiting for a mutex of some sort

Hey, i changed the configuration of my caddy server & restarted it, after that it never came online again.

It’s stuck at “Activating privacy features…”, i haven’t added any (sub)domains to the config, only removed some.

strace log:

close(6)                                = 0
write(5, "\27\3\3\0\332\0\0\0\0\0\0\0\2\375E\"\374\223\t\276\230\226\2\357\255\375~Et\324\2231"..., 223) = 223
futex(0x128c128, FUTEX_WAIT, 0, NULL)   = 0
futex(0x128c128, FUTEX_WAIT, 0, NULL)   = 0... (keeps on going)

It looks like caddy is stuck at some file mutex if i interpret the strace output correctly. I’m running caddy as root on an archlinux box.

Caddy does obtain a lock when it obtains certificates. What is the log output? Run with -log stderr to get log printed to stderr (or you can specify a filename of course).

Running it with -log stderr now.
It seems to be stalling at 2016/09/14 16:43:23 [INFO][domain redacted] acme: Trying to solve HTTP-01

The full log is:

Activating privacy features...2016/09/14 16:43:22 [WARNING] No OCSP stapling for [redacted2]: ocsp: error from server: unauthorized
2016/09/14 16:43:22 [INFO] Certificate for [redacted] expires in 633h4m37.166038842s; attempting renewal
2016/09/14 16:43:23 [INFO][redacted] acme: Trying renewal with 633 hours remaining
2016/09/14 16:43:23 [INFO][redacted] acme: Obtaining bundled SAN certificate
2016/09/14 16:43:23 [INFO][redacted] acme: Could not find solver for: dns-01
2016/09/14 16:43:23 [INFO][redacted] acme: Could not find solver for: tls-sni-01
2016/09/14 16:43:23 [INFO][redacted] acme: Trying to solve HTTP-01

Stalling there usually means that the ACME server is not able to connect to your site for the domain you’re trying to obtain a certificate for. Check your DNS settings, make sure it’s reachable, etc.

It is/(was 20 minutes ago) reachable & the DNS settings are correct. The server runs fine if i remove that one domain from the caddyfile, but if the problem isn’t limited to just that host this means that renewing the certificates for the other hosts will probably fail too when they’re due.

I have no idea what could be causing this

What version of Caddy are you using?

I just noticed i was using 0.8.x, i updated to 0.9.1 and now nothing seems to work anymore

caddy.service: Failed at step EXEC spawning /usr/local/bin/caddy: Operation not permitted

(i followed the update guide)

(Not sure what “update guide” you’re referring to…) - not sure what’s causing that, make sure your system/binary permissions are set correctly.

You need to use the latest version of Caddy because Let’s Encrypt made some changes that broke certain renewal logic in older versions of Caddy.

Then that was probably why the renewal broke.

The guide thingy i was referring to is from Starting with systemd: Failed at step NAMESPACE spawning /usr/local/bin/caddy: No such file or directory - #7 by Beluga
( caddy-AUR/ at master · klingtnet/caddy-AUR · GitHub )

Permissions are root:root:-rwxr-xr-x which look normal to me.

I have fixed the issue.
Commenting out CapabilityBoundingSet AmbientCapabilities and NoNewPrivileges in the systemd unit file have fixed the issue for me.

1 Like

What is your systemd version? It must be 220+ (or 219 up), otherwise it will fail on some of the capability options.

There is clearly some other reason than the systemd version that is making it fail as I have always had 220+ and it has always failed with the capabilities. Currently running the latest in Arch Linux.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.