V2 Upgrade to Caddy2 failing with errors

I have a public server that is in alpha production. Caddy1 has been working great there for a while. So I’m upgrading to Caddy2 RC1 but it is failing with errors.

With Caddy1 it was running and storing certificates in /etc/ssl/caddy. Those certs are still there. But as I understand it now stores them in $HOME (/var/lib/caddy).

I installed Caddy2 almost verbatim per this: Install — Caddy Documentation

Can anyone offer any suggestions?

1. My Caddy version (caddy version):

$ which caddy
/usr/bin/caddy
$ caddy version
v2.0.0-rc.1 h1:DxUlg4kMisXwXVnWND7KEPl1f+vjFpIOzYpKpfmwyj8=

2. How I run Caddy:

a. System environment:

A vps at Upcloud.com running Ubuntu 18.04.

$ uname -a
Linux 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

b. Command:

sudo systemctl daemon-reload
sudo systemctl enable caddy
sudo systemctl start caddy

c. Service/unit/compose file:

Uses caddy.service file:

[Unit]
Description=Caddy Web Server
Documentation=https://caddyserver.com/docs/
After=network.target

[Service]
User=caddy
Group=caddy
Restart=on-failure
ExecStart=/usr/bin/caddy run --config /etc/caddy/Caddyfile --resume --environ
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
#ProtectHome=true // TODO try enabling this
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

Almost verbatim from : dist/caddy.service at master · caddyserver/dist · GitHub

d. My complete Caddyfile or JSON config:

/etc/caddy/Caddyfile

https://redacted.com {
    log {
    	output file /var/log/caddy/access.log
    	format single_field common_log
    }

    encode gzip

	handle /api/* {
	    # TODO add health checks
		reverse_proxy localhost:8223
	}

	handle {
        root * /var/www/redacted.com
		try_files {path} /index.html
		file_server
	}
}

3. The problem I’m having:

Fails with errors, see below.

4. Error messages and/or full log output:

Apr 03 14:52:29 systemd[1]: Started Caddy Web Server.
Apr 03 14:52:29 caddy[4975]: caddy.HomeDir=/var/lib/caddy
Apr 03 14:52:29 caddy[4975]: caddy.AppDataDir=/var/lib/caddy/.local/share/caddy
Apr 03 14:52:29 caddy[4975]: caddy.AppConfigDir=/var/lib/caddy/.config/caddy
Apr 03 14:52:29 caddy[4975]: caddy.ConfigAutosavePath=/var/lib/caddy/.config/caddy/autosave.json
Apr 03 14:52:29 caddy[4975]: runtime.GOOS=linux
Apr 03 14:52:29 caddy[4975]: runtime.GOARCH=amd64
Apr 03 14:52:29 caddy[4975]: runtime.Compiler=gc
Apr 03 14:52:29 caddy[4975]: runtime.NumCPU=1
Apr 03 14:52:29 caddy[4975]: runtime.GOMAXPROCS=1
Apr 03 14:52:29 caddy[4975]: runtime.Version=go1.14.1
Apr 03 14:52:29 caddy[4975]: os.Getwd=/
Apr 03 14:52:29 caddy[4975]: LANG=C
Apr 03 14:52:29 caddy[4975]: PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
Apr 03 14:52:29 caddy[4975]: HOME=/var/lib/caddy
Apr 03 14:52:29 caddy[4975]: LOGNAME=caddy
Apr 03 14:52:29 caddy[4975]: USER=caddy
Apr 03 14:52:29 caddy[4975]: INVOCATION_ID=1a7d71df05f74ed5a22db3e4604537d9
Apr 03 14:52:29 caddy[4975]: JOURNAL_STREAM=9:543069
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1357584,"msg":"resuming from last configuration","autosave_file":"/var/lib/caddy/.config/caddy/autosave.json"}
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.136814,"logger":"admin","msg":"admin endpoint started","address":"localhost:2019","enforce_origin":false,"origins":["localhost:2019"]}
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1373339,"logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv1","http_port":80}
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1374652,"logger":"http","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1375701,"logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO][cache:0xc000445450] Started certificate maintenance routine
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1388898,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["redacted.com"]}
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1391227,"logger":"tls","msg":"cleaned up storage units"}
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1393428,"msg":"autosaved config","file":"/var/lib/caddy/.config/caddy/autosave.json"}
Apr 03 14:52:29 caddy[4975]: {"level":"info","ts":1585925549.1394496,"msg":"serving initial configuration"}
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO][redacted.com] Obtain certificate; acquiring lock...
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO][redacted.com] Obtain: Lock acquired; proceeding...
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO][redacted.com] Waiting on rate limiter...
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO][redacted.com] Done waiting
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO] [redacted.com] acme: Obtaining bundled SAN certificate given a CSR
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO] [redacted.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/3738029760
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO] [redacted.com] acme: use tls-alpn-01 solver
Apr 03 14:52:29 caddy[4975]: 2020/04/03 14:52:29 [INFO] [redacted.com] acme: Trying to solve TLS-ALPN-01
Apr 03 14:52:30 caddy[4975]: 2020/04/03 14:52:30 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/3738029760
Apr 03 14:52:30 caddy[4975]: 2020/04/03 14:52:30 [ERROR] acme: Error -> One or more domains had a problem:
Apr 03 14:52:30 caddy[4975]: [redacted.com] [redacted.com] acme: error presenting token: presenting with embedded provider: could not start listener at :443 for challenge server: listen tcp :443: bind: address already in use
Apr 03 14:52:30 caddy[4975]:  (challenge=tls-alpn-01 remaining=[http-01])
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [INFO] [redacted.com] acme: Obtaining bundled SAN certificate given a CSR
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [INFO] [redacted.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/3738030334
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [INFO] [redacted.com] acme: Could not find solver for: tls-alpn-01
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [INFO] [redacted.com] acme: use http-01 solver
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [INFO] [redacted.com] acme: Trying to solve HTTP-01
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/3738030334
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [ERROR] acme: Error -> One or more domains had a problem:
Apr 03 14:52:32 caddy[4975]: [redacted.com] [redacted.com] acme: error presenting token: presenting with embedded provider: could not start listener at :80 for challenge server: listen tcp :80: bind: address already in use
Apr 03 14:52:32 caddy[4975]:  (challenge=http-01 remaining=[])
Apr 03 14:52:34 caddy[4975]: 2020/04/03 14:52:34 [ERROR] attempt 1: [redacted.com] Obtain: [redacted.com] acme: Error -> One or more domains had a problem:
Apr 03 14:52:34 caddy[4975]: [redacted.com] [redacted.com] acme: error presenting token: presenting with embedded provider: could not start listener at :80 for challenge server: listen tcp :80: bind: address already in use
Apr 03 14:52:34 caddy[4975]:  - retrying in 1m0s (5.516414869s/720h0m0s elapsed)...
Apr 03 14:52:43 systemd[1]: Stopping Caddy Web Server...
Apr 03 14:52:43 caddy[4975]: {"level":"info","ts":1585925563.7457967,"msg":"shutting down apps then terminating","signal":"SIGTERM"}
Apr 03 14:52:43 caddy[4975]: 2020/04/03 14:52:43 [INFO][cache:0xc000445450] Stopped certificate maintenance routine
Apr 03 14:52:43 caddy[4975]: {"level":"info","ts":1585925563.7464712,"msg":"shutdown done","signal":"SIGTERM"}
Apr 03 14:52:43 systemd[1]: Stopped Caddy Web Server.

5. What I already tried:

Problem seems to start here:

Apr 03 14:52:30 caddy[4975]: [redacted.com] [redacted.com] acme: error presenting token: presenting with embedded provider: could not start listener at :443 for challenge server: listen tcp :443: bind: address already in use
Apr 03 14:52:30 caddy[4975]:  (challenge=tls-alpn-01 remaining=[http-01])
...
Apr 03 14:52:32 caddy[4975]: 2020/04/03 14:52:32 [ERROR] acme: Error -> One or more domains had a problem:
Apr 03 14:52:32 caddy[4975]: [redacted.com] [redacted.com] acme: error presenting token: presenting with embedded provider: could not start listener at :80 for challenge server: listen tcp :80: bind: address already in use
Apr 03 14:52:32 caddy[4975]:  (challenge=http-01 remaining=[])
Apr 03 14:52:34 caddy[4975]: 2020/04/03 14:52:34 [ERROR] attempt 1: [redacted.com] Obtain: [redacted.com] acme: Error -> One or more domains had a problem:
Apr 03 14:52:34 caddy[4975]: [redacted.com] [redacted.com] acme: error presenting token: presenting with embedded provider: could not start listener at :80 for challenge server: listen tcp :80: bind: address already in use
Apr 03 14:52:34 caddy[4975]:  - retrying in 1m0s (5.516414869s/720h0m0s elapsed)...

I’ve verified I have nothing using :80 or :443:

$ sudo ss -tulw | grep :80
$ sudo ss -tulw | grep :443

Firewall ports are open:

$ sudo ufw status verbose
Status: active
To                         Action      From
--                         ------      ----
80/tcp                     ALLOW IN    Anywhere                  
443/tcp                    ALLOW IN    Anywhere                  
80/tcp (v6)                ALLOW IN    Anywhere (v6)             
443/tcp (v6)               ALLOW IN    Anywhere (v6)             

6. Links to relevant resources:

Flip. Good find!

I’m surprised more people haven’t reported this :thinking: … and I’m surprised I haven’t encountered this yet.

Fixed in Fix error handling logic with embedded solvers · caddyserver/certmagic@f8be3e9 · GitHub

You’ll notice that it is some unusual error handling logic (i.e. not simple if err != nil { return err }) and that tripped me up when writing it I guess.

The latest commit in Caddy has a fix for this. Can you please try it out? The build will have artifacts published here soon: go.mod: Use latest Certmagic (v0.10.9) · caddyserver/caddy@05164c8 · GitHub

1 Like

Actually, my fix is incorrect.

But it exposed a bug in the standard library, which is now panicking…

This will take more time. I’m not sure why you are seeing that error.

1 Like

Thanks so much for your attention to this. I’ll sit tight for a while. Not under a hard deadline at the moment.

2 Likes

@hippmr Alrighty. This oughtta do the trick: Handle the heck out of those socket errors · caddyserver/certmagic@03d0645 · GitHub

It should be overkill but I can’t reproduce the problem people are reporting, nor can I explain why the OS appears to give contradictory error messages. Whatever the case may be, I’ve resorted to comparing substrings in the error message to try to know more surely if the listener is already in use, but that’s really bad form and pretty hacky.

This commit should have build artifacts momentarily, please try it out – all I can do is test that nothing breaks for me, since you’ve seen the problem, please try it ASAP: go.mod: Update CertMagic (again) v0.10.10 · caddyserver/caddy@c7ac7de · GitHub

It appears to have come up & working, with the possible exception of this err msg:

Apr 04 00:10:05 caddy[8585]: 2020/04/04 00:10:05 [WARNING] OS reports a contradiction: listen tcp :443: bind: address already in use - but we cannot connect to it, with this error: dial tcp :443: connect: connection refused; continuing anyway 🤞

But I’m guessing that may be a symptom of the hacky fix you alluded to.

Anyways, I seem to be working. Thanks-a-million.

1 Like

Very interesting! Thank you so much for the quick follow-up.

Yes, it appears to be working for you because I am now comparing the actual string of the error message and we are continuing anyway… :confounded:

“Connection refused…” that’s odd. Something is listening but not accepting connections?

Huh. Well… it’s convenient that you are able to see it fairly consistently, I guess. I still don’t know what’s going on.

Hmmm. Just for grins I rebooted the server. When it started this time, no error message. Same thing after issuing a ‘systemctl restart caddy’ command. I’ve no explanation.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.