TLS handshake time out without any errors?

1. The problem I’m having:

I just moved my home server to a different location, turned it on and suddenly it’s not reachable anymore. I configured a domain, all with automatic dns and such. The server is reachable, all ports are open, I can communicate with netcat on ports 80 and 443 flawlessly. When I curl my domain, caddy redirects me to https, as expected. However, a request to https times out. No logs show any kind of error. I am at the end of my knowledge.

2. Error messages and/or full log output:

curl -vL domain.net

* processing: https://domain.net
*   Trying 123.123.123.123:443...
* Connected to domain.net (123.123.123.123) port 443
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none

# hangs

Caddy logs after the curl:

2023/08/26 20:33:13.297	DEBUG	events	event	{"name": "tls_get_certificate", "id": "30c1ddfb-c933-48fc-9d3e-94efbf005194", "origin": "tls", "data": {"client_hello":{"CipherSuites":[4866,4867,4865,49196,49200,159,52393,52392,52394,49195,49199,158,49188,49192,107,49187,49191,103,49162,49172,57,49161,49171,51,157,156,61,60,53,47,255],"ServerName":"domain.net","SupportedCurves":[29,23,30,25,24,256,257,258,259,260],"SupportedPoints":"AAEC","SignatureSchemes":[1027,1283,1539,2055,2056,2057,2058,2059,2052,2053,2054,1025,1281,1537,771,769,770,1026,1282,1538],"SupportedProtos":["h2","http/1.1"],"SupportedVersions":[772,771],"Conn":{}}}}
2023/08/26 20:33:13.298	DEBUG	tls.handshake	choosing certificate	{"identifier": "domain.net", "num_choices": 1}
2023/08/26 20:33:13.298	DEBUG	tls.handshake	default certificate selection results	{"identifier": "domain.net", "subjects": ["domain.net"], "managed": true, "issuer_key": "acme-v02.api.letsencrypt.org-directory", "hash": "4a21719325f2670b7956b6aec3a7c046a4ba45c78ba1b2abc8120bc723013527"}
2023/08/26 20:33:13.298	DEBUG	tls.handshake	matched certificate in cache	{"remote_ip": "10.22.0.1", "remote_port": "64152", "subjects": ["domain.net"], "managed": true, "expiration": "2023/11/24 18:53:47.000", "hash": "4a21719325f2670b7956b6aec3a7c046a4ba45c78ba1b2abc8120bc723013527"}
# here it hangs

# This happens after I ctrl-c the curl command:
2023/08/26 20:33:15.127	DEBUG	http.stdlib	http: TLS handshake error from 10.22.0.1:64152: EOF

The remote IP being 10.22.0.1 is intended, I’m using a wireguard tunnel to access the server.

3. Caddy version:

v2.6.4 h1:2hwYqiRwk1tf3VruhMpLcYTg+11fCdr8S3jhNAdnPy8=

4. How I installed and ran Caddy:

c. docker compose file:

caddy:
    container_name: caddy
    image: caddy
    ports:
      - "80:80"
      - "443:443"
    networks:
      - caddy-net
    volumes:
      - "/home/media/src/Caddyfile:/etc/caddy/Caddyfile"
    restart: unless-stopped

d. My complete Caddy config:

{
        debug
        log {
                format console
                level DEBUG
        }
}


(common) {
        log {
                level DEBUG
        }
        encode zstd gzip
        reverse_proxy /* dashy:80
}

domain.net {
        import common
}

Please don’t tell me to include the domain, I redacted it because it’s a private domain and I don’t want a public link to it on the internet.

5. More info

The certificates are working. The acme challenge is completed and caddy says it was successful. The problem here is that the handshake somehow stops in the middle of it. I could not find any even remotely related issue anywhere. I sincerely hope this is a very dumb misconfiguration on my side, but I don’t think it’s the case, after all the literal exact same setup (at a different location) worked without a problem. Just the router and ip changed, but because the traffic is routed through a wireguard tunnel to a vps, that should not make a difference.

And one more thing, when I add a localhost config to access it locally via http, that works without problems.

Ok, I solved the issue.

Turns out, for some reason the Server Hello answer packets were being dropped. Caddy did nothing wrong, the problem was that the quite large packet that it sent should have been fragmented.

A wrong MTU setting in my wireguard setup was the culprit here. I still don’t understand why this was not an issue before, but I could fix it quite easily by setting the MTU setting on my home server wireguard configuration to 1384. I followed this guide.

How did I find the issue? I pasted my question into chatgpt, it suggested a lot of things. I tried getting a tcpdump of the failing request, and posted that into chatgpt again. It then told me that the Server Hello packet was not being acknowledged, and that it might have to do with it’s large packet size of 2700+ bytes, and that a MTU setting could help.

TLDR, check if your MTU is configured properly, as your TLS handshake packets might be dropped.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.