Tls error after configuration reload

Hi we are about to buy comercial licenses for our instances, but first we are evaluating the product.

For a simple configuration it’s working fine.

cat Caddyfile

www.mydomain.com {
  proxyprotocol 0.0.0.0/0 ::/0
  proxy /  10.0.0.125:8088
}

Running as a container

docker run -d --name caddy \
  -p 80:80 \
  -p 443:443 \
  -v $(pwd)/Caddyfile:/etc/Caddyfile \
  -v $(pwd)/.caddy:/root/.caddy \
  mycaddy.with.plugin/caddy

the version

docker exec -it 6f4b17e7c164 caddy -version
Caddy 0.11.0 (unofficial)

the logs

docker logs 6f4b17e7c164
Activating privacy features... done.
https://www.mydomain.com
2018/08/20 17:15:24 https://www.mydomain.com
http://www.mydomain.com
2018/08/20 17:15:24 http://www.mydomain.com
2018/08/20 17:15:25 [NOTICE] Sending telemetry: we were too early; waiting 32m22.604769578s before trying again

A configuration reload

docker exec -it 6f4b17e7c164 pkill -SIGUSR1 caddy

Now when we access the site after the reload we get this error

docker logs 6f4b17e7c164
2018/08/20 17:25:10 [INFO] SIGUSR1: Reloading
2018/08/20 17:25:10 [INFO] Reloading
2018/08/20 17:25:10 [INFO] Reloading complete
2018/08/20 17:25:20 http: TLS handshake error from 10.0.0.175:60722: tls: oversized record received with length 22617
2018/08/20 17:25:21 http: TLS handshake error from 10.0.0.175:60724: tls: oversized record received with length 22617
2018/08/20 17:25:21 http: TLS handshake error from 10.0.0.175:60726: tls: oversized record received with length 22617

If we stop and redeploy the container everything works fine.

Hi there,

This error is nothing to worry about. It just means you restarted the instance (stopped then started, I believe, though I may be wrong) and telemetry couldn’t be sent at startup because it was too soon. It’ll resolve itself in time.

What was the configuration change? It looks like your client is sending malformed TLS records when performing the handshake, or something about your container/network configuration is causing packets to be modified/malformed…

Hi Matt, thanks for the response. Yes, the telemetry [NOTICE] is ok for us.

But the ‘http: TLS handshake error’ stops our Caddy container to proxy requests. And I tested the reload with USR1 without changing Caddyfile configuration.

So to reproduce this error, I simply run the container.

docker run -d --name caddy \
  -p 80:80 \
  -p 443:443 \
  -v $(pwd)/Caddyfile:/etc/Caddyfile \
  -v $(pwd)/.caddy:/root/.caddy \
  mycaddy.with.plugin/caddy

Verify that it is running OK and serving as a valid proxy, and then I signal the caddy process to reload with the same original Caddyfile.

docker exec -it 6f4b17e7c164 pkill -SIGUSR1 caddy

And then the “TLS handshake error” occurs.

It is pretty simple to reproduce this steps.

If you could help us with that I would really appreciate.

Thought I’d give a shot at replicating this. I did the following:

Caddyfile
https://whitestrake.net:8080 {
  tls {
    dns cloudflare
  }
  proxyprotocol 0.0.0.0/0 ::/0
  proxy / https://www.google.com
}
  1. Built Caddy container with plugins:
docker build github.com/abiosoft/caddy-docker.git \
  --build-arg "plugins=proxyprotocol,cloudflare" \
  -t whitestrake/test-caddy
  1. Run with DNS credentials to get valid HTTPS:
docker run -d --name caddy_test \
  -p 8080:8080 \
  -v /home/whitestrake/Caddyfile:/etc/Caddyfile \
  -e "CLOUDFLARE_EMAIL=[snip]" \
  -e "CLOUDFLARE_API_KEY=[snip]" \
  whitestrake/test-caddy
  1. Verify working proxy:
    curl -IL https://whitestrake.net:8080 --resolve 'whitestrake.net:8080:127.0.0.1'
  2. Reload Caddy (no Caddyfile changes):
    docker exec -it caddy_test pkill -SIGUSR1 caddy
  3. Verify working proxy again

Proxy still worked after this - I wasn’t able to reproduce the issue. Caddy version is 0.11.0, Docker version is 18.06.0-ce.

Apart from port and volume mappings and the certificate challenge method, can you think of any significant differences between our steps?

I’ll investigate this tomorow.

Here the Caddy container run on our Docker Swarm production cluster with the Server Version: 17.12.0-ce
behind an AWS ELB.

I’ll try to isolate the stack and give feedback about the results.

But once again, if I remove my Caddy container and start it over, everything runs fine.

Thank you.