"Random" TLS alert, internal error

1. Output of caddy version:

v2.5.1 h1:bAWwslD1jNeCzDa+jDCNwb8M3UJ2tPa8UZFFzPVmGKs=

2. How I run Caddy:

With systemd

a. System environment:

  • CloudLinux release 8.6 (Leonid Kadenyuk)
  • Linux web23 4.18.0-372.9.1.1.lve.el8.x86_64 #1 SMP Tue May 24 07:49:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Systemd

b. Command:

caddy reload -config /etc/caddy/Caddyfile -> for reloads
systemctl restart caddy -> for full restarts

c. Service/unit/compose file:

cat /usr/lib/systemd/system/caddy.service

# caddy.service
#
# For using Caddy with a config file.
#
# Make sure the ExecStart and ExecReload commands are correct
# for your installation.
#
# See https://caddyserver.com/docs/install for instructions.
#
# WARNING: This service does not use the --resume flag, so if you
# use the API to make changes, they will be overwritten by the
# Caddyfile next time the service is restarted. If you intend to
# use Caddy's API to configure it, add the --resume flag to the
# `caddy run` command or use the caddy-api.service file instead.

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
Type=notify
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

cat /etc/systemd/system/caddy.service.d/override.conf

[Service]
ReadWritePaths=/etc/pki

d. My complete Caddy config:

{
        admin 127.0.0.1:8888
        grace_period 3s
        log {
                output file /var/log/caddy/caddy.log {
                        roll_size 250MiB
                        roll_keep_for 15d
                }
                level DEBUG
        }
        email letsencrypt@swisscenter.com
        on_demand_tls {
                ask https://api.swisscenter.com/webservices.php/caddy/dnslookup
                interval 2m
                burst 5
        }
        order realip first
}

(common) {
        bind 127.0.0.1 [::1] 94.103.96.188 [2a00:a500:0:96::188]
        realip {
                header "X-Forwarded-For"
                from cloudflare
                maxhops 5
        }
        @sc_server_fqdn {
                path /_sc_get_server_fqdn
        }
        respond @sc_server_fqdn "web23.swisscenter.com" 200 {
                close
        }
        header X-SwissCenter-Server "web23.swisscenter.com"
        reverse_proxy http://127.0.0.80:80
}

(manager) {
        bind 94.103.96.188 [2a00:a500:0:96::188]
        reverse_proxy http://127.0.0.1:9000
}

import /etc/caddy/host.conf
import /etc/caddy/customers/*.conf

host.conf

web23.swisscenter.com {
  @only_obs {
    path /imav*
    not remote_ip 94.103.96.9 94.103.96.12 94.103.96.129 94.103.97.110 192.168.50.0/24 2a00:a500:0:10::/64
  }
  respond @only_obs "We're sorry, but this resource is not available for you. If you feed this is an error, please contact your amazing server administrator." 403 {
    close
  }
  import common
}

manager.web23.swisscenter.com {
  @only_obs {
    not remote_ip 94.103.96.9 94.103.96.12 94.103.96.129 94.103.97.110 192.168.50.0/24 2a00:a500:0:10::/64
  }
  route @only_obs {
    respond "We're sorry, but this resource is not available for you. If you feed this is an error, please contact your amazing server administrator." 403 {
      close
    }
  }
  import manager
}

127.0.0.1, [::1], 94.103.96.188, [2a00:a500:0:96::188] {
  import common
  tls internal
}

customers/*.conf

http://castella-immobilier.ch, http://www.castella-immobilier.ch, http://101818.web23.swisscenter.com {
        import common
}
https://castella-immobilier.ch, https://www.castella-immobilier.ch, https://101818.web23.swisscenter.com {
        import common
        tls {
                on_demand
        }
}

3. The problem I’m having:

Hello hello,

We use caddy mainly for SSL terminator in front of apache (for the automatic certificate management) on shared hosting servers.

Quite often, when we add to caddy the config to serve a new virtualhost, we then get random tlsv1 internal error on the related domains.

1d madjik@prism:~ $ curl -v --http2 https://101818.web23.swisscenter.com/

*   Trying 2a00:a500:0:96::188:443...
* Connected to 101818.web23.swisscenter.com (2a00:a500:0:96::188) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS alert, internal error (592):
* error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error
* Closing connection 0
curl: (35) error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error

Then, after a few curl retries (~5-10), it works correctly. But it dosn’t last long. A few minutes later, it goes in error again.

I enabled DEBUG loggin to see if I can understand what is going on.

The weird thing is that while caddy returns the TLS errors, there is zero log produced for this particular domain.
Then when it suddently start to answer correctly, it throws this log:

{"level":"debug","ts":1659644872.0620792,"logger":"tls.handshake","msg":"no matching certificates and no custom selection logic","identifier":"101818.web23.swisscenter.com"}
{"level":"debug","ts":1659644872.0635443,"logger":"tls","msg":"loading managed certificate","domain":"101818.web23.swisscenter.com","expiration":1667339924,"issuer_key":"acme-v02.api.letsencrypt.org-directory","storage":"FileStorage:/var/lib/caddy/.local/share/caddy"}
{"level":"debug","ts":1659644872.064106,"logger":"tls.cache","msg":"added certificate to cache","subjects":["101818.web23.swisscenter.com"],"expiration":1667339924,"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"c60741b0ef6a11d760ed68acc320ad96a218247dd43f8899e5b96d0fc0164d99","cache_size":162,"cache_capacity":10000}
{"level":"debug","ts":1659644872.064133,"logger":"tls.handshake","msg":"loaded certificate from storage","subjects":["101818.web23.swisscenter.com"],"managed":true,"expiration":1667339924,"hash":"c60741b0ef6a11d760ed68acc320ad96a218247dd43f8899e5b96d0fc0164d99"}
{"level":"debug","ts":1659644873.2423995,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"127.0.0.80:80","duration":1.174636204,"request":{"remote_ip":"2a00:a500:0:97::dead:b00b","remote_port":"59516","proto":"HTTP/2.0","method":"GET","host":"101818.web23.swisscenter.com","uri":"/","headers":{"User-Agent":["curl/7.74.0"],"Accept":["*/*"],"X-Forwarded-For":["2a00:a500:0:97::dead:b00b"],"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["101818.web23.swisscenter.com"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"101818.web23.swisscenter.com"}},"headers":{"Content-Type":["text/html; charset=UTF-8"],"Server":["Apache/2.4.37 () Phusion_Passenger/6.0.14"],"Link":["<https://101818.web23.swisscenter.com/wp-json/>; rel=\"https://api.w.org/\", <https://101818.web23.swisscenter.com/wp-json/wp/v2/pages/52>; rel=\"alternate\"; type=\"application/json\", <https://101818.web23.swisscenter.com/>; rel=shortlink"],"Date":["Thu, 04 Aug 2022 20:27:52 GMT"],"X-Powered-By":["PHP/7.4.30"]},"status":200}
{"level":"debug","ts":1659644873.7286992,"logger":"tls.handshake","msg":"choosing certificate","identifier":"101818.web23.swisscenter.com","num_choices":1}
{"level":"debug","ts":1659644873.7287493,"logger":"tls.handshake","msg":"default certificate selection results","identifier":"101818.web23.swisscenter.com","subjects":["101818.web23.swisscenter.com"],"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"c60741b0ef6a11d760ed68acc320ad96a218247dd43f8899e5b96d0fc0164d99"}
{"level":"debug","ts":1659644873.728756,"logger":"tls.handshake","msg":"matched certificate in cache","subjects":["101818.web23.swisscenter.com"],"managed":true,"expiration":1667339924,"hash":"c60741b0ef6a11d760ed68acc320ad96a218247dd43f8899e5b96d0fc0164d99"}
{"level":"debug","ts":1659644874.0367506,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"127.0.0.80:80","duration":0.304633867,"request":{"remote_ip":"2a00:a500:0:97::dead:b00b","remote_port":"59518","proto":"HTTP/2.0","method":"GET","host":"101818.web23.swisscenter.com","uri":"/","headers":{"Accept":["*/*"],"X-Forwarded-For":["2a00:a500:0:97::dead:b00b"],"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["101818.web23.swisscenter.com"],"User-Agent":["curl/7.74.0"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"101818.web23.swisscenter.com"}},"headers":{"X-Powered-By":["PHP/7.4.30"],"Content-Type":["text/html; charset=UTF-8"],"Link":["<https://101818.web23.swisscenter.com/wp-json/>; rel=\"https://api.w.org/\", <https://101818.web23.swisscenter.com/wp-json/wp/v2/pages/52>; rel=\"alternate\"; type=\"application/json\", <https://101818.web23.swisscenter.com/>; rel=shortlink"],"Server":["Apache/2.4.37 () Phusion_Passenger/6.0.14"],"Date":["Thu, 04 Aug 2022 20:27:53 GMT"]},"status":200}

It looks like it sometimes “forgets” about the existing certificate and answer with tls error, then suddently it load it from the storage.
Then it forgets it again a few minutes later and goes in circle like this.

I have to explain how we add new configuration:

We have a folder called “customers” where we add *.conf file (one per “virtualhost”).
So we create the config there, and reload caddy (reload not restart)

It seems that doing a full restart of caddy resolve the issue until it randomly happens for a new configuration we add.
By randomly I mean the issue is not with every new config we add but I am not able to see a pattern that would explain why the issue is happening for some and not for others.

I think at the moment it handles around 2500 certs.

Any ideas what could be wrong ?

4. Error messages and/or full log output:

See log in problem description

5. What I already tried:

When it happens, I tried to restart caddy instead of reloading and it seems to dismiss the problem.
But still we would prefer to do a reaload when we add configs, instead of a complete restart.

6. Links to relevant resources:

I would say it looks a bit like this issue:

https://caddy.community/t/error-tls-alert-internal-error-592-again/

Don’t use on_demand when the domains you’re serving are known. It should only be used when you don’t know ahead of time which domains you want to have certificates for, i.e. a customer might point their own domains to your server and you need Caddy to issue a cert on the fly. In your case, you explicitly configured Caddy with your domains as site addresses, so on_demand doesn’t make sense.

If there’s no debug level log from a failed TLS handshake, then I think the request isn’t reaching Caddy at all. Are you sure your networking setup always routes those requests to Caddy, or is there somekind of round-robin load balancing going on somewhere? Do you have multiple A or AAAA records on the same domain, which would cause a random IP to be used?

1 Like

Hello,

Thanks for your reply.

We use on_demand because when we create the hosting account, the domain might not be pointing to the server yet.

For example if the customer manage his domain DNS on an external service and has not yet applied the changes. Or in case of a migration from another service provider and the DNS still points there.

Correct me if I’m wrong but, when not using on_demand caddy will try to grab a certificate as soon as the hostname is declared in the config. In that case, if the domain DNS entries are not correctly configured, it will fail the cert requests and can trigger the rate-limit at letsencrypt.

When enabling the customer virtualhost, we additionnaly provide the customer with a “temporary” hostname for his site so he can work on his site before migrating DNS.
That’s what the entry https://101818.web23.swisscenter.com is about in the configuration I submitted in the original post.

In this particular example, that’s exactly the case. The customer is using his temporary hostname to access the site and work on it because his domain is still pointing to another service provider at the moment.

The entry has an unique A and AAAA record:

root@paradise:~# host 101818.web23.swisscenter.com

101818.web23.swisscenter.com has address 94.103.96.188
101818.web23.swisscenter.com has IPv6 address 2a00:a500:0:96::188

I’m kinda sure (though I should have checked with tcpdump) that it reaches caddy on port 443.
What also makes me think this is that fully restarting caddy resolves the issue, until it comes back with the next new added configuration.

EDIT: If you look at the original post, there is a verbose curl result that shows it connects to the right IP/port.

Could it be that caddy is having an unexpected error in the certificate handling and have no idea what log to produce and drop the request without logging anything ?

Kind regards