Internal self-signed SSL and server IP Addresses

r00tsh3ll · November 25, 2022, 2:11pm

1. Output of `caddy version`:

v2.6.2 h1:wKoFIxpmOJLGl3QXoo6PNbYvGW4xLEgo32GPBEjWL8o=

2. How I run Caddy:

systemctl start caddy

a. System environment:

CloudLinux 8.x (AlmaLinux based)
Systemd

b. Command:

systemctl start caddy

c. Service/unit/compose file:

# caddy.service
#
# For using Caddy with a config file.
#
# Make sure the ExecStart and ExecReload commands are correct
# for your installation.
#
# See https://caddyserver.com/docs/install for instructions.
#
# WARNING: This service does not use the --resume flag, so if you
# use the API to make changes, they will be overwritten by the
# Caddyfile next time the service is restarted. If you intend to
# use Caddy's API to configure it, add the --resume flag to the
# `caddy run` command or use the caddy-api.service file instead.

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
Type=notify
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile --force
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

d. My complete Caddy config:

{
        admin 127.0.0.1:8888
        default_bind 127.0.0.1 [::1] 94.103.96.188 [2a00:a500:0:96::188]
        grace_period 3s
        log {
                output file /var/log/caddy/caddy.log {
                        roll_size 250MiB
                        roll_keep_for 15d
                }
                level ERROR
        }
        email letsencrypt@superservers.com
        on_demand_tls {
                ask https://api.swisscenter.com/webservices/caddy/dnslookup
                interval 2m
                burst 5
        }
        order realip first
}

# Common options we want to apply to every "virtualhosts"
(common) {
        @sc_server_fqdn {
                path /_sc_get_server_fqdn
        }
        respond @sc_server_fqdn "web23.swisscenter.com" 200 {
                close
        }
        realip {
                header "X-Forwarded-For"
                from cloudflare
                maxhops 5
        }
        reverse_proxy http://127.0.0.80:80
}

# Default catchall endpoints
http:// {
        import common
}
https:// {
        import common
        tls {
                on_demand
        }
}

# Hostname endpoint
http://web23.swisscenter.com {
        redir https://{host}{uri}
}
https://web23.swisscenter.com {
        # Imunify AV+ access restriction
        @imav_access {
                path /imav*
                not remote_ip 192.168.50.0/24
        }
        route @imav_access {
                respond "We're sorry, but this resource is not available for you. If you feed this is an error, please contact your amazing server administrator." 403 {
                        close
                }
        }
        import common
}

# LVE Manager endpoint
http://manager.web23.swisscenter.com {
        redir https://{host}{uri}
}
https://manager.web23.swisscenter.com {
        @manager_access {
                not remote_ip 192.168.50.0/24
        }
        route @manager_access {
                respond "We're sorry, but this resource is not available for you. If you feed this is an error, please contact your amazing server administrator." 403 {
                        close
                }
        }
        reverse_proxy http://127.0.0.1:9000
}

# IP endpoints
http://127.0.0.1, http://[::1], http://94.103.96.188, http://[2a00:a500:0:96::188] {
        import common
}
https://127.0.0.1, https://[::1], https://94.103.96.188, https://[2a00:a500:0:96::188] {
        import common
        tls internal
}

# Per virtualhost specific configs
import /etc/caddy/customers/*.conf

3. The problem I’m having:

I’m trying to have the IP addresses routes to use internal SSL and the rest using the “catchall” routes. It works quite correctly for a few hours, but then connecting to the ip address with SSL fails.

When it works

[root@web23 ~]# curl -v https://94.103.96.188
* Rebuilt URL to: https://94.103.96.188/
*   Trying 94.103.96.188...
* TCP_NODELAY set
* Connected to 94.103.96.188 (94.103.96.188) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: [NONE]
*  start date: Nov 25 13:56:28 2022 GMT
*  expire date: Nov 26 01:56:28 2022 GMT
*  subjectAltName: host "94.103.96.188" matched cert's IP address!
*  issuer: CN=Caddy Local Authority - ECC Intermediate
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x56359f3d2690)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/2
> Host: 94.103.96.188
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 302 
< alt-svc: h3=":443"; ma=2592000,h3=":443"; ma=2592000,h3=":443"; ma=2592000,h3=":443"; ma=2592000
< content-type: text/html; charset=UTF-8
< date: Fri, 25 Nov 2022 14:05:35 GMT
< location: https://www.swisscenter.com/
< server: Caddy
< server: Apache/2.4.37 () Phusion_Passenger/6.0.14
< x-powered-by: PHP/8.1.12
< content-length: 0
< 
* Connection #0 to host 94.103.96.188 left intact

When it fails

[root@web23 caddy]# curl -v https://94.103.96.188
* Rebuilt URL to: https://94.103.96.188/
*   Trying 94.103.96.188...
* TCP_NODELAY set
* Connected to 94.103.96.188 (94.103.96.188) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS alert, internal error (592):
* error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error
* Closing connection 0
curl: (35) error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error

Also I’m noticing this in the log:

{"level":"error","ts":1669374283.009526,"logger":"tls.on_demand","msg":"certificate should not be obtained","server_name":"94.103.96.188","subjects":["94.103.96.188"],"expiration":1669388679,"remaining":14396.207596023,"revoked":false,"error":"94.103.96.188: certificate not allowed by ask endpoint https://api.swisscenter.com/webservices/caddy/dnslookup - non-2xx status code 403"}

While this happens, https calls on 127.0.0.1 still works fine.

Might this be related to caddy trying to query the ASK endpoint as I can see in the log.
Is this supposed to happen when we’re not on-demand but with “tls internal” for the IP addresses ?

Restarting caddy makes it answer again correctly with SSL on IP address but after a few hour it goes bozo again.

4. Error messages and/or full log output:

Paste logs/commands/output here.
USE THE PREVIEW PANE TO MAKE SURE IT LOOKS NICELY FORMATTED.

5. What I already tried:

Restarting caddy temporarily helps

6. Links to relevant resources:

r00tsh3ll · November 25, 2022, 4:20pm

In the meantime I’ve modded our ASK endpoint so it accepts and return “200” when the request is about domain=IP_ADDRESS of the host, to see if that’s the reason the SSL is going crazy after a few hours when accessing the server by it’s IP with SSL (that’s mainly for zabbix checks…)

But I was a bit surprised caddy uses the ASK method when “tls internal” is used.

On another hand it could be logical as it’s kinda also an on demand thing but self-signed…

Is this by design ?

matt · November 25, 2022, 4:47pm

Yep, by design. If you configure an ‘ask’ endpoint, Caddy will ask it for permission to get a cert – it’s up to the endpoint to return the right answer for the cert.

r00tsh3ll · November 25, 2022, 5:00pm

Hello Matt,

Ok thank you for confirming this. So I think hopefully fixing the endpoint should resolve our issue.

Kind regards

matt · November 27, 2022, 9:52pm

@r00tsh3ll Just want to make sure this isn’t a bug…

Is the connection to 94.103.96.188 coming in over the https://94.103.96.188 site? Or the https:// site? Because if it’s coming in over :443 and not 94.103.96.188 specifically it could be using the on-demand config.

r00tsh3ll · November 27, 2022, 11:44pm

Hello Matt,

First I would like to say that modding the ASK endpoint so it returns a 200 for requests for declared IP addresses routes, the SSL error we had coming after a few hours is gone.

For your question, if I understand it correctly, I’ve modded the IP address routes so it answer a custom thing:

# IP endpoints
http://127.0.0.1, http://[::1], http://94.103.96.188, http://[2a00:a500:0:96::188] {
        respond "Greetings, professor Falken." 200 {
                close
        }
        #import common
}
https://127.0.0.1, https://[::1], https://94.103.96.188, https://[2a00:a500:0:96::188] {
        respond "Greetings, professor Falken." 200 {
                close
        }
        #import common
        tls internal
}

And it seems it’s using the expected route, not the catchall:

[root@web23 caddy]# curl https://94.103.96.188
Greetings, professor Falken.

Also a curl -v shows that the certificat is the internalone:

* Server certificate:
*  subject: [NONE]
*  start date: Nov 27 23:08:51 2022 GMT
*  expire date: Nov 28 11:08:51 2022 GMT
*  subjectAltName: host "94.103.96.188" matched cert's IP address!
*  issuer: CN=Caddy Local Authority - ECC Intermediate
*  SSL certificate verify ok.

So I guess there is no bug. I only had misunderstood that tls internal acts like it was on demand but using tthe local cert provider and also using the ASK endpoint.

Seems that for 127.0.0.1 though it doesn’t use ASK endpoint, which kinda makes sense as it’s a lookback address…

While I’m it, I have a little question: What is going to happen if the ASK endpoint is down for some (unexpected or expected, like maintenance) reason while certs are renewing ?

Will it zap the cert and return a SSL internal error ?

How caddy would interpret a timeout connecting to the ASK endpoint, if for example the remote host is temporarily down ?

Kind regards.

matt · November 29, 2022, 3:58pm

Certs are renewed 2/3 through their lifetime, so if it can’t be renewed right away, Caddy will retry next time and serve the expiring cert in the meantime.

It will treat that as not allowed to get a certificate; and log an error.

I’d recommend making the ask endpoint local if you can, btw! Just for latency reasons if nothing else.

r00tsh3ll · November 30, 2022, 4:54am

Certs are renewed 2/3 through their lifetime, so if it can’t be renewed right away, Caddy will retry next time and serve the expiring cert in the meantime.

That would explain why internal certifcates for the IP addresses route die quite quickly after some time if the ASK endpoint doesn’t allow it. As I understood they have by default a 12h lifetime.
That doesn’t explain why after a restart it was valid for a few hours again though.

I’d recommend making the ask endpoint local if you can, btw! Just for latency reasons if nothing else.

Yes, true. I had thought about it but as we need to do some real time lookup sin a centralized database in our infrastructure, to validate that the domain is in fact really configured for the service, we have to externalise it.
We’ll put some monitoring on it though, so if there is an unexcepted outage of the endpoint, we’ll be quickly aware of it…

Again, thanks a lot for your help and the infos

system · December 25, 2022, 2:12pm

This topic was automatically closed after 30 days. New replies are no longer allowed.