Wildcard SNIs not being matched

1. My Caddy version (caddy version):

v2.0.0-beta.17 h1:x+Ur3uX83j+STerOWsrLDlknXe7z71VnO5xD+H2OwAw=
( downloaded off of github releases )

2. How I run Caddy:

plain binary execution since this is a test server and I can SSH into the machine

a. System environment:

lsb_release --all
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic

b. Command:

sudo caddy run --config config.caddy.json ( I run with sudo so that caddy can access /etc/letsencrypt/live )

d. My complete Caddyfile or JSON config:

{
  "apps" : {
    "http" : {
      "servers" : {
        "caddy.test.shinenelson.xyz" : {
          "listen" : [
            ":80",
            ":443"
          ],
          "automatic_https" : {
            "disable" : true,
            "disable_redirects" : true
          },
          "routes" : [
            {
              "match" : [
                {
                  "host" : [
                    "*.shine.caddy.test.shinenelson.xyz"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "static_response",
                  "body": "Hi there, love from shine and Caddy!"
                }
              ],
              "terminal" : true
            }
          ],
          "tls_connection_policies" : [
            {
              "match" : {
                "sni" : [
                  "shine.caddy.test.shinenelson.xyz",
                  "shine.shine.caddy.test.shinenelson.xyz",
                  "something.shine.caddy.test.shinenelson.xyz",
                  "*.shine.caddy.test.shinenelson.xyz"
                ]
              },
              "certificate_selection" : {
                "policy" : "custom",
                "tag" : "shine"
              }
            }
          ]
        }
      }
    },
    "tls" : {
      "certificates" : {
        "load_files" : [
          {
            "certificate" : "/etc/letsencrypt/live/caddy.test.shinenelson.xyz/fullchain.pem",
            "key" : "/etc/letsencrypt/live/caddy.test.shinenelson.xyz/privkey.pem",
            "tags" : [ "caddy", "test", "shine" ]
          }
        ]
      }
    }
  }
}

3. The problem I’m having:

I have one wildcard TLS certificate covering the SNIs - caddy.test.shinenelson.xyz and *.shine.caddy.test.shinenelson.xyz from the staging endpoint of Let’s Encrypt ACME ( this is only a test server ( notice the test subdomain ) that I’ll pull down every now and then ).

The certificate works if I use subdomains that are provided directly in the match.sni array viz shine.shine.caddy.test.shinenelson.xyz and something.shine.caddy.test.shinenelson.xyz ( Of course, shine.caddy.test.shinenelson.xyz wouldn’t work because it would not match in the host matcher nor is it an SNI on the TLS certificate; that was me just confirming that it wouldn’t work. )

Anything random that is supposed to match the wildcard SNI *.shine.caddy.test.shinenelson.xyz would throw me :

TLS alert, internal error (592):
error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error

4. Error messages and/or full log output:

caddy ( server ) :

 http: TLS handshake error from 68.183.87.23:60752: no server TLS configuration available for ClientHello: &{CipherSuites:[4866 4867 4865 49196 49200 159 52393 52392 52394 49195 49199 158 49188 49192 107 49187 49191 103 49162 49172 57 49161 49171 51 157 156 61 60 53 47 255] ServerName:anything.shine.caddy.test.shinenelson.xyz SupportedCurves:[29 23 30 25 24] SupportedPoints:[0 1 2] SignatureSchemes:[1027 1283 1539 2055 2056 2057 2058 2059 2052 2053 2054 1025 1281 1537 771 515 769 513 770 514 1026 1282 1538] SupportedProtos:[h2 http/1.1] SupportedVersions:[772 771 770 769] Conn:0xc00022ad38 config:0xc0004bd680}

curl ( client ) :

curl -kv https://anything.shine.caddy.test.shinenelson.xyz
*   Trying 68.183.87.23:443...
* Connected to anything.shine.caddy.test.shinenelson.xyz (68.183.87.23) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS alert, internal error (592):
* error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error
* Closing connection 0
curl: (35) error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error

5. What I already tried:

Different combinations of match.sni :

  1. adding more subdomains to the host SNI
  2. putting the wildcard SNI at the top, then bottom of the array
  3. exclusive wildcard SNI *.shine.caddy.test.shinenelson.xyz

the version pasted above is the final version after many iterations of combinations above

I also tried adapting a Caddyfile ( my default fallback whenever my JSON fails )

*.shine.caddy.test.shinenelson.xyz {
tls /etc/letsencrypt/live/caddy.test.shinenelson.xyz/fullchain.pem /etc/letsencrypt/live/caddy.test.shinenelson.xyz/privkey.pem
}

and then compared the results. They were pretty much the same.
I didn’t know how to disable automatic-https via the Caddyfile, so, that much was hand-rolled in. Also, the adapted JSON had a nested handler with a subroute ( I didn’t think that was relevant, but tried anyway, though not extensively ).

6. Links to relevant resources:

I found this old issue ( #3004 ) on github that were already fixed in commit 4a07a5d in PR #3005 which is probably remotely related to the problem. It isn’t relevant here because both issues described in #3004 were fixed.

2 Likes

Thanks for the detailed, unredacted report!

Ah, you’re hitting this TODO in my code: matchers.go - caddyserver/caddy - Sourcegraph :sweat:

Guess it’s time to go implement that. Can you help test patches?

sure, no problem. Just push and ping me on github.

1 Like

@matt, quick question : do you want me to convert this into a github issue?

I wasn’t sure whether this was a bug or a goof up in my configuration which is why I came here first to confirm.

Alas, I wish I could help by fixing this and opening a pull request, but unfortunately, I can’t write go ( though I can kind of read it ). So, I thought at least, I could spare you the pain of creating the issue at least. If this is something that a beginner could do ( and you have the time to mentor ), please let me know, I’d be happy to contribute via a pull request.

Sure, it’s a known bug since it’s a TODO in the code, but an issue will help me track its progress! Thanks! Your writeup here was really good.

1 Like

@shine I went ahead and implemented it real quick (with tests!) – can you try it out? caddytls: Support wildcard matching in ServerName conn policy matcher · caddyserver/caddy@3c1def2 · GitHub

The CI should have pre-compiled build artifacts for you to download as soon as it finishes.

1 Like

man, you’re super fast. I just created the github issue ( and you even beat my copy-paste skills )

1 Like

I could git pull and go build off of the source tree, that’s how I was testing locally. I guess that would increase the TAT for tests rather than having to wait for CI to complete?

I used the beta release binary only because I didn’t want to have the go compiler on the server.

I guess it’s time for git pull, go build and rsync magic now. Let’s do this!

Sure – git pull and then just checkout that commit. Or, download a binary from the artifacts here (top-right corner, sorta: caddytls: Support wildcard matching in ServerName conn policy matcher · caddyserver/caddy@3c1def2 · GitHub)

1 Like

I’m sorry I took too long to test. First, my local build bailed on me, I might have played around with something that broke the newer build. Then the artifact from github was not a proper URL that I could simply pass to curl and download it on the server. So, I had to download it to my local machine and then rsync it up. My stupid internet connection decided to change its public IP right in between the rsync and the connection was hung for bit before I realized what had happened. Then after some juggling with the binaries on the server, I finally was able to test and verify that the fix worked at least for my use-case.

You were too fast in committing the fix. I wanted to reciprocate the speed ( I was already falling behind with all the internet stuff ), so, I didn’t do too many tests; but my use-case worked fine. I created a new matcher for another subdomain host ( but still within the same SNI ) to send a different response and that worked too.

So, thank you so much for your lightening fast speed @matt!

2 Likes

No worries, you don’t have to be faster than any developer. I actually prefer it when the bug reporter takes their time and fills out details, does some testing and investigating on their own, confirms patches are working, etc. You weren’t a blocker on the fix, and the whole thing took much less than an hour. I wish more users would report bugs like you do!

2 Likes

I’m probably goofing up somewhere in the configuration, but I hit the exact same bug again when I used on_demand automatic HTTPS.

I got the automatic on-demand TLS certificate for shine.caddy.test.shinenelson.xyz and anymore subjects that is manually added to tls.automation.policies[0].subjects, but wildcard sub-domains not being provisioned.

Everything in my original post is the same except for 3 differences :

  1. I’m using the latest build - v2.0.0-beta.20 h1:oUNG1uh0UV8LWLlAVDZolFzk112++V/pxY+fF0HLmlY=
  2. The obvious change in the configuration file
...
"automatic_https" : {
  <del> "disable" : true </del>
  "disable_redirects" : true
}
...
<del>
"certificate_selection" : {
  "policy" : "custom",
  "tag" : "shine"
}
</del>
...
"tls" : {
  "automation" : {
    "policies" : [
      {
        "subjects" : [
          "shine.caddy.test.shinenelson.xyz",
          "*.shine.caddy.test.shinenelson.xyz"
        ],
        "issuer" : {
          "module" : "acme",
          "ca" : "https://acme-staging-v02.api.letsencrypt.org/directory"
        },
        "on_demand" : true
      }
   ],
   "on_demand" : {
       "rate_limit" : {
          "interval" : "2160h",
          "burst" : 3
       }
     }
   }
}
  1. Error from the server :
http: TLS handshake error from 183.82.170.125:47540: no certificate available for 'a.shine.caddy.test.shinenelson.xyz'

What I already tried

  1. Increasing the on_demand.rate_limits
  2. Removing the on_demand.rate_limit block entirely ( I didn’t see it when I adapted a dummy Caddyfile )
  3. Adding more subjects to tls.automation.policies[0].subjects provisions the TLS certificates as expected.

Did we miss another wildcard SNI matcher somewhere else @matt?

Er, what is up with <del> in your config? Can you please post your full, unmodified config file. I need to be able to reproduce it…

Please post your full logs and steps as well so I can reproduce them.

I’m sorry for the shoddy paste. I was kind of tired and worn out. I knew you’d be out as well. As I went to bed, I felt that the paste would’ve been confusing. I wanted to edit the post after I woke up; but again, clearly, you beat me to it.
What I posted was the diff from my old config. The <del> was supposed to mean ‘deleted’ from old config.

Configuration

Here’s the full configuration :

{
  "apps" : {
    "http" : {
      "servers" : {
        "caddy.test.shinenelson.xyz" : {
          "listen" : [
            ":80",
            ":443"
          ],
          "automatic_https" : {
            "disable_redirects" : true
          },
          "routes" : [
            {
              "group" : "shine",
              "match" : [
                {
                  "host" : [
                    "shine.caddy.test.shinenelson.xyz"
                  ]
                }
              ],
              "handle" : [
                {
                  "handler" : "static_response",
                  "body": "Hi there, love from shine and Caddy!"
                }
              ],
              "terminal" : true
            },
            {
              "group" : "shine",
              "match" : [
                {
                  "host" : [
                    "*.shine.caddy.test.shinenelson.xyz"
                  ]
                }
              ],
              "handle": [
                {
                  "@id" : "proxy_shine",
                  "handler" : "reverse_proxy",
                  "upstreams" : [
                    {
                      "dial" : "localhost:8080"
                    },
                    {
                      "dial" : "localhost:8081"
                    }
                  ]
                },
                {
                  "handler": "static_response",
                  "body" : "hey, you were not supposed to get here. - shine"
                }
              ],
              "terminal" : true
            }
          ],
          "tls_connection_policies" : [
            {
              "match" : {
                "sni" : [
                  "shine.caddy.test.shinenelson.xyz",
                  "shine.shine.caddy.test.shinenelson.xyz",
                  "something.shine.caddy.test.shinenelson.xyz",
                  "*.shine.caddy.test.shinenelson.xyz"
                ]
              }
            }
          ]
        }
      }
    },
    "tls" : {
      "automation" : {
        "policies" : [
          {
            "subjects" : [
              "shine.caddy.test.shinenelson.xyz",
              "*.shine.caddy.test.shinenelson.xyz"
            ],
            "issuer" : {
              "module" : "acme",
              "ca" : "https://acme-staging-v02.api.letsencrypt.org/directory"
            },
            "on_demand" : true
          }
        ],
        "on_demand" : {
          "rate_limit" : {
            "interval" : "2160h",
            "burst" : 3
          }
        }
      }
    }
  }
}

I got the automatic on-demand ( and it happens on-demand, alright ) TLS certificate provisioned for shine.caddy.test.shinenelson.xyz and anymore subjects that is manually added to tls.automation.policies[0].subjects , but wildcard sub-domains are not being provisioned.

Error

It’s a single line error :

 http: TLS handshake error from 183.82.170.125:47540: no certificate available for 'a.shine.caddy.test.shinenelson.xyz'

What I already tried

  1. Increasing the on_demand.rate_limit s
  2. Removing the on_demand.rate_limit block entirely ( I didn’t see it when I adapted a dummy Caddyfile )
  3. Adding more subjects to tls.automation.policies[0].subjects provisions the TLS certificates as expected.

=== Edit ===

Caddy version

v2.0.0-beta.20 h1:oUNG1uh0UV8LWLlAVDZolFzk112++V/pxY+fF0HLmlY=

Thanks for the info. You don’t have to apologize for that, please make sure to take care. :slight_smile:

I’ll see if I can reproduce it, but I already see a problem: To get wildcard certificates from Let’s Encrypt, you need to use the DNS challenge: Automatic HTTPS — Caddy Documentation

oh, so, it can’t be like I put a wildcard matcher and then get a TLS certificate on-demand per sub-domain that I use?

That’s how I understood the on-demand thing to work. Otherwise, it’s just a static list right?

So, if I got it right, on-demand is useful when you don’t know all the hostnames up front, or if domain names you know of may not be properly configured right away (e.g. DNS records not yet set correctly), would work only if I also set up a wildcard DNS challenge and get a wildcard SNI TLS certificate, is that how it is?

No, no, on-demand has nothing to do with it. Let’s Encrypt requires using the DNS challenge for any wildcard certificates.

I understand the Let’s Encrypt part of it but probably my assumption of the on-demand feature is not clear.

Let me clarify my intention of use here : I have a sub-domain that I want to give away. People get their own sub-domains on the first-level sub-domain ( caddy.test.shinenelson.xyz ). I have no clue as to what those sub-domains that people are going to pick. So, I want to generate a TLS certificate when someone tries one of the sub-domains on my first-level sub-domain.

I know the obvious idea of getting a wildcard certificate from Let’s Encrypt for the first-level sub-domain, but I was hoping that the on-demand provisioning would give me only the used sub-domains and not the whole sub-domain ( via the wildcard ). Is that too much of an ask from the on-demand feature?

I recommend against that, because Let’s Encrypt rate limits the number of certificates you can get for subdomains per week. Use a wildcard certificate when you have an undefined number of subdomains.

Again, this doesn’t have anything to do with on-demand mode; all that does is change when certificates are managed (the kind of certificate is orthogonal).

When using a wildcard certificate, you don’t need to (and probably shouldn’t) use on-demand mode. For 99% of sites, on-demand is not the right solution, it’s basically just for SaaS that need to manage lots of arbitrary domain names they are not in control of.

Thank you for the clarification. I think I have a better understanding now.

Oh, I think I get it now. It’s just about when ( emphasis on the when right? ) the certificate management lifecycle starts. While in the normal mode, the management lifecycle starts during server initialization which could block / delay the startup if there were too many domains. I hope I got it right this time.

I don’t intend on being a pain, but I think this is what confused me in the first place. If you don’t mind ( and have the time, no pressure to answer at all ), can you please explain this concept to me?
On one hand, you need to provide a static set of subjects to match in the configuration. And I think I get the when the lifecycle is being triggered argument as well. Then where’s this ‘arbitrary domains’ concept coming from? If they’re arbitrary and they have no control over the domain, how are they going to put them in the subjects list? How does that work? An example would be great. ( Again, no pressure to answer if you have other pressing matters to look into ). Anyone from the community who understands the concept could help as well.