403 errors coming from caddy without log entries and I don't know why

1. Caddy version (caddy version):

2.3.0. It was also happening with 2.2.1.

2. How I run Caddy:

One caddy server behind a AWS network load balancer with two zones and cross region enabled.

a. System environment:

Amazon linux 2

b. Command:

caddy run --config /home/ec2-user/caddy.json

c. Service/unit/compose file:

n/a

d. My complete Caddyfile or JSON config:

{
  "admin": {
    "disabled": true
  },
  "logging": {
    "sink": {
      "writer": {
        "output": "stdout"
      }
    },
    "logs": {
      "log2stdout": {
        "writer": {
          "output": "stdout"
        },
        "level": "DEBUG"
      }
    }
  },
  "apps": {
    "http": {
      "servers": {
        "websites": {
          "listen": [
            ":443"
          ],
          "routes": [
            {
              "match": [
                {
                  "host": [
                    "*.stampr.io"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "reverse_proxy",
                  "upstreams": [
                    {
                      "dial": "localhost:8080"
                    }
                  ],
                  "flush_interval": -1,
                  "buffer_requests": false
                }
              ],
              "terminal": true
            },
            {
              "match": [
                {
                  "not": [
                    {
                      "header_regexp": {
                        "host": {
                          "name": "disallow_branded_host",
                          "pattern": ".+\\.stampr\\.io$"
                        }
                      }
                    }
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "reverse_proxy",
                  "upstreams": [
                    {
                      "dial": "localhost:8080"
                    }
                  ],
                  "flush_interval": -1,
                  "buffer_requests": false
                }
              ],
              "terminal": true
            }
          ],
          "strict_sni_host": true
        }
      }
    },
    "tls": {
      "automation": {
        "policies": [
          {
            "subjects": [
              "*.stampr.io"
            ],
            "issuer": {
              "module": "acme",
              "ca": "https://acme-v02.api.letsencrypt.org/directory",
              "email": "cory@stam.pr",
              "challenges": {
                "dns": {
                  "provider": {
                    "name": "route53"
                  }
                }
              },
              "trusted_roots_pem_files": []
            },
            "storage": {
              "module": "dynamodb",
              "table": "websites-edge-certificate-storage",
              "aws_region": "us-east-1"
            },
            "on_demand": false
          },
          {
            "issuer": {
              "module": "acme",
              "ca": "https://acme-v02.api.letsencrypt.org/directory",
              "email": "cory@stam.pr",
              "challenges": {
                "http": {
                  "disabled": false
                },
                "tls-alpn": {
                  "disabled": false
                }
              },
              "trusted_roots_pem_files": []
            },
            "storage": {
              "module": "dynamodb",
              "table": "websites-edge-certificate-storage",
              "aws_region": "us-east-1"
            },
            "on_demand": true
          }
        ],
        "on_demand": {
          "ask": "http://localhost:8080/.stampr/.system/verify-tenant-ssl-provisioning-status"
        }
      }
    }
  }
}

3. The problem I’m having:

Caddy is responding with 403 when requesting a site and logs nothing with debug enabled. It seems somehow related to the load balancer IP I hit but both IPs hit the same server and caddy is responding to both requests. Just differently. (See videos at end for details)

4. Error messages and/or full log output:

Nothing is logged.

5. What I already tried:

I tried adding another server to avoid cross region routing from the load balancer but it didn’t work.

6. Links to relevant resources:

Here’s a couple videos of the issue:

I’m not sure why caddy would be treating the requests differently. They’re both hitting the same server.

I reduced the ips on the load balancer to one just to be sure and now I’m completely out of ideas. Two profiles/windows, same remote IP, different responses.

I don’t remember if this exactly right, but I think you need to make a logger named “default” and set that to DEBUG level to properly get all the logs.

I think you also need to add "logs": {} to your server to get access logs to be produced as well.

Hopefully that helps reveal what’s going on.

Yay, getting somewhere. I made the changes you suggested and there is in fact a log entry and it shows the problem. I’m making a request for Host A but tls using Host B for the server name.

I’m not sure why my testing server doesn’t have the same issue :confused:

Any ideas? Thanks!

{
    "level": "error",
    "ts": 1610267912.5543113,
    "logger": "http.log.access",
    "msg": "handled request",
    "request": {
        "remote_addr": "72.192.129.249:52760",
        "proto": "HTTP/2.0",
        "method": "GET",
        "host": "wildcardjewelry.stampr.io",
        "uri": "/",
        "headers": {
            "Cache-Control": ["max-age=0"],
            "User-Agent": ["Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36"],
            "Accept": ["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"],
            "Sec-Fetch-Dest": ["document"],
            "Accept-Encoding": ["gzip, deflate, br"],
            "Accept-Language": ["en-US,en;q=0.9"],
            "Upgrade-Insecure-Requests": ["1"],
            "Sec-Fetch-Site": ["cross-site"],
            "Sec-Fetch-Mode": ["navigate"],
            "Sec-Fetch-User": ["?1"]
        },
        "tls": {
            "resumed": false,
            "version": 772,
            "cipher_suite": 4865,
            "proto": "h2",
            "proto_mutual": true,
            "server_name": "tforjesus.stampr.io"
        }
    },
    "common_log": "72.192.129.249 - - [10/Jan/2021:08:38:32 +0000] \"GET / HTTP/2.0\" 403 0",
    "duration": 0.00001598,
    "size": 0,
    "status": 403,
    "resp_headers": {
        "Server": ["Caddy"]
    }
}

I’m a little lost… can you post the simplest possible curl command that reproduces the error? Use curl -v and post the full output too.

(Btw we don’t have access to view your videos linked above.)

The 403 error likely comes from your backend app, probably not from within Caddy.

can you post the simplest possible curl command

That’s the weird thing. curl seems to always work. safari works most of the time. chrome is almost guaranteed to only work on the first site and then fail with subsequent sites.

Safari will sometimes work with a site and fail while navigating around.

access to view your videos

Fixed – sorry about that.

403 error likely comes from your backend app

I stopped my backend app and the 403 persisted.

In that log entry above the request host was "host": "wildcardjewelry.stampr.io", but tls is doing "server_name": "tforjesus.stampr.io". Where does the server_name come from?

Edit: Also – the browser doesn’t have a cert for the request. If my app was returning 403 it would still have a valid cert, correct?

So then, this is an issue with the client.

The only places Caddy emits a 403 is in the static file server (upon permission errors accessing a file, probably not the case here) and in the HTTP server when enforcing strict_sni_host, here:

With strict_sni_host enabled, the client must use the same ServerName in the TLS handshake as it does Host header in the subsequent HTTP requests.

(I just realized we don’t log the actual error message/value that is generated in the HTTP server when the HTTP status code is in the 4xx range, because they are client errors, not server errors. So that is why you aren’t seeing the error message in your logs. We could probably improve on this.)

It’s funny. I was looking through the caddy source and just landed on the same conclusion. Flipped strict sni off and everything started working correctly.

So the question now is… why are the browsers behaving this way? And why is it working in testing but not production.

Edit: The only thing different between the two is my testing wildcard is a .com and my production is a .io. That and I’m using the staging LE endpoint, ofc.

Famous first words :wink: Whenever I say this during troubleshooting I end up being wrong: either the statement itself is wrong, or I’m looking for the difference/bug in the wrong place and I need to step back and reassess.

Debugging browsers is a bit out of my pay grade – especially free help on the forums :sweat_smile: – but I would start by using new incognito sessions at every trial, and inspect every request carefully. I wish Chrome showed more info about making the TLS handshake. Fortunately, you can get Caddy to emit almost everything the client sends in the TLS handshake though using placeholders.

That’s fair. I guess I should’ve said that I’m using terraform and the caddy config uses the same template with those two differences. It’s possible the problem is in terraform or even a bug in AWS, I suppose.

One thing I’m pretty confident ruling out though are the browsers. I imagine they’re behaving correctly and something else is causing them to think they should send the wrong server name with tls. So that leaves the network in the middle, the cert itself, or caddy – either through misconfiguration or bug. No one else is having an issue with strict sni and wildcards?

When this first popped up I thought about reaching out to you to fix it but a) I don’t have any money, and b) I needed to fix it quickly which meant doing it myself. I’m happy to become a sponsor just as soon as the dollars start rolling in for my free service.

You can find out for sure by having Caddy respond with information from the TLS handshake (see available placeholders).

Not that I know of. I’m pretty sure the wildcard cert has nothing to do with the 403 response, unless presenting that certifiate is affecting the behavior of the client. But that’s not a Caddy bug. So far it’s pretty clear that Caddy is doing what it’s told correctly. (You probably don’t need strict_sni_host unless you’re doing TLS client auth.)

I definitely understand that pinch. :upside_down_face: Would be happy to have your sponsorship, but yeah… giving out a free service makes that difficult! :wink:

I’ll take a look at placeholders, thanks. Yeah, I’m not terribly concerned having it disabled for now. Oh well, all is once again right with the world. Thanks!

1 Like