Need some help with enabling Let's Encrypt certificate on layer 4 module

1. Caddy version:

v2.6.2 h1:wKoFIxpmOJLGl3QXoo6PNbYvGW4xLEgo32GPBEjWL8o=

2. How I installed, and run Caddy:

My copy of caddy is being compiled with:

sudo xcaddy build --with github.com/gamalan/caddy-tlsredis \
--with github.com/caddy-dns/cloudflare \
--with github.com/mholt/caddy-ratelimit \
--with github.com/mholt/caddy-l4

I am currently running caddy with the supervision of pm2. Everything related to caddy is ran under an user named caddy.

a. System environment:

Server machine is Raspberry Pi 4, Debian 11 (bullseye), aarch64 architecture.

b. Command:

/usr/bin/caddy run --environ --config /etc/caddy/Caddy.json

d. My complete Caddy config:

{
	"logging": {
		"logs": {
			"default": {
				"level": "DEBUG",
				"writer": {
					"filename": "/var/log/caddy/process.log",
					"output": "file",
					"roll_size_mb": 4769
				},
				"encoder": {
					"format": "console",
					"time_local": true
				}
			}
		}
	},
	"storage": {
		"Client": null,
		"ClientLocker": null,
		"Logger": null,
		"address": "127.0.0.1:6379",
		"aes_key": "",
		"db": 1,
		"host": "",
		"key_prefix": "",
		"module": "redis",
		"password": "REDACTED",
		"port": "",
		"timeout": 0,
		"tls_enabled": false,
		"tls_insecure": false,
		"username": "",
		"value_prefix": ""
	},
	"apps": {
		"http": {
			"http_port": 80,
			"https_port": 443,
			"servers": {
				"srv0": {
					"listen": [":443"],
					"routes": [{
						"match": [{
							"host": ["testdomain.org", "*.testdomain.org"]
						}],
						"handle": [{
							"handler": "subroute",
							"routes": [{
								"handle": [{
									"handler": "subroute",
									"routes": [{
										"handle": [{
											"handler": "reverse_proxy",
											"upstreams": [{
												"dial": "172.25.69.101:6060"
											}]
										}]
									}]
								}],
								"match": [{
									"host": ["chord.testdomain.org"]
								}]
							}]
						}],
						"terminal": true
					}],
					"tls_connection_policies": [{
						"match": {
							"sni": ["testdomain.org"]
						},
						"client_authentication": {
							"trusted_ca_certs": ["REDACTED"],
							"mode": "require_and_verify"
						}
					}, {
						"match": {
							"sni": ["*.testdomain.org"]
						},
						"client_authentication": {
							"trusted_ca_certs": ["REDACTED"],
							"mode": "require_and_verify"
						}
					}, {}],
					"strict_sni_host": true
				}
			}
		},
		"layer4": {
			"servers": {
				"mssql_DB": {
					"listen": [":1433"],
					"routes": [
						{
							"handle": [
								{
									"handler": "proxy",
									"upstreams": [
										{
											"dial": ["172.25.69.101:15930"]
										}
									]
								}
							]
						}
					]
				}
			}
		},
		"tls": {
			"automation": {
				"policies": [{
					"subjects": ["testdomain.org"]
				},
				{
					"subjects": ["*.testdomain.org"],
					"issuers": [{
						"challenges": {
							"dns": {
								"provider": {
									"api_token": "REDACTED",
									"name": "cloudflare"
								},
								"resolvers": ["1.1.1.1"]
							}
						},
						"module": "acme"
					}, {
						"challenges": {
							"dns": {
								"provider": {
									"api_token": "REDACTED",
									"name": "cloudflare"
								},
								"resolvers": ["1.1.1.1"]
							}
						},
						"module": "zerossl"
					}]
				}]
			}
		}
	}
}

3. The problem I’m having:

Hi there, first thing first. The db server (msSQL) mentioned later is hosted on secondary Raspi with TLS encryption via self-signed certificate.

The Caddy server is hosted on primary Raspi, my own self-signed CA is originated from here. My goal is to use Caddy to reverse proxy TCP connection to the db server. With the help of layer 4 module, it finally worked fine!!! Well only if user disables certificate verification on the client software.

Hence the problem is I am struggling to find a way to make the mssql_DB caddy layer 4 server which listens on port 1433 to use previously obtained wildcard Let’s Encrypt certificate.

So, the request path looks like this:

Note: The user will connect to the server not through IP but via a domain name, which is home.testdomain.org.

Client SQL Management Software → Caddy TCP home.testdomain.org:1433 — (reverse TCP) → local network msSQL server 15930 port .

I googled for quite some times and it seems like that, (correct me if i’m wrong) the general way of doing this is capturing all TLS through port 443 and then re-routes the request based the sni or something else.

Sorry if my explanation is a bit off, is there any ways I can accomplish the goal of enabling Let’s Encrypt certificate on the home.testdomain.org:1433 mentioned above?

4. Error messages and/or full log output:

The caddy log file only produces those two lines when a client tries to connect to the msSQL server.

2023/01/25 02:28:04.132 DEBUG   layer4.handlers.proxy   dial upstream   {"remote": "172.25.69.1:57076", "upstream": "172.25.69.101:15930"}
2023/01/25 02:28:04.184 DEBUG   layer4  connection stats        {"remote": "172.25.69.1:57076", "read": 285, "written": 1741, "duration": 0.052489367}

Screenshot below is the error while connecting to the msSQL server through caddy.
Note: Disabling SSL/TLS certificate verification on msSQL client will fix the issue.

5. What I already tried:

I have searched around my issue for quite some time already but I cannot find an exact match that’s related absolutely to this problem I’m facing.

You need to use the tls handler in your layer4 server. See the second example here in the docs GitHub - mholt/caddy-l4: Layer 4 (TCP/UDP) app for Caddy.

The tls handler is what handles completing the TLS handshake and decrypting the traffic before passing it onto the next handler (i.e. proxy).

There’s also a tls matcher which can match by SNI so you can route the TLS traffic. But that doesn’t seem to be necessary for you here since you’re using a different port for this traffic anyways.

Hi @francislavoie, thank you so much on helping this case!

I have followed the example and modified my json file.
Note: I have allowed Caddy to generate valid Let’s Encrypt ceritificate for the domain mssqltest.ddns.net and verified them through https://testtls.com/ before continuing.

Anyways, my current JSON config file is shown as below:

{
	"logging": {
		"logs": {
			"default": {
				"level": "DEBUG",
				"writer": {
					"filename": "/var/log/caddy/process.log",
					"output": "file",
					"roll_size_mb": 4769
				},
				"encoder": {
					"format": "console",
					"time_local": true
				}
			}
		}
	},
	"storage": {
		"Client": null,
		"ClientLocker": null,
		"Logger": null,
		"address": "127.0.0.1:6379",
		"aes_key": "",
		"db": 1,
		"host": "",
		"key_prefix": "",
		"module": "redis",
		"password": "REDACTED",
		"port": "",
		"timeout": 0,
		"tls_enabled": false,
		"tls_insecure": false,
		"username": "",
		"value_prefix": ""
	},
	"apps": {
		"layer4": {
			"servers": {
				"msSQldb": {
					"listen": ["0.0.0.0:1433"],
					"routes": [
						{
							"handle": [
								{
									"handler": "tls"
								},
								{
									"handler": "proxy",
									"upstreams": [
										{
											"dial": ["172.25.69.101:15930"],
											"tls": {
												"renegotiation": "freely"
											}
										}
									]
								}
							]
						}
					]
				}
			}
		},
		"tls": {
			"certificates": {
				"automate": ["mssqltest.ddns.net"]
			},
			"automation": {
				"policies": [{
					"issuers": [{"module": "acme"}]
				}]
			}
		}
	}
}

I apologize for the dumb-down config file, I’m hoping to fix this issue part by part.

Now the problem is when I tried to connect to mssqltest.ddns.net:1433 through Azure Data Studio, the error like below have shown up:

Caddy logging is shown as:

2023/01/25 14:09:31.013 DEBUG   tls.cache       added certificate to cache      {"subjects": ["mssqltest.ddns.net"], "expiration": "2023/04/25 12:04:09.000", "managed": true, "issuer_key": "acme-v02.api.letsencrypt.org-directory", "hash": "4af2312bcb8560d76a30b2185067b0cd4deadb16cbb8aeccfad8cc9da545a8df", "cache_size": 1, "cache_capacity": 10000}
2023/01/25 14:09:31.013 DEBUG   events  event   {"name": "cached_managed_cert", "id": "86c11118-5d6e-4623-809b-2bb1455cca1c", "origin": "tls", "data": {"sans":["mssqltest.ddns.net"]}}
2023/01/25 14:09:31.014 DEBUG   layer4  listening       {"address": "tcp/[::]:1433"}
2023/01/25 14:09:31.014 INFO    autosaved config (load with --resume flag)      {"file": "/var/lib/caddy/.config/caddy/autosave.json"}
2023/01/25 14:09:31.014 INFO    serving initial configuration
2023/01/25 14:09:31.015 INFO    tls     cleaning storage unit   {"description": "{\"Client\":{},\"ClientLocker\":{},\"Logger\":{},\"address\":\"127.0.0.1:6379\",\"host\":\"127.0.0.1\",\"port\":\"6379\",\"db\":1,\"username\":\"\",\"password\":\"REDACTED\",\"timeout\":5,\"key_prefix\":\"caddytls\",\"value_prefix\":\"caddy-storage-redis\",\"aes_key\":\"\",\"tls_enabled\":false,\"tls_insecure\":true}"}
2023/01/25 14:09:31.024 INFO    tls     finished cleaning storage units
2023/01/25 14:09:34.394 ERROR   layer4  handling connection     {"remote": "172.25.69.1:55634", "error": "tls: first record does not look like a TLS handshake"}
2023/01/25 14:09:34.394 DEBUG   layer4  connection stats        {"remote": "172.25.69.1:55634", "read": 94, "written": 0, "duration": 0.000267572}

Any ideas what could have gone wrong here? Been trying to get this working for days already…

Well, it might just not be a TLS connection. Are you sure that it is? If it is, Caddy doesn’t recognize it as such, apparently.

Uhhh, to be honest, sorry, I’m not so sure…

I’m sorry I’m a newbie to this. I can set connection protocol in the Database IDEs software, the server only accepts TCP/IP. But the encryption should be using TLSv1.2…

If I remove the "handler": "tls", then caddy will no output any errors.
The logs are looking like this:

2023/01/25 14:51:18.906 DEBUG   layer4.handlers.proxy   dial upstream   {"remote": "172.25.69.1:53989", "upstream": "172.25.69.101:15930"}
2023/01/25 14:51:19.193 DEBUG   layer4  connection stats        {"remote": "172.25.69.1:53989", "read": 17962, "written": 4755, "duration": 0.288307206}
2023/01/25 14:56:47.853 DEBUG   layer4.handlers.proxy   dial upstream   {"remote": "172.25.69.1:54057", "upstream": "172.25.69.101:15930"}
2023/01/25 14:56:47.965 DEBUG   layer4  connection stats        {"remote": "172.25.69.1:54057", "read": 282, "written": 1741, "duration": 0.112879238}

I think it’s worth mentioning the upstream server is a Azure SQL Server (docker image -> mcr.microsoft.com/azure-sql-edge:latest ) running inside a docker container. The version is 15.0.2000.1565.

I use JetBrains DataGrip to connect to mssqltest.ddns.net:1433. The IDE complains that

The driver couldn't establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. 

Error: "Failed to validate the server name "mssqltest.ddns.net" in a certificate during Secure Sockets Layer (SSL) initialization. Name in certificate : ''"

Seems like the certificate that Caddy offered is empty or something.

Update (I have no idea what I am doing):

If I modified my JSON config to this:

"apps": {
    "layer4": {
        "servers": {
            "msSQldb": {
                "listen": [
                    "0.0.0.0:1433"
                ],
                "routes": [
                    {
                        "match": [
                            {
                                "tls": {
                                    "sni": [
                                        "mssqltest.ddns.net"
                                    ]
                                }
                            }
                        ],
                        "handle": [
                            {
                                "handler": "tls"
                            },
                            {
                                "handler": "proxy",
                                "upstreams": [
                                    {
                                        "dial": [
                                            "172.25.69.101:15930"
                                        ]
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        }
    },
    "tls": {
        "certificates": {
            "automate": [
                "mssqltest.ddns.net"
            ]
        },
        "automation": {
            "policies": [
                {
                    "issuers": [
                        {
                            "module": "acme"
                        }
                    ]
                }
            ]
        }
    }
}

Caddy’s log is shown as below:

2023/01/25 15:34:15.866 DEBUG   layer4  matching        {"remote": "172.25.69.1:60668", "matcher": "layer4.matchers.tls", "matched": false}
2023/01/25 15:34:15.866 DEBUG   layer4  connection stats        {"remote": "172.25.69.1:60668", "read": 5, "written": 0, "duration": 0.000327515}

This log line seems to indicate the SNI is not matching

It is also why the tls handler isn’t handling the connection. The SNI is not matching. Can you investigate what is DataGrip doing? Why is the SNI different (is it empty?)?

I believe the default_sni parameter in the tls handler may be useful for your case. Can you try it out?

Hi there, I have followed your instruction and added the default_sni to the tls handler.

My config looks like this now:

"apps": {
    "layer4": {
        "servers": {
            "msSQldb": {
                "listen": [
                    "0.0.0.0:1433"
                ],
                "routes": [
                    {
                        "handle": [
                            {
                                "handler": "tls",
                                "connection_policies": [
                                    {
                                        "match": {
                                            "sni": [
                                                "mssqltest.ddns.net"
                                            ]
                                        },
                                        "default_sni": "mssqltest.ddns.net"
                                    }
                                ]
                            },
                            {
                                "handler": "proxy",
                                "upstreams": [
                                    {
                                        "dial": [
                                            "172.25.69.101:15930"
                                        ]
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        }
    },
    "tls": {
        "certificates": {
            "automate": [
                "mssqltest.ddns.net"
            ]
        },
        "automation": {
            "policies": [
                {
                    "issuers": [
                        {
                            "module": "acme"
                        }
                    ]
                }
            ]
        }
    }
}

I tried to use the config above on msSQL and MariaDB, both have different result. IDE: DataGrip

msSQL
IDE Connection Result:

A connection was successfully establish with the server, but then an error occurred during the pre-login handshake.
(Provider: TCP Provider, error: 0 - An exisiting connection was forcibly closed by the remote host.)

Caddy log:

2023/01/26 08:46:07.319 ERROR   layer4  handling connection     {"remote": "172.25.69.1:50780", "error": "tls: first record does not look like a TLS handshake"}
2023/01/26 08:46:07.320 DEBUG   layer4  connection stats        {"remote": "172.25.69.1:50780", "read": 67, "written": 0, "duration": 0.001298933}

MariaDB (Confirmed server is alive, SSL/TLS ON etc)
IDE Connection Result:

Hangs for eternality, no timeout

Caddy log:

There aren't any logs output.

Sorry, I can’t seem to find a way to enable verbose or its equivalent mode but i think you’re right that Caddy received an empty SNI or something…

While MariaDB is up and using the exactly same config mentioned above, If I go over to this website, https://testtls.com/.

Then, run a test the following parameters:

  1. Server → mssqltest.ddns.net
  2. Port → 1433
  3. Protocol Options → Connect to mySQL, then STARTTLS(e.g. TCP 3306)

Before the test error out, the progress was:

Then finally it was as shown below:
image

Caddy logs just keep printing the same error like:

2023/01/26 09:08:12.496 ERROR   layer4  handling connection     {"remote": "116.203.76.237:51456", "error": "tls: first record does not look like a TLS handshake"}
2023/01/26 09:08:12.496 DEBUG   layer4  connection stats        {"remote": "116.203.76.237:51456", "read": 36, "written": 0, "duration": 0.005480265}
2023/01/26 09:08:18.686 ERROR   layer4  handling connection     {"remote": "116.203.76.237:51458", "error": "EOF"}
2023/01/26 09:08:18.686 DEBUG   layer4  connection stats        {"remote": "116.203.76.237:51458", "read": 0, "written": 0, "duration": 4.813129492}

GitHub - mholt/caddy-l4: Layer 4 (TCP/UDP) app for Caddy has no support for STARTTLS

I have to preface this by saying hooking up layer4 to terminate TLS for database isn’t something I’ve dealt with personally before. I’ve just now experimented with Postgres for the same and researched a bit on the subject. Here are the findings:

  • MySQL/MariaDB: @IndeedNotJames has already stated layer4 does not support STARTTLS

  • PostgreSQL: It has its own handshake before the exchange of TLS bytes. See:

    • ssl - NGINX TLS termination for PostgreSQL - Stack Overflow

      Sadly that confirms what I feared. Adding SNI to the PostgreSQL protocol wont help with solving your use case because the PostgreSQL protocol has its own handshake which happens before the SSL handshake so the session will not look like SSL to HA Proxy.

    • PostgreSQL: Documentation: 15: 55.2. Message Flow

      To initiate an SSL-encrypted connection, the frontend initially sends an SSLRequest message rather than a StartupMessage. The server then responds with a single byte containing S or N, indicating that it is willing or unwilling to perform SSL, respectively. The frontend might close the connection at this point if it is dissatisfied with the response. To continue after S, perform an SSL startup handshake (not described here, part of the SSL specification) with the server.

  • MSSQL: I couldn’t find hard evidence like the others, but I found this comment by F5 (link: Re: SSL offloading issue with MSSQL - DevCentral):

    SQL traffic uses the TDS protocol. WIth TDS/TDS7, there is a PRELOGIN message that is sent by the client, prior to the beginning of the SSL/TLS handshake. The client-ssl profile is not expecting this, and resets the connections as non-SSL/TLS traffic.

It seems like any TLS-terminating proxy must be database-aware to handle the pre-handshake dance. For now, and I have not verified this personally, you can grab the certificates from Caddy storage and configure the database server to use them. You’ll need a way to update the certs when renewed.

If you ever find a way or another workaround, please share! I’d love to know.

3 Likes

Caddy emits events when certs are renewed now. See the v2.6.0 release notes for more info. There’s a plugin to run arbitrary commands, it can be used to trigger a script to copy certs to wherever.

@Mohammed90, Thank you on researching this topic!

I believe all of your findings are correct because I’m encountering this TLS termination issue with NGINX on both MariaDB & msSQL too!

Sighs, I would have gone for a VPN solution or at least mutual TLS if this isn’t related to my university group project.

PS: I came across proxysql too, but sadly, it doesn’t seem to support msSQL connections.

For now, I have settled down with adding another subdomain that points to my server machine. Since I was using self-signed certificate, I am now using certbot to handle the certs for that particular subdomain.


Funny story tho: After I got the certs from certbot and added them to both MariaDB & msSQL . Then, I tried to connect to them on DataGrip with SSL/TLS Full Verification mode.

MariaDB had no issue at all, I am able to connect without any problems. The sni verification is working correctly. (i.e. if I put wrong hostname then the verification will fail)

However, this isn’t the case with msSQL. I don’t know is the issue related to the client driver or msSQL server itself but seems like DataGrip didn’t get the fullchain certificate correctly during handshake. I have to manually add Let’s Encrypt R3 Intermediate Cert to DataGrip for it to connect under full verficiation mode.

I don’t know but it seems like msSQL had some poor TLS implementation or something.


Anyways, thank you, all of you, on lending a helping hand on this issue.

Thank you so much! Have a great day ahead!

2 Likes

By the way, you can use Caddy to manage and automatically renew the cert, even if you don’t use its proxying capability. See this article by @matt .

2 Likes

Cool! Caddy is such an excellent toolbox, packed with tons of features.

Thanks for sharing!

2 Likes