Automatically forward subpath to port on localhost

jsjoeio · July 22, 2022, 11:28pm

1. Caddy version (`caddy version`):

v2.4.6 h1:HGkGICFGvyrodcqOOclHKfvJC0qTU7vny/7FhYp9hNw=

2. How I run Caddy:

for now, just running locally with caddy start or caddy run depending on how i’m feeling.

a. System environment:

macOS 11.6.7
Homebrew 3.5.5-21-g0b2a4c4
Homebrew/homebrew-core (git revision aa463d5651f; last commit 2022-07-20)
Homebrew/homebrew-cask (git revision 2df013e9fb; last commit 2022-07-21)

b. Command:

caddy run

c. Service/unit/compose file:

n/a

d. My complete Caddyfile or JSON config:

{
	admin localhost:4444
	debug
	log
}
:8000/* {
	@portLocalhost path_regexp port ^([0-9]+)/
	handle @portLocalhost {
		reverse_proxy localhost:{re.port.1}
	}

	handle {
		respond "Bad hostname" 400
	}
}

3. The problem I’m having:

I’m basically trying to do this but with a subpath instead of a subdomain.

I’m running an application on localhost: and I want to reverse proxy to that so that I can access it via localhost:8000/8080/ide/*

4. Error messages and/or full log output:

n/a

5. What I already tried:

I’ve tried using the path_regexp but I can’t get it to match for some reason. Maybe bad regex?

:8000/*/ide/* {
        uri strip_prefix @portLocalhost {re.port.1}/ide
        @portLocalhost path_regexp port ^([0-9]+)\/ide
        handle @portLocalhost {
                reverse_proxy localhost:{re.port.1}
        }

        handle {
                respond "Bad hostname" 400
        }
}

6. Links to relevant resources:

jsjoeio · July 22, 2022, 11:51pm

{
	admin    localhost:4444
}
:8000/*/ide/* {
        handle {
                respond "hello world" 200
        }

}

Doesn’t work with http://localhost:8000/8080/ide/hi so something I’m doing is wrong

emilylange · July 23, 2022, 6:19am

Hi

Please update to v2.5.2, your version is outdated.

Your regex is missing the leading / in the path.
Instead of ^([0-9]+)/ you must write ^/([0-9]+)/.
That should fix your problem and allow you to actually use your matcher the way you intended

Though, it should be noted, that the upstream will receive the vhosts’ Host: header.
A quick curl localhost:8000/3000/ with a dummy netcat tcp running in the background yields:

❯ nc -vl 3000
Listening on 0.0.0.0 3000
Connection received on localhost 52880
GET /3000/ HTTP/1.1
Host: localhost:8000
User-Agent: curl/7.84.0
Accept: */*
X-Forwarded-For: 127.0.0.1
X-Forwarded-Host: localhost:8000
X-Forwarded-Proto: http
Accept-Encoding: gzip

If your upstream somehow depends on Host: header containing localhost:3000 if connecting from localhost:8000/3000/, consider extending your reverse_proxy directive as described in the docs somewhat far down regarding reverse_proxy with https upstreams:

reverse_proxy localhost:{re.port.1} {
	header_up Host {upstream_hostport}
}

Also, drop that /* in :8000/* {. That is not necessary in your case and got deprecated with v2.5.0.
Exerpt from the v2.5.0 release notes:

Caddyfile: Deprecated paths in site addresses. Prefer using path matchers within your site block instead.

The linked PR explains why and shows and example how to rewrite multiple path vhosts to a single vhosts:

github.com/caddyserver/caddy

httpcaddyfile: Deprecate paths in site addresses; use zap everywhere

caddyserver:master ← caddyserver:deprecate-site-paths

opened 09:24PM - 24 Apr 22 UTC

francislavoie

+23 -11

In the Caddyfile, [site addresses](https://caddyserver.com/docs/caddyfile/concep…ts#addresses) have historically supported paths. This was essentially a carry-over from Caddy v1. The idea is that you could write your sites with a path which would get turned into a subroute with that path as a matcher, and sites with the same domain would get grouped up together: ``` example.com/first* { respond "first" } example.com/second* { respond "second" } example.com { respond "anything else" } ``` This is kinda fun, it can read nicely if your config is relatively simple. But, keeping this feature has many problems, both from a config UX standpoint and from a code complexity standpoint: - Since the site address _looks_ like a proper URL, users may be tempted to write their site address like `https://example.com/` which is how the URL might look in the browser. The problem though, is that path matchers in Caddy v2 are exact, so this would only match requests to exactly `/` and nothing else; it would need to be configured as `https://example.com/*` to match all paths, but that's not _really_ a URL. In Caddy v1, this wasn't a problem because a matcher of `/` would match _all paths_ since it was always a prefix match (prefix matching has ambiguity though, so we changed to exact matching in v2). This has been a common mistake that has caused confusion for users, and we've gotten plenty of questions in the forums relating to this problem. - Some directives such as `tls` and `log` are not HTTP handlers, but actually special directives that configure some settings for the server according to the _domain_ in the site. These directives don't make sense to be scoped by a URL path, because they can only operate on the domain; `tls` configures connection policies and cert automation policies for the domain(s) in the site address; `log` enables access logging based on domain (this technically could be extended to match on the path as well, but the effort isn't worth it, there's better ideas to extend this, such as #4689) If two site blocks are defined with the same domain and different paths, then which of the `tls` and `log` directives should we use? This causes ambiguity, and isn't obvious how it'll be have to users configuring it. Right now, we just sort the sites based on the length of their path matchers, and _I think_ the least specific matcher's config will win in that case (I haven't verified this, but either way, it's bad behaviour). - The Caddyfile adapter has to include a bunch of extra code to deal with sorting site blocks based on their path matchers, creating new subroutes matching by the path, merging all of these together in one single set of routes, etc. Some of this code is shared with the logic for merging sites with different domains (reasonable), but it should be possible to significantly simplify certain loops to avoid this extra work, once we remove path support. The preferred way to write your config if you need to have separate routes based on the request path, is to use `handle` blocks inside of a single site block. The above Caddyfile example would be better written like this: ``` example.com { handle /first* { respond "first" } handle /second* { respond "second" } handle { respond "anything else" } } ``` This is slightly longer, and has an extra level of indentation, but the behaviour is _much_ clearer, and the domain doesn't need to be repeated for each site block. --- Also in this branch, I adjusted a bunch of places where we used `fmt.Printf` or `log.Printf` to instead use our zap logger; looks nicer in the terminal, and improves logging consistency in the codebase.

jsjoeio · August 1, 2022, 9:04pm

Wow! Why are you so helpful? This was the best response I could have asked for. Thank you so much

Full Caddyfile with changes:

{
	admin    localhost:4444
}
:8000 {
	@portLocalhost path_regexp port ^/([0-9]+)\/ide
        handle @portLocalhost {
		uri strip_prefix {re.port.1}/ide
                reverse_proxy localhost:{re.port.1}
        }

	handle {
                respond "Bad hostname" 400
        }

}

matt · August 1, 2022, 9:34pm

@emilylange is indeed awesome if not James, right??

Thank you for filling out the help template. It makes things much faster.

system · August 21, 2022, 11:28pm

This topic was automatically closed after 30 days. New replies are no longer allowed.