Collapsing multiple forward slashes in path only

1. Caddy version (caddy version):

latest

2. How I run Caddy:

a. System environment:

b. Command:

paste command here

c. Service/unit/compose file:

paste full file contents here

d. My complete Caddyfile or JSON config:

paste config here, replacing this text
use `caddy fmt` to make it readable
DO NOT REDACT anything except credentials
or helpers will be sad

3. The problem I’m having:

I am replacing Nginx with caddy in our dev environments. One of the things Nginx is “clean” the URL. And it turns out our test suite took advantage of that and sends a lot of bad urls like:

https://foo.com//token

No worries - I added the following directive to clean that up on caddy as well:

uri replace // /

Working great until I came across this URL in our test suite:

https://foo.com/login?redirect=https://bar.com/hi

It appears the uri replace applies to the query string as well as the path. Is there a way to get uri replace to only apply to the path? Is there a different way of dealing this I should be trying instead?

4. Error messages and/or full log output:

5. What I already tried:

6. Links to relevant resources:

Hmm, not sure this is possible right now. Haven’t seen a request for this yet. What would your ideal solution look like instead? Maybe uri replace_path ? Or maybe a more proper fix would be the ability to do a regex replace? (Hmm, I wonder if that can be done using existing path_regexp matcher with rewrite handler…)

Since the <target> parameter of the uri directive accepts regex, could it be used to narrow the target search down to a single instance of a double slash at the start of the string?

e.g. uri replace ^// / or something along those lines.

It would 100% be doable if regex captures and substitutions were available because you could capture the whole line after the double and reinstate it as-is; this I’m less sure about but could probably work, surely?

I don’t know if I have an opinion on the “ideal” solution. The regex option Mathew mentions seems more powerful - but it is also more complicated. (Sorry regex’s are ALWAYs more complicated for normal people even if geeks like myself love them.)

What I can say is what I intended was to fix issues with bad paths and the fact that it affected query parameter values was a “surprise”. My intent was only to fix paths. So yea - something like uri replace_path is very straightforward and heck if that option existed in the first place I would have naturally used it because that was my intent all along.

You could also go with a much simpler option:

@double path //*
uri @double strip_prefix /

In essence, if the path starts with two slashes, strip one slash from the prefix only.

1 Like

That would only work if the double slashes are only at the front of the path and not somewhere in the middle, though.

I was under the impression that was the problem. Have I misunderstood?

Regardless, for the case of arbitrary numbers of double slashes only in the path, could we perhaps rewrite to {path}, perform the uri replace, then rewrite to append the {query} again?

1 Like

Yea - when people concat urls you can get those double slashes anywhere in the path…

Matt - with respect to your replace_path idea. Would the JSON be something like this:

                      - handler: rewrite
                        uri_substring:
                        - find: //
                          replace: /
                          path_only: true

Looking at the code it looks like if I add this to the replacer struct then I could skip the substring matching over the RawQuery. I think that would do the trick.

I may take a stab at this when I can find sometime as I am blocked deploying caddy until I can resolve this issue.

Yeah, that would probably work. The code that does it is here, it would just need to skip the replacement on RawQuery:

1 Like

@Ray_Johnson How about this:

:2015

@slashes path_regexp slashes (/{2,})(.*\\?.*)
uri @slashes replace {re.slashes.1}{re.slashes.2} /{re.slashes.2}

respond "PATH='{path}' QUERY='{query}'"

Result:

$ curl "http://localhost:2015/foo//bar?a=b//c"
PATH='/foo/bar' QUERY='a=b//c'

No code changes needed.

It does technically fulfill your request, but it doesn’t handle repeated instances of multiple slashes.

Edit: Ok, this is a crummy solution. Will rework this.

1 Like

Alrighty… I’ve got a proper solution working, it looks like this:

uri path_regexp /{2,} /

Or, in JSON:

{
	"handler": "rewrite",
	"path_regexp": [
		{
			"find": "/{2,}",
			"replace": "/"
		}
	]
}

Look good? Should I commit this?

1 Like

That looks pretty neat!

1 Like

I think that will work. Indeed, regexp can probably handle a bunch more complicated cases there…

1 Like

Cool; committed. You can use it from the latest on master branch. Thanks!

Already using it. Seems to be working great! Still running through our massive test suite but this passed the specific tests I knew were failing without this feature. Thanks!!!

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.