Normalizing URLs with multiple forward slashes

Hi! I’ve got a weird problem that Google indexed /url// for my site and now (because of how I use relative paths in the generated HTML) is trying to index /url/url/... everything which is kind of annoying to see (since it gives 404 obviously). So I thought I’d just redirect those double slashes to a single slash, right? Like this:

	@slashes path_regexp slashes (.*)//$
	redir @slashes {http.regexp.slashes.1}/$ permanent

But it’s not working. It doesn’t catch the request. So I’ve did the simplest server I could:

:1234 {
	root * /nothing

	@slashes path_regexp slashes (.*)//$
	redir @slashes {http.regexp.slashes.1}/$ permanent

And curl -I http://localhost:1234/url// returns 404 rather than a redirect. Any ideas please? I’ve seen Collapsing multiple forward slashes in path only, but reading it did not enlighten me. It seems my config is done in a sane way, am I missing something obvious?

Ah, the problem is that we clean the path before passing the request to the path_regexp matcher, as a protection against maliciously-crafted requests. This protects from an attack which can bypass matchers (e.g. when a matcher is used to protect a certain path with basicauth, we don’t want someone to make a request like //foo/bar when the matcher is ^/foo which would skip past authentication).

You could do something like this maybe:

	route {
		uri path_regexp /{2,} /
		@changed `{http.request.orig_uri.path} != {path}`
		redir @changed {path} permanent

This is kinda janky, but it uses an example from uri to collapse the slashes with the uri directive (instead of a matcher) and then using an expression matcher to compare if the path has changed from the original, and if so redir to the rewritten path. The route is necessary because uri is ordered after redir.

1 Like

Oh wow that’s pretty cool, actually!

I realized that there is only single url with such a problem, so I literally did redir /url// /url/ permanent and that works for now. I may need to revisit this if Google invents another one. :rofl: