Routes with / or %2F

finanalyst · April 21, 2023, 5:24pm

1. The problem I’m having:

I want to operate on a route called routine/s//. This is not possible, so its coded as routine/s%2F%2F.
There is a map directive in my Caddyfile that does something like this mapping

map {path} {npath} {
  "/routine/s%2F%2F" "/hashed/1f2f3f4f5"
}

Actually the line inside the map block is imported from a file called prettyurls.

However, it seems that Caddy is translating %2F to / before getting to the map directive.

So I changed the map (actually added a line to prettyurls) to

map {path} {npath} {
  "/routine/s%2F%2F" "/hashed/1f2f3f4f5"
  "/routine/s//" "/hashed/1f2f3f4f5"
}

This does not work either. / seems to be the only character with a problem. ? coded as %3F is dealt with as expected.

I was wondering whether I should be looking at another placeholder (not {path}) to get the request without the %2F → / mapping.

2. Error messages and/or full log output:

3. Caddy version:

v4.6.4

4. How I installed and ran Caddy:

a. System environment:

docker

b. Command:

c. Service/unit/compose file:

d. My complete Caddy config:

	root * /usr/share/caddy

  map {path} {npath} {
    import /usr/share/caddy/assets/prettyurls
  }

  route {
    error /hashed* 403
    try_files {npath}.html {path}.html {path}
  }

  file_server

  handle_errors {
		@404 {
			expression {http.error.status_code} == 404
		}
		@403 {
			expression {http.error.status_code} == 403
		}
		rewrite @404 /404.html
		rewrite @403 /403.html
		file_server
  }

5. Links to relevant resources:

matt · April 21, 2023, 6:25pm

I spent some time on this for v2.6:

github.com/caddyserver/caddy

caddyhttp: Smarter path matching and rewriting

caddyserver:master ← caddyserver:path-escaping

opened 05:56AM - 10 Aug 22 UTC

mholt

+614 -112

This branch resolves several inconsistencies across Caddy's HTTP facilities rega…rding URI encodings in paths. I am not entirely sure, but I suppose breaking changes might be possible if users relied on buggy behavior that has only just been determined and is being remedied here. This PR mainly affects the `path` matcher and the `rewrite` middleware (including both the `rewrite` and `uri` Caddyfile directives). These are extremely commonly-used Caddy features. ## Background URIs (essentially the part of the URL after the scheme and authority/host, e.g. `/foo/bar?a=b#frag` -- though servers don't really deal with `#fragment` components) are famous for being inconsistently encoded and parsed. Differences in parsing/handling between servers, proxies, and applications often lead to bugs and security vulnerabilities. For example, a path of `//foo/bar` might be considered equivalent to `/foo/bar` by one piece of infrastructure, and different to another. Similarly, `/foo%2Fbar` might or might not be the same as `/foo/bar`. To a router, they could be different. To an application, they could be the same. A web server like Caddy is between a rock and a hard place, because it finds itself between untrusted clients who send all manner of inconsistent requests, and other servers or applications who expect the request URI to be _just right_. Caddy is often expected to route requests of all varieties and rewrite/transform them into something the backend application (even if that's just the built-in static file server) can use without confusion. The problem is the requirements and expectations vary widely! Caddy has had several issues over the years where some users expect a URI like `/foo%2Fbar` to be transformed into `/foo/bar` before being proxied. Some want `/foo/bar` to match `/foo%2Fbar`, while others don't. Some want a matcher like `/secret/*` to match URIs like `//secret/*` or `/secret//*` because they put it behind authentication, and if it doesn't match, auth could be bypassed! Windows treats `/file.php . ..` the same as `/file.php` -- even though they technically have different suffixes and file extensions, [causing routing blunders](https://github.com/caddyserver/caddy/pull/2917). Then imagine a path prefix like `/bands/*/*/` that should match `/bands/Pink/Try/` as well as `/bands/AC%2FDC/T.N.T` -- but if the path matcher normalizes (decodes) URIs before matching, the first URI would work but the second would become `/bands/AC/DC/T.N.T` which doesn't match the pattern anymore. To make matters worse, any given URI has multiple valid encoded forms. `%2F%66%6F%6F%2F%62%61%72` can be decoded to `/foo/bar` just as well as `/foo%2Fbar` can, and everything in between can, too. If routers matched on non-normalized URIs, there would be plenty more security bugs to deal with: a pattern of `/foo/*`, which is expected to be authenticated, would no longer match `/foo%2Fbar` even though they are, according to ratified RFCs, _equivalent_. In other words, encodings are significant to applications, but normalizing URIs to a consistent form is critical for maintaining security. Let me restate here [what I wrote for the Laravel community](https://github.com/laravel/framework/issues/22125#issuecomment-1208810581) when I started working on this (with minor changes to make sense out of context): --- RFC 9110, "HTTP Semantics," has a section on HTTP URI normalization, [which says](https://www.rfc-editor.org/rfc/rfc9110.html#name-https-normalization-and-com): > Two HTTP URIs that are equivalent after normalization (using any method) can be assumed to identify the same resource, and any HTTP component MAY perform normalization. As a result, distinct resources SHOULD NOT be identified by HTTP URIs that are equivalent after normalization. In other words, `/foo%2Fbar` and `/foo/bar` are equivalent after normalizing, and thus they _SHOULD NOT_ be used for distinct resources. So if you are encoding application data into the path, and that data could possibly have reserved characters / delimiters (like `/`), consider redesigning your API: it is not robust in the harsh HTTP environment. Note that several RFCs, notably RFCs 3986 and 9110, continually repeat that URI parsing is dependent upon scheme. That's one other problem: we all use the `http://` or `https://` scheme and yet expect applications to handle URIs differently. So of course there's going to be head-butting: we're fighting the design. To clarify, it is definitely possible for a URI path such as `/band/AC%2fDC/T.N.T` to be "properly" handled by a server application. For this case, simply write a server that decodes everything after `/band/` except `%2f`. :man_shrugging: The problem is that this is difficult _in general_. Depending on what situations you do this, you may be opening yourself to bugs and security holes. This is why Caddy currently handles URIs solely in the unescaped space: it's the "one true" representation of a URI, and normalized HTTP URIs are more or less clearly defined nowadays. Others might propose a solution to double-encode application data in the path; in other words, have the client send a URI with a path of `/bands/AC%252fDC/T.N.T`. This will probably work, but it's a hack and it [violates spec](https://www.rfc-editor.org/rfc/rfc3986#section-2.4): > Implementations **must not** percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string. Beware of the non-conforming behavior and highlight it very prominently in documentation so you can avoid bugs. Laravel user @alcaitiff made a comment that some of you may be thinking: > The router should resolve the route and after that decode parameters, but it does decode the url parameters before resolving the route. I can't speak for Laravel or what it's doing, but the Go standard library (what Caddy uses), for example, **does** do URL parsing correctly and still has this problem. Go does exactly what you and the spec recommend: it splits the URI into its components and _then_ decodes reserved characters after parsing. It preserves the original, "raw" path in the `RawPath` field and offers the decoded path in the `Path` field. Its [`EscapedPath()`](https://pkg.go.dev/net/url#URL.EscapedPath) method uses `RawPath` _if it is a valid encoding of `Path`_, which is interesting because any given path has multiple valid encodings as I noted above. So if I want to truly "normalize" the URI in Go, I have to call `url.PathEscape(req.URL.Path)` myself and ignore `RawPath` entirely (AFAIK). And guess what, this converts `/foo/bar` to... `/foo/bar`. In other words, decoding `/foo%2Fbar` is not reversible without loss of precision. (Unless the HTTP server knows your business logic, more on that in a moment.) We can write our own logic, though, that uses `RawPath` as a "hint" (as the Go docs say) to maybe replace `/` with `%2f`, but if we've manipulated/rewritten the URI at all, this becomes infeasible because we don't know where or if that instance still exists in the string. [RFC 3986 section 2](https://www.rfc-editor.org/rfc/rfc3986#section-2.2) states: > URI producing applications should percent-encode data octets that > correspond to characters in the reserved set unless these characters > are specifically allowed by the URI scheme to represent data in that > component. If a reserved character is found in a URI component and > no delimiting role is known for that character, then it must be > interpreted as representing the data octet corresponding to that > character's encoding in US-ASCII. The `/` is in the reserved set. Thus it is up to the implementation to determine whether it is data or a delimiter. I guess Laravel doesn't know, and it's frankly safer to assume it's a delimiter and treat it in its normalized form. So yes, this issue is frustrating. As a web server author, I feel like I need to write software that can read people's minds: is this slash data or is it a delimiter? The router needs more information, because both are very valid ways of interpreting a URI! --- ## The solution I think the key to this problem is trying to read the developer's mind: is this character supposed to be a delimiter (part of the path) or data? Should we collapse repeated slashes or no? The answers depend on the context. For routing / path matching, the answer may be one way, for rewriting it may be another, and for proxying it may be yet another depending on the applications being proxied to. Nginx, Apache, and Caddy all merge slashes by default when matching. However, Nginx and Apache have options to disable that behavior and preserve the slashes, which can lead to security vulnerabilities. All three do path matching (or routing) in the normalized space to mitigate bugs but, like we saw with Laravel, makes it difficult or impossible to route requests with application data that decode as path-significant characters like `%2F` (`/`), leaving many developers frustrated. This PR introduces a somewhat novel solution that allows the developer to convey their intent to the server when doing matching and rewriting. **Simply put, our solution is to interpret encoded characters and multiple slashes in the configuration as a literal conveyance of the developer's intent.** In other words, we don't blanket-unescape the whole URI every time. We do it byte-for-byte in lock-step with the configured pattern to match, and only unescape if the match pattern is not escaped at that position. Similarly, if a configured path has double slashes `//` in it, we do not merge slashes when comparing paths, because we infer the user's intent is to match repeated slashes. ### Path matching Path matching (aka routing) is still done in the normalized space. That means if you configure a path matcher of `/foo/bar`, it will match `/foo/bar`, `/foo%2Fbar` and even `%2F%66%6F%6F%2F%62%61%72` because we normalize the URI. This is unchanged from before. But now if you have a path matcher of `/foo%2Fbar`, it will match `/foo%2Fbar` exactly (the escape sequence is case-insensitive), whereas previously it would have only matched `/foo%252Fbar` (i.e. `%` as data). Now, `/foo%2Fbar` will NOT match `/foo/bar` or `%2Ffoo/bar` because we infer intent from seeing escape sequences in the match pattern as application data, not path delimiters. **This logic handily extends to wildcards, too.** Referring to the previous example from our Laravel discussion, if you want to use `/bands/*/*` it is impossible to match a URI of `/bands/AC%2fDC/T.N.T` (in Laravel, too). But with this change, you can use special "escape-wildcard" characters: `/bands/%*/%*` to indicate that the span matched by the wildcard should _not_ be URI-decoded and should be kept in the escaped/raw space. So now, if you want to allow band names to have a `/` in them, you can simply write `/bands/%*/%*`. ### Double slashes Similar to escape sequences, we now disable slash merging automatically if the configured pattern has repeated slashes. Previously, it was impossible to match `//foo` because all URIs were normalized. Now, a path matcher of `//foo` will preserve multiple slashes. (A matcher of `/foo` will still match `//foo`.) ### Rewriting A common task of rewriting is to strip path prefix and path suffix. The logic explained above has also been implemented for these operations, allowing you to use escaped characters and multiple slashes in your prefix and suffix patterns, and now Caddy will rewrite more intuitively and correctly. For example, if you want to strip a prefix of `//prefix` from `//prefix/foo`, it will work, whereas before it wouldn't find the prefix because it would look at a fully-normalized URI. Similarly, you can strip prefixes or suffixes with encoded characters. For example, a prefix of `/foo%2Fbar` will rewrite a URI of `/foo%2Fbar/asdf` into `/asdf`, whereas before it wouldn't find the prefix. ## Is it perfect? Probably not. Are there bugs? Probably. Have I overlooked things? Almost certainly yes. I'm pretty sure there might be nooks and crannies within Caddy that I missed implementing this. Please file a bug report if you need it to work but doesn't work like you expect. I'm pretty happy with this approach though. I think it's very useful and I don't know of other mainstream servers or frameworks that implement this behavior. In true Caddy fashion, this should _just work_. - fixes #4801 - fixes #4923 - fixes #4743

The release notes also have details.

These changes affect the path matcher and the rewrite directive, so I don’t know if it got into map.

Can you temporarily try changing your config to use the path matcher and rewrite directives (try_files uses a file matcher and a rewrite directive under the hood) to see if you get the desired behavior, after reading about it in the linked notes?

finanalyst · April 22, 2023, 9:55am

After more testing trying to reconfigure as requested, I have discovered that the Caddy map is working as expected.
The directive

map {path} {npath} {
  import prettyurls
}

where prettyurls contains lines like

"/routine/%2F" "/hashed/1a2a3a"

does work as described deep in the Caddy Documentation.

The explanation in the notes you linked to were excellent and the solution you have adopted is - I think - a good one.

Because of your extensive notes, I also discovered another URL-rewrite unrelated to Caddy in my test setup. So in fact the actual problem I was encountering was not related to Caddy.

Thank you for the help.

matt · April 22, 2023, 4:26pm

Wow, that’s awesome!

I’m currently working on a new website and new docs, I will be sure to explain this better in the new documentation (as it’s a relatively new feature).

Thanks for following up

system · May 22, 2023, 4:26pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.