Serving static files with %encoding

1. The problem I’m having:

I have a requirement to serve static files (using file_server) which contain special characters like / and : - for example https://example.com/pie. The implementation follows the SAML Metadata Query semantics (see: Metadata Query Protocol - InCommon Metadata Service Wiki - Internet2 Wiki)

This is currently working under Apache httpd by having files on the filesystem named using urlencoded versions (eg. https:%2F%2Fexample.com%2Fpie) and AllowEncodedSlashes NoDecode defined in the Apache config.

I’m trying to port this to Caddy v2 (using caddy:alpine docker container running v2.7.6) but I only get 404s when requesting the same URI as was being served successfully by Apache.

Both of these files exist on the filesystem:

# ls -il ./pub/mdq/entities/*example.com*
306692 -rw-r--r-- 4 root root 5 Jan 18 12:09 ./pub/mdq/entities/https%3A%2F%2Fexample.com%2Fpie
306692 -rw-r--r-- 4 root root 5 Jan 18 12:09 ./pub/mdq/entities/https%3a%2f%2fexample.com%2fpie
306692 -rw-r--r-- 4 root root 5 Jan 18 12:09 ./pub/mdq/entities/https:%2F%2Fexample.com%2Fpie
306692 -rw-r--r-- 4 root root 5 Jan 18 12:09 ./pub/mdq/entities/https:%2f%2fexample.com%2fpie

The debug output suggests that Caddy is urldecoding %2a into / and then expecting that to be in the filesystem path. What I need is to be able to pass through the %2a into the actual filename on disk.

2. Error messages and/or full log output:

$ curl -vs -H 'Host: example' http://hal.lan:8082/entities/https:%2f%2fexample.com%2fpie
*   Trying 192.168.37.22:8082...
* Connected to hal.lan (192.168.37.22) port 8082
> GET /entities/https:%2f%2fexample.com%2fpie HTTP/1.1
> Host: example
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/samlmetadata+xml
< Server: Caddy
< Date: Thu, 18 Jan 2024 12:18:31 GMT
< Content-Length: 0
<
* Connection #0 to host hal.lan left intact
{"level":"info","ts":1705580627.7879965,"msg":"using provided configuration","config_file":"/etc/caddy/Caddyfile","config_adapter":"caddyfile"}
{"level":"warn","ts":1705580627.8091376,"msg":"Caddyfile input is not formatted; run 'caddy fmt --overwrite' to fix inconsistencies","adapter":"caddyfile","file":"/etc/caddy/Caddyfile","line":2}
{"level":"info","ts":1705580627.8116891,"msg":"redirected default logger","from":"stderr","to":"stdout"}
{"level":"info","ts":1705580627.8151548,"logger":"admin","msg":"admin endpoint started","address":"localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
{"level":"warn","ts":1705580627.8173888,"logger":"http.auto_https","msg":"automatic HTTPS is completely disabled for server","server_name":"srv0"}
{"level":"info","ts":1705580627.8174314,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc0005a3200"}
{"level":"debug","ts":1705580627.8174934,"logger":"http.auto_https","msg":"adjusted config","tls":{"automation":{"policies":[{}]}},"http":{"servers":{"srv0":{"listen":[":80"],"routes":[{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"vars","root":"/pub/mdq"}]},{"handle":[{"handler":"headers","response":{"set":{"Content-Type":["application/samlmetadata+xml"]}}}],"match":[{"path":["/entities/*"]}]},{"handle":[{"canonical_uris":false,"handler":"file_server","hide":["/etc/caddy/Caddyfile"],"precompressed":{"gzip":{}},"precompressed_order":["gzip"]}]}]}],"terminal":true}],"automatic_https":{"disable":true},"logs":{"logger_names":{"example":"log0"}}}}}}
{"level":"debug","ts":1705580627.8233988,"logger":"http","msg":"starting server loop","address":"[::]:80","tls":false,"http3":false}
{"level":"info","ts":1705580627.8235157,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
{"level":"info","ts":1705580627.824527,"msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
{"level":"info","ts":1705580627.8276577,"msg":"serving initial configuration"}
{"level":"warn","ts":1705580627.8347867,"logger":"tls","msg":"storage cleaning happened too recently; skipping for now","storage":"FileStorage:/data/caddy","instance":"4979a702-4f27-4e4b-8742-69a09721d8b0","try_again":1705667027.8347764,"try_again_in":86399.999997861}
{"level":"info","ts":1705580627.8350356,"logger":"tls","msg":"finished cleaning storage units"}

{"level":"debug","ts":1705580634.837426,"logger":"http.handlers.file_server","msg":"sanitized path join","site_root":"/pub/mdq","request_path":"/entities/https://example.com/pie","result":"/pub/mdq/entities/https:/example.com/pie"}
{"level":"debug","ts":1705580634.8377106,"logger":"http.log.error.log0","msg":"{id=z8ax53q2w} fileserver.(*FileServer).notFound (staticfiles.go:629): HTTP 404","request":{"remote_ip":"192.168.37.221","remote_port":"53028","client_ip":"192.168.37.221","proto":"HTTP/1.1","method":"GET","host":"example","uri":"/entities/https:%2f%2fexample.com%2fpie","headers":{"Accept":["*/*"],"User-Agent":["curl/8.4.0"]}},"duration":0.000516803,"status":404,"err_id":"z8ax53q2w","err_trace":"fileserver.(*FileServer).notFound (staticfiles.go:629)"}
{"level":"error","ts":1705580634.8378222,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_ip":"192.168.37.221","remote_port":"53028","client_ip":"192.168.37.221","proto":"HTTP/1.1","method":"GET","host":"example","uri":"/entities/https:%2f%2fexample.com%2fpie","headers":{"User-Agent":["curl/8.4.0"],"Accept":["*/*"]}},"bytes_read":0,"user_id":"","duration":0.000516803,"size":0,"status":404,"resp_headers":{"Server":["Caddy"],"Content-Type":["application/samlmetadata+xml"]}}

3. Caddy version:

# docker compose exec mps caddy version
v2.7.6 h1:w0NymbG2m9PcvKWsrXO6EEkY9Ru4FJK8uQbYcev1p3A=

4. How I installed and ran Caddy:

a. System environment:

  • Docker 24.0.7 running under Alpine Linux v3.19
  • caddy:alpine image reporting as v2.7.6
# uname -a
Linux hal 6.6.7-0-virt #1-Alpine SMP PREEMPT_DYNAMIC Thu, 14 Dec 2023 08:49:17 +0000 x86_64 GNU/Linux

b. Command:

docker compose up -d

c. Service/unit/compose file:

Docker Compose file

version: '3.6'
services:
  mps:
    image: caddy:alpine
    ports:
      - 8082:80
    volumes:
      - ./pub:/pub:ro
      - ./Caddyfile:/etc/caddy/Caddyfile:ro

d. My complete Caddy config:

{
        auto_https off
        log default {
                output stdout
        }
        debug
}

(common) {
        log {
                output stdout
        }

        file_server {
                precompressed gzip
                disable_canonical_uris
        }
}

example:80 {
        root * /pub/mdq
        header /entities/* Content-Type application/samlmetadata+xml

        import common
}

5. Links to relevant resources:

https://httpd.apache.org/docs/2.4/mod/core.html#allowencodedslashes

https://spaces.at.internet2.edu/display/MDQ/Metadata+Query+Protocol

Thanks!

From what I can see, file_server routes requests through staticfile.go’s ServeHTTP function. This does a bit of tidying then passes the request path through:

filename := strings.TrimSuffix(
    caddyhttp.SanitizedPathJoin(root, r.URL.Path),
    "/"
)

The function SanitizedPathJoin() which will be called (I can’t see any configuration item which would prevent it) tjem passes the r.URL.Path through path.Clean() which (again, with no tunables), does various things including:

Replace multiple Separator elements with a single one

From this I think that there is no way to persuade Caddy, as it currently stands, to serve up a file in the way I am after.

Would anyone be able to confirm my working?

I think you need to make the URLs use %25 in place of % to “escape” the encoding. I don’t think there’s really any way around that.

Disabling path cleaning would open up a security vulnerability via path traversal. I don’t think it’s a good idea to allow that to be configurable.

Thanks for replying… unfortunately I don’t think it’s something I can influence as the request format is part of the (draft) MDQ specification (draft-young-md-query-20 - Metadata Query Protocol) which, in §3.2.1, says:

3.2.1.  Request by Identifier

   A metadata query request for all entities tagged with a particular
   identifier is performed by issuing an HTTP GET request to a URL
   constructed as the concatenation of the following components:

   *  The responder's base URL.

   *  The string "entities/".

   *  A single identifier, percent-encoded appropriately for use as a
      URL path segment (see sections 2.1 and 3.3 of [STD66]).

   For example, with a base URL of http://example.org/mdq/, a query for
   the identifier foo would be performed by an HTTP GET request to the
   following URL:

   http://example.org/mdq/entities/foo

   Correct encoding of the identifier as a URL path segment is critical
   for interoperability.  In particular:

      The character '/' MUST be percent-encoded.

      The space character MUST be encoded as '%20' and MUST NOT be
      encoded as '+' as would be required in a query parameter.

   For example, with a base URL of http://example.org/mdq/, a query for
   the identifier "blue/green+light blue" would be performed by an HTTP
   GET request to the following URL:

   http://example.org/mdq/entities/blue%2Fgreen+light%20blue

I understand the keep the path sanitisation strict, but wonder if it’s practical to have an option to disable the decoding for a specific subpath (request matcher?) or path element.

In the case of the example, it would be something like:

handle /entities/* {
    uri no_urldecode
}

In reading that I’m not sure what verbiage suggests that % can’t be encoded as %25 though.

I’m sorry I’m not sure what you mean.

I’m not saying that %2F can’t be encoded as %252F, however existing MDQ clients (those that I’m trying to support) currently make requests that don’t do this.

That being said, if there was some sort of preprocessor within caddy that would do that rewrite (%2F := %252F) before the normal processing then I could try that…

As of Caddy v2.6.0, you can match against % as-is. See the path matcher docs here:

Specifically, note this section:

Because there are multiple escaped forms of any given URI, the request path is normalized (URL-decoded, unescaped) except for those escape sequences at positions where escape sequences are also present in the match pattern. For example, /foo/bar matches both /foo/bar and /foo%2Fbar, but /foo%2Fbar will match only /foo%2Fbar, because the escape sequence is explicitly given in the configuration.

The special wildcard escape %* can also be used instead of * to leave its matching span escaped. For example, /bands/*/* will not match /bands/AC%2FDC/T.N.T because the path will be compared in normalized space where it looks like /bands/AC/DC/T.N.T, which does not match the pattern; however, /bands/%*/* will match /bands/AC%2FDC/T.N.T because the span represented by %* will be compared without decoding escape sequences.

I believe you can use the path matcher per the documented criteria, then do a rewrite to the escaped pattern to allow the file-server to function per normal.

You probably will find the PR (of the smart matching feature) description and the release notes helpful to wrap your head around it.

(search for Smarter path matching and rewriting)

Yeah maybe you can do this:

uri replace %2F %252F

The problem is this is case sensitive I think, so if %2f appears in the request then it wouldn’t get replaced.

You could do both probably, I think it should be safe? Maybe? :grimacing:

Edit: For my own sanity, I added a couple test cases to make sure path joining behaves the way we think it does for %252F caddyhttp: Test cases for `%2F` and `%252F` by francislavoie · Pull Request #6084 · caddyserver/caddy · GitHub

Just noting that this seems like a bug in the clients. It’s an improper encoding of the URL if it’s not using %252F for this purpose.

I think the workaround described by Francis works, but it is likely not secure/correct generally speaking.

@matt do you have a RFC reference for that non-compliance assertion?

This bit of config appears to do the right thing:

example:80 {
    ...
    uri /entities/* replace %2F %252F
    uri /entities/* replace %2f %252F
    uri /entities/* replace %3a %253A
    root * /pub/mdq
}
1 Like

It’s not a matter of spec, IMO it’s just logical:

%252F decodes to %2F which is what you want to access on disk. Using %2F decodes to a literal / which is NOT true to the filenames on disk.