Enabling a series of rewrites

jmandel · May 11, 2023, 5:26am

1. The problem I’m having:

When a request comes in like “/path”, I’d like to look in “/a/path” and “/b/path” as well as applying possible file extensions using a map like

    map {header.Accept} {fextension} {
        "text/turtle"              "ttl"
        "application/xml"     "xml"
        "application/json"    "json"
        default                    "html"
    }

    try_files {path} {path}.{fextension}

I also want the standard “index.html” to be tried when a directory is requested, which happens by default if I just do something like:

@nona {not path_regexp ^a} 
rewrite @nona /a/{path}

or if I do

@nonb {not path_regexp ^b} 
rewrite @nonb /b/{path}

… but I can’t figure out how to try both of these rewrites. I can try to replace them with a single try_files statement, but this forces me to recaptitulate the index.html logic the file extension logic, and in my real use case I have more than two directories that I want to try, and it all gets pretty hairy.

Is there a cleaner solution?

I have tried many things but I’m not sure they’re on the right path; is there any high level advice for how to apply two rewrites in series (if no files are matched by the first), or speculatively (trying the 2nd one if the 1st fails)?

If not, what other approaches should I consider?

Thanks!

francislavoie · May 11, 2023, 6:50pm

Could you do something like this?

try_files {path} {path}.{fextension} /a{path}.{fextension} /b{path}.{fextension} /index.html

I don’t really understand the rules you’re trying to build. Why can a file be in more than one directory anyway? Why can’t the client request the correct directory in the first place? This seems overly complicated.

jmandel · May 12, 2023, 3:32pm

I wound up with a variant of this (auto-ig-builder/Caddyfile at 1b783f12ccdb2c757ed341a1f3fd28377239c464 · FHIR/auto-ig-builder · GitHub). The use case is a directory full of git branch checkouts; the client can specify a branch, but if they don’t, I want to find the file in master or main, if it exists.

More generally though I’m having trouble understanding the flow of logic after a "rewrite ". My initial mental model (“start the routing logic again from the top of the file, with the new rewritten path”) was wrong, but I haven’t quite grasped what is in fact happening.

francislavoie · May 12, 2023, 6:29pm

(tryall) {
	try_files {args.0}
	try_files {args.0}{fextension}
	try_files {args.0}index.html
}

You can just do this in a single line, simpler and slightly more performant:

try_files {args.0} try_files {args.0}{fextension} try_files {args.0}index.html

{
	http_port 80
	https_port 443
}

This is useless, those are already the defaults. You can remove that.

	log {
		level debug
	}

There’s no debug level for access logs, so this is not doing anything different than simply log with no options. To turn on debug logging for all the logs, set debug in global options instead.

Access-Control-Allow-Methods GET, POST, OPTIONS

This won’t work as you expect. Spaces are significant in the Caddyfile. Make sure to wrap the header value with " quotes, otherwise it has a different effect (usually a header value replacement operation, i.e. search for GET, and replace it with POST, which doesn’t make sense)

	@core {
		not path_regexp ^\/ig
		not path_regexp ^\/branches\/
	}

You can replace this with @core not path /ig* /branches/* which will be slightly more performant and shorter to read.

Yeah a rewrite just happens part-way through the middleware stack, no restarting. Directives in the Caddyfile are sorted according to this predetermined directive order. You can run caddy adapt --pretty to see the effective order it runs in.

When the request comes into Caddy, we make a copy of the original URL and store it in the request context so it can be accessed with placeholders like {http.request.orig_uri}. Then any rewrite modifies the current path in-place.

The try_files directive does not rewrite unless the tried path exists on disk, so it’s often a no-op. Using {path} as the first term usually allows for short-circuiting try_files if the path does indeed exist already on disk. A non-parameterized try like /index.html can act as a fallback as long as that file does indeed exist (you shouldn’t configure that unless you know you do have that file obviously), usually set as the last thing to try.

handle blocks are mutually exclusive from eachother, so if one matches others won’t also run. In the adapted JSON config, you see that as group, so only the first matched subroute in a group will run.

Also the middleware chain is two-way, so some handlers operate on the way in and some on the way out. For example encode operates on the way in and out; in to look at the request headers to consider which encodings it can use, then out to actually transform/compress the response body. Some are terminal handlers like file_server so it terminates the middleware chain by writing a response then makes it go back out (up the chain). So reading the JSON config, consider it first top-down, then again bottom-up. Most handlers like rewrites have no effect on the way up.

jmandel · May 12, 2023, 6:38pm

Thanks so much for the review – it’s incredibly helpful.

After running “caddy adapt --pretty” and seeing what looked like the right logic (__default paths tried before master paths) I realize that the resulting behavior was still not right. I want to use files from “__default” if they exists, and fall back to “master” or “main” if no “__default” exists. The following seems to accomplish it:

	handle @ig {
		import tryall /ig/{re.ig.org}/{re.ig.repo}/branches/main/{re.ig.rest}
		import tryall /ig/{re.ig.org}/{re.ig.repo}/branches/master/{re.ig.rest}
		import tryall /ig/{re.ig.org}/{re.ig.repo}/branches/__default/{re.ig.rest}
	}

… but this is backwards from what I would have expected – i.e. putting __default last causes it to “try to match” first, from my testing. Is that right?

And technically I don’t want this logic at the file level – if __default is present and a file exists in master but not __default, I’d like to return a 404.

francislavoie · May 12, 2023, 6:50pm

No, they’re tried in the order they appear in the config.

Uh, that sounds very complicated… You might be better served by writing your own Caddy plugin to do this instead, since you’d have direct access to the request and you can write the conditions you want as actual code.

I can’t really conceptualize what you’re trying to do here, there’s too many vectors to this than I can really spend time thinking about right now.

jmandel · May 12, 2023, 7:35pm

Thanks again! The complexity is short term while we make some infrastructure transitions.

I figured out why I’m seeing “backwards order” matching: each tryall block executes in turn comparing the same regexp results to the filesystem. So the last one to match will perform a rewrite that clobbers any previous rewrites. I could make these exclusive blocks to avoid this behavior, but now that I understand the behavior it’s manageable.

francislavoie · May 12, 2023, 7:51pm

Yeah, that’s another reason why I suggest doing a single try_files directive instead of multiple in series, because it would avoid that problem.

system · June 11, 2023, 7:51pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.