Composing in the Caddyfile

Web server config files are mostly about expressing HTTP handling logic. Often, various handlers or middlewares need to be “composed” to form a cohesive HTTP handler that returns the desired responses.

Structurally, config files are basically limited to two dimensions: rows and columns (or, lines and tokens). Syntactically, a config file’s second dimension can expand into more dimensions or scopes when blocks are opened (which are usually represented by curly braces { } and/or indentation, similar to most programming languages); the added dimensionality is useful for expressing complex, nested ideas, but makes it difficult to access higher scopes that “go up and over” if you were to draw them like a tree: there are profound theoretical underpinnings for why most programming languages and config parsers implement scopes like stacks instead of trees. Keep that in mind as we proceed to discuss web server config files.

In a config file, there are two primary ways to describe how to handle HTTP requests:

  • Behavior-first. This involves defining the behavior first, then describing matching/routing within each behavior. Handler logic on the “outside” / routing logic on the “inside”.
  • Routing-first. This involves defining the matching/routing first, then describing the behavior within each route/block. Routing logic on the “outside” / handler logic on the “inside”.

When we say “first” or “outside” here, we basically mean “from left-to-right” in a config file.

We’ll explore both in detail after introducing some crucial background first.

Two problems

Consider this behavior-first Caddyfile excerpt:

rewrite /docs/foo/* /docs/foo.html
rewrite /docs/*     /docs/index.html

header                Cache-Control max-age=86400
header /docs/foo.html Cache-Control no-cache

This is very nice to read and write, but it won’t work the way you expect: all requests will have a Cache-Control header of max-age=86400, even though you would think that requests for /docs/foo.html get a value of no-cache.

Why? First you need to understand two problems.

Problem 1: Handler order

One problem is ordering. The Caddyfile is helpful because, by default, it sorts handlers for you into an order that makes sense for most sites.

First, it sorts by directive name. This ensures, for example, that even if root is the last line of your Caddyfile, the file_server directive that appears at the top will still know where your site’s files are. It means that most of the time, you don’t have to worry about the implementation details of the server when you write your Caddyfile.

Then, within multiple instances of the same directive, the Caddyfile sorts them that have only a path matcher by the descending lengths of their path. This means that directive instances with the longest (most specific) paths are ordered first – otherwise, less specific instances of the directive would always take precedence! To illustrate, imagine this config:

reverse_proxy        localhost:5000  # len(path) = 0
reverse_proxy /api/* localhost:8000  # len(path) = 5

If the Caddyfile did not reorder these handlers for you, the reverse proxy to localhost:5000 would always be used and the proxy to localhost:8000 would never be used, because it appears first and it matches a superset of the requests that the second proxy does. Since reverse_proxy is a terminal handler (i.e. it writes a response and does not call the next handler in the chain), no handlers will be called after it.

To solve this problem, the Caddyfile automatically reorders these internally so that the second proxy in your config will be prioritized to match requests to /api/*; all other requests will still match the first one otherwise.

Problem 2: Mutual exclusion

Another problem is mutual exclusion, or using one handler and not any others that match the same request.

Consider this config (an actual excerpt from the Caddy website’s Caddyfile):

rewrite /docs/json/*    /docs/json/index.html
rewrite /docs/modules/* /docs/modules/index.html
rewrite /docs/*         /docs/index.html

What would happen if rewrite was not mutually exclusive? A request to /docs/modules/http would be rewritten to /docs/modules/index.html (which is what we want), but then it would finally be rewritten to /docs/index.html, which we intended only for all other requests in /docs/ that weren’t to /docs/modules/ – oops!

Because cascading rewrites are often undesirable and similar logic applies to some other directives, the Caddyfile currently considers three standard directives to be mutually exclusive with other instances of themselves:

In other words, after the directives are sorted, only the first matching instance of each of these directives will be invoked. (The Caddyfile adapter uses route groups to enforce mutual exclusion.)

This automatic behavior is what makes the two rewrite lines in our first config excerpt above work as expected. :+1: But what about the header directives?

You’ll notice that header is not on that list… hmm… :thinking: ah, right, that’s because we’ll often want header directives to cascade:

@options method   OPTIONS
header   @options Access-Control-Allow-Methods "POST, GET, OPTIONS"
header            Access-Control-Allow-Origin  example.com

Notice that here, we want all requests to get the Access-Control-Allow-Origin header, but only OPTIONS requests to get the Access-Control-Allow-Methods header. Because of these kinds of extremely handy cases, the Caddyfile can’t assume that all header directives should be mutually exclusive of each other. (We also can’t program the Caddyfile with special logic to peer inside each directive and only make it mutually-exclusive if the first argument is the same – the Caddyfile is extensible, so this would not only break several abstractions but it would also be impossible to maintain.)

Answering the question

Back to the original question: Why? Why does that config way above not yield the desired result?

Well, you know that directives are sorted so that their handlers are ordered most effectively for most use cases, and then some directives are made mutually exclusive within themselves.

The answer should be much more obvious now: multiple header directives cascade into each other, so order might matter. For example, as we wrote:

header                Cache-Control max-age=86400
header /docs/foo.html Cache-Control no-cache

Because of how the Caddyfile does the internal sorting, first sets the Cache-Control: no-cache header for requests to /docs/foo.html – which is what we want! – but then immediately overwrites that for by setting Cache-Control: max-age=86400 on all requests (because there is no matcher). :frowning_face:

Ahhh… well shoot.

We also know that simply switching the order of these two lines won’t have any effect, because the Caddyfile (helpfully) orders them so that no directives become inert.

So how do we solve this?

In this specific case, we have two possible solutions:

  • Make those two header directives mutually exclusive,
    or
  • Manually switch the order so that the overwriting ends favorably.

Solution A

To make any directives – or group of directives – mutually exclusive, we can simply wrap them in a handle directive block:

handle /docs/foo.html {
	header Cache-Control no-cache
}
handle {
	header Cache-Control max-age=86400
}

Observant readers will find this pattern similar to nginx’s location blocks (but not limited to just path matching - any matchers can be used!). By definition, the handle directive is mutually exclusive to all other handle blocks at the same level of nesting! So this is very useful sometimes.

And as it turns out, this part of the config is now routing-first. As you scan from left to right (“outer to inner”), you’ll notice it first defines routing logic, and the behavior is inside it. This approach is useful and necessary at times when you need to achieve specific results.

Unlike nginx, however, you are not coerced to use this approach. The Caddyfile lets you use both/either! But you certainly may use this approach exclusively if you prefer.

Solution B

The other approach is to force the order so that the final header modification is the one we want. We can do this using the route directive:

route {
	header                Cache-Control max-age=86400
	header /docs/foo.html Cache-Control no-cache
}

The route directive defines a literal route that the Caddyfile does not dare re-order; and it is not mutually exclusive with other route blocks. With this config, Caddy will literally invoke those handlers in that order: first applying the max-age= value to all requests, then immediately after will overwrite that with the no-cache value only for requests to /docs/foo.html, which is precisely what we want.

But maybe Solution A is slightly “cleaner” in that it doesn’t require overwriting a previous handler’s result.


Now you know some of the intricacies of handler order and mutual exclusion. Now you can appreciate two equally-valid and equally-useful paradigms for writing a web server configuration!

Behavior-first

This paradigm is most effective when a site’s behavior can be easily described by a sequence of handlers that do not depend on the routing logic of other handlers.

For example, the actual Caddyfile for the Caddy website is quite elegant this way:

caddyserver.com

root ./src

file_server
templates
encode gzip

try_files {path}.html {path}

redir   /docs/json      /docs/json/
redir   /docs/modules   /docs/modules/
rewrite /docs/json/*    /docs/json/index.html
rewrite /docs/modules/* /docs/modules/index.html
rewrite /docs/*         /docs/index.html

reverse_proxy /api/* localhost:4444

Notice how the left-most elements of the configuration primarily describe behaviors, and then most behaviors (directives) are scoped by matchers. As you scan from left to right, you first see behaviors and then matchers (and then the behaviors are configured in more detail).

This works well in our case because the Caddyfile takes care of the general ordering and mutual exclusion for us. It is easy to read and write. However, it can be difficult to reason about the routing (or “matching”) logic because this visual structure does not achieve parity with the final route that a request would take in the handler chain. Instead, it is optimized for grouping behaviors together and letting the server sort out the routes.

Routing-first

This paradigm is most effective when a site’s behavior can be easily described by the sequence of handlers a request will invoke depending on its properties.

To demonstrate, I’ll use a real config from a real production use case on our forums a while ago. This config is behavior-first, and is incorrect for the desired result:

:80

encode gzip
root * /Users/shared/server/components/web
file_server browse
reverse_proxy /soc/* 127.0.0.1:9710
reverse_proxy /api/* 127.0.0.1:9720
reverse_proxy /fio/* 127.0.0.1:9730
try_files {path} /index.html

Notice that none of the reverse_proxy directives will get invoked because the try_files directive rewrites everything that isn’t a static file on disk to /index.html, which is handled only by file_server. Hence, the logic of this behavior-first config is incorrect.

As you saw above, we could solve this by using handle or route (there are often multiple ways to solve a config problem). If we use handle, the corrected config is:

:80

encode gzip
root * /Users/shared/server/components/web
handle /soc/* {
	reverse_proxy 127.0.0.1:9710
}
handle /api/* {
	reverse_proxy 127.0.0.1:9720
}
handle /fio/* {
	reverse_proxy 127.0.0.1:9730
}
handle {
	try_files {path} /index.html
	file_server browse
}

This will first route the request into the proper handle block, and then execute each directive group within the block. Notice how this is particularly important since try_files and file_server are separated by reverse_proxy in the directive order, meaning that if they weren’t divided into separate groups, the Caddyfile would put try_files before any reverse_proxy, resulting in the wrong behavior from the behavior-first attempt.

Hence, the routing-first approach is better for this config. (Actually, we kind of use both! The Caddyfile doesn’t require you to put everything within a handle block. You can compose your handlers the best way for any given part of your config, without being restricted to only a single paradigm.)

If we use route instead:

:80

encode gzip
root * /Users/shared/server/components/web
route {
	reverse_proxy /soc/* 127.0.0.1:9710
	reverse_proxy /api/* 127.0.0.1:9720
	reverse_proxy /fio/* 127.0.0.1:9730
	try_files {path} /index.html
	file_server browse
}

This config is more concise, but we have to mind the exact order of each line, since the Caddyfile won’t rescue us if we get something out of order. It can also be a matter of preference: is it obvious to you that requests in /soc/ will be handled by the first reverse proxy, and that none of the other directives after it will execute? We can use that guarantee to our advantage here, but it’s also not as visually clear.

This also works because we guarantee that try_files isn’t run until after the reverse proxies have all been skipped.

So, how you do it is up to you, as long as it is correct.

Summary

There are two basic ways to express web server configuration: behavior-first or routing-first paradigms.

Either directives are ordered and then grouped (behavior-first), or grouped and then ordered (routing-first).

Between the Caddyfile’s default behavior-first composition, and its support for routing-first composition, you can express many kinds of HTTP handling logic in a way that you find most suitable. This hybrid model results in leaner, more flexible configurations than other web servers allow, while also giving you manual control when you need it.

Related reading:

17 Likes