How to serve gzipped files automatically in Caddy v2?

I have a directory /www/data with the following:

file.html
file.html.gz

The file.html is zero bytes and file.html.gz is non-zero bytes, as it is actually a gzipped html file.

I have a Caddy (v1) block is like this:

server.com {
   root /www/data
}

This block succeeds in serving server.com/file.html, by appropriately finding and serving the gzipped file (file.html.gz) and NOT the file.html (which is 0 bytes).

I’d like to do this in Caddy v2. My Caddy v2 directive is this:

server.com {
  file_server * {
    root /www/data
  }
}

However, this automatically serves the file.html and not the file.html.gz.

The reason I have a bunch of gzipped files is because I want to reduce my disk costs so I keep a ton of files gzipped. This worked well in Caddy v1, and I’d like to move to Caddy v2 hopefully without having to keep a un-gzipped version of the file on disk.

Hi @schollz,

This behaviour has changed between v1 and v2 as most people don’t pre-compress their files, and for those people v1 was hitting the disk extra times unnecessarily for each request, checking for those pre-compressed files.

In v2, you’ll need to configure this behaviour manually. You can use a matcher to check if a .gz extension exists for a requested file, then rewrite to it and set the Content-Type appropriately.

example.com {
  root /www/data
  file_server

  # Match requests with a .gz file
  # Check also for index file specifically
  @gz {
    file {
      try_files {uri}.gz {uri}/index.html.gz
    }
  }

  # Rewrite to .gz and handle Content-Type
  route @gz {
    rewrite {http.matchers.file.relative}
    header Content-Type application/x-gzip
  }
}

With this method you could feasibly remove the original, uncompressed empty file entirely.

3 Likes

Thanks so much. I think what you wrote makes sense in my Caddyfile v2. I’m not getting it to work though. I deleted file.html so now I just have file.html.gz in /www/data. However when I ping server.com/file.html it doesn’t seem to rewrite it and I just get a 404 back.

Is http.matchers.file.relative the right thing to re-write?

Yes, per documentation: https://caddyserver.com/docs/modules/http.matchers.file

It gives us the path to the matched file, relative to the web root - exactly what we need to use for the URI.

With the above Caddyfile I was able to have the request rewritten properly and was served a gzipped file. This is with the latest commit as of this post:

~/Projects/test
➜ caddy version
v2.0.0-beta9.0.20200325051336-0fa1a3b630ec h1:blqg8kpNY8QSyMvhr9Lj2mtYkdm+jaKddp44bw34A04=

~/Projects/test
➜ cat Caddyfile
http://:8080 {
  root www
  file_server

  @gz {
    file {
      try_files {uri}.gz {uri}/index.html.gz
    }
  }

  route @gz {
    rewrite {http.matchers.file.relative}
    header Content-Type application/x-gzip
  }
}

~/Projects/test
➜ mkdir www

~/Projects/test
➜ echo "I'm a gzipped HTML file!" | gzip > www/index.html.gz

~/Projects/test
➜ echo "I'm an uncompressed HTML file!" > www/index.html

~/Projects/test
➜ curl -I localhost:8080
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 45
Content-Type: application/x-gzip
Etag: "q7s93l19"
Last-Modified: Thu, 26 Mar 2020 04:14:09 GMT
Server: Caddy
Date: Thu, 26 Mar 2020 04:26:44 GMT

~/Projects/test
➜ curl -s localhost:8080 | gunzip
I'm a gzipped HTML file!
3 Likes

Thanks @Whitestrake.

Thanks so much! I got your example working. I still have two more questions if you can spare the time. I really appreciate your help, so much!

I’m using the current master 5c55e5d53f for Caddy. Here’s my process:

$ mkdir -p /tmp/caddy/cmd/caddy/www
$ echo "test html!" | gzip >  /tmp/caddy/cmd/caddy/www/index.html
$ cd /tmp/caddy/cmd/caddy && go build
$ vim Caddyfile #make changes
$ ./caddy run -config ./Caddyfile

Here is the Caddyfile I use which works (following yours):

http://:8080 {
  root www
  file_server 

  @gz {
    file {
      try_files {uri}.gz {uri}/index.html.gz
    }
  }

  route @gz {
    rewrite {http.matchers.file.relative}
    header Content-Type text/html
    header Content-Encoding gzip
  }
}

This Caddyfile successfully serves index.html.gz from localhost:8080 in the browser.

How can I control the Content-Type to follow the content type?

I found that If I don’t have that header Content-Type text/html line, then the browser won’t show the file, it will revert to downloading it instead. I’d like to be able to server gzipped css and js as well, which means the content type will change. Please let me if you have any ideas about how to serve any gzipped content (html / css / js).

Can I use an absolute path as a root with file_server?

Instead of using root www I’d like to use the absolute path, root /tmp/caddy/cmd/caddy/www. However, this Caddyfile won’t run:

http://:8080 {
  root /tmp/caddy/cmd/caddy/www
  file_server 

  @gz {
    file {
      try_files {uri}.gz {uri}/index.html.gz
    }
  }

  route @gz {
    rewrite {http.matchers.file.relative}
    header Content-Type text/html
    header Content-Encoding gzip
  }
}

It gives an error:

$ ./caddy run -config ./Caddyfile
2020/03/26 14:34:37.975 INFO  using provided configuration  {"config_file": "./Caddyfile", "config_adapter": ""}
run: adapting config using caddyfile: parsing caddyfile tokens for 'root': ./Caddyfile:3 - Error during parsing: Wrong argument count or unexpected line ending after 'root'

So instead I try using the file_server directive:

http://:8080 {
  file_server * {
    root /tmp/caddy/cmd/caddy/www
  }

  @gz {
    file {
      try_files {uri}.gz {uri}/index.html.gz
    }
  }

  route @gz {
    rewrite {http.matchers.file.relative}
    header Content-Type text/html
    header Content-Encoding gzip
  }
}

and that runs in Caddy, but it serves only 404’s now.

Actually, this directive doesn’t work with the relative path either:

http://:8080 {
  file_server * {
    root www
  }

  @gz {
    file {
      try_files {uri}.gz {uri}/index.html.gz
    }
  }

  route @gz {
    rewrite {http.matchers.file.relative}
    header Content-Type text/html
    header Content-Encoding gzip
  }
}

as it also gives 404s. I did not expect this behavior, according to the docs. Please let me know if I’m doing something wrong!

In Caddy v2, if your first argument looks like a path, it gets used as a matcher. So instead, you need to do root * /tmp/caddy/cmd/caddy/www, to tell Caddy to use * as a path matcher (i.e. all paths).

I think the reason you get 404s when using root inside of file_server is because the other parts of the Caddyfile (rewrite, try_files matcher) won’t know what the root is, because it’s scoped under the file_server, so they’re looking relative to where you ran Caddy instead.

2 Likes

Are you sure? Because this is documented very clearly: https://caddyserver.com/docs/caddyfile/directives/file_server

Note: When specified as a subdirective like this, only this directive will know this root; for other directives (like try_files or templates) to know the same site root, use the root directive, not subdirective.

1 Like

@francislavoie Thanks. That works. Any ideas for serving multiple content types in the browser? Could there be some sort of matcher for each type of file, which multiple try_files?

@matt The docs are not wrong - when I read them again it makes sense now. I think I’m still getting used to the matchers, didn’t realize that they can have different roots.

1 Like

I do think some sort of content negotiation module is in order (have been thinking on it for almost a year): https://github.com/caddyserver/caddy/issues/2665 – as you can tell, this is not an easy thing to get right, so I haven’t implemented it quite yet because I don’t deem it essential for a v2 release. Maybe I’ll work it into v2.1 or a little later, we’ll see.

Feel free to comment on that issue and link to this thread, it’s a good case study to keep in mind.

1 Like

Yeah, might need to do it with multiple matcher/route blocks for now… unfortunately.

  @gzHtml {
    path *.html
    file {
      try_files {uri}.gz {uri}/index.html.gz
    }
  }

  route @gzHtml {
    rewrite {http.matchers.file.relative}
    header Content-Type text/html
    header Content-Encoding gzip
  }


  @gzJs {
    path *.js
    file {
      try_files {uri}.gz
    }
  }

  route @gzJs {
    rewrite {http.matchers.file.relative}
    header Content-Type application/javascript
    header Content-Encoding gzip
  }

  ... etc.

Note - I think this would break the index.html fallback because the path wouldn’t have *.html at the end until after being rewritten. Not sure the best way to deal with it. @Whitestrake will probably have a clever idea, as usual :stuck_out_tongue:

3 Likes

Hmm… maybe…

I think try_files as a directive first might do the trick by rewriting first:

try_files {uri}.gz {uri} {uri}/index.html.gz {uri}/index.html

route *.html.gz {
    header Content-Type text/html
    header Content-Encoding gzip
}

route *.js.gz {
    header Content-Type application/javascript
    header Content-Encoding gzip
}

I wonder if this would work. The idea is that it should rewrite first to any existing file in that order, and THEN, match by suffix, cause the path should contain .gz. Not certain, didn’t try it locally.

1 Like

Thanks @francislavoie, @matt, @Whitestrake.

I think this will get me to where I need to be. Y’all have been so very kind and so very helpful! I appreciate everything you do x1,000,000.

Does that last config work? I’m curious to hear if it does :wink:

1 Like

Just glancing, but inline path matchers have to start with /. So… you’ll have to use named matchers for suffix matching.

Haven’t tried. I forgot that my data doesn’t have encoded css/js - only HTML. But it would be useful in the future. Its working for me now with my encoded data (and with brotli too!).

Ah, bummer. Would’ve hoped it would work with a * prefix as well. Is that something that would make sense to have? Because * on its own already works.

Just depends if we anticipate any directives to have * as the first character of their first argument…