How to stop caddy sniffing mime types and return a default mime type

1. My Caddy version (caddy -version):

Caddy v1.0.4 (h1:wwuGSkUHo6RZ3oMpeTt7J09WBB87X5o+IZN4dKehcQE=)

2. How I run Caddy:

I run caddy like this to run the beta.rclone.org site. This is a static site where people can download beta versions of rclone.

The one unusual feature here is that /mnt/beta.rclone.org is provided by an rclone mount.

beta.rclone.org {
    root /mnt/beta.rclone.org
    log /var/www/logs/beta.rclone.org.log {
        rotate_size 100 # Rotate after 100 MB
    }
    errors stderr
    browse /
    tls me@example.com
}

a. System environment:

# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.9 (stretch)
Release:	9.9
Codename:	stretch

b. Command:

I replicated this in a test environment

./caddy

c. Service/unit/compose file:

None

d. My complete Caddyfile:

127.0.0.1:8080 {
    root /mnt/tmp
    errors stderr
    browse /
}

3. The problem I’m having:

The problem I’m having is due to Caddy sniffing the mime types by reading the initial contents of the file for unknown extensions.

The beta.rclone.org site is mirrored by various people. To update their mirrors they run directory listings and HEAD requests on all the files. This is all standard stuff.

However a HEAD request on a file with an unknown extension is causing a read on the file to work out the mime type.

A HEAD request on a file with a known extension does not cause a read on the file.

This causes a problem because a read on the file causes rclone to fetch the whole file from object storage.

This means that the quick scan by the mirrors causes all the files with unknown extensions to be brought in from object storage which is quite inefficient and using 500GB of bandwidth a day!

Note that a lot of these files have NO extension.

I made a test setup to reproduce the problem see above for caddy file.

Run rclone with

rclone mount -vv --vfs-cache-mode full /tmp/filez/ /mnt/tmp/

And make sure there are some files in /tmp/filez

$ ls -l /tmp/filez
total 8
-rw-rw-r-- 1 ncw ncw 6 Jan 17 10:26 test-no-extension
-rw-rw-r-- 1 ncw ncw 6 Jan 17 10:26 test.txt

Doing this

$ HEAD http://127.0.0.1:8080/test.txt
200 OK
Connection: close
Date: Fri, 17 Jan 2020 10:27:19 GMT
Accept-Ranges: bytes
ETag: "q48ybl6"
Server: Caddy
Content-Length: 6
Content-Type: text/plain; charset=utf-8
Last-Modified: Fri, 17 Jan 2020 10:26:09 GMT
Client-Date: Fri, 17 Jan 2020 10:27:19 GMT
Client-Peer: 127.0.0.1:8080
Client-Response-Num: 1

Does this in rclone’s log

Note the file is not read. Opening it isn’t enough to bring it in from backing store.

2020/01/17 10:27:19 DEBUG : /: Lookup: name="test.txt"
2020/01/17 10:27:19 DEBUG : /: >Lookup: node=test.txt, err=<nil>
2020/01/17 10:27:19 DEBUG : test.txt: Attr: 
2020/01/17 10:27:19 DEBUG : test.txt: >Attr: a=valid=1s ino=0 size=6 mode=-rw-rw-r--, err=<nil>
2020/01/17 10:27:19 DEBUG : /: Lookup: name="test.txt"
2020/01/17 10:27:19 DEBUG : /: >Lookup: node=test.txt, err=<nil>
2020/01/17 10:27:19 DEBUG : test.txt: Attr: 
2020/01/17 10:27:19 DEBUG : test.txt: >Attr: a=valid=1s ino=0 size=6 mode=-rw-rw-r--, err=<nil>
2020/01/17 10:27:19 DEBUG : test.txt: Open: flags=OpenReadOnly
2020/01/17 10:27:19 DEBUG : test.txt: Open: flags=O_RDONLY
2020/01/17 10:27:19 DEBUG : test.txt: >Open: fd=test.txt (rw), err=<nil>
2020/01/17 10:27:19 DEBUG : test.txt: >Open: fh=&{test.txt (rw)}, err=<nil>
2020/01/17 10:27:19 DEBUG : test.txt: Open: flags=OpenReadOnly
2020/01/17 10:27:19 DEBUG : test.txt: Open: flags=O_RDONLY
2020/01/17 10:27:19 DEBUG : test.txt: >Open: fd=test.txt (rw), err=<nil>
2020/01/17 10:27:19 DEBUG : test.txt: >Open: fh=&{test.txt (rw)}, err=<nil>
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: Flush: 
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: >Flush: err=<nil>
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: Release: 
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: Flush: 
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: >Flush: err=<nil>
2020/01/17 10:27:19 DEBUG : test.txt(0xc00018a580): RWFileHandle.Release closing
2020/01/17 10:27:19 DEBUG : test.txt(0xc00018a580): close: 
2020/01/17 10:27:19 DEBUG : test.txt(0xc00018a580): >close: err=<nil>
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: >Release: err=<nil>
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: Release: 
2020/01/17 10:27:19 DEBUG : test.txt(0xc00018a440): RWFileHandle.Release closing
2020/01/17 10:27:19 DEBUG : test.txt(0xc00018a440): close: 
2020/01/17 10:27:19 DEBUG : test.txt(0xc00018a440): >close: err=<nil>
2020/01/17 10:27:19 DEBUG : &{test.txt (rw)}: >Release: err=<nil>

However doing this on a file with no extension

$ HEAD http://127.0.0.1:8080/test-no-extension
200 OK
Connection: close
Date: Fri, 17 Jan 2020 10:27:39 GMT
Accept-Ranges: bytes
ETag: "q48ybs6"
Server: Caddy
Content-Length: 6
Content-Type: text/plain; charset=utf-8
Last-Modified: Fri, 17 Jan 2020 10:26:16 GMT
Client-Date: Fri, 17 Jan 2020 10:27:39 GMT
Client-Peer: 127.0.0.1:8080
Client-Response-Num: 1

Does this. Note the 4096 bytes read and INFO log message about the file being copied from backing store.

2020/01/17 10:27:39 DEBUG : /: Lookup: name="test-no-extension"
2020/01/17 10:27:39 DEBUG : /: >Lookup: node=test-no-extension, err=<nil>
2020/01/17 10:27:39 DEBUG : test-no-extension: Attr: 
2020/01/17 10:27:39 DEBUG : test-no-extension: >Attr: a=valid=1s ino=0 size=6 mode=-rw-rw-r--, err=<nil>
2020/01/17 10:27:39 DEBUG : /: Lookup: name="test-no-extension"
2020/01/17 10:27:39 DEBUG : /: >Lookup: node=test-no-extension, err=<nil>
2020/01/17 10:27:39 DEBUG : test-no-extension: Attr: 
2020/01/17 10:27:39 DEBUG : test-no-extension: >Attr: a=valid=1s ino=0 size=6 mode=-rw-rw-r--, err=<nil>
2020/01/17 10:27:39 DEBUG : test-no-extension: Open: flags=OpenReadOnly
2020/01/17 10:27:39 DEBUG : test-no-extension: Open: flags=O_RDONLY
2020/01/17 10:27:39 DEBUG : test-no-extension: >Open: fd=test-no-extension (rw), err=<nil>
2020/01/17 10:27:39 DEBUG : test-no-extension: >Open: fh=&{test-no-extension (rw)}, err=<nil>
2020/01/17 10:27:39 DEBUG : test-no-extension: Open: flags=OpenReadOnly
2020/01/17 10:27:39 DEBUG : test-no-extension: Open: flags=O_RDONLY
2020/01/17 10:27:39 DEBUG : test-no-extension: >Open: fd=test-no-extension (rw), err=<nil>
2020/01/17 10:27:39 DEBUG : test-no-extension: >Open: fh=&{test-no-extension (rw)}, err=<nil>
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: Read: len=4096, offset=0
2020/01/17 10:27:39 DEBUG : test-no-extension: Need to transfer - File not found at Destination
2020/01/17 10:27:39 DEBUG : test-no-extension: MD5 = b1946ac92492d2347c6235b4d2611184 OK
2020/01/17 10:27:39 INFO  : test-no-extension: Copied (new)
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a7c0): Opening cached copy with flags=O_RDONLY
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: >Read: read=6, err=<nil>
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: Flush: 
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a7c0): RWFileHandle.Flush ignoring flush on unwritten handle
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: >Flush: err=<nil>
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: Release: 
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a7c0): RWFileHandle.Release closing
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a7c0): close: 
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: Flush: 
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: >Flush: err=<nil>
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a7c0): >close: err=<nil>
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: >Release: err=<nil>
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: Release: 
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a740): RWFileHandle.Release closing
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a740): close: 
2020/01/17 10:27:39 DEBUG : test-no-extension(0xc00008a740): >close: err=<nil>
2020/01/17 10:27:39 DEBUG : &{test-no-extension (rw)}: >Release: err=<nil>

What I’d like to do is disable mime type detection for files of unknown extensions and just return the mime type for them as application/octet-stream. This would fix the problem.

The mime directive only lets you set mime types for extensions with . in as far as I can see and I couldn’t get it to apply to files without extensions so I couldn’t get sniffing disabled.

Is this possible somehow?

Hey @ncw! Long time no see :slight_smile:, glad to have you here.

I admit it’s been a while since I looked at the code for v1’s file serving, but can you try setting the header yourself? The sniffing is done by the Go standard lib but might not if the header is already set:

header / Content-Type application/octet-stream

(By the way, Caddy 2 does not do MIME sniffing.)

:smiley:

Hmm, I only want to set the header on the files with no extensions which might be tricky.

That is probably the best solution! I haven’t experimented with caddy2 yet but I will have a go.

1 Like

Sadly, this is tricky in Caddy 1. Even if Caddy 2 did MIME sniffing (it doesn’t), this would be suuuper easy in Caddy 2.

Beta 13 of Caddy 2 is coming out today or Monday (depends on how many docs for it I want to get done first) – just so you know. It has some significant improvements to the Caddyfile. But, if Caddy 2 works for you as-is, then please by all means try it! It’s still in beta, of course, so keep your v1 configs handy just in case. Let me know how it goes.

1 Like

@ncw To help you get started, I think the equivalent v2 Caddyfile for your downloads site would be:

beta.rclone.org {
    root * /mnt/beta.rclone.org
    file_server browse
    tls me@example.com
}

Error logs in v2 automatically go to stderr by default.

Note that log (access logging) is not yet implemented in the v2 Caddyfile, but is configurable via JSON. I’ll get around to that soon-ish.

If your ACME email is the same for all sites, you can instead put:

{
    email me@example.com
}

at the very top of your v2 Caddyfile and remove the tls directive.

1 Like

Thanks @matt that is super helpful! I spent a while reading the caddy 2 docs - it looks very powerful but I found it hard to get going being a copy the example and mutate it sort of experimenter. A concrete example like that to start from is very useful :slight_smile:

1 Like

That’s probably because the v2 Caddyfile isn’t documented yet :wink: I’m working on it today.

1 Like

I had a go with this with caddy2

$ HEAD http://127.0.0.1:8080/test-no-extension
200 OK
Connection: close
Date: Sat, 18 Jan 2020 09:34:47 GMT
Accept-Ranges: bytes
ETag: "q48ybs6"
Server: Caddy
Content-Length: 6
Last-Modified: Fri, 17 Jan 2020 10:26:16 GMT
Client-Date: Sat, 18 Jan 2020 09:34:47 GMT
Client-Peer: 127.0.0.1:8080
Client-Response-Num: 1

It doesn’t read the file which is excellent :slight_smile: It doesn’t provide the Content-Type in the HEAD request which seems like a good compromise to me.

I’m gong to mark this thread as fixed - upgrade to caddy 2 being the answer!

1 Like

Great!

You can also set a Content-Type manually if you want to… letsee…

@no_ext {
    path_regexp .*/[^.]+$
}
header @no_ext Content-Type application/octet-stream

I believe that should match all files that do not have a dot in their names.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.