Gzipped sidecar file: wrong same Etag?

1. Output of caddy version:

docker run -it --rm caddy:2.5.2-alpine caddy version
# v2.5.2 h1:eCJdLyEyAGzuQTa5Mh3gETnYWDClo1LjtQm2q9RNZrs=

2. How I run Caddy:

echo $SHELL
# /opt/homebrew/bin/fish
docker run -i --rm --name caddy -p 8080:80 -v $PWD/Caddyfile:/etc/caddy/Caddyfile:ro -v $PWD/data:/usr/share/caddy:ro caddy:2.5.2-alpine

I just serve a big text file, with a pre-gzipped sidecar file:

yes | head -n 50000000 >data/y.txt
gzip -k data/y.txt
ls -lh data
# -rw-r--r--   1 j  staff    95M Aug  8 16:54 y.txt
# -rw-r--r--   1 j  staff    95K Aug  8 16:54 y.txt.gz

a. System environment:

uname -a
# Darwin jannis-mba.fritz.box 21.6.0 Darwin Kernel Version 21.6.0: Sat Jun 18 17:05:47 PDT 2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8101 arm64
sw_vers
# ProductName:	macOS
# ProductVersion:	12.5
# BuildVersion:	21G72

docker version
# Client:
#  Cloud integration: v1.0.24
#  Version:           20.10.17
#  API version:       1.41
#  Go version:        go1.17.11
#  Git commit:        100c701
#  Built:             Mon Jun  6 23:04:45 2022
#  OS/Arch:           darwin/arm64
#  Context:           default
#  Experimental:      true
# 
# Server: Docker Desktop 4.10.1 (82475)
#  Engine:
#   Version:          20.10.17
#   API version:      1.41 (minimum version 1.12)
#   Go version:       go1.17.11
#   Git commit:       a89b842
#   Built:            Mon Jun  6 23:01:01 2022
#   OS/Arch:          linux/arm64
#   Experimental:     false
#  containerd:
#   Version:          1.6.6
#   GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
#  runc:
#   Version:          1.1.2
#   GitCommit:        v1.1.2-0-ga916309
#  docker-init:
#   Version:          0.19.0
#   GitCommit:        de40ad0

b. Command:

docker run -i --rm --name caddy -p 8080:80 -v $PWD/Caddyfile:/etc/caddy/Caddyfile:ro -v $PWD/data:/usr/share/caddy:ro caddy:2.5.2-alpine

c. Service/unit/compose file:

–

d. My complete Caddy config:

localhost:80 {
	root * /usr/share/caddy
	file_server browse {
		precompressed gzip
		disable_canonical_uris
	}
}

3. The problem I’m having:

The gzipped entity (HTTP lingo for a variant of the resource), requested via Accept-Encoding: gzip and served with Content-Encoding: gzip has the same ETag as the un-gzipped one.

4. Error messages and/or full log output:

curl 'http://localhost:8080/y.txt' -v --no-progress-meter | wc -c
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /y.txt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Content-Length: 100000000
< Content-Type: text/plain; charset=utf-8
< Etag: "rgaxfh1njchs"
< Last-Modified: Mon, 08 Aug 2022 14:54:53 GMT
< Server: Caddy
< Date: Mon, 08 Aug 2022 15:01:28 GMT
< 
{ [11107 bytes data]
* Connection #0 to host localhost left intact
 100000000
curl 'http://localhost:8080/y.txt' -H 'Accept-Encoding: gzip' -v --no-progress-meter | wc -c
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /y.txt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.79.1
> Accept: */*
> Accept-Encoding: gzip
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Content-Encoding: gzip
< Content-Type: text/plain; charset=utf-8
< Etag: "rgaxfh1njchs"
< Last-Modified: Mon, 08 Aug 2022 14:54:53 GMT
< Server: Caddy
< Vary: Accept-Encoding
< Date: Mon, 08 Aug 2022 15:01:22 GMT
< Transfer-Encoding: chunked
< 
{ [18299 bytes data]
* Connection #0 to host localhost left intact
   97244

5. What I already tried:

nginx handles this according to the HTTP spec(s).

server {
	listen 80 default_server;
	listen [::]:80 default_server;
	server_name _;

	root /var/www/html;
	autoindex on;
	gzip on;
	gzip_types text/plain;
	gzip_min_length 100;
	gzip_static on;
	gzip_vary on;

	location / {
		try_files $uri $uri/ =404;
	}
}
docker run -it --rm -p 8080:80 -v $PWD/nginx.conf:/etc/nginx/conf.d/default.conf -v $PWD/data:/var/www/html:ro nginx
curl 'http://localhost:8080/y.txt' -v --no-progress-meter | wc -c
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /y.txt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.23.1
< Date: Mon, 08 Aug 2022 15:21:41 GMT
< Content-Type: text/plain
< Content-Length: 100000000
< Last-Modified: Mon, 08 Aug 2022 14:54:53 GMT
< Connection: keep-alive
< Vary: Accept-Encoding
< ETag: "62f123bd-5f5e100"
< Accept-Ranges: bytes
< 
{ [32768 bytes data]
* Connection #0 to host localhost left intact
 100000000
curl 'http://localhost:8080/y.txt' -H 'Accept-Encoding: gzip' -v --no-progress-meter | wc -c
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /y.txt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.79.1
> Accept: */*
> Accept-Encoding: gzip
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.23.1
< Date: Mon, 08 Aug 2022 15:22:15 GMT
< Content-Type: text/plain
< Content-Length: 97244
< Last-Modified: Mon, 08 Aug 2022 14:54:53 GMT
< Connection: keep-alive
< Vary: Accept-Encoding
< ETag: "62f123bd-17bdc"
< Content-Encoding: gzip
< 
{ [14480 bytes data]
* Connection #0 to host localhost left intact
   97244

6. Links to relevant resources:

From RFC 7231 (HTTP/1.1 Semantics and Content):

3 Representations

An origin server might be provided with, or be capable of generating, multiple representations that are each intended to reflect the current state of a target resource. In such cases, some algorithm is used by the origin server to select one of those representations as most applicable to a given request, usually based on content negotiation. This “selected representation” is used to provide the data and metadata for evaluating conditional requests […] and constructing the payload […].

3.1 Representation Metadata

Representation header fields provide metadata about the representation.
The following header fields convey representation metadata:

  • Content-Type […]
  • Content-Encoding […]
    […]

3.1.2.1 Content Codings

Content coding values indicate an encoding transformation that has been or can be applied to a representation.

From RFC 7232 (Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests):

2.3 ETag

The ETag header field in a response provides the current entity-tag for the selected representation, as determined at the conclusion of handling the request. An entity-tag is an opaque validator for differentiating between multiple representations of the same resource, regardless of whether those multiple representations are due to resource state changes over time, content negotiation resulting in multiple representations being valid at the same time, or both.

2.3.3 Example: Entity-Tags Varying on Content-Negotiated Resources

Consider a resource that is subject to content negotiation […], and where the representations sent in response to a GET request vary based on the Accept-Encoding request header field […]:

GET /index HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip

In this case, the response might or might not use the gzip content coding. If it does not, the response might look like:

HTTP/1.1 200 OK
Date: Fri, 26 Mar 2010 00:05:00 GMT
ETag: "123-a"
Content-Length: 70
Vary: Accept-Encoding
Content-Type: text/plain

Hello World!
Hello World!
Hello World!
Hello World!
Hello World!

An alternative representation that does use gzip content coding would be:

HTTP/1.1 200 OK
Date: Fri, 26 Mar 2010 00:05:00 GMT
ETag: "123-b"
Content-Length: 43
Vary: Accept-Encoding
Content-Type: text/plain
Content-Encoding: gzip

...binary data...

Note: Content codings are a property of the representation data, so a strong entity-tag for a content-encoded representation has to be distinct from the entity tag of an unencoded representation to prevent potential conflicts during cache updates and range requests. In contrast, transfer codings […] apply only during message transfer and do not result in distinct entity-tags.

2 Likes

Probably an oversight. The code does seem to always use the “primary” file to compute the Etag.

1 Like

Oh, easy fix. Thanks! Will push that soon. (Edit: Fix pushed.)

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.