Requests not completing / stuck - possibly cache plugin issue

I’ve been struggling with tracking the details of this one down enough to report it…
… so really I’m here asking “What can I provide to help track down this issue and help get it fixed?”

The Problem

Some requests made via Caddy to a backend server stall and don’t complete. I’ve only seen it for static files so far.
Requests made directly to the backend server complete as expected.

Workaround

Restarting Caddy when the problem occurs resolves the issue - until it occurs again.
Currently every week or so for a specific file (the PDF mentioned below), although it should be noted - I restarted the server for a config change and saw that the PDF could not be downloaded the next day.

What I’m seeing

Currently I’m seeing requests for some files stall / not complete, the curl output looks like this:
The IP address and location have been adjusted / masked in this output.

λ curl --verbose https://www.example.com/wp-content/uploads/sites/99/2018/09/Website-Catalogue-18.7MB.pdf
*   Trying 192.168.0.103...
* TCP_NODELAY set
* Connected to www.example.com (192.168.0.103) port 443 (#0)
* schannel: SSL/TLS connection with www.example.com port 443 (step 1/3)
* schannel: checking server certificate revocation
* schannel: sending initial handshake data: sending 189 bytes...
* schannel: sent initial handshake data: sent 189 bytes
* schannel: SSL/TLS connection with www.example.com port 443 (step 2/3)
* schannel: failed to receive handshake, need more data
* schannel: SSL/TLS connection with www.example.com port 443 (step 2/3)
* schannel: encrypted data got 3676
* schannel: encrypted data buffer: offset 3676 length 4096
* schannel: sending next handshake data: sending 93 bytes...
* schannel: SSL/TLS connection with www.example.com port 443 (step 2/3)
* schannel: encrypted data got 186
* schannel: encrypted data buffer: offset 186 length 4096
* schannel: SSL/TLS handshake complete
* schannel: SSL/TLS connection with www.example.com port 443 (step 3/3)
* schannel: stored credential handle in session cache
> GET /wp-content/uploads/sites/99/2018/09/Website-Catalogue-18.7MB.pdf HTTP/1.1
> Host: www.example.com
> User-Agent: curl/7.55.1
> Accept: */*
>

Where the request doesn’t appear to complete and I have to Ctrl+C to cancel / break out.
In a web browser this is seen as the loading spinner continuing to go around.

In this particular case this problem has occurred repeatedly once every couple of weeks for this specific 18.7MB PDF file.
Previously though I’ve experienced cases were it appears a WordPress page is not loading backend or frontend - only to open up Chrome Developer console Network tab and hit refresh to see that is a JavaScript file which is not completing loading blocking the page from completing loading for the user.

It’s been more consistently occurring with this larger 18.7MB PDF file - where as when previously spotted it might be a different .js file each time.

Obviously it might / or might not be occurring for other files and I’m just not noticing.

Configuration Source Server

The backend server responds to static file requests with future expiry times.

In the case of the PDF file, the headers of the response are as follows:

HTTP/1.1 200 OK
Date: Wed, 21 Nov 2018 03:10:24 GMT
Server: Apache
Last-Modified: Fri, 14 Sep 2018 15:46:05 GMT
Accept-Ranges: bytes
Content-Length: 19623339
Cache-Control: max-age=2592000
Expires: Fri, 21 Dec 2018 03:10:24 GMT
X-Content-Type-Options: nosniff
Connection: close
Content-Type: application/pdf

The backend server consistently replies with “Connection: close” so there is no “keep-alive” to the source server.

And when Caddy works the response headers are as follows:

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=2592000
Content-Length: 19623339
Content-Type: application/pdf
Date: Wed, 21 Nov 2018 03:02:43 GMT
Expires: Fri, 21 Dec 2018 03:02:43 GMT
Last-Modified: Fri, 14 Sep 2018 15:46:05 GMT
Server: Caddy
Server: Apache
X-Cache-Status: hit
X-Content-Type-Options: nosniff

Configuration - Caddy

Config file is similar to the following:

(wordpress_default) {
	tls support@example.com
	cache {
		status_header X-Cache-Status
	}
}
example.com,
www.example.com {
	import wordpress_default
	proxy / 192.168.152.198 {
		transparent
	}
}

I don’t store the cache in specific location so I assume it goes to temp and gets cleared on Caddy restart

More notes

  • I am currently seeing the following issue also (may be unrelated / related):
  • I did note that “range” requests are not supported, this is happening more often / notably with a large file - perhaps “range requests” / locking around this is at fault as they are more likely to happen with large files?
  • I’ve setup pprof on one of the domains now so may be I can capture some detail when I spot it next.

What next?

  • What if anything can I do / provide when it occurs next?
  • Is a stack dump from pprof helpful?
1 Like

This here is a very nice post! Thanks for putting so much detail into your request.

First step needs to be to accurately identify the culprit. If it is cache, I don’t know if there’s going to be much I (or Caddy contributors) can do to help but point you in the right direction and get this information to the plugin’s author.

Is it going to be feasible to run Caddy without cache enabled on this particular site for a while to see if the symptoms desist?

1 Like

Because this so unreliably / rarely occurs and is not consistent - I would love to be able to catch enough detail when it happens for the issue to be debugged and resolved.

Sure I have now disabled caching for the single domain for now…
… but considering how unreliably / rarely this occurs is this really the best course to take?

It could possibly be months of running with no “stuck” responses and still be no better off.

You’ve already collected quite a bit of info. There might be more specific information to collect that will be useful, but I’m not sure what. If Caddy makes it past your mean time to failure thus far with cache disabled, I’d take it to GitHub - nicolasazrak/caddy-cache: Caching middleware for caddy and open an issue there, citing this thread.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.