1. Caddy version (caddy version
):
v2.4.5
2. How I run Caddy:
Through docker-compose
a. System environment:
VPS:
Ubuntu 18.04 Server
Docker (looks like this could do with an update…):
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:29:52 2019
OS/Arch: linux/amd64
Experimental: false
b. Command:
sudo docker-compose -f docker-compose-master.yml up -d
c. Service/unit/compose file:
docker-compose-master.yml:
version: "3.1"
services:
caddy:
image: caddy
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "24m"
container_name: gb-caddy
working_dir: /application
environment:
- HOST=getblogged.net
volumes:
- .:/application
- ./phpdocker/caddy/Caddyfile:/etc/caddy/Caddyfile
- ./phpdocker/caddy/data:/data
- ./phpdocker/caddy/config:/config
- /var/log/caddy/:/var/log/caddy/
depends_on:
- php-fpm
- php-fpm-ws
ports:
- 80:80
- 443:443
restart: unless-stopped
mysql:
image: mysql:5.6
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "24m"
container_name: gb-mysql
working_dir: /application
volumes:
- ./phpdocker/db-import:/docker-entrypoint-initdb.d
- ./phpdocker/db:/var/lib/mysql
- .:/application
environment:
- MYSQL_ROOT_PASSWORD=<removed>
- MYSQL_DATABASE=<removed>
- MYSQL_USER=<removed>
- MYSQL_PASSWORD=<removed>
ports:
- "8002:3306"
php-fpm:
build: phpdocker/php-fpm
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "24m"
container_name: gb-php-fpm
working_dir: /application
volumes:
- .:/application
- ./phpdocker/php-fpm/php-ini-overrides.ini:/etc/php/7.3/fpm/conf.d/99-overrides.ini
environment:
- DOCKER_MYSQL_HOST=gb-mysql
php-fpm-ws:
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "24m"
build: phpdocker/php-fpm
container_name: gb-php-fpm-ws
working_dir: /application
restart: always
volumes:
- .:/application
- ./phpdocker/php-fpm/php-ini-overrides.ini:/etc/php/7.3/fpm/conf.d/99-overrides.ini
environment:
- DOCKER_MYSQL_HOST=gb-mysql
command: php public_html/getblogged/app/chat/php-socket.php
php-fpm-setup:
build: phpdocker/php-fpm-setup
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "24m"
container_name: gb-php-fpm-setup
working_dir: /application
volumes:
- .:/application
- ./phpdocker/php-fpm/php-ini-overrides.ini:/etc/php/7.3/fpm/conf.d/99-overrides.ini
environment:
- DOCKER_MYSQL_HOST=gb-mysql
d. My complete Caddyfile or JSON config:
Caddyfile:
{
debug
}
(app) {
root * /application/public_html/getblogged/app
encode gzip zstd
#CORS
header {
Access-Control-Allow-Origin https://www.getblogged.net
}
@ws {
header Connection *Upgrade*
header Upgrade websocket
}
reverse_proxy @ws php-fpm-ws:8090
php_fastcgi php-fpm:9000
file_server
log {
output file /var/log/caddy/app.access.log {
roll_size 100mb
roll_keep 30
roll_keep_for 720h
}
format json
level DEBUG
}
}
(portal) {
root * /application/public_html/getblogged/portal
encode gzip zstd
@ws {
header Connection *Upgrade*
header Upgrade websocket
}
reverse_proxy @ws php-fpm-ws:8090
php_fastcgi php-fpm:9000
file_server
log {
output file /var/log/caddy/portal.access.log {
roll_size 100mb
roll_keep 30
roll_keep_for 720h
}
format json
level DEBUG
}
}
app.{$HOST} {
#tls internal
import app
}
portal.{$HOST} {
#tls internal
import portal
}
:80 {
redir https://{host}{uri}
}
:443 {
tls {
on_demand
}
import app
}
3. The problem I’m having:
Sporadically, seemingly when the site is busy the site will stop responding. The browser just waits for it to load before timing out.
We’ve been having this issue for a few months now where the site will go down between 2 to 10 times a week. We were initially using a LetsEncrypt container with NGINX and switch to Caddy so we could take advantage of the on demand TLS and have been having this issues since.
I’ve tried to capture as much debug output as possible but as this only happens on our production server I am in a rush to get it back up.
Any help would be extremely appreciated. Been pulling my hair out for a few months now on this one!
curl -v output (clientside):
* Trying 51.68.212.198:443...
* TCP_NODELAY set
* Connected to app.getblogged.net (51.68.212.198) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=app.getblogged.net
* start date: Dec 17 12:31:18 2021 GMT
* expire date: Mar 17 12:31:17 2022 GMT
* subjectAltName: host "app.getblogged.net" matched cert's "app.getblogged.net"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x56501e0a2e30)
> GET / HTTP/2
> Host: app.getblogged.net
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
curl -v output on VPS when down:
root@vps784967:~# curl -v https://app.getblogged.net/
* Trying 51.68.212.198...
* TCP_NODELAY set
* connect to 51.68.212.198 port 443 failed: Connection refused
* Failed to connect to app.getblogged.net port 443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to app.getblogged.net port 443: Connection refused
curl -v output on VPS when working fine:
root@vps784967:~# curl -v https://app.getblogged.net/
* Trying 51.68.212.198...
* TCP_NODELAY set
* Connected to app.getblogged.net (51.68.212.198) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=app.getblogged.net
* start date: Dec 17 12:31:18 2021 GMT
* expire date: Mar 17 12:31:17 2022 GMT
* subjectAltName: host "app.getblogged.net" matched cert's "app.getblogged.net"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* Using Stream ID: 1 (easy handle 0x555a6af0e600)
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
> GET / HTTP/2
> Host: app.getblogged.net
> User-Agent: curl/7.58.0
> Accept: */*
>
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
< HTTP/2 302
< access-control-allow-origin: https://www.getblogged.net
< cache-control: no-store, no-cache, must-revalidate
< content-type: text/html; charset=UTF-8
< expires: Thu, 19 Nov 1981 08:52:00 GMT
< location: /login
< pragma: no-cache
< server: Caddy
< set-cookie: PHPSESSID=dvql9h5gpmo59493pegk9evrul; path=/; domain=.app.getblogged.net
< status: 302 Found
< content-length: 0
< date: Thu, 10 Feb 2022 17:02:14 GMT
<
* Connection #0 to host app.getblogged.net left intact
4. Error messages and/or full log output:
We don’t actually get any errors. Here are the last log entries though before it stopped responding:
caddy container log:
{"log":"{\"level\":\"debug\",\"ts\":1644496405.8380249,\"logger\":\"http.handlers.reverse_proxy\",\"msg\":\"upstream roundtrip\",\"upstream\":\"php-fpm-ws:8090\",\"request\":{\"remote_addr\":\"97.70.160.221:50566\",\"proto\":\"HTTP/1.1\",\"method\":\"GET\",\"host\":\"ws.getblogged.net\",\"uri\":\"/chat/php-socket.php\",\"headers\":{\"Sec-Websocket-Version\":[\"13\"],\"Sec-Websocket-Extensions\":[\"permessage-deflate\"],\"Accept-Encoding\":[\"gzip, deflate\"],\"Sec-Websocket-Key\":[\"91piOtRuFiK1mQCwmCfopw==\"],\"Pragma\":[\"no-cache\"],\"Accept-Language\":[\"en-US,en;q=0.9\"],\"Origin\":[\"https://app.getblogged.net\"],\"Cookie\":[\"_fbp=fb.1.1644492997118.1007065267; _ga=GA1.2.1201346913.1644492992; _gid=GA1.2.657030684.1644492992; __hstc=32639585.ac4910a52502a3374020a12569f1a7ab.1643285198506.1643551792117.1643925912344.3; hubspotutk=ac4910a52502a3374020a12569f1a7ab; _hjSessionUser_764671=eyJpZCI6ImUyODgyNzc5LWI2NTctNWFjZS04Y2ZlLTFhNzEyZGVkMWE3ZSIsImNyZWF0ZWQiOjE2NDMyODUxOTczMTUsImV4aXN0aW5nIjp0cnVlfQ==; tk_lr=%22%22; tk_or=%22%22\"],\"X-Forwarded-Proto\":[\"https\"],\"User-Agent\":[\"Mozilla/5.0 (iPhone; CPU iPhone OS 15_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.2 Mobile/15E148 Safari/604.1\"],\"Cache-Control\":[\"no-cache\"],\"Connection\":[\"Upgrade\"],\"Upgrade\":[\"websocket\"],\"X-Forwarded-For\":[\"97.70.160.221\"],\"Accept\":[\"*/*\"]},\"tls\":{\"resumed\":false,\"version\":772,\"cipher_suite\":4865,\"proto\":\"http/1.1\",\"proto_mutual\":true,\"server_name\":\"ws.getblogged.net\"}},\"headers\":{\"Upgrade\":[\"websocket\"],\"Connection\":[\"Upgrade\"],\"Websocket-Origin\":[\"ws.localhost\"],\"Websocket-Location\":[\"wss://ws.localhost/chat/php-socket.php\"],\"Sec-Websocket-Accept\":[\"TicBEHxZ9T7X3tSpTh4F+KpDV84=\"]},\"status\":101}\n","stream":"stderr","time":"2022-02-10T12:33:25.838684474Z"}
{"log":"{\"level\":\"debug\",\"ts\":1644496405.8381214,\"logger\":\"http.handlers.reverse_proxy\",\"msg\":\"upgrading connection\",\"upstream\":\"php-fpm-ws:8090\",\"request\":{\"remote_addr\":\"97.70.160.221:50566\",\"proto\":\"HTTP/1.1\",\"method\":\"GET\",\"host\":\"ws.getblogged.net\",\"uri\":\"/chat/php-socket.php\",\"headers\":{\"Sec-Websocket-Version\":[\"13\"],\"Sec-Websocket-Extensions\":[\"permessage-deflate\"],\"Accept-Encoding\":[\"gzip, deflate\"],\"Sec-Websocket-Key\":[\"91piOtRuFiK1mQCwmCfopw==\"],\"Pragma\":[\"no-cache\"],\"Accept-Language\":[\"en-US,en;q=0.9\"],\"Origin\":[\"https://app.getblogged.net\"],\"Cookie\":[\"_fbp=fb.1.1644492997118.1007065267; _ga=GA1.2.1201346913.1644492992; _gid=GA1.2.657030684.1644492992; __hstc=32639585.ac4910a52502a3374020a12569f1a7ab.1643285198506.1643551792117.1643925912344.3; hubspotutk=ac4910a52502a3374020a12569f1a7ab; _hjSessionUser_764671=eyJpZCI6ImUyODgyNzc5LWI2NTctNWFjZS04Y2ZlLTFhNzEyZGVkMWE3ZSIsImNyZWF0ZWQiOjE2NDMyODUxOTczMTUsImV4aXN0aW5nIjp0cnVlfQ==; tk_lr=%22%22; tk_or=%22%22\"],\"X-Forwarded-Proto\":[\"https\"],\"User-Agent\":[\"Mozilla/5.0 (iPhone; CPU iPhone OS 15_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.2 Mobile/15E148 Safari/604.1\"],\"Cache-Control\":[\"no-cache\"],\"Connection\":[\"Upgrade\"],\"Upgrade\":[\"websocket\"],\"X-Forwarded-For\":[\"97.70.160.221\"],\"Accept\":[\"*/*\"]},\"tls\":{\"resumed\":false,\"version\":772,\"cipher_suite\":4865,\"proto\":\"http/1.1\",\"proto_mutual\":true,\"server_name\":\"ws.getblogged.net\"}}}\n","stream":"stderr","time":"2022-02-10T12:33:25.873362351Z"}
app.access.log
{"level":"info","ts":1644496457.6839786,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"176.24.66.159:56493","proto":"HTTP/2.0","method":"GET","host":"app.getblogged.net","uri":"/res/js/affiliate.js?v=1","headers":{"Accept-Encoding":["gzip, deflate, br"],"User-Agent":["Mozilla/5.0 (iPhone; CPU iPhone OS 15_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Mobile/15E148 Safari/604.1"],"Accept-Language":["en-GB,en;q=0.9"],"Referer":["https://www.getblogged.net/"],"Cookie":["_ga=GA1.2.451233662.1644227918; _gid=GA1.2.2116029319.1644227918; _fbp=fb.1.1644227918287.1124069725; password_cookie_token=87ea99723cb20510881be6957a597d2b; remember_me=1; __hstc=32639585.6fa8383c6b059284785202527ca247c0.1644227919109.1644339483749.1644404403135.5; hubspotutk=6fa8383c6b059284785202527ca247c0; _hjSessionUser_764671=eyJpZCI6IjAyZjcyNTEyLTFjMzItNWYxMS04OGRlLWNmZmU2OGY3NTQxMiIsImNyZWF0ZWQiOjE2NDQyMjc5MTgwMTAsImV4aXN0aW5nIjp0cnVlfQ==; tk_lr=%22%22; tk_or=%22%22; tk_r3d=%22%22; banner-merchandising-oneTime-37=true"],"Accept":["*/*"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"app.getblogged.net"}},"common_log":"176.24.66.159 - - [10/Feb/2022:12:34:17 +0000] \"GET /res/js/affiliate.js?v=1 HTTP/2.0\" 200 537","user_id":"","duration":0.023586972,"size":537,"status":200,"resp_headers":{"Content-Type":["application/javascript"],"Last-Modified":["Wed, 09 Jun 2021 21:01:34 GMT"],"Content-Encoding":["gzip"],"Vary":["Accept-Encoding"],"Server":["Caddy"],"Access-Control-Allow-Origin":["https://www.getblogged.net"],"Etag":["\"qugd2m11f\""]}}
{"level":"info","ts":1644496457.6882086,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"176.24.66.159:56493","proto":"HTTP/2.0","method":"GET","host":"app.getblogged.net","uri":"/res/js/external/register.js?date=2022-02-27b","headers":{"Referer":["https://www.getblogged.net/"],"Cookie":["_ga=GA1.2.451233662.1644227918; _gid=GA1.2.2116029319.1644227918; _fbp=fb.1.1644227918287.1124069725; password_cookie_token=87ea99723cb20510881be6957a597d2b; remember_me=1; __hstc=32639585.6fa8383c6b059284785202527ca247c0.1644227919109.1644339483749.1644404403135.5; hubspotutk=6fa8383c6b059284785202527ca247c0; _hjSessionUser_764671=eyJpZCI6IjAyZjcyNTEyLTFjMzItNWYxMS04OGRlLWNmZmU2OGY3NTQxMiIsImNyZWF0ZWQiOjE2NDQyMjc5MTgwMTAsImV4aXN0aW5nIjp0cnVlfQ==; tk_lr=%22%22; tk_or=%22%22; tk_r3d=%22%22; banner-merchandising-oneTime-37=true"],"Accept":["*/*"],"Accept-Encoding":["gzip, deflate, br"],"User-Agent":["Mozilla/5.0 (iPhone; CPU iPhone OS 15_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Mobile/15E148 Safari/604.1"],"Accept-Language":["en-GB,en;q=0.9"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"app.getblogged.net"}},"common_log":"176.24.66.159 - - [10/Feb/2022:12:34:17 +0000] \"GET /res/js/external/register.js?date=2022-02-27b HTTP/2.0\" 200 3602","user_id":"","duration":0.032281172,"size":3602,"status":200,"resp_headers":{"Server":["Caddy"],"Access-Control-Allow-Origin":["https://www.getblogged.net"],"Etag":["\"r6d8qn9zc\""],"Content-Type":["application/javascript"],"Last-Modified":["Thu, 27 Jan 2022 11:18:23 GMT"],"Content-Encoding":["gzip"],"Vary":["Accept-Encoding"]}}
5. What I already tried:
As this is an on-going development, we expected the issue to stem from our codebase. We removed large portions of slow code and did find some instances where the database was being queried for way too much data at once (we were thinking that we were having memory issues perhaps). We’ve addressed all of these issues and added logging for each query and result which does not appear to have had an effect (though the site does run better when it’s up!).
I saw in a thread yesterday ( Server not reachable after a few days - #26 by mdathersajjad ) where they are having a similar issue with the same result that php-fpm may be to blame. I restarted php-fpm on it’s own to no effect.
Restarted caddy container on it’s own too to no effect.
We doubled the server CPU and RAM wondering if this would help at all but it appears to have had no effect.
I believe I’ve enabled debug logging globally for all containers (let me know if there is anything else needed)
It only seems to come backup when I do a full docker-compose restart.
6. Links to relevant resources:
n/a