Website sometime not responding in v2.4.0-beta2. Caddy service status shows failed after caddy upgrade

1. Caddy version (caddy version):

v2.4.0-beta.2 h1:DUaK4qtL3T0/gAm0fVVkHgcMN04r4zGpfPUZWHRR8QU=

2. How I run Caddy:

a. System environment:

Ubuntu 20.04.1 LTS

b. Command:

sudo systemctl restart caddy

c. Service/unit/compose file:

none

d. My complete Caddyfile or JSON config:

{
    on_demand_tls {
        ask https://my.hypershapes.com/validate
        interval 2m
        burst 10
    }
}

(root) {
    root * /var/www/{args.0}
}

(baseSetup) {
    file_server
    php_fastcgi unix//run/php/php7.4-fpm.sock
    encode gzip zstd

    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload; always"
		X-Frame-Options "SAMEORIGIN"
		X-XSS-Protection "1; mode=block"
		X-Content-Type-Options "nosniff"
    }

    @static {
		file
		path *.ico *.css *.js *.gif *.jpg *.jpeg *.png *.svg *.woff *.woff2 *.json
	}
	header @static Cache-Control max-age=5184000

    log {
        output file /var/log/caddy/access.log {
            roll_size 2MiB
            roll_keep 100
            roll_keep_for 1440h
        }
    }
}

my.hypershapes.com {
    import root hypershapes/public
    import baseSetup 
}

affiliate.hypershapes.com {
    import root hypershapes/public
    import baseSetup
}

admin.hypershapes.com {
    import root hypershapes-master-admin/dist
    import baseSetup
}

*.hypershapes.com {
    import root hypershapes/public
    import baseSetup

    tls {
        dns cloudflare {REDACTED}    
    }
}

https:// {
    import root hypershapes/public
    import baseSetup

    tls {
        on_demand
    }
}

3. The problem I’m having:

All my websites hosted with Caddy web server not responding, although the status of caddy service shows active (running).

Status of caddy when websites not responding:

The result of curl -v -I of one of my website:


No responses are returned after long wait.

This issue happens in my production server only, and never happen in my staging server, even their Caddyfile configurations are the same. This usually happens once per week or few times a day. And so far only can be solved by running

sudo systemctl restart caddy

4. Error messages and/or full log output:

Journalctl log:

I found this error message keep appear in my log, is it related to the problem I faced?

May 12 07:15:56 hyper-prod-ubuntu-sgp1-01 caddy[25352]: {"level":"error","ts":1620803756.6630805,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"http: wrote more than the declared Content-Length"}

5. What I already tried:

sudo systemctl restart caddy

6. Links to relevant resources:

  1. full journalctl of that day:
    https://www.notion.so/journalctl-u-caddy-since-2021-05-13-13734ed935df4419896244a11f9c8c29

  2. htop output
    https://drive.google.com/file/d/1En239Tg0t3Ox0q2p-WIS0QZ8MBs7qPzt/view?usp=sharing

Please upgrade to v2.4.0 stable! It was just released earlier this week, with plenty of fixes.

That’s pretty weird. That suggests a bug in your upstream server possibly :thinking:

Please try again with v2.4.0 to see if it works any better, and if not, @matt might need to chime in with ideas for debugging this.

Hi.

When I tried to upgrade caddy to v2.4.0 stable with caddy upgrade, errors occur, and caddy service’s status becomes failed now. (I ran with root user)

Status of caddy service

Journalctl logs

Any idea what has happened?

Yeah, there’s a known issue with caddy upgrade that’s already been fixed and will ship with v2.4.1, it doesn’t preserve the permissions on the binary correctly when it swaps it out.

You’ll need to change the permissions for /usr/bin/caddy back to 755 by hand for now.

After manually change the permission of /usr/bin/caddy to 755, the caddy service still failed eh.

Here are all the steps I ran:

  1. caddy upgrade
  2. sudo chmod 755 /usr/bin/caddy
  3. sudo systemctl daemon-reload

The status of caddy service now:

Am I wrongly executed anything here? Any help is appreciated ^____^

Updated:
My websites are working even the caddy status is failed.

That means you must have a 2nd instance of Caddy running also using port 2019. Make sure to kill off any of them, then restart the systemd service.

As I checked, looks like only one instance of Caddy is using port 2019.

What can I do to get the status of Caddy service back to active(running). Do I need to kill this caddy instance and restart the service?

Yes, exactly. Kill that one, because it’s not managed by systemd, then restart systemd.

Hi.

After kill the process, the status becomes active (running) now.

Thank you for your help.

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.