Zero downtime deployment with a single service + health check

Malcolm_Crum · August 30, 2021, 10:58pm

1. Caddy version (`caddy version`):

v2.4.3 h1:Y1FaV2N4WO3rBqxSYA8UZsZTQdN+PwcoOcAiZTM8C0I=

2. How I run Caddy:

a. System environment:

OSX 11.2.3 (20D91)

b. Command:

caddy run

d. My complete Caddyfile or JSON config:

:80 {
    reverse_proxy http://localhost:4568 {
        lb_try_duration 10s
    }
}

3. The problem I’m having:

I’m trying to achieve a dead simple zero downtime deployment, with just a single backend server. I noticed that Caddy will buffer my requests when my backend goes down with the lb_try_duration setting, which is really helpful - I don’t mind if users have delayed responses, as long as they don’t see errors.

However, I have an application that uses Spark (https://sparkjava.com/), which eagerly starts the server - if I register 10 routes, the web server will start serving content when the first route is loaded, and I try to query the tenth route when the first ones are still loading, I’ll get a 404.

What I imagined I could do was to set a health endpoint that just returns 200 and set it as my last loaded route in Spark. Then, when my service went down to restart, I’d have Caddy poll /health to verify it’s back up, then forward the requests on. But I can’t figure out how to do this - it’s an odd combination of passive checks (to know it’s down) and active (to know it’s back up).

Is this possible?

francislavoie · August 31, 2021, 3:47am

Yeah, that can work fine. The upstream will only be marked healthy if both active and passive health checks say it’s good to go.

You’ll probably want to adjust the transport’s dial_timeout btw, because otherwise lb_try_duration might get missed from the timeout being greater than the try duration.

Malcolm_Crum · August 31, 2021, 8:16pm

Fantastic! It all works great. For reference, here’s what I went with:

my.domain.com {
     reverse_proxy http://localhost:4567 {
            lb_try_duration 60s
            fail_duration 1s
            health_uri /health
            health_interval 1s
    }
}

system · September 29, 2021, 10:58pm

This topic was automatically closed after 30 days. New replies are no longer allowed.