What is the best practise for doing a health-check for Caddy containers?

rugk · July 10, 2021, 12:32am

I’m sorry for deleting your template, but this is really a best-practise question and not a support/issue case I am having, so the template makes no sense…

As such just the important parts…

The aim is just that I want a simple Docker/podman/container healthcheck for my Caddy service.
As a simple try, I tried using the /metrics endpoint…

Note that you cannot use curl, because that is not included in caddy’s Docker image.

1. Caddy version (`caddy version`):

v2.4.3 h1:Y1FaV2N4WO3rBqxSYA8UZsZTQdN+PwcoOcAiZTM8C0I=

2. How I run Caddy:

podman-compose -t identity -p caddy up

b. Command:

c. Service/unit/compose file:

version: "3.7"

services:

  caddy:
    image: caddy
    restart: unless-stopped
    network_mode: "slirp4netns:port_handler=slirp4netns,enable_ipv6=true,allow_host_loopback=true"
    ports:
      - "80:80"
      - "443:443"
      # […]
      - "2019:2019"
    volumes:
      - caddy_data:/data
      - caddy_config:/config
      # […]
    environment:
      - HOST_DOMAIN=host.containers.internal
      # […]
    healthcheck:
      # https://stackoverflow.com/a/47722899/5008962
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:2019/metrics", "||", "exit", "1"]
      interval: 1m30s
      timeout: 10s
      retries: 3
      start_period: 40s
    labels:
      - io.containers.autoupdate=registry

volumes:
  caddy_data:
    # always persist volume by forcing external creation
    # https://docs.docker.com/compose/compose-file/compose-file-v3/#external
    external: true
  caddy_config:

d. My complete Caddyfile or JSON config:

{
	admin off
	# debug
}

# […]


# manually expose metrics as we disabled the admin API
:2019 {
	metrics /metrics
}

# […]

Now the question

So, the question: What would you suggest as a generic/general healthcheck for caddy containers?
I.e. of course I could check also one service (e.g. that the server serves a file or so) or a reverse-proxy as a healthcheck, but IMHO this is too much, especially as it (for the reverse-proxy thing) also adds dependencies, i.e. not only checks the health of caddy, but other services as well…
As such, what is the best practice here?
And if “admin API” is the solution, what would you suggest if the admin API is disabled, as some users do?

Is using the metrics endpoint suitable for that?

I’m running this caddy mainly just as a reverse proxy and static file server…

Related error?

Though, with what I’m currently doing, I’m getting these strange errors at the metrics endpoint, and as far as I see only the healthcheck can cause these (see the two different error messages at the right):

{"level":"error","ts":1625875789.598645,"logger":"http.handlers.metrics","msg":"error encoding and sending metric family:write tcp [::1]:2019->[::1]:60652: write: broken pipe"}
{"level":"error","ts":1625875880.612273,"logger":"http.handlers.metrics","msg":"error encoding and sending metric family:write tcp [::1]:2019->[::1]:60654: write: connection reset by peer"}
{"level":"error","ts":1625875971.5551207,"logger":"http.handlers.metrics","msg":"error encoding and sending metric family:write tcp [::1]:2019->[::1]:60656: write: connection reset by peer"}
{"level":"error","ts":1625876062.5940106,"logger":"http.handlers.metrics","msg":"error encoding and sending metric family:write tcp [::1]:2019->[::1]:60658: write: broken pipe"}

Maybe it’s because wget is doing a --spider request, i.e. just checking whether the “file” is there and not actually downloading it?

Also

I saw this thread and the feature request for a graceful shutdown that are related, but that is way above my use case, I do not have a load balances.
I just want a simple solution/health check that works in 90% of the use cases and that makes sense from your perspective.

basil · July 10, 2021, 1:45am

Does this meet your requirements? /reverse_proxy/upstreams

francislavoie · July 10, 2021, 3:04am

I would recommend just not disabling the admin API. It’s pretty core to how Caddy works. If you turn it off, then you can’t reload Caddy gracefully.

Yeah I think --spider closes the connection before Caddy is done sending the response. Probably not ideal to use.

You could just do respond "OK" 200 in a random site block and watch for the response body to be exactly OK.

The definition of “health” is very blurry, it entirely depends on what you want. So Caddy doesn’t make any assumptions for that.

rugk · July 13, 2021, 3:12pm

Not not really, because I explicitly do not want to check the backends, but thanks anyway.

Yeah thanks, maybe that is indeed a good idea…

system · August 9, 2021, 12:33am

This topic was automatically closed after 30 days. New replies are no longer allowed.