Caddy reload hangs / does not reload configuration

1. The problem I’m having:

My caddy reload stopped working a few days ago.

Please note that my caddy.json configuration file is quite big, more than 10MB, and contains a lot of different routes.

The issue only happens when I reload. When I initially run caddy with this configuration file, caddy works as expected.

Little bit more about my setup
I have the initial caddy.json that loads the real configuration file that’s hosted remotely.

For the sake of this post, let’s call these two caddy config files initial and final config file.

Whenever I need to handle a new domain, I generate the new final config, and then reload caddy with caddy reload --config caddy.json --force. This used to work like charm and caddy would start handling the new domain immediately upon reload.

I use grace_period set to 10 seconds.

2. Error messages and/or full log output:

Not exactly an error message, but this is what I found while debugging so maybe it will help you help me :slight_smile:

When I fetch the config file via GET localhost:2019/config, I used to get the final configuration file. But few days ago, all the GET /config requests started returning the initial config file (the one that should load the real configuration file).

I feel that even though caddy reload --config caddy.json --force returned with success, because it’s an asynchronous call, it hangs and never completes. And my only solution is to keep restarting my server so caddy picks up the new configuration file, which is not ideal as you can imagine :joy:

3. Caddy version:

2.10.0

4. How I installed and ran Caddy:

a. System environment:

Docker (Linux)

b. Command:

caddy run --config /etc/caddy/caddy.json

d. My complete Caddy config:

NOTE: I can share my production caddy config file via DM with @matt, @francislavoie or someone else from the Caddy team.

caddy.json (initial

{
  "admin": {
    "config": {
      "load": {
        "method": "GET",
        "module": "http",
        "adapter": "json",
        "url": "https://gist.githubusercontent.com/popcorn/b569e72d2223a0ae96fe97f6f88efe93/raw/53c092a5d698e927df68946b2e849f2bbba11c9c/caddy.json"
      }
    }
  }
}

caddy.json (final config file that gets loaded)
This gist is public so you can fully test it.

{
  "apps": {
    "http": {
    "grace_period": "10s",
      "servers": {
        "tls_terminator": {
          "listen": [
            ":443"
          ],
          "routes": [
            {
              "match": [
                {
                  "host": [
                    "localhost"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "static_response",
                  "body": "Hello, world!",
                  "status_code": 200
                }
              ]
            }
          ],
          "logs": {}
        }
      }
    }
  },
  "admin": {
    "identity": {
      "issuers": [
        {
          "module": "acme",
          "email": "drago@gmail.com+latest"
        }
      ]
    }
  },
  "storage": {
    "module": "file_system",
    "root": "~/caddy-dev/data"
  }
}

Here’s more information on how I run this.

This is my Dockerfile:

FROM caddy:2.10.0-builder AS builder

RUN xcaddy build \
    --with github.com/ss098/certmagic-s3 \
    --with github.com/caddyserver/replace-response \
    --with github.com/caddy-dns/cloudflare \
    --with github.com/zhangjiayin/caddy-geoip2 \
    --with github.com/mholt/caddy-ratelimit \
    --with github.com/pberkel/caddy-storage-redis \
    --with github.com/caddyserver/cache-handler \
    --with github.com/darkweak/storages/go-redis/caddy

FROM caddy:2.10.0

# Install bash and curl
RUN apk update && apk add bash curl

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

# Copy the Caddy configuration file to the container
COPY config/caddy-dev.json /etc/caddy/caddy-dev.json

# Copy the start script and make it executable
COPY /bin/start.sh /usr/local/bin/start.sh
RUN chmod +x /usr/local/bin/start.sh

ENTRYPOINT ["/usr/local/bin/start.sh"]

My start.sh script basically only contains:

#!/bin/bash

# Start Caddy in the background
caddy run --config /etc/caddy/caddy.json &

# Wait for all background processes to finish
wait

I build this Docker image with:

docker build -t tls_reverse_proxy_dev -f Dockerfile .

Then I run it with:

docker run -d --name tls_reverse_proxy_dev \
  -p 80:80 -p 443:443 \
  -v ~/code/tls-reverse-proxy/tls_reverse_proxy_dev/data:/root/caddy-dev/data \
  -v ~/code/tls-reverse-proxy/tls_reverse_proxy_dev/log:/var/log/caddy-dev \
  -v ~/code/tls-reverse-proxy/tls_reverse_proxy_dev/config/caddy-dev.json:/etc/caddy/caddy-dev.json \
  tls_reverse_proxy_dev

Ok, so with this setup, GET /config returns the final configuration file as expected:

wget -qO- 127.0.0.1:2019/config | jq .
{
  "admin": {
    "identity": {
      "issuers": [
        {
          "email": "drago@gmail.com+latest",
          "module": "acme"
        }
      ]
    }
  },
  "apps": {
    "http": {
    "grace_period": "10s",
      "servers": {
        "tls_terminator": {
          "listen": [
            ":443"
          ],
          "logs": {},
          "routes": [
            {
              "handle": [
                {
                  "body": "Hello, world!",
                  "handler": "static_response",
                  "status_code": 200
                }
              ],
              "match": [
                {
                  "host": [
                    "localhost"
                  ]
                }
              ]
            }
          ]
        }
      }
    }
  },
  "storage": {
    "module": "file_system",
    "root": "~/caddy-dev/data"
  }
}

But in my production environment, this same command now returns the initial caddy config file:

wget -qO- 127.0.0.1:2019/config | jq .
{
  "admin": {
    "config": {
      "load": {
        "method": "GET",
        "module": "http",
        "adapter": "json",
        "url": "https://gist.githubusercontent.com/popcorn/b569e72d2223a0ae96fe97f6f88efe93/raw/53c092a5d698e927df68946b2e849f2bbba11c9c/caddy.json"
      }
    }
  }
}

Does anyone have any idea why caddy reload would hang and never actually execute?

Is it possible to collect profile of the “hanging” process? I’m not 100% sure if it should be goroutine or CPU profile.

Let me do it. I’ll get back to you.

But, it seems to have fixed itself when I disabled cache-handler in my production caddy.json.

I dumped goroutines with GET /debug/pprof/goroutine?debug=1 for a healthy server that reloads, and for the unhealthy one where reload seems to hang forever.

NOTE: The server that hangs in reload has caching directives in caddy.json, the healthy server where reloads work doesn’t have caching directives in its configuration file.

This is how I set up cache in the config where Caddy reload hangs. Please note that only caddy reload command hangs. If I’m starting Caddy with this configuration from scratch, it works as expected.

{
  // other directives like http etc.
  "apps": {
    "cache": {
      "api": {
        "basepath": "/souin-api",
        "souin": {
          "basepath": "/souin",
          "enable": true,
          "security": false
        }
      },
      "ttl": "5m",
      "cache_name": "saas_custom_domains_cache",
      "distributed": true,
      "otter": {
        "found": true
      },
      "redis": {
        "found": true,
        "url": "[REDACTED]"
      },
      "timeout": {
        "backend": "10s",
        "cache": "5s"
      },
      "storers": [
        "otter",
        "redis"
      ],
      "log_level": "DEBUG"
    }
  }
}

You can see both here:

Also, here are full-stack dumps (I had to upload them to dropbox because the files were too big for GitHub Gists):

I found a few differences between the good and bad server looking at the simple goroutine dumps:

Huge call stack depth in the bad server for one of the goroutines
Specifically, there’s a goroutine containing repeated calls to the same route handling functions:

github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1
github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP

Could it be this is why the reload is hanging? Did it enter a recursion or something?

Souin using sleep
The problematic server shows multiple goroutines blocked in time.Sleep. Could this be causing the hang?

160 @ 0x88b38 0x8ce18 0x13cedf8 0x90f34
#	0x8ce17		time.Sleep+0x157
#	0x13cedf7	github.com/darkweak/souin/pkg/middleware.registerMappingKeysEviction.func1+0x87

What would cause caddy reload to hang when building routes, while caddy run works as expected?

Thank you all for your help!

cc @darkweak just in case you may know if there’s something in Souin that could be causing this.

My setup is to run Otter as in-memory cache on each server, and then have a shared Redis cache as the second layer. Hope that helps! And thank you!