[solved] Caddy won't start after using k8s volumeMount on Caddyfile

vingarcia · November 28, 2021, 9:16pm

1. Caddy version (`caddy version`):

Its the official Caddy docker image: caddy:2.4.6-alpine

2. How I run Caddy:

I am running it on Kubernetes, the full deployment.yaml is available here on pastebin, but the part I believe to be relevant is as follows:

spec:
  template:
    spec:
      containers:
      - image: caddy:2.4.6-alpine
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: caddy-shard
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/caddy/Caddyfile2
          name: config
          subPath: Caddyfile
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: caddy-caddy-shard
      serviceAccountName: caddy-caddy-shard
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: caddy
        name: config

I am using k3s with the following version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5+k3s2", GitCommit:"724ef700bab896ff252a75e2be996d5f4ff1b842", GitTreeState:"clean", BuildDate:"2021-10-05T19:59:14Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5+k3s2", GitCommit:"724ef700bab896ff252a75e2be996d5f4ff1b842", GitTreeState:"clean", BuildDate:"2021-10-05T19:59:14Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

a. System environment:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

b. Command:

kubectl apply -f deployment.yaml

c. Service/unit/compose file:

I don’t think this is relevant for this issue but since it was requested this is the service.yaml file:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: caddy
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2021-11-28T13:48:42Z"
  labels:
    app.kubernetes.io/instance: caddy
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: caddy-shard
    app.kubernetes.io/version: 1.16.0
    helm.sh/chart: caddy-shard-0.1.3
  name: caddy
  namespace: default
  resourceVersion: "1247749"
  uid: 0d983121-d59d-4a6a-9d6d-04bf8c4c1510
spec:
  clusterIP: 10.43.157.240
  clusterIPs:
  - 10.43.157.240
  externalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    nodePort: 31177
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/instance: caddy
    app.kubernetes.io/name: caddy-shard
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

d. My complete Caddyfile or JSON config:

    :80 {
      reverse_proxy rpgserver:80
    }

This Caddyfile works if I mount the config file with a different name like Caddyfile2 and then copy it manually into position using kubectl exec ... + cp Caddyfile2 /etc/caddy/Caddyfile

3. The problem I’m having:

So I am deploying Caddy using the aforementioned deployment file on K3s and it works fine, so I entered the pod for testing the Caddyfile config I should use and after some experimentation, I found that the Caddyfile I showed above works for what I need.
Then I added this same file to a ConfigMap, applied it to k8s and mounted it on the deployment overwriting the original /etc/caddy/Caddyfile at first the entire /etc/caddy directory, and later as I tried to fix it I changed the deployed to only overwrite the Caddyfile itself leaving the directory unchanged. Either way, the same thing happens, the pod starts showing no errors informs it is going to shut down because of a SIGTERM received and shuts down.

4. Error messages and/or full log output:

$ kubectl logs -f caddy-8bbd4794f-7pdbk
{"level":"info","ts":1638133614.7578523,"msg":"using provided configuration","config_file":"/etc/caddy/Caddyfile","config_adapter":"caddyfile"}
{"level":"info","ts":1638133614.7594004,"logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["localhost:2019","[::1]:2019","127.0.0.1:2019"]}
{"level":"info","ts":1638133614.7594845,"logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv0","http_port":80}
{"level":"info","ts":1638133614.759582,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc0003570a0"}
{"level":"info","ts":1638133614.7600214,"msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
{"level":"info","ts":1638133614.7600455,"msg":"serving initial configuration"}
{"level":"info","ts":1638133614.7597928,"logger":"tls","msg":"cleaning storage unit","description":"FileStorage:/data/caddy"}
{"level":"info","ts":1638133614.7600858,"logger":"tls","msg":"finished cleaning storage units"}
{"level":"info","ts":1638133642.628051,"msg":"shutting down apps, then terminating","signal":"SIGTERM"}
{"level":"warn","ts":1638133642.6280835,"msg":"exiting; byeee!! 👋","signal":"SIGTERM"}
{"level":"info","ts":1638133642.6292756,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc0003570a0"}
{"level":"info","ts":1638133642.6305075,"logger":"admin","msg":"stopped previous server","address":"tcp/localhost:2019"}
{"level":"info","ts":1638133642.630528,"msg":"shutdown complete","signal":"SIGTERM","exit_code":0}

5. What I already tried:

So at first, I thought it was because of permissions so I tried making sure the permissions on the mounted volume were the same, then I thought Caddy might be trying to make changes on the /etc/caddy directory so I changed the deployment.yaml file so it would mount the file directly without touching the caddy directory.

I also thought that maybe there was some invisible character like a \t or something causing the error so I tested whether copy-pasting the file into position and reloading it with caddy reload --config /etc/caddy/Caddyfile and it worked (meaning I was able to make the request to the endpoint I wanted).

I tried searching for similar problems on google and I found this issue on this forum:

https://caddy.community/t/caddy-stopped-for-some-reason/13215/7

But his problem was something else.

So now I am not sure what to do, maybe I can change the deployment command so it copies the Caddyfile2 into position before calling caddy run, or maybe I can create a Dockerfile for each config but this will be problematic.

PS: Unrelated issue, but worth mentioning: I had a problem with this forum editor, that made it hard to write this topic, if there is a place where I can report this issue I can write a report there. I just edited the entire message fixing the problems I was not able to prevent on the first write.

vingarcia · November 28, 2021, 10:13pm

So, I need to add one piece of information:

If I change the configuration myself and reload it using caddy reload --config /etc/caddy/Caddyfile it works in the sense that I can make the request to the rpgserver/ping endpoint and it returns “pong”. But this time I did it with the kubectl logs -f on the pod and I noticed that it actually receives a SIGTERM signal anyway, but it takes some time to shut down so in the meantime I am able to make the request.

Here are the logs of this attempt reloading it manually with the new config:

{"level":"info","ts":1638136979.5251448,"msg":"using provided configuration","config_file":"/etc/caddy/Caddyfile","config_adapter":"caddyfile"}
{"level":"info","ts":1638136979.5269353,"logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["localhost:2019","[::1]:2019","127.0.0.1:2019"]}
{"level":"info","ts":1638136979.5274315,"logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv0","http_port":80}
{"level":"info","ts":1638136979.5275297,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc000424070"}
{"level":"info","ts":1638136979.5278752,"logger":"tls","msg":"cleaning storage unit","description":"FileStorage:/data/caddy"}
{"level":"info","ts":1638136979.5279005,"logger":"tls","msg":"finished cleaning storage units"}
{"level":"info","ts":1638136979.5282402,"msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
{"level":"info","ts":1638136979.5282488,"msg":"serving initial configuration"}

--- Here was when I called `caddy reload --config /etc/caddy/Caddyfile` ---

{"level":"info","ts":1638137087.5318642,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_addr":"127.0.0.1:55342","headers":{"Accept-Encoding":["gzip"],"Content-Length":["147"],"Content-Type":["application/json"],"Origin":["localhost:2019"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1638137087.5346153,"logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["localhost:2019","[::1]:2019","127.0.0.1:2019"]}
{"level":"info","ts":1638137087.5347197,"logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv0","http_port":80}
{"level":"info","ts":1638137087.5351644,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc000424e00"}
{"level":"info","ts":1638137087.5412524,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc000424070"}
{"level":"info","ts":1638137087.5414474,"msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
{"level":"info","ts":1638137087.5414586,"logger":"admin.api","msg":"load complete"}
{"level":"info","ts":1638137087.5435588,"logger":"admin","msg":"stopped previous server","address":"tcp/localhost:2019"}
{"level":"info","ts":1638137117.4870186,"msg":"shutting down apps, then terminating","signal":"SIGTERM"}
{"level":"warn","ts":1638137117.487061,"msg":"exiting; byeee!! 👋","signal":"SIGTERM"}
{"level":"info","ts":1638137117.4882174,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc000424e00"}
{"level":"info","ts":1638137117.489421,"logger":"admin","msg":"stopped previous server","address":"tcp/localhost:2019"}
{"level":"info","ts":1638137117.4894426,"msg":"shutdown complete","signal":"SIGTERM","exit_code":0}

francislavoie · November 28, 2021, 11:13pm

Sorry to say, this seems like a k8s-specific issue. I don’t use k8s so I can’t really suggest anything. The image works fine with Docker as described in the docs on Docker Hub.

Do volume mounts work differently in k8s? Does it mount it after running instead of before running?

Is it a problem with your health checks causing k8s to try to tear down the container?

vingarcia · November 29, 2021, 10:59am

I am not performing any health checks yet, so that’s probably not it.

I know that the configs for volumes are more verbose/complicated on k8s than it is for docker, which might indicate it works differently somehow, but I will have to search about it to see what might be causing this sigterm.

I don’t think it mounts it after running, since this would probably have caused more issues in other projects, but I might try to read more about it.

Thanks for you answer

[edit:] Actually I just noticed there is a liveness probe configured there, it is worth investigating later when I have time I will test it. Thanks for the tip.

vingarcia · November 29, 2021, 11:26am

Actually I was able to test the liveness probe hypothesis and that was the problem, thank you very much Francis, I will mark this as [solved] on the title.

system · December 28, 2021, 9:16pm

This topic was automatically closed after 30 days. New replies are no longer allowed.