Looking for advice on keeping the cpu consumption low (or how to avoid "shutting down admin server: context deadline exceeded" with a 173k config)

1. Output of caddy version:

v2.5.2 h1:eCJdLyEyAGzuQTa5Mh3gETnYWDClo1LjtQm2q9RNZrs=

2. How I run Caddy:

Dockerfile:

ARG TARGETPLATFORM
ARG TARGETOS
ARG TARGETARCH
ARG CADDY_VERSION=2.5.2

FROM caddy:${CADDY_VERSION}-builder AS builder

RUN CGO_ENABLED=0 GOARCH=${TARGETARCH} GOOS=${TARGETOS} \
    xcaddy build \
    --with github.com/greenpau/caddy-security \
    --with github.com/lucaslorentz/caddy-docker-proxy/v2

FROM caddy:${CADDY_VERSION}-alpine

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

RUN apk add nss-tools curl && \
    rm -rf /var/cache/apk/*

CMD ["caddy", "docker-proxy", "-caddyfile-path", "/etc/caddy/Caddyfile"]

a. System environment:

centos and docker (latest version)

b. Command:

caddy docker-proxy -caddyfile-path /etc/caddy/Caddyfile

c. Service/unit/compose file:

networks:
  proxy:
    external: true
    name: nginx-proxy
services:
  caddy:
    container_name: cadme
    cpu_quota: 200000
    environment:
      MY_HOSTNAME: bob.example.com
    image: myhub/caddy:2.5.2
    mem_limit: 4g
    networks:
      proxy: {}
    ports:
    - 10.10.0.101:80:80/tcp
    - 10.10.0.101:443:443/tcp
    restart: always
    volumes:
    - /srv/_tools/cadme/var/config:/config:rw
    - /srv/_tools/cadme/var/data:/data:rw
    - /srv/_tools/cadme/cadme-stack/env/common/caddy:/etc/caddy:rw
    - /srv/_tools/cadme/var/htpasswd:/etc/caddy/htpasswd:rw
    - /srv/_tools/cadme/cadme-stack/env/dev-001/vhost.d:/etc/caddy/vhost.d:rw
    - /srv/_tools/cadme/var/logs:/logs:rw
    - /var/run/docker.sock:/var/run/docker.sock:rw
    working_dir: /etc/caddy

d. My complete Caddy config:

Caddyfile:

{
        # debug
        log {
                output file /logs/main.log
                level INFO
                format json {
                        time_format iso8601
                }
        }
}

import conf.d/*

:80 {
        header Server "{$MY_HOSTNAME}"
        import logs access.log
        redir https://{$MY_HOSTNAME}/ 301
}

{$MY_HOSTNAME} {
        header Content-Type "text/html; charset=utf-8"
        respond "<html><head><title>Oops</title></head><body><p>This is not the website you are looking for.</body></html>"
        import headers headers_overwrite
        import logs access.log
}

import vhost.d/*

conf.d/snippets:

(auth) {
        @check_auth expression `"{args.0}" == "auth_enabled"`
        basicauth @check_auth {
                # if a file is missing, the whole vhost wont work.
                # base64 encoded bcrypt: doable from ansible yes, same resulting hash, no :(
                #bob JDJhJDEwJEVCNmdaNEg2Ti5iejRMYkF3MFZhZ3VtV3E1SzBWZEZ5Q3VWc0tzOEJwZE9TaFlZdEVkZDhX
                # prevent the whole vhost block to fail by adding a wildcard to the filename import
                # this will only generate a warning and because the basicauth block is empty,
                # if the auth is enabled no user will get in but all will work otherwise :)
                import /etc/caddy/htpasswd/{args.1}*
        }
}

(headers) {
        @check_add expression `"{args.0}" == "headers_add" || "{args.0}" == "headers_overwrite"`
        header Server "{http.request.host}"
        # not so fast
        header @check_add {
                # X-Frame-Options "DENY"
                Strict-Transport-Security `max-age=31536000; includeSubDomains; preload; always;`
                X-Content-Type-Options "nosniff"
                Referrer-Policy "strict-origin-when-cross-origin"
                X-Robots-Tag "noindex, nofollow, nosnippet, noarchive"
                X-XSS-Protection `1; mode=block`
                # X-Headers-Check "Overwritten or Added"
        }
}

(logs) {
        log {
                output file /logs/{args.0}
                format filter {
                        wrap json {
                                time_format iso8601
                        }
                        fields {
                                common_log delete
                                request>headers>Authorization delete
                        }
                }
        }
}

(rev_proxy) {
        @check_remove expression `"{args.1}" == "headers_overwrite"`
        @check_keep   expression `"{args.1}" != "headers_overwrite"`

        reverse_proxy @check_remove {
                to {args.0}
                header_down -Server
                header_down -X-Hudson
                header_down -X-Jenkins
                # header_down -X-Frame-Options
                header_down -Strict-Transport-Security
                header_down -X-Content-Type-Options
                header_down -Referrer-Policy
                header_down -X-Robots-Tag
                header_down -X-XSS-Protection
                # header_down X-Headers-Check "Upstream Removed"
        }
        reverse_proxy @check_keep {
                to {args.0}
                header_down -Server
                header_down -X-Hudson
                header_down -X-Jenkins
                # header_down X-Headers-Check "Upstream Kept"
        }
}

conf.d/templates:

(tmpl) {
        import rev_proxy {args.0} {args.2}
        import headers {args.2}
        import auth {args.3} {args.1}
        import logs access.log
}
(tmpl_no_auth) {
        import rev_proxy {args.0} {args.2}
        import headers {args.2}
        import logs access.log
}
(tmpl_drupal) {
        import rev_proxy {args.0} {args.2}
        import headers {args.2}
        import auth auth_enabled generic.drupal
        import logs access.log
}

vhost.d/default:

default.example.com {
        import logs access.log
        header Server "{http.request.host}"
        redir https://{$MY_HOSTNAME}/ 301
}

3. The problem I’m having:

For smaller amount of reverse proxied containers all looks great.
However, on a dev server I have 79 containers with a total of 87 domain names and here is where I have an issue. Everytime a container state is changed or the Caddyfile is modified, the cpu goes to 200% (which is the mac cpu quota allocated for it) for 1 to 3 minutes. During this time it is really slow to serve anything. Because this is a dev server, deployments happen pretty much all the time, making close to impossible to make anything out of it.

The full autoconfig.json is 173KB.

4. Error messages and/or full log output:

typical log when put to info level and a container is created / stopped / whatever:

{"level":"info","ts":"2022-09-15T15:52:16.170Z","logger":"docker-proxy","msg":"New Caddyfile","caddyfile":"..huge file.."}
{"level":"info","ts":"2022-09-15T15:52:16.219Z","logger":"docker-proxy","msg":"New Config JSON","json":".huge file.."}
{"level":"info","ts":"2022-09-15T15:52:16.221Z","logger":"docker-proxy","msg":"Sending configuration to","server":"localhost"}
{"level":"info","ts":"2022-09-15T15:52:16.224Z","logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_ip":"127.0.0.1","remote_port":"42282","headers":{"Accept-
Encoding":["gzip"],"Content-Length":["165599"],"Content-Type":["application/json"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":"2022-09-15T15:52:16.259Z","logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
{"level":"info","ts":"2022-09-15T15:52:16.261Z","logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc003f14850"}
{"level":"info","ts":"2022-09-15T15:52:16.261Z","logger":"http","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
{"level":"info","ts":"2022-09-15T15:52:16.261Z","logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
{"level":"warn","ts":"2022-09-15T15:52:16.261Z","logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv1","http_port":80}
{"level":"error","ts":"2022-09-15T15:52:26.260Z","logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded"}
{"level":"info","ts":"2022-09-15T15:53:31.537Z","logger":"http","msg":"enabling automatic TLS certificate management","domains":["...", "87 domains here"]}
{"level":"info","ts":"2022-09-15T15:53:31.700Z","logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc0003bebd0"}
{"level":"info","ts":"2022-09-15T15:53:31.701Z","msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
{"level":"info","ts":"2022-09-15T15:53:31.701Z","logger":"admin.api","msg":"load complete"}
{"level":"info","ts":"2022-09-15T15:53:31.701Z","logger":"docker-proxy","msg":"Successfully configured","server":"localhost"}

smaller autoconfig.json

{
    "admin": {
        "listen": "tcp/localhost:2019"
    },
    "apps": {
        "http": {
            "servers": {
                "srv0": {
                    "listen": [
                        ":443"
                    ],
                    "logs": {
                        "logger_names": {
                            "alertmanager.example.com": "log2",
                            "default.utility.example.com": "log1",
                            "prometheus.example.com": "log2"
                        }
                    },
                    "routes": [
                        {
                            "handle": [
                                {
                                    "handler": "subroute",
                                    "routes": [
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Referrer-Policy": [
                                                                "strict-origin-when-cross-origin"
                                                            ],
                                                            "Strict-Transport-Security": [
                                                                "max-age=31536000; includeSubDomains; preload; always;"
                                                            ],
                                                            "X-Content-Type-Options": [
                                                                "nosniff"
                                                            ],
                                                            "X-Robots-Tag": [
                                                                "noindex, nofollow, nosnippet, noarchive"
                                                            ],
                                                            "X-Xss-Protection": [
                                                                "1; mode=block"
                                                            ]
                                                        }
                                                    }
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_overwrite\" == \"headers_add\" || \"headers_overwrite\" == \"headers_overwrite\""
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Content-Type": [
                                                                "text/html; charset=utf-8"
                                                            ]
                                                        }
                                                    }
                                                },
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Server": [
                                                                "{http.request.host}"
                                                            ]
                                                        }
                                                    }
                                                },
                                                {
                                                    "body": "<html><head><title>Oops</title></head><body><p>This is not the website you are looking for.</body></html>",
                                                    "handler": "static_response"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ],
                            "match": [
                                {
                                    "host": [
                                        "default.utility.example.com"
                                    ]
                                }
                            ],
                            "terminal": true
                        },
                        {
                            "handle": [
                                {
                                    "handler": "subroute",
                                    "routes": [
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Referrer-Policy": [
                                                                "strict-origin-when-cross-origin"
                                                            ],
                                                            "Strict-Transport-Security": [
                                                                "max-age=31536000; includeSubDomains; preload; always;"
                                                            ],
                                                            "X-Content-Type-Options": [
                                                                "nosniff"
                                                            ],
                                                            "X-Robots-Tag": [
                                                                "noindex, nofollow, nosnippet, noarchive"
                                                            ],
                                                            "X-Xss-Protection": [
                                                                "1; mode=block"
                                                            ]
                                                        }
                                                    }
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_noop\" == \"headers_add\" || \"headers_noop\" == \"headers_overwrite\""
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Server": [
                                                                "{http.request.host}"
                                                            ]
                                                        }
                                                    }
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "reverse_proxy",
                                                    "headers": {
                                                        "response": {
                                                            "delete": [
                                                                "Server",
                                                                "X-Hudson",
                                                                "X-Jenkins",
                                                                "Strict-Transport-Security",
                                                                "X-Content-Type-Options",
                                                                "Referrer-Policy",
                                                                "X-Robots-Tag",
                                                                "X-XSS-Protection"
                                                            ]
                                                        }
                                                    },
                                                    "upstreams": [
                                                        {
                                                            "dial": "10.99.0.99:80"
                                                        }
                                                    ]
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_noop\" == \"headers_overwrite\""
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "reverse_proxy",
                                                    "headers": {
                                                        "response": {
                                                            "delete": [
                                                                "Server",
                                                                "X-Hudson",
                                                                "X-Jenkins"
                                                            ]
                                                        }
                                                    },
                                                    "upstreams": [
                                                        {
                                                            "dial": "10.99.0.99:80"
                                                        }
                                                    ]
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_noop\" != \"headers_overwrite\""
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ],
                            "match": [
                                {
                                    "host": [
                                        "prometheus.example.com",
                                        "alertmanager.example.com"
                                    ]
                                }
                            ],
                            "terminal": true
                        }
                    ]
                },
                "srv1": {
                    "listen": [
                        ":80"
                    ],
                    "logs": {
                        "default_logger_name": "log0"
                    },
                    "routes": [
                        {
                            "handle": [
                                {
                                    "handler": "headers",
                                    "response": {
                                        "set": {
                                            "Server": [
                                                "default.utility.example.com"
                                            ]
                                        }
                                    }
                                },
                                {
                                    "handler": "static_response",
                                    "headers": {
                                        "Location": [
                                            "https://default.utility.example.com/"
                                        ]
                                    },
                                    "status_code": 301
                                }
                            ]
                        }
                    ]
                }
            }
        }
    },
    "logging": {
        "logs": {
            "default": {
                "encoder": {
                    "format": "json",
                    "time_format": "iso8601"
                },
                "exclude": [
                    "http.log.access.log1",
                    "http.log.access.log2",
                    "http.log.access.log0"
                ],
                "level": "WARN",
                "writer": {
                    "filename": "/logs/main.log",
                    "output": "file"
                }
            },
            "log0": {
                "encoder": {
                    "fields": {
                        "common_log": {
                            "filter": "delete"
                        },
                        "request>headers>Authorization": {
                            "filter": "delete"
                        }
                    },
                    "format": "filter",
                    "wrap": {
                        "format": "json",
                        "time_format": "iso8601"
                    }
                },
                "include": [
                    "http.log.access.log0"
                ],
                "writer": {
                    "filename": "/logs/access.log",
                    "output": "file"
                }
            },
            "log1": {
                "encoder": {
                    "fields": {
                        "common_log": {
                            "filter": "delete"
                        },
                        "request>headers>Authorization": {
                            "filter": "delete"
                        }
                    },
                    "format": "filter",
                    "wrap": {
                        "format": "json",
                        "time_format": "iso8601"
                    }
                },
                "include": [
                    "http.log.access.log1"
                ],
                "writer": {
                    "filename": "/logs/access.log",
                    "output": "file"
                }
            },
            "log2": {
                "encoder": {
                    "fields": {
                        "common_log": {
                            "filter": "delete"
                        },
                        "request>headers>Authorization": {
                            "filter": "delete"
                        }
                    },
                    "format": "filter",
                    "wrap": {
                        "format": "json",
                        "time_format": "iso8601"
                    }
                },
                "include": [
                    "http.log.access.log2"
                ],
                "writer": {
                    "filename": "/logs/access.log",
                    "output": "file"
                }
            }
        }
    }
}

5. What I already tried:

Enabled debug and not seeing anything of interest other that it is seen from info level and up.

6. Links to relevant resources:

Certificates and keys are, unfortunately, expensive to parse. Gotta love all the encodings involved there.

Currently, the best way to overcome this is probably to enable on-demand TLS. It defers certificate loading until the first handshake that needs it.

I’ve thought about other ways to avoid loading all the certs at every config load, but each time there’s a logical or technical barrier I can’t overcome. For example, I’ve thought about just keeping certs in memory between reloads, but the only way to know if a cert file has changed on disk is to load and decode it, which of course, is what we’re already doing. I’ve thought about loading them asynchronously but this doesn’t fix the CPU issue (and can in fact exacerbate it). I’ve thought about storing certs and keys as Go gobs instead, but this doesn’t really work and even if it did it would be unhelpful to any other program or user trying to use or inspect the certs/keys.

Thank you, will try and report back.

@matt, if I include the block below into each domains defined in the config it will not be insecure and therefore I wouldnt need any third party check service right?

        tls {
                on_demand
        }

like:

(tls_on_demand) {
        tls {
                on_demand
        }
}
...
foo.example.com {
	import tls_on_demand
	import rev_proxy foo:80 headers_overwrite
	import headers headers_overwrite
	import logs access.log
}

Yes, you need the ‘ask’ endpoint (hence the warning when you run Caddy without one), but fortunately you can bake that into your Caddy config too!

:1234 {
    bind 127.0.0.1
    @allowed host example.com ...
    respond @allowed 200
    respond 401
}

Or something like that. Might be a long line…

(This is much easier in JSON config.)

Ok, reporting back. No improvement. About the same time interval where the CPU goes nuts, the I get "logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded".

It does look like the tls is choking the cpu, but it’s about the “cache maintenance” not necessarily the certificate generation.

I don’t understand why the admin is starting then is failing to stop…

{"level":"info","ts":"2022-09-15T22:04:42.464Z","logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_ip":"127.0.0.1","remote_port":"56758","headers":{"Accept-Encoding":["gzip"],"Content-Length":["167351"],"Content-Type":["application/json"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":"2022-09-15T22:04:42.500Z","logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
{"level":"info","ts":"2022-09-15T22:04:42.502Z","logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc00031b420"}
{"level":"info","ts":"2022-09-15T22:04:42.502Z","logger":"http","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443
}
{"level":"info","ts":"2022-09-15T22:04:42.502Z","logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
{"level":"warn","ts":"2022-09-15T22:04:42.502Z","logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv1","http_port":80}
{"level":"error","ts":"2022-09-15T22:04:52.500Z","logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded"}

and why the tls cache maintenance starts right then

{"level":"info","ts":"2022-09-15T22:00:08.836Z","logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc00032ac40"}

I got in the autoconfig.json a new section:

        "tls": {
            "automation": {
                "policies": [
                    {
                        "on_demand": true,
                        "subjects": [
                            "84-domains-in-this-list"
                        ]
                    },
                    {
                        "subjects": [
                            "few-domains-with-automatic-tls"
                        ]
                    }
                ]
            }
        }

Any other things I might try? :\

In the mean time, I’ll configure the checker, nice idea! Will group up a big chunk of those domains with a header_regexp host and report back if anything changes.

That error just means that the config didn’t finish reloading in time; for example, perhaps a new config change came in before the old one finished? (I haven’t looked too closely… I need to give my eyes a break right now.)

Can you reproduce the CPU usage and take a profile while it’s happening? http://localhost:2019/debug/pprof should open a simple UI for capturing a CPU profile, or a goroutine dump. Both could be useful.

I managed to create the profiles eventually (pushed to a random repo on github):

Hopefully this helps with the investigation, @matt

Thank you!

Looking at the profile, I think @francislavoie is onto something here: cpu issues on larger setups · Issue #400 · lucaslorentz/caddy-docker-proxy · GitHub

I made another cpu profile with a 180 second duration, which should cover the whole process:

cpu180.svg

I replaced the 2.5.2 with 2.6.0-beta.3 and those 2 minutes got down to about 6 seconds!
It looks like @francislavoie was correct, the issue was in the basicauth processing (the hash computation).

I also reverted the on_demand setup. It’s cool but it looks I don’t need it now.

3 Likes

Oh wow, I totally forgot about basicauth.

Yes, thanks to @francislavoie for improving that!

1 Like