Looking for advice on keeping the cpu consumption low (or how to avoid "shutting down admin server: context deadline exceeded" with a 173k config)

1. Output of caddy version:

v2.5.2 h1:eCJdLyEyAGzuQTa5Mh3gETnYWDClo1LjtQm2q9RNZrs=

2. How I run Caddy:

Dockerfile:

ARG TARGETPLATFORM
ARG TARGETOS
ARG TARGETARCH
ARG CADDY_VERSION=2.5.2

FROM caddy:${CADDY_VERSION}-builder AS builder

RUN CGO_ENABLED=0 GOARCH=${TARGETARCH} GOOS=${TARGETOS} \
    xcaddy build \
    --with github.com/greenpau/caddy-security \
    --with github.com/lucaslorentz/caddy-docker-proxy/v2

FROM caddy:${CADDY_VERSION}-alpine

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

RUN apk add nss-tools curl && \
    rm -rf /var/cache/apk/*

CMD ["caddy", "docker-proxy", "-caddyfile-path", "/etc/caddy/Caddyfile"]

a. System environment:

centos and docker (latest version)

b. Command:

caddy docker-proxy -caddyfile-path /etc/caddy/Caddyfile

c. Service/unit/compose file:

networks:
  proxy:
    external: true
    name: nginx-proxy
services:
  caddy:
    container_name: cadme
    cpu_quota: 200000
    environment:
      MY_HOSTNAME: bob.example.com
    image: myhub/caddy:2.5.2
    mem_limit: 4g
    networks:
      proxy: {}
    ports:
    - 10.10.0.101:80:80/tcp
    - 10.10.0.101:443:443/tcp
    restart: always
    volumes:
    - /srv/_tools/cadme/var/config:/config:rw
    - /srv/_tools/cadme/var/data:/data:rw
    - /srv/_tools/cadme/cadme-stack/env/common/caddy:/etc/caddy:rw
    - /srv/_tools/cadme/var/htpasswd:/etc/caddy/htpasswd:rw
    - /srv/_tools/cadme/cadme-stack/env/dev-001/vhost.d:/etc/caddy/vhost.d:rw
    - /srv/_tools/cadme/var/logs:/logs:rw
    - /var/run/docker.sock:/var/run/docker.sock:rw
    working_dir: /etc/caddy

d. My complete Caddy config:

Caddyfile:

{
        # debug
        log {
                output file /logs/main.log
                level INFO
                format json {
                        time_format iso8601
                }
        }
}

import conf.d/*

:80 {
        header Server "{$MY_HOSTNAME}"
        import logs access.log
        redir https://{$MY_HOSTNAME}/ 301
}

{$MY_HOSTNAME} {
        header Content-Type "text/html; charset=utf-8"
        respond "<html><head><title>Oops</title></head><body><p>This is not the website you are looking for.</body></html>"
        import headers headers_overwrite
        import logs access.log
}

import vhost.d/*

conf.d/snippets:

(auth) {
        @check_auth expression `"{args.0}" == "auth_enabled"`
        basicauth @check_auth {
                # if a file is missing, the whole vhost wont work.
                # base64 encoded bcrypt: doable from ansible yes, same resulting hash, no :(
                #bob JDJhJDEwJEVCNmdaNEg2Ti5iejRMYkF3MFZhZ3VtV3E1SzBWZEZ5Q3VWc0tzOEJwZE9TaFlZdEVkZDhX
                # prevent the whole vhost block to fail by adding a wildcard to the filename import
                # this will only generate a warning and because the basicauth block is empty,
                # if the auth is enabled no user will get in but all will work otherwise :)
                import /etc/caddy/htpasswd/{args.1}*
        }
}

(headers) {
        @check_add expression `"{args.0}" == "headers_add" || "{args.0}" == "headers_overwrite"`
        header Server "{http.request.host}"
        # not so fast
        header @check_add {
                # X-Frame-Options "DENY"
                Strict-Transport-Security `max-age=31536000; includeSubDomains; preload; always;`
                X-Content-Type-Options "nosniff"
                Referrer-Policy "strict-origin-when-cross-origin"
                X-Robots-Tag "noindex, nofollow, nosnippet, noarchive"
                X-XSS-Protection `1; mode=block`
                # X-Headers-Check "Overwritten or Added"
        }
}

(logs) {
        log {
                output file /logs/{args.0}
                format filter {
                        wrap json {
                                time_format iso8601
                        }
                        fields {
                                common_log delete
                                request>headers>Authorization delete
                        }
                }
        }
}

(rev_proxy) {
        @check_remove expression `"{args.1}" == "headers_overwrite"`
        @check_keep   expression `"{args.1}" != "headers_overwrite"`

        reverse_proxy @check_remove {
                to {args.0}
                header_down -Server
                header_down -X-Hudson
                header_down -X-Jenkins
                # header_down -X-Frame-Options
                header_down -Strict-Transport-Security
                header_down -X-Content-Type-Options
                header_down -Referrer-Policy
                header_down -X-Robots-Tag
                header_down -X-XSS-Protection
                # header_down X-Headers-Check "Upstream Removed"
        }
        reverse_proxy @check_keep {
                to {args.0}
                header_down -Server
                header_down -X-Hudson
                header_down -X-Jenkins
                # header_down X-Headers-Check "Upstream Kept"
        }
}

conf.d/templates:

(tmpl) {
        import rev_proxy {args.0} {args.2}
        import headers {args.2}
        import auth {args.3} {args.1}
        import logs access.log
}
(tmpl_no_auth) {
        import rev_proxy {args.0} {args.2}
        import headers {args.2}
        import logs access.log
}
(tmpl_drupal) {
        import rev_proxy {args.0} {args.2}
        import headers {args.2}
        import auth auth_enabled generic.drupal
        import logs access.log
}

vhost.d/default:

default.example.com {
        import logs access.log
        header Server "{http.request.host}"
        redir https://{$MY_HOSTNAME}/ 301
}

3. The problem Iā€™m having:

For smaller amount of reverse proxied containers all looks great.
However, on a dev server I have 79 containers with a total of 87 domain names and here is where I have an issue. Everytime a container state is changed or the Caddyfile is modified, the cpu goes to 200% (which is the mac cpu quota allocated for it) for 1 to 3 minutes. During this time it is really slow to serve anything. Because this is a dev server, deployments happen pretty much all the time, making close to impossible to make anything out of it.

The full autoconfig.json is 173KB.

4. Error messages and/or full log output:

typical log when put to info level and a container is created / stopped / whatever:

{"level":"info","ts":"2022-09-15T15:52:16.170Z","logger":"docker-proxy","msg":"New Caddyfile","caddyfile":"..huge file.."}
{"level":"info","ts":"2022-09-15T15:52:16.219Z","logger":"docker-proxy","msg":"New Config JSON","json":".huge file.."}
{"level":"info","ts":"2022-09-15T15:52:16.221Z","logger":"docker-proxy","msg":"Sending configuration to","server":"localhost"}
{"level":"info","ts":"2022-09-15T15:52:16.224Z","logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_ip":"127.0.0.1","remote_port":"42282","headers":{"Accept-
Encoding":["gzip"],"Content-Length":["165599"],"Content-Type":["application/json"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":"2022-09-15T15:52:16.259Z","logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
{"level":"info","ts":"2022-09-15T15:52:16.261Z","logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc003f14850"}
{"level":"info","ts":"2022-09-15T15:52:16.261Z","logger":"http","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
{"level":"info","ts":"2022-09-15T15:52:16.261Z","logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
{"level":"warn","ts":"2022-09-15T15:52:16.261Z","logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv1","http_port":80}
{"level":"error","ts":"2022-09-15T15:52:26.260Z","logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded"}
{"level":"info","ts":"2022-09-15T15:53:31.537Z","logger":"http","msg":"enabling automatic TLS certificate management","domains":["...", "87 domains here"]}
{"level":"info","ts":"2022-09-15T15:53:31.700Z","logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc0003bebd0"}
{"level":"info","ts":"2022-09-15T15:53:31.701Z","msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
{"level":"info","ts":"2022-09-15T15:53:31.701Z","logger":"admin.api","msg":"load complete"}
{"level":"info","ts":"2022-09-15T15:53:31.701Z","logger":"docker-proxy","msg":"Successfully configured","server":"localhost"}

smaller autoconfig.json

{
    "admin": {
        "listen": "tcp/localhost:2019"
    },
    "apps": {
        "http": {
            "servers": {
                "srv0": {
                    "listen": [
                        ":443"
                    ],
                    "logs": {
                        "logger_names": {
                            "alertmanager.example.com": "log2",
                            "default.utility.example.com": "log1",
                            "prometheus.example.com": "log2"
                        }
                    },
                    "routes": [
                        {
                            "handle": [
                                {
                                    "handler": "subroute",
                                    "routes": [
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Referrer-Policy": [
                                                                "strict-origin-when-cross-origin"
                                                            ],
                                                            "Strict-Transport-Security": [
                                                                "max-age=31536000; includeSubDomains; preload; always;"
                                                            ],
                                                            "X-Content-Type-Options": [
                                                                "nosniff"
                                                            ],
                                                            "X-Robots-Tag": [
                                                                "noindex, nofollow, nosnippet, noarchive"
                                                            ],
                                                            "X-Xss-Protection": [
                                                                "1; mode=block"
                                                            ]
                                                        }
                                                    }
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_overwrite\" == \"headers_add\" || \"headers_overwrite\" == \"headers_overwrite\""
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Content-Type": [
                                                                "text/html; charset=utf-8"
                                                            ]
                                                        }
                                                    }
                                                },
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Server": [
                                                                "{http.request.host}"
                                                            ]
                                                        }
                                                    }
                                                },
                                                {
                                                    "body": "<html><head><title>Oops</title></head><body><p>This is not the website you are looking for.</body></html>",
                                                    "handler": "static_response"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ],
                            "match": [
                                {
                                    "host": [
                                        "default.utility.example.com"
                                    ]
                                }
                            ],
                            "terminal": true
                        },
                        {
                            "handle": [
                                {
                                    "handler": "subroute",
                                    "routes": [
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Referrer-Policy": [
                                                                "strict-origin-when-cross-origin"
                                                            ],
                                                            "Strict-Transport-Security": [
                                                                "max-age=31536000; includeSubDomains; preload; always;"
                                                            ],
                                                            "X-Content-Type-Options": [
                                                                "nosniff"
                                                            ],
                                                            "X-Robots-Tag": [
                                                                "noindex, nofollow, nosnippet, noarchive"
                                                            ],
                                                            "X-Xss-Protection": [
                                                                "1; mode=block"
                                                            ]
                                                        }
                                                    }
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_noop\" == \"headers_add\" || \"headers_noop\" == \"headers_overwrite\""
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "headers",
                                                    "response": {
                                                        "set": {
                                                            "Server": [
                                                                "{http.request.host}"
                                                            ]
                                                        }
                                                    }
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "reverse_proxy",
                                                    "headers": {
                                                        "response": {
                                                            "delete": [
                                                                "Server",
                                                                "X-Hudson",
                                                                "X-Jenkins",
                                                                "Strict-Transport-Security",
                                                                "X-Content-Type-Options",
                                                                "Referrer-Policy",
                                                                "X-Robots-Tag",
                                                                "X-XSS-Protection"
                                                            ]
                                                        }
                                                    },
                                                    "upstreams": [
                                                        {
                                                            "dial": "10.99.0.99:80"
                                                        }
                                                    ]
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_noop\" == \"headers_overwrite\""
                                                }
                                            ]
                                        },
                                        {
                                            "handle": [
                                                {
                                                    "handler": "reverse_proxy",
                                                    "headers": {
                                                        "response": {
                                                            "delete": [
                                                                "Server",
                                                                "X-Hudson",
                                                                "X-Jenkins"
                                                            ]
                                                        }
                                                    },
                                                    "upstreams": [
                                                        {
                                                            "dial": "10.99.0.99:80"
                                                        }
                                                    ]
                                                }
                                            ],
                                            "match": [
                                                {
                                                    "expression": "\"headers_noop\" != \"headers_overwrite\""
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ],
                            "match": [
                                {
                                    "host": [
                                        "prometheus.example.com",
                                        "alertmanager.example.com"
                                    ]
                                }
                            ],
                            "terminal": true
                        }
                    ]
                },
                "srv1": {
                    "listen": [
                        ":80"
                    ],
                    "logs": {
                        "default_logger_name": "log0"
                    },
                    "routes": [
                        {
                            "handle": [
                                {
                                    "handler": "headers",
                                    "response": {
                                        "set": {
                                            "Server": [
                                                "default.utility.example.com"
                                            ]
                                        }
                                    }
                                },
                                {
                                    "handler": "static_response",
                                    "headers": {
                                        "Location": [
                                            "https://default.utility.example.com/"
                                        ]
                                    },
                                    "status_code": 301
                                }
                            ]
                        }
                    ]
                }
            }
        }
    },
    "logging": {
        "logs": {
            "default": {
                "encoder": {
                    "format": "json",
                    "time_format": "iso8601"
                },
                "exclude": [
                    "http.log.access.log1",
                    "http.log.access.log2",
                    "http.log.access.log0"
                ],
                "level": "WARN",
                "writer": {
                    "filename": "/logs/main.log",
                    "output": "file"
                }
            },
            "log0": {
                "encoder": {
                    "fields": {
                        "common_log": {
                            "filter": "delete"
                        },
                        "request>headers>Authorization": {
                            "filter": "delete"
                        }
                    },
                    "format": "filter",
                    "wrap": {
                        "format": "json",
                        "time_format": "iso8601"
                    }
                },
                "include": [
                    "http.log.access.log0"
                ],
                "writer": {
                    "filename": "/logs/access.log",
                    "output": "file"
                }
            },
            "log1": {
                "encoder": {
                    "fields": {
                        "common_log": {
                            "filter": "delete"
                        },
                        "request>headers>Authorization": {
                            "filter": "delete"
                        }
                    },
                    "format": "filter",
                    "wrap": {
                        "format": "json",
                        "time_format": "iso8601"
                    }
                },
                "include": [
                    "http.log.access.log1"
                ],
                "writer": {
                    "filename": "/logs/access.log",
                    "output": "file"
                }
            },
            "log2": {
                "encoder": {
                    "fields": {
                        "common_log": {
                            "filter": "delete"
                        },
                        "request>headers>Authorization": {
                            "filter": "delete"
                        }
                    },
                    "format": "filter",
                    "wrap": {
                        "format": "json",
                        "time_format": "iso8601"
                    }
                },
                "include": [
                    "http.log.access.log2"
                ],
                "writer": {
                    "filename": "/logs/access.log",
                    "output": "file"
                }
            }
        }
    }
}

5. What I already tried:

Enabled debug and not seeing anything of interest other that it is seen from info level and up.

6. Links to relevant resources:

1 Like

Certificates and keys are, unfortunately, expensive to parse. Gotta love all the encodings involved there.

Currently, the best way to overcome this is probably to enable on-demand TLS. It defers certificate loading until the first handshake that needs it.

Iā€™ve thought about other ways to avoid loading all the certs at every config load, but each time thereā€™s a logical or technical barrier I canā€™t overcome. For example, Iā€™ve thought about just keeping certs in memory between reloads, but the only way to know if a cert file has changed on disk is to load and decode it, which of course, is what weā€™re already doing. Iā€™ve thought about loading them asynchronously but this doesnā€™t fix the CPU issue (and can in fact exacerbate it). Iā€™ve thought about storing certs and keys as Go gobs instead, but this doesnā€™t really work and even if it did it would be unhelpful to any other program or user trying to use or inspect the certs/keys.

Thank you, will try and report back.

@matt, if I include the block below into each domains defined in the config it will not be insecure and therefore I wouldnt need any third party check service right?

        tls {
                on_demand
        }

like:

(tls_on_demand) {
        tls {
                on_demand
        }
}
...
foo.example.com {
	import tls_on_demand
	import rev_proxy foo:80 headers_overwrite
	import headers headers_overwrite
	import logs access.log
}

Yes, you need the ā€˜askā€™ endpoint (hence the warning when you run Caddy without one), but fortunately you can bake that into your Caddy config too!

:1234 {
    bind 127.0.0.1
    @allowed host example.com ...
    respond @allowed 200
    respond 401
}

Or something like that. Might be a long lineā€¦

(This is much easier in JSON config.)

Ok, reporting back. No improvement. About the same time interval where the CPU goes nuts, the I get "logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded".

It does look like the tls is choking the cpu, but itā€™s about the ā€œcache maintenanceā€ not necessarily the certificate generation.

I donā€™t understand why the admin is starting then is failing to stopā€¦

{"level":"info","ts":"2022-09-15T22:04:42.464Z","logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_ip":"127.0.0.1","remote_port":"56758","headers":{"Accept-Encoding":["gzip"],"Content-Length":["167351"],"Content-Type":["application/json"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":"2022-09-15T22:04:42.500Z","logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
{"level":"info","ts":"2022-09-15T22:04:42.502Z","logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc00031b420"}
{"level":"info","ts":"2022-09-15T22:04:42.502Z","logger":"http","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443
}
{"level":"info","ts":"2022-09-15T22:04:42.502Z","logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
{"level":"warn","ts":"2022-09-15T22:04:42.502Z","logger":"http","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv1","http_port":80}
{"level":"error","ts":"2022-09-15T22:04:52.500Z","logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded"}

and why the tls cache maintenance starts right then

{"level":"info","ts":"2022-09-15T22:00:08.836Z","logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc00032ac40"}

I got in the autoconfig.json a new section:

        "tls": {
            "automation": {
                "policies": [
                    {
                        "on_demand": true,
                        "subjects": [
                            "84-domains-in-this-list"
                        ]
                    },
                    {
                        "subjects": [
                            "few-domains-with-automatic-tls"
                        ]
                    }
                ]
            }
        }

Any other things I might try? :\

In the mean time, Iā€™ll configure the checker, nice idea! Will group up a big chunk of those domains with a header_regexp host and report back if anything changes.

That error just means that the config didnā€™t finish reloading in time; for example, perhaps a new config change came in before the old one finished? (I havenā€™t looked too closelyā€¦ I need to give my eyes a break right now.)

Can you reproduce the CPU usage and take a profile while itā€™s happening? http://localhost:2019/debug/pprof should open a simple UI for capturing a CPU profile, or a goroutine dump. Both could be useful.

I managed to create the profiles eventually (pushed to a random repo on github):

Hopefully this helps with the investigation, @matt

Thank you!

Looking at the profile, I think @francislavoie is onto something here: cpu issues on larger setups Ā· Issue #400 Ā· lucaslorentz/caddy-docker-proxy Ā· GitHub

I made another cpu profile with a 180 second duration, which should cover the whole process:

cpu180.svg

I replaced the 2.5.2 with 2.6.0-beta.3 and those 2 minutes got down to about 6 seconds!
It looks like @francislavoie was correct, the issue was in the basicauth processing (the hash computation).

I also reverted the on_demand setup. Itā€™s cool but it looks I donā€™t need it now.

3 Likes

Oh wow, I totally forgot about basicauth.

Yes, thanks to @francislavoie for improving that!

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.