Hangs on reload

1. Caddy version (caddy version):

  • v2.3.0
  • v2.4.0-beta.2

(tested both)

2. How I run Caddy:

a. System environment:

  • OS: Ubuntu 18.04 bionic
  • with Systemd;

b. Command:

/usr/bin/caddy reload -config /etc/caddy/Caddyfile -adapter caddyfile -address localhost:2019

c. Service/unit/compose file:

[Unit]
Description=Caddy Web Server
After=network.target


[Service]
Type=forking
PIDFile=/run/caddy.pid
ExecStartPre=/usr/bin/caddy validate -config /etc/caddy/Caddyfile
ExecStart=/usr/bin/caddy start -config /etc/caddy/Caddyfile -pidfile /run/caddy.pid
ExecReload=/usr/bin/caddy reload -config /etc/caddy/Caddyfile -adapter caddyfile -address localhost:2019
ExecStop=/usr/bin/caddy stop

d. My complete Caddyfile or JSON config:

LOTS OF SITES (about 100 server) using Caddyfile

3. The problem Iā€™m having:

Reload occurs timeout, after than I must have to restart server. (kills process and start again.)
For few moments after restart, reload works instantly. After using some period of time, reload hangs.

4. Error messages and/or full log output:

Apr 2 18:52:55 user caddy[50473]: {ā€œlevelā€:ā€œinfoā€,ā€œtsā€:1617357175.992867,ā€œmsgā€:ā€œusing provided configurationā€,ā€œconfig_fileā€:"/etc/caddy/Caddyfile",ā€œconfig_adapterā€:ā€œcaddyfileā€}
Apr 2 18:52:56 user caddy[50473]: {ā€œlevelā€:ā€œwarnā€,ā€œtsā€:1617357176.0004132,ā€œmsgā€:ā€œinput is not formatted with ā€˜caddy fmtā€™ā€,ā€œadapterā€:ā€œcaddyfileā€,ā€œfileā€:"/etc/caddy/Caddyfile",ā€œlineā€:2}
Apr 2 18:52:56 user caddy[39502]: {ā€œlevelā€:ā€œinfoā€,ā€œtsā€:1617357176.0013154,ā€œloggerā€:ā€œadmin.apiā€,ā€œmsgā€:ā€œreceived requestā€,ā€œmethodā€:ā€œPOSTā€,ā€œhostā€:ā€œlocalhost:2019ā€,ā€œuriā€:"/load",ā€œremote_addrā€:ā€œ127.0.0.1:57400ā€,ā€œheadersā€:{ā€œAccept-Encodingā€:[ā€œgzipā€],ā€œContent-Lengthā€:[ā€œ18144ā€],ā€œContent-Typeā€:[ā€œapplication/jsonā€],ā€œOriginā€:[ā€œlocalhost:2019ā€],ā€œUser-Agentā€:[ā€œGo-http-client/1.1ā€]}}
Apr 2 18:53:16 cemware caddy[39502]: {ā€œlevelā€:ā€œerrorā€,ā€œtsā€:1617357196.1463163,ā€œloggerā€:ā€œhttp.handlers.reverse_proxyā€,ā€œmsgā€:ā€œaborting with incomplete responseā€,ā€œerrorā€:ā€œwrite tcp 1.2.3.4:443->1.2.3.4:36997: write: broken pipeā€}

5. What I already tried:

I use JSON config instead of Caddyfile, but problem remains.
I upgraded to latest beta, but problem remains.

6. Links to relevant resources:

Itā€™s impossible to say without more details about your config or how youā€™re using Caddy.

There is really nothing special. (some lines are so much same- just repetition of sites by sites) Iā€™m just using Caddy for reverse proxy, which I had used on Nginx perfectly. I changed it to Caddy because Iā€™ve been changed all tech stack into Golang-based, and I like automatic certificates feature of Caddy.

{
  email     email@email.email
  http_port 80
  https_port 443
  admin localhost:2019
  storage file_system {
    root /root/.local/share/caddy
  }
}

(strip-www) {
	@www.{args.0} host www.{args.0}
	redir @www.{args.0} https://{args.0}{uri}
}

(add-www) {
	@{args.0} host {args.0}
	redir @{args.0} https://www.{args.0}{uri}
}

(cors) {
  @origin{args.0} header Origin {args.0}
  header @origin{args.0} Access-Control-Allow-Origin "{args.0}"
}

(default) {
  header -Server
}


(acl) {
  @acl {
    remote_ip WHITE_IP1/24
    remote_ip WHITE_IP2/24
    remote_ip WHITE_IP3/24
  }
}


DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}


DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}
DOMAIN {
  import default
  redir https://DOMAIN{uri}
}

DOMAIN {
  import default
  redir https://DOMAIN{uri}
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}
DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN,
DOMAIN {
  import default
  redir https://DOMAIN{uri}
}

DOMAIN {
  import default
  file_server {
    root /var/www/DOMAIN/
    index index.html
  }
  handle_errors {
    @404 {
      expression {http.error.status_code} == 404
    }

    rewrite @404 /index.html
    file_server {
      root /var/www/DOMAIN/
      index index.html
    }
  }

  route /PATH* {
    uri strip_prefix /PATH
    reverse_proxy http://IP:PORT
  }
}


DOMAIN {
  import default
  redir https://DOMAIN{uri}
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN,
DOMAIN,
DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}

DOMAIN {
  import default
  import @acl

  import cors DOMAIN
  reverse_proxy https://IP:PORT {
    transport http {
      tls_insecure_skip_verify
    }
  }
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
  basicauth {
    USERNAME JDJhJDE0JGQxSTUwZzA2RlZVY0NOY3F3Zk1ucy4waG9jWmMwbUZZdFZOMGJ6UnpJdFpmTmxEbVRRdmcy
  }
}

DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}
DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}


DOMAIN {
  import default
  redir https://DOMAIN{uri}
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  redir https://DOMAIN{uri}
}
DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}
DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}

DOMAIN {
  import default
  reverse_proxy http://IP:PORT
}


DOMAIN {
  import default
  tls /etc/caddy/certs/DOMAIN/_wildcard_.DOMAIN.com_202012176495.crt.pem /etc/caddy/certs/DOMAIN/_wildcard_.DOMAIN.com_202012176495.key.pem
  redir https://www.DOMAIN.com{uri}
}
DOMAIN {
  import default
  tls /etc/caddy/certs/DOMAIN/_wildcard_.DOMAIN.com_202012176495.crt.pem /etc/caddy/certs/DOMAIN/_wildcard_.DOMAIN.com_202012176495.key.pem
  reverse_proxy http://IP:PORT {
    header_up X-Real-IP {remote_host}
    header_up X-Forwarded-For {remote_host}
  }
}
DOMAIN {
  import default
  tls /etc/caddy/certs/mathcollabo/_wildcard_.DOMAIN.com_202012176495.crt.pem /etc/caddy/certs/DOMAIN/_wildcard_.DOMAIN.com_202012176495.key.pem
  reverse_proxy http://IP:PORT
}


DOMAIN {
  import default
  import acl

  reverse_proxy @acl http://IP:PORT
}


## WEB
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN,
DOMAIN {
  import default
  import acl
  reverse_proxy @acl http://IP:PORT
}

Thanks, I think what Francis meant is how can someone else reproduce the behavior?

FYI, localhost:2019 is the default address, so you can omit that from your config and your command. You can also omit -adapter caddyfile because the file is called Caddyfile, so the Caddyfile adapter will be used by default.

Can you explain more what this means? Why must you restart the server? What is timing out? What change are you making in the config when the problem occurs?

I just like to be explicit, so that someone later who would work on this can see it and know what this does :sweat_smile:

Anyway, I really want to know where all these problems occurs. From what I suspect from log, maybe this comes from automatic certificates management issues or somethings ( Sorry for not posting all of these at first time, I was exhausted by the time I posted this at 3 AM)

If I change nothing from config and reload, it seems no hangs.

Apr  3 14:02:18 USER caddy[51915]: {"level":"info","ts":1617426138.3133948,"msg":"using provided configuration","config_file":"/etc/caddy/Caddyfile","config_adapter":"caddyfile"}                                                    Apr  3 14:02:18 cemware caddy[51915]: {"level":"warn","ts":1617426138.326343,"msg":"input is not formatted with 'caddy fmt'","adapter":"caddyfile","file":"/etc/caddy/Caddyfile","line":2}
Apr  3 14:02:18 USER caddy[31459]: {"level":"info","ts":1617426138.3274755,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_addr":"127.0.0.1:39836","headers":{"Accept-Enc$ding":["gzip"],"Content-Length":["18144"],"Content-Type":["application/json"],"Origin":["localhost:2019"],"User-Agent":["Go-http-client/1.1"]}}
Apr  3 14:02:18 USER caddy[31459]: {"level":"info","ts":1617426138.349725,"logger":"admin.api","msg":"config is unchanged"}                                                                                                           Apr  3 14:02:18 cemware caddy[31459]: {"level":"info","ts":1617426138.3499577,"logger":"admin.api","msg":"load complete"}

But if I change config and reload, log is like;

Apr  3 14:04:00 USER caddy[52837]: {"level":"info","ts":1617426240.3973637,"msg":"using provided configuration","config_file":"/etc/caddy/Caddyfile","config_adapter":"caddyfile"}
Apr  3 14:04:00 USER caddy[52837]: {"level":"warn","ts":1617426240.4071853,"msg":"input is not formatted with 'caddy fmt'","adapter":"caddyfile","file":"/etc/caddy/Caddyfile","line":2}                                              
Apr  3 14:04:00 USER caddy[31459]: {"level":"info","ts":1617426240.4079936,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_addr":"127.0.0.1:40906","headers":{"Accept-Encoding":["gzip"],"Content-Length":["18144"],"Content-Type":["application/json"],"Origin":["localhost:2019"],"User-Agent":["Go-http-client/1.1"]}}
Apr  3 14:04:00 USER caddy[31459]: {"level":"info","ts":1617426240.411902,"logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["localhost:2019","[::1]:2019","127.0.0.1:2019"]}                                                                                                                                                                                                                                    
Apr  3 14:04:00 USER caddy[31459]: {"level":"info","ts":1617426240.4121284,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc002b944d0"}
Apr  3 14:04:00 USER caddy[31459]: {"level":"info","ts":1617426240.4825945,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"<DOMAIN_WITH_MANUALLY_BOUGHT_CERTIFICATES>","server_name":"srv0"}
Apr  3 14:04:00 USER caddy[31459]: {"level":"info","ts":1617426240.4826496,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"<DOMAIN_WITH_MANUALLY_BOUGHT_CERTIFICATES>","server_name":"srv0"}
Apr  3 14:04:00 USER caddy[31459]: {"level":"info","ts":1617426240.4826708,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"<DOMAIN_WITH_MANUALLY_BOUGHT_CERTIFICATES>","server_name":"srv0"}
Apr  3 14:04:00 USER caddy[31459]: {"level":"info","ts":1617426240.482713,"logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}                                                                      
Apr  3 14:04:02 USER caddy[31459]: {"level":"info","ts":1617426242.644434,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN", "DOMAIN"]}
Apr  3 14:04:03 USER caddy[31459]: {"level":"info","ts":1617426243.663937,"logger":"tls","msg":"cleaned up storage units"}                                                                                                            
Apr  3 14:04:10 USER caddy[31459]: {"level":"error","ts":1617426250.4122725,"logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded"}
Apr  3 14:07:00 USER systemd[1]: caddy.service: Reload operation timed out. Killing reload process.

So IMHO maybe at the time of enabling automatic TLS cert management, because there are so many domains, even though they do not have to be renewed, caddy tried and took so much time, and timeoutā€¦? just thinking.

Bit of updates.

I manually reload and wait, and figured out it took about 10 minutes or more to finish, that systemd considered it to be timed out.

I tested on-demand TLS, but seems situation does not change either.

Maybe I have to change the title, not ā€˜hangs on reloadā€™ but ā€˜takes too many time on reload after config changes.ā€™.

But really, for me it takes like 10 minutes at minimum, and just now it took 30 minutes :sob: Are there any cases youā€™ve seen this? I just want to know this is my fault and can fix by some config, hopefully. I really love Caddyā€™s feature, but it is very embarrassing :sob:

Regardless of on-demand tls; (on-demand tls works perfectly, really awesome), still takes so many time to reload after config changes.

What can I do to reproduce the behavior youā€™re seeing, @kesuskim ? Iā€™ll need specific configs and commands to run (preferably without systemd, docker, etc ā€“ just use the CLI and/or API, curl can be helpful here).

Hello @matt :smiley:

I use systemd, but I confirmed this situation regardless of systemdā€™s existence.

1. config (Caddyfile)

For the matter of configuration, as I mentioned in Hangs on reload - #3 by kesuskim, there are lots of sites. Here are updated with on-demand tls.

{
#  debug
  email     email@email.com
  http_port 80
  https_port 443
  admin localhost:2019
  storage file_system {
    root /root/.local/share/caddy
  }

  on_demand_tls {
    interval 10s
    burst 10
  }
}

(strip-www) {
	@www.{args.0} host www.{args.0}
	redir @www.{args.0} https://{args.0}{uri}
}

(add-www) {
	@{args.0} host {args.0}
	redir @{args.0} https://www.{args.0}{uri}
}

(cors) {
  @origin{args.0} header Origin {args.0}
  header @origin{args.0} Access-Control-Allow-Origin "{args.0}"
}

(default) {
  header -Server
}

(log) {
  log {
    output /var/log/caddy.log
  }
}

(acl) {
  @acl {
    remote_ip INTERNAL_IP_1/24
    remote_ip INTERNAL_IP_2/24
    remote_ip INTERNAL_IP_3/24
  }
}

(autossl) {
  tls {
    on_demand
  }
}


DOMAIN1 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN2 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN3 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN4 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN5 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN6 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}


DOMAIN7 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}
DOMAIN8 {
  import default
  import autossl
  redir https://DOMAIN11{uri}
}

DOMAIN9 {
  import default
  import autossl
  redir https://DOMAIN12{uri}
}

DOMAIN10 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}
DOMAIN11 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN12 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN13,
DOMAIN14 {
  import default
  import autossl
  redir https://DOMAIN15{uri}
}

DOMAIN15 {
  import default
  import autossl
  #  import acl
  #  reverse_proxy @acl http://TARGET_HOST
  file_server {
    root /var/www/DOMAIN15/
    index index.html
  }
  handle_errors {
    @404 {
      expression {http.error.status_code} == 404
    }

    rewrite @404 /index.html
    file_server {
      root /var/www/DOMAIN15/
      index index.html
    }
  }

  route /api* {
    uri strip_prefix /api
    reverse_proxy http://TARGET_HOST
  }
}

DOMAIN16 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN17 {
  import default
  import autossl
  redir https://DOMAIN18{uri}
}

DOMAIN18 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN19 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN20,
DOMAIN21,
DOMAIN22 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN23 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN24 {
  import default
  import autossl

  import cors DOMAIN23
  reverse_proxy https://TARGET_HOST {
    transport http {
      tls_insecure_skip_verify
    }
  }
}

DOMAIN25 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
  basicauth {
    username JDJhJDE0JGQxSTUwZzA2RlZVY0NOY3F3Zk1ucy4waG9jWmMwbUZZdFZOMGJ6UnpJdFpmTmxEbVRRdmcy
  }
}

DOMAIN26 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}
DOMAIN27 {
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}


DOMAIN28 {
  import default
  import autossl
  redir https://DOMAIN29{uri}
}

DOMAIN29 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN30 {
  import default
  import autossl
  redir https://DOMAIN31{uri}
}
DOMAIN31 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN32 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}
DOMAIN33 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN34 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}

DOMAIN35 {
  import default
  import autossl
  reverse_proxy http://TARGET_HOST
}


DOMAIN36 {
  import default
  tls /etc/caddy/certs/DOMAIN36/_wildcard_.DOMAIN36.com_202012176495.crt.pem /etc/caddy/certs/DOMAIN36/_wildcard_.DOMAIN36.com_202012176495.key.pem
  redir https://DOMAIN37{uri}
}
DOMAIN37 {
  import default
  tls /etc/caddy/certs/DOMAIN36/_wildcard_.DOMAIN36.com_202012176495.crt.pem /etc/caddy/certs/DOMAIN36/_wildcard_.DOMAIN36.com_202012176495.key.pem
  reverse_proxy TARGET_HOST {
    header_up X-Real-IP {remote_host}
    header_up X-Forwarded-For {remote_host}
  }
}
DOMAIN38 {
  import default
  tls /etc/caddy/certs/mathcollabo/_wildcard_.mathcollabo.com_202012176495.crt.pem /etc/caddy/certs/mathcollabo/_wildcard_.mathcollabo.com_202012176495.key.pem
  reverse_proxy http://TARGET_HOST
}


DOMAIN39 {
  import default
  import autossl
  import acl

  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN40 {
  import default
  import autossl
  import acl

  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN41 {
  import default
  import autossl
  import acl

  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN42 {
  import default
  import autossl
  import acl

  reverse_proxy @acl http://TARGET_HOST
}
DOMAIN43 {
  import default
  import autossl
  import acl

  reverse_proxy @acl http://TARGET_HOST
}

DOMAIN44 {
  import default
  import autossl

  reverse_proxy http://TARGET_HOST
}
DOMAIN45 {
  import default
  import autossl

  reverse_proxy http://TARGET_HOST
}
DOMAIN46 {
  import default
  import autossl

  reverse_proxy http://TARGET_HOST
}
DOMAIN47 {
  import default
  import autossl

  reverse_proxy http://TARGET_HOST
}


## WEB
DOMAIN48,
DOMAIN49,
DOMAIN50,
DOMAIN51,
DOMAIN52,
DOMAIN53,
DOMAIN54,
DOMAIN55,
DOMAIN56,
DOMAIN57,
DOMAIN58,
DOMAIN59,
DOMAIN60,
DOMAIN61,
DOMAIN62,
DOMAIN63
{
  import default
  import autossl
  import acl
  reverse_proxy @acl http://TARGET_HOST
}

So there is total 63 domains with almost reverse-proxied, some are redirected to another domain (non-www to www, and .co domain to .com.).

Some are publicly open, some are privately whitelisted with remote_ip directive with custom matcher acl. We use caddy mainly for company internal test web server, and some on staging-level service, some on small size of production-level service.

2. runtime OS

Ubuntu 18.04.4 LTS bionic

3. How to run caddy

With following command, I started caddy. I was previously using systemd, but I checked same goes without systemd.

/usr/bin/caddy start -config /etc/caddy/Caddyfile

4. Where problem occurs

After I run caddy, It works perfectly (I frequently see these kinds of errors; "logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"write tcp 1.2.3.4:443->5.6.7.8:41510: write: connection timed out", but seems no effects to system itself so I ignored).

But after I changes some config (letā€™s say I add more domain to Caddyfile, or change DOMAIN26 from public to private with custom acl matcher), and reload with following command,

/usr/bin/caddy reload -config /etc/caddy/Caddyfile -adapter caddyfile -address localhost:2019
# or simply just with...
/usr/bin/caddy reload -config /etc/caddy/Caddyfile
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.8664992,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_addr":"127.0.0.1:59992","headers":{"Accept-Encoding":["gzip"],"Content-Length":["18831"],"Content-Type":["application/json"],"Origin":["localhost:2019"],"User-Agent":["Go-http-client/1.1"]}}
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.8923666,"logger":"admin","msg":"admin endpoint started","address":"tcp/localhost:2019","enforce_origin":false,"origins":["localhost:2019","[::1]:2019","127.0.0.1:2019"]}
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.892636,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc000351ce0"}
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.913508,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already
loaded","domain":"DOMAIN36","server_name":"srv0"}
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.9136827,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"DOMAIN36","server_name":"srv0"}
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.9137046,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"DOMAIN37","server_name":"srv0"}
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.9137492,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"DOMAIN38","server_name":"srv0"}
Apr  6 10:19:37 USER caddy[31459]: {"level":"info","ts":1617671977.9137578,"logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
Apr  6 10:19:40 USER caddy[31459]: {"level":"info","ts":1617671980.0855713,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["DOMAIN1","DOMAIN2","DOMAIN3","DOMAIN4","DOMAIN5","DOMAIN6","DOMAIN7","DOMAIN8","DOMAIN9","DOMAIN10","DOMAIN11","DOMAIN12","DOMAIN13","DOMAIN14","DOMAIN15","DOMAIN16","DOMAIN17","DOMAIN18","DOMAIN19","DOMAIN20","DOMAIN21","DOMAIN22","DOMAIN23","DOMAIN24","DOMAIN25","DOMAIN26","DOMAIN27","DOMAIN28","DOMAIN29","DOMAIN30","DOMAIN31","DOMAIN32","DOMAIN33","DOMAIN34","DOMAIN35","DOMAIN39","DOMAIN40","DOMAIN41","DOMAIN42","DOMAIN43","DOMAIN44","DOMAIN45","DOMAIN46","DOMAIN47","DOMAIN48","DOMAIN49","DOMAIN50","DOMAIN51","DOMAIN52","DOMAIN53","DOMAIN54","DOMAIN55","DOMAIN56","DOMAIN57","DOMAIN58","DOMAIN59","DOMAIN60","DOMAIN61","DOMAIN62","DOMAIN63",]}
Apr  6 10:19:47 cemware caddy[31459]: {"level":"error","ts":1617671987.8927,"logger":"admin","msg":"stopping current admin endpoint","error":"shutting down admin server: context deadline exceeded"}

It takes more than 10 minutes; observably 20 minutes or 30 minutes, with simple config changes.

Timeout occurs at systemd level because it took so much time that systemd considered it to be timed out, but if I triggers reload manually, it works eventually after very long time (20~30 minutes).

I just tested if I reload, reload works seemingly very soon, but admin api waits very long time (quoted above), reload does not end until it returns (after 20~30 minutes).

If I manually kill reload process (caddy reload -config /etc/caddy/Caddyfile) with SIGINT (ctrl+c) and try again, it does not work either.

Sorry to leave you hanging like your web server, just been trying to catch up on thingsā€¦

Thank you matt :slight_smile:
Iā€™ve tried to minimize reload; buffering changes and reload, so just live with itā€¦
It would be fantastic if this can be solved :smiley:

To be clear, the logs you posted above this comment only span 10 seconds, not ten minutes.

Anyway, Iā€™ve taken your config and run it locally, and reloaded it a ton of times. It consistently takes just a second (most of that time is spent setting up HTTP handler routes, specifically this section of code right here). But it doesnā€™t grow. (Edit: @francislavoie pointed out that the 1s lag happens because of the basicauth directive, it spends some time in the bcrypt algorithm. But it doesnā€™t grow as you add more.)

Youā€™ll need to be more specific. What should I do, step-by-step, to reproduce the behavior? Be very specific about config changes, commands to run, when to observe slowness, etc, from beginning to end.

Matt was digging deeper and he spotted that the biggest time spent was actually in blowfish package - golang.org/x/crypto/blowfish - pkg.go.dev which is the underlying cipher used for bcrypt, i.e. the password hashing algorithm.

Removing the basicauth block seemed to have sped it up. So it seems that the problem lies with basicauth, somewhere in provisioning.

How many user-password pairs do you actually have in your config? I could guess that you have more than one and that you omitted the rest, replacing it with just a stub?

Anyways, try removing basicauth and see if it reloads fast.

(I am not really convinced that the slowdown from basicauth grows, though. I suspect thereā€™s something else.)

A profile and/or trace would help us know.

@kesuskim Hereā€™s what to do:

Run your web server. Then open http://localhost:2019/debug/pprof in your browser. Youā€™ll see a few debugging options. Weā€™re interested in a profile. Profiles run over time and then the browser will download the results. You can make one by navigating the browser to http://localhost:2019/debug/pprof/profile?debug=1&seconds=N (where N is a number of seconds). While the browser is hanging on that request, perform a reload of your config that is slow. The goal is to capture the entire reload during the profile. So, adjust N to be just a little longer than the time it takes to reload the config. When N seconds are up, the browser will download your profile.

Then post the profile here.

Does that make sense?

(You can also do the same thing for the trace endpoint. But definitely do it for the profile.)

2 Likes

Hello francislavoie :slight_smile:
Sadly, there are only one basicauth block and have removed it and tested it, but situation same.

Hello matt :slight_smile:
I have done what you said and got profile, but I could not upload it here (only images are allowed, it seems?) . I extract GIF with go tool pprof profile_file and below is here.


If there are any way I can upload profile file, I will upload it asap.

To add a bit information of profile;

I take 30 minutes of profile (as shown in gif), but reload finished within 20 minutes (maybe about 18 minutes). This time varies by situations :sob: I saw if there are many users accessing caddy web server, it took more than 30 minutes.

You could upload the profile to pastebin or a github gist maybe. Anywhere is fine.

Iā€™m not gleaning much from that image, but it looks like the proxy is slow to write, seems like due to TLS encryption? Are you writing very big responses frequently? Does your machine have hardware acceleration for AES?

Either way I donā€™t see how that relates to the reloads being slow.