Getting CONNECT_CR_SRVR_HELLO errors

1. Caddy version (caddy version):

v2.2.1 h1:Q62GWHMtztnvyRU+KPOpw6fNfeCD3SkwH7SfT1Tgt2c=

2. How I run Caddy:

a. System environment:

Ubuntu 18.04
systemd

b. Command:

paste command here

c. Service/unit/compose file:

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

d. My complete Caddyfile or JSON config:

{
    email letsencrypt@communigator.co.uk
    storage file_system {
        root /media/caddyshare/certsV2
    }
}

(defaults) {
    reverse_proxy 10.117.1.71:80 10.117.1.72:80 {
        header_up X-Real-IP {remote_host}
        header_up X-Forwarded-For {remote_host}
    } 

    header T-Caddyhead "05"
    log {
        output file /var/log/caddy/process.log
    }
}

:443 {
    import defaults
    tls {
    on_demand
        }
}
########################
:80 {
    import defaults
}
########################
*.aeml1.com,
*.aeml1.co.uk,
*.aeml2.co.uk,
*.aeml3.co.uk,
*.ceml3.co.uk,
*.ceml4.co.uk,
*.cgml1.com,
*.cgml2.com,
*.communigatormail1.co.uk,
*.ctml1.com,
*.ctml2.com,
*.geml1.co.uk,
*.geml2.co.uk,
*.gtml1.com,
*.gtml2.com,
*.gtml3.com,
*.gtml4.com,
*.sgml1.com,
*.sgml1.co.uk,
*.sgml2.com,
*.sgml2.co.uk,
*.sgml3.com,
*.sgml3.co.uk,
*.tgml1.co.uk,
*.tgml2.co.uk,
*.tgml3.co.uk {
    import defaults
    tls {
        dns cloudflare [REDACTED ID]
     }
}
########################
http://*.aeml1.com,
http://*.aeml1.co.uk,
http://*.aeml2.co.uk,
http://*.aeml3.co.uk,
http://*.ceml3.co.uk,
http://*.ceml4.co.uk,
http://*.cgml1.com,
http://*.cgml2.com,
http://*.communigatormail1.co.uk,
http://*.ctml1.com,
http://*.ctml2.com,
http://*.geml1.co.uk,
http://*.geml2.co.uk,
http://*.gtml1.com,
http://*.gtml2.com,
http://*.gtml3.com,
http://*.gtml4.com,
http://*.sgml1.com,
http://*.sgml1.co.uk,
http://*.sgml2.com,
http://*.sgml2.co.uk,
http://*.sgml3.com,
http://*.sgml3.co.uk,
http://*.tgml1.co.uk,
http://*.tgml2.co.uk,
http://*.tgml3.co.uk {
    import defaults
}
########################
*.communigator.co.uk {
    import defaults
    tls /media/caddyshare/certs/static/communigator.co.uk.pem /media/caddyshare/certs/static/communigator.co.uk.key
}
########################
*.wowanalytics.co.uk {
    import defaults
    tls /media/caddyshare/certs/static/wowanalytics.co.uk.pem /media/caddyshare/certs/static/wowanalytics.co.uk.key
}
#######################
*.gatorleads.co.uk {
    import defaults
    tls /media/caddyshare/certs/static/gatorleads.co.uk.pem /media/caddyshare/certs/static/gatorleads.co.uk.key
}
#######################
nagios.communigator.co.uk:443 {
  reverse_proxy 10.117.4.20:80
  tls /media/caddyshare/certs/static/communigator.co.uk.pem /media/caddyshare/certs/static/communigator.co.uk.key
}
###################




3. The problem I’m having:

I have 2 environments - Live and Test.
Both run 2 CaddyServers using the same Certs folder for redundancy.
Everything works for static certs and wildcard (although I have another issue here were sub subdomains are requesting from on_demand and causing rate limits but this is seperate and I am trying to block these another way as almost all of these are illegitimate requests).

Issue I seem to be having is that when 2 servers are active some on_demand domains will only be working on one server i.e. any traffic going to one of the servers will work whilst any traffic going to the other will fail.

I am working around this problem on live but only running one active server but would like to get both in play and understand why this is happening.

4. Error messages and/or full log output:

Syslog:
Oct 16 17:57:47 ca-proxy05 caddy[30642]: {“level”:“info”,“ts”:1602871067.8974483,“logger”:“tls.on_demand”,“msg”:“obtaining new certificate”,“server_name”:“news-afigroup.co.uk”}

Process.log shows nothing

5. What I already tried:

Have attempted to set default_sni in global settings (after trying to set in defaults then realised this was wrong place) - though wasn’t sure what value I should use as we have A LOT of domains to serve and they change a lot so cannot be specific for each domain.

Spent some time on sub sub domain issue as thought this might be cause if changing service to use staging environement but I think that was a red herring.

6. Links to relevant resources:

The ask global option is the solution for this.

What are you using to sync the filesystem between servers?

Thanks for the swift response.

Re: Ask global solution - I though this may be the way to go. I will have a try at setting it up.

I am not syncing the filesystem, they are both mounting the same folder. Is this not the way to do it?

That should be fine - are you saying both of them are on the same machine? Or is it a network share?

They are mounting a windows share using fstab.

Hi,
Actually I think this is nothing to do with shared file system now. I have removed the second server and have changed the firewall config from load balancing between 2 IP’s to a straight forward 1-2-1 NAT.
With that done the system is simply not renewing certificates as they come up for expiry.
All the log says is this:
Oct 20 18:15:32 ca-proxy04 caddy[5210]: {“level”:“info”,“ts”:1603217732.6481144,“logger”:“tls.on_demand”,“msg”:“obtaining new certificate”,“server_name”:“globalknowledgetraining.com”}

And I THINK, unrelated I see these error messages.
Oct 20 18:14:45 ca-proxy04 caddy[5210]: {“level”:“error”,“ts”:1603217685.1283507,“logger”:“http.handlers.reverse_proxy”,“msg”:“aborting with incomplete response”,“error”:“http2: stream closed”}

Since I have been having these issues I have added the ask directive to my global settings in caddyfile. But as I say that is after these issues started so do not believe related.

Any help gratefully received.

Many thanks
Chris

Are you running on Amazon?

A similar issue is reported here:

Can you verify if yours matches that description completely?

So far I am not convinced it is a bug in Caddy but am open to further evidence that can drill down independent of platform and make it easily reproducible.