How to configure Cloudflare?

There’s a lot of context cancelled stuff but that’s from the config being reloaded, and when that happens, anything currently in-progress gets cancelled so that the new config can take effect (don’t want to keep doing things that the old config wanted to do because it could now be wrong, knowing there’s a new config).

{"level":"info","ts":1660609070.7409632,"logger":"tls.issuance.acme.acme_client","msg":"successfully downloaded available certificate chains","count":2,"first_url":"https://acme-v02.api.letsencrypt.org/acme/cert/04fa2cb56e03a77cb69b61017867d47068e8"}
{"level":"info","ts":1660609070.7412431,"logger":"tls.obtain","msg":"certificate obtained successfully","identifier":"whoami.mysmarthome.network"}

It looks like it worked, eventually. I can hit your domain now and I get a response.

1 Like

Oh, no. It works. I suppose I must have been a tad impatient. :innocent: Before switching to this caddy-docker-proxy plugin, it was much quicker and I assumed it should be similar. Is it configurable? It sounds like it is itself getting impatient and retrying before it has had a chance to take affect?

BTW, extremely grateful for your assistance and patience. I’ve dabbled with some other proxies before Caddy. Eg Traefik was terribly complicated in this scenario and I posted on 7-8 forums, discords, reddits, without getting any help. Caddy is much more sensible, simple and it has been a pleasure reading and posting here. Thank you for that.

2 Likes

I think you just got unlucky, frankly. But also some “frequent” docker container up/downs may have killed off some of the cert issuance attempts that would have otherwise succeeded. Typically issuance takes just a few seconds (like 3-10s) and anything longer is very unusual.

From your logs, no, I don’t think that was the case at all. Retries were all after failing the previous try.

:blush:

1 Like

I hate to reopen this, but I am still having troubles. Something is failing with the caddy-docker-proxy plugin that does not fail running with a static config. It’s not taking seconds, it’s taking many minutes and certs are not registered or downloaded at all. The day I posted here last, I left Caddy and whoami running continually over night. The next morning I tried launching wikijs. Just added the two lines of config to the docker-compose file I previously used with Caddy without the plugin and launched it, but after at least 20 mins or so it didn’t come up. While still running, I looked on cloudflare and saw the dns entries, deleted them, and then it came up, but that was probably luck because I can’t repeat that scenario. Launched a cat demo container, and like whoami, it took maybe 5-10 mins or so before I could access it on my network. It also left the dns entries on cloudflare. Sometimes domains come up and sometimes they never do. Certs that have already be registered will sometimes re-download and sometimes not.

I’ve tried recreating my Cloudflare access token and that didn’t change anything.

I created an account on ZeroSSL. Haven’t really looked to see what other config is needed, but I noticed that a couple of domains did register there, but when run again it won’t download the certs nor create new ones.

I ran these same containers manually with Caddy before using the plugin and they came right up. And just ran them again manually without the plugin and they pull the cert and are up as quick as I can type in the url.

Maybe I should post something on the github for caddy-docker-proxy? Just following up here in case you might see something else I’m missing here. Otherwise, just close this and I’ll move my query over to github for the plugin.

Full logs:
https://zerobin.net/?502c321b3dc65965#hzMBiSM7ziWinwD6kxwfENvV918/1NoY734HNq/xARI=

What’s your Caddy config? Do you only have the one domain whoami.mysmarthome.network?

Also please don’t redact your domains, it makes it exceptionally difficult for us to help you, and in many cases people make errors when redacting making the logs and other reports unreliable.

My domain is mysmarthome.network and I’m trying to create the subdomains of whoami.mysmarthome.network, wiki.mysmarthome.network, etc. Sorry for redacting that in the earlier posts: in other forums, reddits, etc. I had read many posts saying it’s bad to post your domain for everyone to see.

Here are my configs:

Dockerfile

ARG CADDY_VERSION=2.5.2
FROM caddy:${CADDY_VERSION}-builder AS builder

RUN xcaddy build \
    --with github.com/lucaslorentz/caddy-docker-proxy/plugin \
    --with github.com/caddy-dns/cloudflare

FROM caddy:${CADDY_VERSION}-alpine

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

CMD ["caddy", "docker-proxy"]

Built with: docker build -t caddy-docker-proxy--mysmarthome_network .

docker-compose.yml for caddy

version: "3.7"

services:
  caddy:
    image: caddy-docker-proxy--mysmarthome_network:latest
    ports:
      - "80:80"
      - "443:443"
    networks:
      - proxy
    env_file: .env
    environment:
      - CADDY_INGRESS_NETWORKS=proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /docker-services/caddy/data:/data
      - /docker-services/caddy/config:/config
      - /docker-services/caddy/logs:/var/log/caddy
    deploy:
      labels:
        caddy.debug:
        caddy.log.output: file /var/log/caddy/caddy.log
        caddy.acme_dns: "cloudflare {env.CF_API_TOKEN}"
        caddy.email: "{env.EMAIL}"
      placement:
        constraints:
          - node.role == manager
      replicas: 1
      resources:
        reservations:
          cpus: "0.1"
          memory: 200M
      restart_policy:
        condition: any

networks:
  proxy:
    external: true

Deployed with: docker stack deploy -c docker-compose.yml caddy

A couple test containers are:

docker-compose.yml for whoami

version: "3.7"

services:

  whoami:
    image: jwilder/whoami
    networks:
      - proxy
    deploy:
      labels:
        caddy: whoami.mysmarthome.network
        caddy.reverse_proxy: "{{upstreams 8000}}"

networks:
  proxy:
    external: true

docker-compose.yml for wikijs

version: "3.7"

services:
  wikijs:
    image: lscr.io/linuxserver/wikijs:latest
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/New_York
    volumes:
      - /docker-services/wikijs/config:/config
      - /docker-services/wikijs/data:/data
    ports:
      - 3000:3000
    networks:
      - proxy
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.role == manager]
      labels:
         caddy: wiki.mysmarthome.network
         caddy.reverse_proxy: "{{upstreams 3000}}"

networks:
  proxy:
    external: true
1 Like

Thanks to @francislavoie for also pointing out to me that caddy-docker-proxy spits out the config to the logs :man_facepalming:

Appreciate all the useful info.

Do you have any wildcard domains you’re using / getting certs for? It doesn’t seem like it from this, but I want to check because I recently fixed a bug where getting certs for a wildcard domain and a subdomain at the same time could cause a timeout: go.mod: Upgrade CertMagic and acmez · caddyserver/caddy@63c7720 · GitHub

I’d be curious if you upgraded to use the latest commits on master if the error still occurs.

But I don’t think it would actually affect your use case. (Btw, for any more subdomains I’d just recommend using a wildcard cert, since Let’s Encrypt rate limits subdomains quite heavily.)

Oh, I see the problem now.

deleting temporary record for zone mysmarthome.network.: Delete "https://api.cloudflare.com/client/v4/zones/4c38d8a7f0598c0cc42ab69d65eeb822/dns_records/8a21f345e0b0bd0d4fd7136d8cf185e3\": context canceled

Basically, the config is unloaded and it tries to clean up the challenge but we use the same context that was cancelled by the config reload, so the underlying HTTP client aborts the request to clean up the DNS record.

I mean, this should be the right thing to do to avoid leaking resources, in theory. But maybe we need to use a context that isn’t tied to the config and use one with a timeout instead. I’ll look into this. Francis and I are chatting about this in Slack.

Thanks for looking in to this. If it matters about the certs I’ve got, I guess I got a lot, incl wild cards.

https://search.censys.io/certificates?q=mysmarthome.network
https://crt.sh/?q=mysmarthome.network

Many of those were created while trying different proxies: npm, traefik, etc. Esp the wildcards. I haven’t learned about wildcards in Caddy yet. And I don’t know enough about certs to understand why there are multiple copies issued instead of fetching the existing ones??? I’ve still got a ton to learn about all the under the hood stuff, while right now I’m just trying to get something up and running.

1 Like

I’m not sure about caddy-docker-proxy and if that affects wildcards, but if I were you I’d build Caddy from the latest master because it has a likely-relevant patch, since it seems like you’d benefit if there are indeed wildcard certs in play.

To do that, you can just add another parameter after xcaddy build to specify the git tag/branch/ref, e.g. xcaddy build master --with <etc>. You can probably just put master there for now, or better, get the commit hash of the current master and use that for now.

2 Likes

No luck yet. A bit more info

Here’s the Dockerfile:

ARG CADDY_VERSION=2.5.2
FROM caddy:${CADDY_VERSION}-builder AS builder

RUN xcaddy build c7772588bd44ceffcc0ba4817e4d43c826675379 \
    --with github.com/lucaslorentz/caddy-docker-proxy/plugin \
    --with github.com/caddy-dns/cloudflare

FROM caddy:${CADDY_VERSION}-alpine

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

CMD ["caddy", "docker-proxy"]

Launched the new image. Launched whoami. Same issue. Takes a while but finally resovles. Ditto on wikijs. I copied wikijs to a new config and launched with a subdomain I had not used before and it seems to behave the same.

For each of the three containers/subdomains launched, I observed it created two dns entries (_acme-challenge). It eventually deletes the first one and leaves the second. It creates two certs for each subdomain according to:

https://crt.sh/?q=mysmarthome.network

Here are the new logs:
https://zerobin.net/?4db958d996d77a47#hTJgq0CPD2FhDjkb5dFoeigvrapp6g8+CkU8IL3SKPA=

1 Like

Darn. So, this only happens when the config changes right? If you just set up the one (whoami) and leave it run, it works successfully? (Starting clean slate, i.e. no lingering DNS records, etc.)

If so, I might see about patching Caddy/CertMagic today so that it will always clean up even if the context is cancelled.

Filed a bug here: Use different context for challenge cleanup · Issue #200 · caddyserver/certmagic · GitHub

@simsrw73 Maybe fixed here: Use different context for DNS challenge cleanup · caddyserver/certmagic@76f61c2 · GitHub

Now that you’re a pro at deploying a custom version of Caddy :smile: how about if you try this commit: go.mod: Upgrade CertMagic to v0.16.3 · caddyserver/caddy@fe5f5df · GitHub (or use master again)

Thank you for helping troubleshoot!

Good news: It is cleaning up perfectly the DNS entries. I watched very close and there were never multiple challenge entries. And they were cleaned after cert was issued… When a cert was issued…

Launched whoami first. took about ~ 6-7 mins, about 3 iterations of challenges, counting by each new DNS challenge entry created and removed. It was issued by ZeroSSL. After many runs, this is only the 3rd time one was issued from ZeroSSL instead of LE.

Launched wiki1. WikiJS container renamed to a subdomain I haven’t used before. Same as above. About 6-7 mins, about 3 iterations. Issued by ZeroSSL (???) I don’t have a paid account there (yet) and I thought it was limited to 4? But dashboard shows 4 certs (wiki, catz, whoami, wiki1).

Launched kitties (mikesir87/cats) container. Haven’t used that subdomain before. After 30 mins, 9+ iterations, it still has not issued a cert, still trying.

Current Logs: (it’s refusing to accept my zerobin link)
https://tinyurl.com/4refew26

Why did none of these succeed on LE where most of the others had before? I see now on the certificate search where it shows two for each subdomain that one is a pre-certificate and one a leaf certificate which answers my question about why two certs are issued per subdomain, but why are certs not reacquired when one has previously been issued for a subdomain? Why me? lol. Only slightly kidding there. I am wondering why am I facing this problem? Am I doing something odd or non-standard? Surely others are running a similar configuration. What am I doing differently that caused this breakage?

… I was interrupted while composing this. It’s been well over an hour waiting for a cert on that last subdomain now and it still hasn’t , though I haven’t updated the linked logs above since nothing else has changed.

Yeah, their website is very misleading, but no, you have unlimited free certs with ZeroSSL, when issued via ACME.

Wow, I really don’t understand why the config is being reloaded so often. I’d honestly call that a bug in the swarm provider in CDP, probably.

I feel like it shouldn’t add the site to the config unless there’s a container that’s actually up, cause it causes the config to get loaded, then immediately get reloaded which cancels the initial config’s, etc.

But I can see the argument for it both ways, cause like maybe there is some config that should be provisioned even if there is no container running yet, and you might want to have like a handle_errors block which does whatever else when you don’t have a container available to handle it. I dunno. Bah.

Maybe CDP should like… debounce config updates to potentially group them up so it doesn’t try to reload a ton of times? Hmmmm

{"level":"error","ts":1660846756.82023,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"kitties.mysmarthome.network","issuer":"acme-v02.api.letsencrypt.org-directory","error":"[kitties.mysmarthome.network] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-v02.api.letsencrypt.org/acme/order/684992797/117414919027) (ca=https://acme-v02.api.letsencrypt.org/directory)"}

Anyways, seems like Caddy timed out on the propagation checks. This can happen if Caddy itself isn’t able to resolve the DNS properly to find out if the TXT record was successfully added by the plugin, before telling the ACME issuer “okay, it should be good now, please continue”. We do have an option to turn this off, but unfortunately it can’t be configured globally, it must be configured with the tls directive (in each site), and it must be per-issuer, so it ends up looking like this:

tls {
	issuer acme {
		dns cloudflare {env.CLOUDFLARE_TOKEN}
		propagation_timeout -1
	}
	issuer zerossl {
		dns cloudflare {env.CLOUDFLARE_TOKEN}
		propagation_timeout -1
	}
}

This might be it. I added tls resolver to config:

whoami:
  deploy:
    labels:
      caddy: whoami.mysmarthome.network
      caddy.reverse_proxy: "{{upstreams 8000}}"
      caddy.tls.resolvers: 1.1.1.1

I’ve launched a few containers with this config and so far they have all acquired certs and come up immediately. More testing to confirm, but seems to be DNS issue. I wonder if it’s due to my network config? DNS points to my Mikrotik router which is configured to use Cloudflare’s safe 1.1.1.3 over DoH. Will have to experiment with that to see if it changes. After more testing to confirm that all is working now.

EDIT: One thing I don’t understand though is why it seemed to work in my previous testing that didn’t use caddy-docker-proxy. Maybe luck. Or is it related to the plug-in?

Docker has its own DNS resolver layer which is what resolves container names to container IPs. That might not be playing nice in this case, preventing Caddy from seeing changes to the TXT records.

I don’t have a great grasp of what exactly Docker is causing to happen with DNS queries. It’s been a mystery to me.

If I had my way, we’d remove propagation checks altogether… I don’t think they’re useful at all, it doesn’t actually do anything, it just delays the cert issuance process until Caddy “knows” that the DNS is correct so that the ACME issuer doesn’t waste time polling for the DNS challenge. But in general, I don’t think that’s necessary. Some other DNS providers are known to take way too long to propagate changes done via their API (iirc GoDaddy was problematic) but most are quite rapid. So IMO it should be opt-in for the delay, instead of opt-out. But we’ll see. Hopefully I can convince @matt sooner rather than later :crazy_face:

I ran a few more tests today. Not extensive enough to be 100% sure, but when used the certs were acquired nearly immediately, and when I backed off the changes it went back to spinning its wheels. Here are things that did work:

  • Setting Caddy’s resolver option, eg. caddy.tls.resolvers: 1.1.1.1
  • Setting Dockers DNS option in /etc/docker/daemon.json: "dns": ["1.1.1.1"]
  • Setting the DHCP server on my router so that my Docker host’s dns is set to 1.1.1.1 (and I presume manually setting /etc/resolver.conf would work also to but mine is set to DHCP and I didn’t see a need to test that.)

Any of those work. Having my router’s DHCP point my Docker host back to the router and then the router going straight to 1.1.1.1, without the DoH, etc. doesn’t work.

I guess this is good enough for me to consider closed on my end and move on to the next challenge (probably Cloudflared tunnel). If there are any tests that I didn’t think to run, let me know. Appreciate the help more than I can say. I’ve toyed around with a few programs for reverse proxy and I’m feeling extremely confident this is the one I’m sticking with.

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.