Why is caddy forcing an on-demand-tls "ask" on startup for certs where it has a valid cert?

franklouwers · March 5, 2024, 4:10pm

1. The problem I’m having:

I use the on_demand_tls feature, and it’s working great so far. I noticed that on the first request for www.allowedsite.tld, a request is made to the “ask” server, a cert is requested from Let’s Encrypt/ZeroSSL and the cert is stored in ~caddy/.local/share/caddy/certificates/acme-v02.api.letsencrypt.org-directory/www.allowedsite.tld/*

Subsequent requests for that same url are served up without doing the ask / letsencrypt / store dance, which is great.

When restarting Caddy however, the ask server IS queried, even if the certificate is already on the disk and is still valid.

This obviously changed the required SLA for the ask url drastically! I was under the impression the “ask” server needs to be available whenever the cert is about to expire or whenever a new hostname is being used for the very first time.

Is what I am seeing expected behaviour? If so, is this configurable?

2. Error messages and/or full log output:

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

3. Caddy version:

4. How I installed and ran Caddy:

a. System environment:

b. Command:

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

c. Service/unit/compose file:

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

d. My complete Caddy config:

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

5. Links to relevant resources:

francislavoie · March 5, 2024, 4:12pm

Yes, Caddy makes a request whenever it’s about to load the certificate into its cache, to check whether it should still use the certificate or skip loading it.

Yes, your ask endpoint must have very good uptime, you should be running it as close as possible to Caddy and ensure it has good latency.

franklouwers · March 5, 2024, 4:18pm

Why does it do that on load, instead of checking if the certs it has on disk are still valid? Is there a workaround?

(Besides doing, as some suggested:

{
	on_demand_tls {
		ask http://localhost:5555/
	}
}

http://localhost:5555 {
	respond 200
}

matt · March 5, 2024, 4:23pm

Because checking storage can be expensive. Some storage backends charge per read/query, or are remote, etc; and so after some large-scale sponsors mentioned this cost, we decided to protect storage access behind the ask endpoint as well.

If the cert isn’t allowed, doesn’t make sense to load it from storage.

franklouwers · March 5, 2024, 4:26pm

We’re talking .local here… I understand reading certs from remote backends, but when it’s .local. Also: does this mean the cache is purely in memory? Because if it is, following the same logic, why are you even storing them? As they are never to be used again. Written once, then forgotten forever.

matt · March 5, 2024, 4:31pm

Maybe we could add a special case for the FileStorage backend (the local disk – although, technically anything mounted as a local disk would count, including remote storages like NFS or cloud drives through rclone, etc). But then it’s a little weird to explain how the ask logic is different depending on the storage you use. Hmm.

Maybe what the ask endpoint protects should be configurable instead?

Certificates and keys, etc, are written and persisted in storage, but when they are first used, they are loaded into memory (“the cache”) for the lifetime of the server to reduce latency.

franklouwers · March 5, 2024, 4:43pm

Certificates and keys, etc, are written and persisted in storage, but when they are first used, they are loaded into memory (“the cache”) for the lifetime of the server to reduce latency.

Well no, they are not really “persisted”: when you restart Caddy, those certs aren’t trusted and require an explicit OK from the “ask” service.

francislavoie · March 5, 2024, 4:44pm

See Storage check -- can it be removed? · Issue #201 · caddyserver/certmagic · GitHub, we used to have it the other way around but it caused issues for some users.

They are persisted in the sense that Caddy doesn’t need to have them re-issued, it only needs to load them from storage.

franklouwers · March 5, 2024, 4:44pm

In an ideal world, there would be a config option dont_ask_for_valid_certs in the config?

matt · March 5, 2024, 4:50pm

What are you referring to when you say “they” in “they are not really persisted”? The certificates are indeed persisted across server process restarts – in storage.

The in-memory cache, which doesn’t need the “ask” endpoint, obviously does not persist because it is in memory, which the OS frees when the process exits.

matt · March 5, 2024, 4:51pm

What is “valid certs” though?

It’d probably be more like a toggle to choose whether the ask endpoint guards loading certs from storage (or “checking storage” at all), or only certificate issuance.

franklouwers · March 5, 2024, 4:54pm

A “valid cert” would be a cert for which we have the cert, key and metadata on disk (local or remote), and has not expired yet.

I understand that a cert which is “OK” per the Ask server, might not be OK today, but if Caddy wasn’t restarted between yesterday and today, you wouldn’t know either?

I assume the Ask server is checked at every attempt to renew the cert, so you’ll get the Deny when it expires…

To my understanding, the Ask should only be consulted when issuing or renewing certs.

franklouwers · March 5, 2024, 5:03pm

Alternatively, (but less of a fan) a solution could be to have a toggle to “TempAllow” if the Ask server is down?

matt · March 5, 2024, 6:01pm

But unless it’s already in memory, we don’t know that unless we go to storage and check first. (And we don’t use ‘ask’ if it’s already in memory of course.)

It doesn’t really matter: once it’s loaded into memory and hasn’t expired, it can be used. Even if your application no longer accepts that hostname (maybe a customer closed their account), Caddy can accept the TLS handshakes for the near future and show an error page at least.

You get a deny when your ‘ask’ endpoint stops returning a 200 for that domain, regardless of cert expiry. If Caddy gets a deny answer, it will let the cert expire (if it hasn’t already) and reject the handshake.

I’ve got my hands full today but could you open an issue on CertMagic to request a config option for what the ask endpoint protects? (It’s actually called a “DecisionFunc” in CertMagic, I guess)

franklouwers · March 5, 2024, 7:43pm

I see two options here:

lazy: when a request comes in for a cert we don’t have in the cache, instead of going to the Ask service, check if we have it on disk, if the one we have is non-expired, if so, load it (bypassing Ask)
non-lazy: on bootup, load all certs from disk to the cache.

Hmm, but not if the cert is in the cache, right?

So imagine that today I request foo.tld. It isn’t in the cache, so goes to the Ask. Ask returns a 200.

Tomorrowmorning, the Ask service would return a 403 Denied for the same domain (admin disabled it, for instance).

If between now and tomorrow morning Caddy isn’t restarted, how would it know we no longer want to serve foo.tld? As it won’t hit the Ask service (again assuming Caddy didn’t restart and the cert isn’t near expiry).

Done!

francislavoie · March 5, 2024, 8:32pm

1 is “on-demand”, 2 is “managed certs”. That’s how Caddy behaves now, and on_demand is the toggle for that.

Realize that many users have tens to hundreds of thousands of domains that Caddy maintains, so if Caddy were to load all those certs at startup, it could cause a lot of pressure (lock-ups while loading data). Doing it lazily spreads it out over a longer period of time and “as-needed” when requests come in.

It wouldn’t, unless the cert gets evicted from cache (either you hit the cache size limit which iirc is 50,000 certs by default) or you restart Caddy to force a clean cache. But that’s fine, your backend should reject the request anyway. This is just about the TLS cert, not about whether your app accepts the request.

franklouwers · March 5, 2024, 9:00pm

well no. There’s is a difference between “we know nothing about this domain, we’ve never seen it” (so we request a cert if on_demand is turned on) and “it isn’t in our cache, so we haven’t seen this since Caddy was restarted, but we may have seen this in the past and have certs”.

Agree. Lazy is the best option, but we (optionally) check storage first before going to Ask.

Exactly, which is why I was confused by @mholt’s comment which I might have mis-interpreted.

matt · March 5, 2024, 9:47pm

That’s what we used to do, but was expensive for some large-scale deployments.

We can maybe make it configurable, as I’ve been suggesting (and the issue you opened is requesting).

That’s not feasible – too many (hundreds of thousands), and defeats the purpose, as it would bring the server grinding to a halt.

The ask endpoint does not determine whether a domain gets to be served – that is up to your Caddyfile / HTTP config – the ask endpoint only grants permission to obtain/load certificates for a domain.

system · April 4, 2024, 9:48pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.