Error "TLS alert, internal error (592)" (again)

No problem, thanks for clarifying.

Alrighty… here’s what I think is happening. Maybe you can give me your thoughts on it too as I work on a fix.

When a handshake is in process, Caddy calls this function to get a certificate to complete it:

This should obtain and load a certificate from storage, if necessary (the two true args).

First, we see if the certificate exists in the in-memory cache:

If so, we simply return it and use it. If not, this logic is invoked:

Notice that we only do any of it if OnDemand is enabled (and this depends on the ServerName, or hostname, of the handshake). If it’s not, the function simply gives up, hence an error like “no certificate available.”

Of course, getting that error is odd when the domain is explicitly configured: the cert should be obtained and loaded when Caddy starts. So why do those sometimes have “no certificate available” errors, while dynamic certs for domains that aren’t in the config work just fine all the time?

My hunch is because the in-memory cache has a maximum capacity:

And when many thousands of certificates are loaded, we have to evict one (which we do at random) to make room for another:

So, it follows that certificates for domains which appear explicitly in your config (and are not managed OnDemand) may work, then stop working if they get evicted, because the logic during the handshake forbids loading them into the cache at handshake-time (since they should have been loaded at startup); whereas the dynamic certs for OnDemand domain names always work since they can be loaded at handshake-time, replacing a cert that was loaded at startup.

Oops? Anyway, @francislavoie is right, you’re hitting an edge case. Most users with lots of dynamic certs like your config has don’t also specify hard-coded domains, and I guess I didn’t think about this case.

So we probably need to enable loading a certificate from storage even for non-OnDemand hostnames if the cache is at capacity, would you agree?