On-Demand TLS cert check issue

mxrlkn · February 20, 2022, 5:21pm

1. Caddy version (`caddy version`):

v2.4.6

2. How I run Caddy:

a. System environment:

Docker alpine, built with xcaddy and this plugin:

RUN xcaddy build \
    --with github.com/techknowlogick/certmagic-s3

b. Command:

caddy run

d. My complete Caddyfile or JSON config:

{
  email my@email.com
  storage s3 {
    secret_key SECRETKEY
    access_key ACCESSKEY
    host mys3host.com
    bucket mybucket
    prefix myprefix
  }
  on_demand_tls {
    ask http://myapi.com/query
  }
}

:80 {
  redir https://{host}{uri} permanent
}

:443 {
  tls {
    on_demand
  }

  reverse_proxy http://myserver.com
}

3. The problem I’m having:

A https request with a domain that hasn’t been loaded yet hits the server. Caddy then tries to load the cert from the storage backend. If there isn’t a cert, it asks on demand tls if it can get one (ask http://myapi.com/query). It can’t get one, so the request stops there.

So because my storage backend is S3, and Caddy checks it on each request on new/declined domains, it’s possible to effectively make Caddy DDoS my S3 backend.

One solution would be to have Caddy first check on_demand_tls.ask, and if that returns OK 200, then try and load the domain, for domains that hasn’t been loaded yet.

I’d rather have it check my ask endpoint before my storage backend.

4. Error messages and/or full log output:

This happens everytime I try and hit caddy with a domain that isn’t allowed by my ask endpoint:

{"level":"info","ts":1645373368.4733694,"logger":"caddy.storage.s3","msg":"Load: myprefix/certificates/acme-v02.api.letsencrypt.org-directory/adomain/adomain.crt"}

5. What I already tried:

I don’t know what to try.

francislavoie · February 20, 2022, 5:27pm

S3 is not a reliable storage backend. Caddy relies on locks, but S3 doesn’t make that possible on its own.

I strongly suggest using a different storage backend, like Redis or Consul.

mxrlkn · February 20, 2022, 5:41pm

That’s a separate issue, but several S3 services now have strong read after write consistency, so I believe it should work just fine, as long as your S3 service supports that.

matt · February 21, 2022, 2:39am

Hey look, Francis actually hit the nail on the head here. S3 is not a compatible storage mechanism, period – at least, not for any deployments at scale, or with any significant volume.

“Read after write consistency” is not the issue. It’s atomicity, for example: Locking implementation does not appear to be atomic · Issue #3 · securityclippy/magicstorage · GitHub

S3 does not provide atomic operations (that I know of). Unfortunately, because of that, it is not a suitable storage backend for multiple clients, since you cannot guarantee exclusivity / concurrent safety like with databases or regular file systems.

So to clarify, our service (Caddy / CertMagic) supports what you want to do just fine. It’s either a faulty implementation of the interface methods (Load/Store/List/Delete/etc) or a faulty storage backend (lacks atomic operations, is expensive to access, is slow, etc).

For that specifically, I would suggest filing an issue with the authors of the S3 plugin(s). I just checked the CertMagic wiki and there appear to be quite a lot now:

But the Caddy project does not endorse any of these.

There is a DynamoDB implementation that does support locking, but be aware that DynamoDB is very expensive and slow, and it looks like even this implementation has reports of a locking bug:

All I can say is that it’s up to plugin authors to implement reliable storage modules (even if that means not implementing modules for certain backends that are unsuitable).

This is certainly a possibility. Could you open an issue to propose this? Here in CertMagic:

mxrlkn · February 21, 2022, 4:58am

I see. Apparently updates to a single key in S3 are atomic, but:

Amazon S3 does not support object locking for concurrent writers. If two PUT requests are simultaneously made to the same key, the request with the latest timestamp wins. If this is an issue, you must build an object-locking mechanism into your application.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel

So there’s not a “put object if not exists” operation. I thought the newer strong consistency model included that honestly.

So all of these S3 modules are very misleading basically. They’ll work fine with a single caddy instance, but that’s a big asterisk.

But apart from all of that. My issue would still be an issue with any of the proper locking ones like the consul and redis one.

Could you open an issue to propose this?

Yes I’ll do that.

francislavoie · February 21, 2022, 5:39am

Your point was that cost was an issue (if I understand correctly, because of lots of storage reads), but if you run your own Consul or Redis, that would be negligible.

matt · February 21, 2022, 5:50am

Not intentionally I’m sure, just not something the authors considered or used it for.

Thanks for opening an issue, that’ll keep us from losing track of things.

mxrlkn · February 21, 2022, 2:36pm

If I used a managed solution, cost could be an issue. But with self-hosted or even a “small” managed one, performance could definitely be an issue as well. At least compared to my ask endpoint which I’d have full control over.

mxrlkn · February 21, 2022, 2:46pm

github.com/caddyserver/certmagic

On-Demand TLS cert check possible improvement

opened 02:46PM - 21 Feb 22 UTC

mxrlkn

feature request

## What would you like to have changed? A https request with a domain that …hasn’t been loaded yet hits the server. Caddy then tries to load the cert from the storage backend. If there isn’t a cert, it asks on demand tls if it can get one (ask http://myapi.com/query). It can’t get one, so the request stops there. So if my storage backend is not disk, but something like Consul or Redis, and Caddy checks it on each request on new/declined domains, it’s possible to effectively make Caddy DDoS my storage backend. One solution would be to have Caddy first check `on_demand_tls.ask`, and if that returns OK 200, then try and load the domain, for domains that hasn’t been loaded yet. I’d rather have it check my ask endpoint before my storage backend, as I have full control over that. ## Why is this feature a useful, necessary, and/or important addition to this project? If your storage backend is a managed solution, cost could become an issue. If it's self-hosted or a small managed one, performance could become an issue as well.

system · March 22, 2022, 5:22pm

This topic was automatically closed after 30 days. New replies are no longer allowed.