On-Demand TLS cert check issue

1. Caddy version (caddy version):

v2.4.6

2. How I run Caddy:

a. System environment:

Docker alpine, built with xcaddy and this plugin:

RUN xcaddy build \
    --with github.com/techknowlogick/certmagic-s3

b. Command:

caddy run

d. My complete Caddyfile or JSON config:

{
  email my@email.com
  storage s3 {
    secret_key SECRETKEY
    access_key ACCESSKEY
    host mys3host.com
    bucket mybucket
    prefix myprefix
  }
  on_demand_tls {
    ask http://myapi.com/query
  }
}

:80 {
  redir https://{host}{uri} permanent
}

:443 {
  tls {
    on_demand
  }

  reverse_proxy http://myserver.com
}

3. The problem Iā€™m having:

A https request with a domain that hasnā€™t been loaded yet hits the server. Caddy then tries to load the cert from the storage backend. If there isnā€™t a cert, it asks on demand tls if it can get one (ask http://myapi.com/query). It canā€™t get one, so the request stops there.

So because my storage backend is S3, and Caddy checks it on each request on new/declined domains, itā€™s possible to effectively make Caddy DDoS my S3 backend.

One solution would be to have Caddy first check on_demand_tls.ask, and if that returns OK 200, then try and load the domain, for domains that hasnā€™t been loaded yet.

Iā€™d rather have it check my ask endpoint before my storage backend.

4. Error messages and/or full log output:

This happens everytime I try and hit caddy with a domain that isnā€™t allowed by my ask endpoint:

{"level":"info","ts":1645373368.4733694,"logger":"caddy.storage.s3","msg":"Load: myprefix/certificates/acme-v02.api.letsencrypt.org-directory/adomain/adomain.crt"}

5. What I already tried:

I donā€™t know what to try.

S3 is not a reliable storage backend. Caddy relies on locks, but S3 doesnā€™t make that possible on its own.

I strongly suggest using a different storage backend, like Redis or Consul.

Thatā€™s a separate issue, but several S3 services now have strong read after write consistency, so I believe it should work just fine, as long as your S3 service supports that.

Hey look, Francis actually hit the nail on the head here. S3 is not a compatible storage mechanism, period ā€“ at least, not for any deployments at scale, or with any significant volume.

ā€œRead after write consistencyā€ is not the issue. Itā€™s atomicity, for example: Locking implementation does not appear to be atomic Ā· Issue #3 Ā· securityclippy/magicstorage Ā· GitHub

S3 does not provide atomic operations (that I know of). Unfortunately, because of that, it is not a suitable storage backend for multiple clients, since you cannot guarantee exclusivity / concurrent safety like with databases or regular file systems.

So to clarify, our service (Caddy / CertMagic) supports what you want to do just fine. Itā€™s either a faulty implementation of the interface methods (Load/Store/List/Delete/etc) or a faulty storage backend (lacks atomic operations, is expensive to access, is slow, etc).

For that specifically, I would suggest filing an issue with the authors of the S3 plugin(s). I just checked the CertMagic wiki and there appear to be quite a lot now:

But the Caddy project does not endorse any of these.

There is a DynamoDB implementation that does support locking, but be aware that DynamoDB is very expensive and slow, and it looks like even this implementation has reports of a locking bug:

All I can say is that itā€™s up to plugin authors to implement reliable storage modules (even if that means not implementing modules for certain backends that are unsuitable).

This is certainly a possibility. Could you open an issue to propose this? Here in CertMagic:

1 Like

I see. Apparently updates to a single key in S3 are atomic, but:

Amazon S3 does not support object locking for concurrent writers. If two PUT requests are simultaneously made to the same key, the request with the latest timestamp wins. If this is an issue, you must build an object-locking mechanism into your application.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel

So thereā€™s not a ā€œput object if not existsā€ operation. I thought the newer strong consistency model included that honestly.

So all of these S3 modules are very misleading basically. Theyā€™ll work fine with a single caddy instance, but thatā€™s a big asterisk.

But apart from all of that. My issue would still be an issue with any of the proper locking ones like the consul and redis one.

Could you open an issue to propose this?

Yes Iā€™ll do that.

1 Like

Your point was that cost was an issue (if I understand correctly, because of lots of storage reads), but if you run your own Consul or Redis, that would be negligible.

Not intentionally Iā€™m sure, just not something the authors considered or used it for.

Thanks for opening an issue, thatā€™ll keep us from losing track of things.

1 Like

If I used a managed solution, cost could be an issue. But with self-hosted or even a ā€œsmallā€ managed one, performance could definitely be an issue as well. At least compared to my ask endpoint which Iā€™d have full control over.

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.