Caddy Cluster storage with AWS EBS or EFS

codegeek1001 · May 30, 2023, 8:19pm

1. The problem I’m having:

I am setting up a Caddy Cluster on AWS using multiple EC2 instances. Of course, I would like to store the TLS certs centrally for all instances to access. I am aware of storage plugins such as Redis and DynamoDB but since Caddy’s default is filesystem, I am wondering if I could use something like AWS EBS or EFS as a central filesystem to store the TLS certs and that way, I don’t have to use a storage plugin but still able to use a Caddy Cluster of multiple EC2 machines.

Anyone done this already or suggestions on if this can work ?

2. Error messages and/or full log output:

None.

3. Caddy version:

v2.6.4

4. How I installed and ran Caddy:

Officlal install on Debian/Ubuntu

a. System environment:

Ubuntu 22.04

b. Command:

d. My complete Caddy config:

{
    on_demand_tls {
        ask      http://localhost:3000/check
        burst 5
        interval 2m
    }
    log {
    output file /var/log/caddy/mycaddylogs.com-access.log {
        roll_size 10mb
        roll_keep 20
        roll_keep_for 720h
      }
  }
}

https:// {
    tls {
        on_demand
    }
    reverse_proxy http://localhost:9000
}

5. Links to relevant resources:

None.

matt · May 30, 2023, 8:50pm

Beware that we have received many reports of inconsistencies with EFS:

github.com/caddyserver/certmagic

What is going on with AWS EFS?

opened 03:59PM - 11 Feb 22 UTC

closed 11:01PM - 15 Feb 22 UTC

mholt

bug help wanted

We keep getting reports that Caddy/CertMagic sometimes return errors with regard…s to lock files. These are files that are created during certificate management so that only 1 instance in a cluster manages a certificate at any given time. A file is atomically created on disk (it must not already exist), then its contents are updated every few seconds with a new JSON payload containing the current timestamp. This is how we detect and prevent stale locks, which happen if a process is forcefully terminated, and which would deadlock other instances indefinitely. These errors seem to suggest that the lock files are empty or corrupted or incomplete: - https://caddy.community/t/keeping-lock-file-fresh-error-im-getting-from-the-logs/15053?u=matt `[ERROR] Keeping lock file fresh: unexpected end of JSON input - terminating lock maintenance (lockfile: /mnt/efs/caddy_data/locks/issue_cert_www. userDomain .pl.lock` - https://caddy.community/t/how-to-eliminate-downtime-in-a-caddy-cluster/15063?u=matt `obtaining certificate: unable to acquire lock 'issue_cert_eight.greenbongo.com': decoding lockfile contents: EOF` - https://github.com/caddyserver/caddy/issues/3954#issuecomment-753801739 `obtaining certificate: EOF` (user determined it was a permissions problem) And several other forum posts or issues in the past have also reported EOF or JSON errors. So... what is going on with EFS? Does it have problems with write consistency? Even after adding Sync() calls to the lock file code, these errors persist. When we update a lock file, we open it using O_RDWR, then read the file, then truncate the file to 0, then seek to position 0, then write the updated contents and sync and close the file: https://github.com/caddyserver/certmagic/blob/2f78e527561dfdc9f14b7618a265571f206f6be4/filestorage.go#L301-L338 I wonder if EFS has trouble with truncating/syncing? I have no idea.

Some file system nuances may be at play:

github.com/caddyserver/certmagic

Empty lockfile, likely previous process crashed or storage medium failure; treating as stale

opened 04:47AM - 17 May 23 UTC

cattyhouse

since https://github.com/caddyserver/certmagic/commit/79babffe28c5e593b19b5268bf…26447f1f5f0b26 , caddy produces lots of logs (32285 lines so far and keep increasing) like: https://github.com/imgk/caddy-trojan/issues/55#issue-1698507292 in above case the plugin `caddy-trojan` **does not create** `.lock` file at all. **Suggestion**: maybe [certmagic](https://github.com/caddyserver/certmagic) checks if `.lock` file exists before checking `io.EOF`?

If it is that then maybe we can work around it, at a minor performance cost.

codegeek1001 · May 30, 2023, 9:52pm

oooo very interesting. Now that I read a few more on EFS, it seems wonky and risky. I will look into redis I guess.

system · June 29, 2023, 9:52pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.