1. The problem I’m having:
I am setting up a Caddy Cluster on AWS using multiple EC2 instances. Of course, I would like to store the TLS certs centrally for all instances to access. I am aware of storage plugins such as Redis and DynamoDB but since Caddy’s default is filesystem, I am wondering if I could use something like AWS EBS or EFS as a central filesystem to store the TLS certs and that way, I don’t have to use a storage plugin but still able to use a Caddy Cluster of multiple EC2 machines.
Anyone done this already or suggestions on if this can work ?
2. Error messages and/or full log output:
None.
3. Caddy version:
v2.6.4
4. How I installed and ran Caddy:
Officlal install on Debian/Ubuntu
a. System environment:
Ubuntu 22.04
b. Command:
d. My complete Caddy config:
{
on_demand_tls {
ask http://localhost:3000/check
burst 5
interval 2m
}
log {
output file /var/log/caddy/mycaddylogs.com-access.log {
roll_size 10mb
roll_keep 20
roll_keep_for 720h
}
}
}
https:// {
tls {
on_demand
}
reverse_proxy http://localhost:9000
}
5. Links to relevant resources:
None.
matt
(Matt Holt)
May 30, 2023, 8:50pm
2
Beware that we have received many reports of inconsistencies with EFS:
opened 03:59PM - 11 Feb 22 UTC
closed 11:01PM - 15 Feb 22 UTC
bug
help wanted
We keep getting reports that Caddy/CertMagic sometimes return errors with regard… s to lock files. These are files that are created during certificate management so that only 1 instance in a cluster manages a certificate at any given time. A file is atomically created on disk (it must not already exist), then its contents are updated every few seconds with a new JSON payload containing the current timestamp. This is how we detect and prevent stale locks, which happen if a process is forcefully terminated, and which would deadlock other instances indefinitely.
These errors seem to suggest that the lock files are empty or corrupted or incomplete:
- https://caddy.community/t/keeping-lock-file-fresh-error-im-getting-from-the-logs/15053?u=matt `[ERROR] Keeping lock file fresh: unexpected end of JSON input - terminating lock maintenance (lockfile: /mnt/efs/caddy_data/locks/issue_cert_www. userDomain .pl.lock`
- https://caddy.community/t/how-to-eliminate-downtime-in-a-caddy-cluster/15063?u=matt `obtaining certificate: unable to acquire lock 'issue_cert_eight.greenbongo.com': decoding lockfile contents: EOF`
- https://github.com/caddyserver/caddy/issues/3954#issuecomment-753801739 `obtaining certificate: EOF` (user determined it was a permissions problem)
And several other forum posts or issues in the past have also reported EOF or JSON errors.
So... what is going on with EFS? Does it have problems with write consistency? Even after adding Sync() calls to the lock file code, these errors persist.
When we update a lock file, we open it using O_RDWR, then read the file, then truncate the file to 0, then seek to position 0, then write the updated contents and sync and close the file:
https://github.com/caddyserver/certmagic/blob/2f78e527561dfdc9f14b7618a265571f206f6be4/filestorage.go#L301-L338
I wonder if EFS has trouble with truncating/syncing? I have no idea.
Some file system nuances may be at play:
opened 04:47AM - 17 May 23 UTC
since https://github.com/caddyserver/certmagic/commit/79babffe28c5e593b19b5268bf… 26447f1f5f0b26 , caddy produces lots of logs (32285 lines so far and keep increasing) like: https://github.com/imgk/caddy-trojan/issues/55#issue-1698507292
in above case the plugin `caddy-trojan` **does not create** `.lock` file at all.
**Suggestion**:
maybe [certmagic](https://github.com/caddyserver/certmagic) checks if `.lock` file exists before checking `io.EOF`?
If it is that then maybe we can work around it, at a minor performance cost.
oooo very interesting. Now that I read a few more on EFS, it seems wonky and risky. I will look into redis I guess.
system
(system)
Closed
June 29, 2023, 9:52pm
4
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.