Preloading certificates in memory

1. Output of caddy version: v2.6.2 h1:wKoFIxpmOJLGl3QXoo6PNbYvGW4xLEgo32GPBEjWL8o=

2. How I run Caddy:

a. System environment:

Ubuntu 20.04 x86-64 VM, Systemd. Caddy is installed using the official repo.

b. Command:

systemctl reload caddy

c. Service/unit/compose file:

Default systemd config

d. My complete Caddy config:

{
        storage file_system /caddy
}

littlecraftstore.com {
        reverse_proxy https://lc.cdn-pi.com {
                header_up Host {upstream_hostport}
        }
}

3. The problem I’m having:

I run a Caddy cluster across US, EU, and Asia using s3fs as the shared storage, and around 500 hostnames. S3fs uses a s3 bucket in the us-east region to store certificates, ocsp and acme data. Each server uses Caddyfile configuration with default TLS parameters.

This setup works well, but the TLS negotiation from EU and Asia frequently takes between 0.5 to 1 second for hostnames that have valid certificates. I ran some tests on the s3fs directory and the latency seems to come from it having to contact us-east for the files. Once a hostname has had a few visits, it’s pretty fast until server reload.

4. Error messages and/or full log output:

Accessing Measure TTFB from 35 Locations - SpeedVitals shows very high TLS latency for the initial visit. Subsequent visits from different clients and different browsers in the same region are fast.

5. What I already tried:

I tried manually accessing each hostname using curl to preload certificates in memory, but the results are mixed.

For example:

curl -vik --resolve littlecraftstore.com:443:127.0.0.1 https://littlecraftstore.com

Running this command on each caddy server “warms up” a few hostnames. If I try this for all 500 of them, I start noticing tls negotiation latency again.

Is there a way to instruct Caddy to preload certificates for all hostnames in memory?

Hi Rahul, welcome –

So, a few things. First, S3 doesn’t provide atomic operations, so it is not suitable for clustered certificate storage. You may get lucky most of the time, but sometimes some cert operations will be duplicated / overlap. It’s not the end of the world usually, but something to note is that your TLS/ACME cluster will always be less efficient than it could be.

I noticed your Caddyfile in (d) only has 1 hostname in it and no on-demand TLS. Is that accurate? If so, well, there’d only be one cert, but also, by default, Caddy does pre-load all the certificates into memory when the config is loaded.

So, I’m not sure what to tell you; it should be pre-loading them by default.

Thanks, Matt. I might switch to Dynamodb for larger deployment. S3fs is used for the initial POC.

I forgot the last line of the Caddyfile.

{
        storage file_system /caddy
}

littlecraftstore.com {
        reverse_proxy https://lc.cdn-pi.com {
                header_up Host {upstream_hostport}
        }
}

import /configs/*

The /configs directory stores each hostname in a separate file. Each hostname has an identical config to littlecraftstore.com.

This initial POC doesn’t use on_demand tls yet. Would on_demand tls certs also be loaded in memory after the certificate is provisioned?

Ran some more tests today in a more controlled environment and logged curl timing (https://blog.cloudflare.com/a-question-of-timing/)

Tests run:

  1. Visit littlecraftstore.com and log stats
  2. Visit littlecraftstore.com again after 10 seconds with TLS cached and TCP connections intact
  3. Restart caddy, flush fs cache, wait 20 minutes to ensure that TLS cache and keepalives are expired, then revisit littlecraftstore.com and log stats

Ran this with s3fs as caddy storage, then again with a local directory as caddy storage. Here are the results: (appconnect includes TLS timing)

(All figures are seconds)
S3fs Cold request:
dnslookup: 0.000024 | connect: 0.024146 | appconnect: 0.919974 | pretransfer: 0.920095 | starttransfer: 1.081243 | total: 1.129471 | size: 62859

S3fs Subsequent request (Reusing connections, TLS cache):
dnslookup: 0.000024 | connect: 0.024311 | appconnect: 0.058072 | pretransfer: 0.058188 | starttransfer: 0.151286 | total: 0.202603 | size: 62859

S3fs request after restarting caddy and waiting 20 minutes:
dnslookup: 0.000028 | connect: 0.023603 | appconnect: 0.650617 | pretransfer: 0.650726 | starttransfer: 0.804902 | total: 0.850289 | size: 62859

Local storage cold request:
dnslookup: 0.000023 | connect: 0.024428 | appconnect: 0.060374 | pretransfer: 0.060519 | starttransfer: 0.127107 | total: 0.179072 | size: 62859

Local storage subsequent request (Reusing connections, TLS cache):
dnslookup: 0.000027 | connect: 0.024234 | appconnect: 0.057864 | pretransfer: 0.058019 | starttransfer: 0.097507 | total: 0.147492 | size: 62859

Local storage request after restarting caddy and waiting 20 minutes:
dnslookup: 0.000024 | connect: 0.023589 | appconnect: 0.059305 | pretransfer: 0.059420 | starttransfer: 0.220994 | total: 0.269670 | size: 62859

Logs suggest that the cold requests do wait for s3 to respond with the certificate instead of having certs loaded in the memory at the time of starting. Does this mean that certificate preloading is not working as it should?

1 Like

We had one large user try this, but DynamoDB is very expensive. Caddy will have to scan certificates to keep its storage clean (like removing expired ones), which is especially common with frequently-rotated domains. We had to add the storage_clean_interval config property to dramatically lower their costs. Basically, increasing that interval will cause more unused/expired certs to remain in storage for longer, but the scans are performed less frequently (they set it to once a month), so if DB queries are expensive, set this to a high value. If storage is expensive, keep it the default value or use a lower one.

Hm, so yeah, without on_demand enabled, certificates for all domains in the config that qualify for a cert should be loaded when the config is loaded.

Can you enable debug mode on your server and see what the logs say during a test like this? I really appreciate the thoroughness here, thank you for that!

The debug logs should mention when certs are loaded into memory.

Here are the debug logs:

Cold request:
Dec 31 21:54:29 atl1 caddy[372691]: {"level":"debug","ts":1672523669.01345,"logger":"tls","msg":"loading managed certificate","domain":"littlecraftstore.com","expiration":1678657676,"issuer_key":"acme-v02.api.letsencrypt.org-directory","storage":"FileStorage:/caddy"}
Dec 31 21:54:29 atl1 caddy[372691]: {"level":"debug","ts":1672523669.0547035,"logger":"tls.cache","msg":"added certificate to cache","subjects":["littlecraftstore.com"],"expiration":1678657676,"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"7e446f65b512d43f0acf1d37591d6566738d445548a45458912fe4733bd6a2b0","cache_size":4,"cache_capacity":10000}

Subsequent request after a few minutes:
Dec 31 21:55:16 atl1 caddy[372691]: {"level":"debug","ts":1672523716.647431,"logger":"tls.handshake","msg":"choosing certificate","identifier":"littlecraftstore.com","num_choices":1}
Dec 31 21:55:16 atl1 caddy[372691]: {"level":"debug","ts":1672523716.64761,"logger":"tls.handshake","msg":"default certificate selection results","identifier":"littlecraftstore.com","subjects":["littlecraftstore.com"],"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"7e446f65b512d43f0acf1d37591d6566738d445548a45458912fe4733bd6a2b0"}
Dec 31 21:55:16 atl1 caddy[372691]: {"level":"debug","ts":1672523716.6479797,"logger":"tls.handshake","msg":"matched certificate in cache","remote_ip":"159.203.118.102","remote_port":"50228","subjects":["littlecraftstore.com"],"managed":true,"expiration":1678657676,"hash":"7e446f65b512d43f0acf1d37591d6566738d445548a45458912fe4733bd6a2b0"}
Dec 31 21:56:40 atl1 caddy[372691]: {"level":"debug","ts":1672523800.114086,"logger":"tls.handshake","msg":"choosing certificate","identifier":"littlecraftstore.com","num_choices":1}
Dec 31 21:56:40 atl1 caddy[372691]: {"level":"debug","ts":1672523800.1141686,"logger":"tls.handshake","msg":"default certificate selection results","identifier":"littlecraftstore.com","subjects":["littlecraftstore.com"],"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"7e446f65b512d43f0acf1d37591d6566738d445548a45458912fe4733bd6a2b0"}
Dec 31 21:56:40 atl1 caddy[372691]: {"level":"debug","ts":1672523800.114208,"logger":"tls.handshake","msg":"matched certificate in cache","remote_ip":"88.99.91.100","remote_port":"52612","subjects":["littlecraftstore.com"],"managed":true,"expiration":1678657676,"hash":"7e446f65b512d43f0acf1d37591d6566738d445548a45458912fe4733bd6a2b0"}


After reload:
Dec 31 21:57:21 atl1 caddy[372691]: {"level":"debug","ts":1672523841.8182104,"logger":"tls","msg":"loading managed certificate","domain":"littlecraftstore.com","expiration":1678657676,"issuer_key":"acme-v02.api.letsencrypt.org-directory","storage":"FileStorage:/caddy"}
Dec 31 21:57:21 atl1 caddy[372691]: {"level":"debug","ts":1672523841.8593132,"logger":"tls.cache","msg":"added certificate to cache","subjects":["littlecraftstore.com"],"expiration":1678657676,"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"7e446f65b512d43f0acf1d37591d6566738d445548a45458912fe4733bd6a2b0","cache_size":4,"cache_capacity":10000}
Dec 31 21:57:21 atl1 caddy[372691]: {"level":"debug","ts":1672523841.859887,"logger":"events","msg":"event","name":"cached_managed_cert","id":"e25375c1-6cb5-4d96-bfa1-6b09114e21e0","origin":"tls","data":{"sans":["littlecraftstore.com"]}}
Dec 31 21:57:21 atl1 caddy[372691]: {"level":"debug","ts":1672523841.8602052,"logger":"tls.handshake","msg":"loaded certificate from storage","remote_ip":"88.99.91.100","remote_port":"52736","subjects":["littlecraftstore.com"],"managed":true,"expiration":1678657676,"hash":"7e446f65b512d43f0acf1d37591d6566738d445548a45458912fe4733bd6a2b0"}

Initial request loaded certificate from storage. I tried subsequent requests from two different IPs, and both of them used cached certificates. Reloading caddy cleared cache and the next request again loaded SSL from storage.

Installed caddy on a clean VM today and copied the config over. It preloads certs immediately after each reload. If I change the hostname config to use on_demand tls, it doesn’t preload the cert until the hostname has had a visit. (Which I guess is the expected behavior).

I’ll mark this resolved and look at alternative cluster storage options.

Was the new VM using S3 for storage? (With the config copied over, I assume it was using the same S3 setup.)

I’m not sure how to explain the difference then. Could you post the full logs instead of just selected lines?

I configured caddy manually on the new VM and used Goofys instead of s3fs. The previous server no longer exists, but I suspect it was loading incorrect config due to s3fs caching.

1 Like

Ohhh, I gotcha now. s3fs is a separate software utility that virtually mounts an S3 bucket, right? THAT makes way more sense. Sorry, I was a bit confused earlier.

Thanks for the follow-up and the thorough investigation!!

@matt Is it recommended to have the same cert on all the servers in a cluster? Does it have some impact on the performance or ssl issues in the browser? I am using syncthing to sync caddy config file across all the servers. Since it’s not a shared storage mount therefore caddy is not aware that it’s in a cluster. So now I have unique cert for each domain on each server. Is it a negative point? All the servers have the same domains and files. User is sent to their nearest server through the load balancer.

Yes, that’s not ideal. If the storage isn’t the exact same, Caddy can’t coordinate certificate operations like stapling OCSP and renewing/obtaining certs. This means that for each instance in your cluster, you are duplicating certificate obtaining. This means that for a cluster of size N you are using Nx more of your rate limits than you need to. For example, Let’s Encrypt has a limit of 5 certs per registered domain per week. So if your cluster is larger than 5, you will start getting errors from Let’s Encrypt and some instances won’t be able to renew their certs.

(In the future, please open a new topic for new questions. Thanks!)

1 Like

@matt yes, s3fs mounts an s3 bucket as a local directory. I found an s3fs alternative, goofys, which has somewhat better response time.

Since nothing beats local storage in terms of speed, I’m trying the following:

  • Mount s3 bucket in a directory and use that for caddy storage
  • Mount a local directory to /caddy/storage/certificates and use inotifywait to sync locally generated certificates with s3, so the other caddy instances will see them.
  • acme and ocsp will be mounted from s3 directory, hence shared with all nodes

I understand the point about rate-limits for new certs, renewals etc, but if ocsp directory is not shared across caddy instance, would that cause any issues?

@rn-nestify Sorry to barge in. I wonder why don’t you use NFS? I understand caddy will cache the config and ssl certs to memory. So even if the NFS server goes then you have enough time to get it back and NFS clients will stay unaffected for the period. Not sure if I am missing something here.

(Just going to chime in and say we’ve had numerous reported issues with NFS mounts.)

Even though Caddy does cache certs in memory, that first requires they are loaded from storage. And if storage is down, then the cert can’t be loaded and there will be downtime.

1 Like

@Dante In my testing NFS was fairly slower than S3. That, along with it being a single point of failure was a dealbreaker for us.

1 Like

Thank you guys! I have tested syncthing with Caddy. It seems to be working fine. Node whichever is hit first will generate / renew SSL certs and syncthing will sync the folder in realtime to all nodes. So it doesn’t matter which node generate or renew SSL certs. Not completely efficient like a shared storage mount but it’s okay for our use case. Also there is no single point of failure here.

This topic was automatically closed after 30 days. New replies are no longer allowed.