1. Caddy version (caddy version
):
v2.2.1 h1:Q62GWHMtztnvyRU+KPOpw6fNfeCD3SkwH7SfT1Tgt2c=
With two plugins: caddy-dns/route53 and a custom/modified version of certmagic-storage-dynamodb.
2. How I run Caddy:
a. System environment:
Amazon Linux 2
b. Command:
caddy run --config /home/ec2-user/caddy.json
c. Service/unit/compose file:
n/a
d. My complete Caddyfile or JSON config:
{
"admin": {
"disabled": true
},
"logging": {
"sink": {
"writer": {
"output": "stdout"
}
},
"logs": {
"log2stdout": {
"writer": {
"output": "stdout"
},
"level": "DEBUG"
}
}
},
"apps": {
"http": {
"servers": {
"websites": {
"listen": [
":443"
],
"routes": [
{
"match": [
{
"host": [
"*.plumpeskimos.com"
]
}
],
"handle": [
{
"handler": "reverse_proxy",
"upstreams": [
{
"dial": "localhost:8080"
}
],
"flush_interval": -1,
"buffer_requests": false
}
],
"terminal": true
},
{
"match": [
{
"not": [
{
"header_regexp": {
"host": {
"name": "disallow_branded_host",
"pattern": ".+\\.plumpeskimos\\.com$"
}
}
}
]
}
],
"handle": [
{
"handler": "reverse_proxy",
"upstreams": [
{
"dial": "localhost:8080"
}
],
"flush_interval": -1,
"buffer_requests": false
}
],
"terminal": true
}
],
"strict_sni_host": true
}
}
},
"tls": {
"automation": {
"policies": [
{
"subjects": [
"*.plumpeskimos.com"
],
"issuer": {
"module": "acme",
"ca": "https://acme-staging-v02.api.letsencrypt.org/directory",
"email": "cory@stam.pr",
"challenges": {
"dns": {
"provider": {
"name": "route53"
}
}
},
"trusted_roots_pem_files": []
},
"storage": {
"module": "dynamodb",
"table": "websites-edge-certificate-storage",
"aws_region": "us-west-2"
},
"on_demand": false
},
{
"issuer": {
"module": "acme",
"ca": "https://acme-staging-v02.api.letsencrypt.org/directory",
"email": "cory@stam.pr",
"challenges": {
"http": {
"disabled": false
},
"tls-alpn": {
"disabled": false
}
},
"trusted_roots_pem_files": []
},
"storage": {
"module": "dynamodb",
"table": "websites-edge-certificate-storage",
"aws_region": "us-west-2"
},
"on_demand": true
}
],
"on_demand": {
"ask": "http://localhost:8080/.stampr/.system/verify-tenant-ssl-provisioning-status"
}
}
}
}
}
3. The problem I’m having:
I have a wildcard being automatically generated and stored in dynamodb storage. Sometimes when I visit the site caddy logs an error “no certificate available” and thus doesn’t serve the site.
The server is started using an AWS AMI with caddy already running so the time between when the AMI was built to when it starts is sometimes a long time.
Deleting the ocsp file/record in storage and restarting caddy always fixes the error. Not sure if it’s a problem with caddy or the storage plugin.
4. Error messages and/or full log output:
/home/ec2-user/.pm2/logs/proxy-out.log last 15 lines:
{"level":"debug","ts":1606633449.010966,"logger":"http","msg":"starting server loop","address":"[::]:80","http3":false,"tls":false}
{"level":"info","ts":1606633449.0113878,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["*.plumpeskimos.com"]}
{"level":"info","ts":1606633449.4136097,"logger":"tls","msg":"cleaned up storage units"}
{"level":"info","ts":1606633455.3049645,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc0005101c0"}
2020/11/30 22:53:16 [INFO] Redirected sink to here (stdout)
{"level":"info","ts":1606776796.9810793,"logger":"http","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"websites","https_port":443}
{"level":"info","ts":1606776796.981112,"logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"websites"}
{"level":"info","ts":1606776796.9841342,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc0002c0150"}
{"level":"debug","ts":1606776796.9851458,"logger":"http","msg":"starting server loop","address":"[::]:443","http3":false,"tls":true}
{"level":"debug","ts":1606776796.98519,"logger":"http","msg":"starting server loop","address":"[::]:80","http3":false,"tls":false}
{"level":"info","ts":1606776796.9851973,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["*.plumpeskimos.com"]}
{"level":"error","ts":1606776976.9870756,"logger":"tls","msg":"job failed","error":"*.plumpeskimos.com: obtaining certificate: failed storage check: RequestCanceled: request context canceled\ncaused by: context deadline exceeded - storage is probably misconfigured"}
{"level":"info","ts":1606777036.9947038,"logger":"tls","msg":"cleaned up storage units"}
{"level":"debug","ts":1606777934.6477273,"logger":"http.stdlib","msg":"http: TLS handshake error from 167.248.133.38:42308: read tcp 10.2.1.238:443->167.248.133.38:42308: read: connection reset by peer"}
{"level":"debug","ts":1606778004.7133768,"logger":"http.stdlib","msg":"http: TLS handshake error from 167.248.133.38:41938: error checking http://localhost:8080/.stampr/.system/verify-tenant-ssl-provisioning-status to determine if certificate for hostname '10.2.1.238' should be allowed: Get \"http://localhost:8080/.stampr/.system/verify-tenant-ssl-provisioning-status?domain=10.2.1.238\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
/home/ec2-user/.pm2/logs/proxy-error.log last 15 lines:
{"level":"info","ts":1606633454.8046432,"msg":"shutting down apps then terminating","signal":"SIGTERM"}
{"level":"info","ts":1606633454.944,"msg":"shutting down","signal":"SIGINT"}
{"level":"info","ts":1606633455.3049645,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc0005101c0"}
{"level":"info","ts":1606633455.3050168,"msg":"shutdown done","signal":"SIGTERM"}
{"level":"info","ts":1606776796.976701,"msg":"using provided configuration","config_file":"/home/ec2-user/caddy.json","config_adapter":""}
{"level":"warn","ts":1606776796.9792683,"logger":"admin","msg":"admin endpoint disabled"}
2020/11/30 22:53:16 [INFO] Redirecting sink to: stdout
{"level":"info","ts":1606776796.9810793,"logger":"http","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"websites","https_port":443}
{"level":"info","ts":1606776796.981112,"logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"websites"}
{"level":"info","ts":1606776796.9841342,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc0002c0150"}
{"level":"info","ts":1606776796.9851973,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["*.plumpeskimos.com"]}
{"level":"info","ts":1606776857.035941,"msg":"autosaved config","file":"/home/ec2-user/.config/caddy/autosave.json"}
{"level":"info","ts":1606776857.0359654,"msg":"serving initial configuration"}
{"level":"error","ts":1606776976.9870756,"logger":"tls","msg":"job failed","error":"*.plumpeskimos.com: obtaining certificate: failed storage check: RequestCanceled: request context canceled\ncaused by: context deadline exceeded - storage is probably misconfigured"}
{"level":"info","ts":1606777036.9947038,"logger":"tls","msg":"cleaned up storage units"}
Then I try to make a request:
{"level":"debug","ts":1606778935.7608654,"logger":"http.stdlib","msg":"http: TLS handshake error from 98.173.45.78:52018: no certificate available for 'whatever2.plumpeskimos.com'"}
{"level":"debug","ts":1606778935.8069701,"logger":"http.stdlib","msg":"http: TLS handshake error from 98.173.45.78:52019: no certificate available for 'whatever2.plumpeskimos.com'"}
5. What I already tried:
My knowledge of ocsp is superficial, but it might be related to ocsp somehow.
I don’t have concrete proof, but it seems like deleting the “ocsp/wildcard_.plumpeskimos.com-8d4312c0” file/record and restarting caddy always resolves the problem for that server. (No idea about an environment with multiple servers.)
Maybe a timeout? Evidence: I fixed the problem on one instance => terminated it => started a new instance and it was still working. It might’ve been because the AMI was just built.
The error about “context deadline exceeded” seems to also add more weight behind a timeout.
My thoery: It seems like this has something to do with the fact the AMI was created with caddy already running and the ocsp value it created on start while the AMI was being built is then no longer valid when a new instance is launched with that AMI after a certain amount of time.
Any ideas?
6. Links to relevant resources:
None