Clean up Caddy Certificates

Running Caddy Version 2.3.0 inside of Docker on AWS Fargate, connected with EFS (an AWS Network Filesystem) for Certificate storage and an “ask” backend for checking if we serve a given domain and then obtaining certificates from LE. Similar to this question: How Caddy manages certificates (when not in use; catch-all host - few K domains; delete/renew)?

Matt answered here that he is planning to implement cleanup of old certificates

There’s a TODO to remove them automatically, like Caddy does for old OCSP staples. I should spend a few hours and finish that up, but I’d welcome a contribution as well.

the question is: is this already done? If not: Can I just write a script that deletes certificate files from the EFS if they are outdated and we no longer serve that given domain?

Here the dockerfile, we use to build caddy (In case it is relevant which modules we use):

ARG GO_VERSION="1.14.9"
FROM golang:${GO_VERSION}-alpine AS builder
ARG CADDY_VERSION="2.3.0"
ARG XCADDY_VERSION="0.1.5"
RUN apk add --no-cache git ca-certificates

RUN wget -O xcaddy.tar.gz "https://github.com/caddyserver/xcaddy/releases/download/v${XCADDY_VERSION}/xcaddy_${XCADDY_VERSION}_linux_amd64.tar.gz"; \
    tar x -z -f xcaddy.tar.gz -C /usr/bin xcaddy; \
    chmod +x /usr/bin/xcaddy;

COPY tls-insecure/ /usr/local/go/src/tls-insecure/

RUN /usr/bin/xcaddy build v${CADDY_VERSION} \
    --output /usr/bin/caddy \
    --with tls-insecure

FROM alpine:3.12

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

Yes:

Hmm, this function is called in CleanStorage, but the only place where CleanStorage is in the codebase is the function definition itself.

Do I need to enable it somehow? Might it have problems in combination with the ask-backend?

"tls": {
			"automation": {
				"on_demand": {
					"ask": "http://127.0.0.1:8080/ask"
				},
				"policies": [
					{
						"issuers": [
							{
								"email": "{env.EMAIL}",
								"module": "acme"
							}
						],
						"on_demand": true
					}
				]
			},
			"certificates": {
				"load_folders": [
					"/certs"
				]
			}
		}

Nope! Caddy calls this function for you; it’s not currently called in CertMagic itself.

Hmm. it somehow does not work.

e.g. one domain (academy.***.de) is disabled since december, therefor the ASK-Backend returns 404:

ubuntu@ip-10-0-188-167:/mnt/ent/certificates/acme-v02.api.letsencrypt.org-directory/academy.****.de$ curl -v http://10.0.5.202:8080/ask?domain=academy.****.de
*   Trying 10.0.5.202:8080...
* TCP_NODELAY set
* Connected to 10.0.5.202 (10.0.5.202) port 8080 (#0)
> GET /ask?domain=academy.****.de HTTP/1.1
> Host: 10.0.5.202:8080
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Content-Length: 0
< Content-Type: application/octet-stream
< Date: Tue, 09 Feb 2021 18:07:01 GMT
< Server: Caddy
< Server: awselb/2.0
<
* Connection #0 to host 10.0.5.202 left intact

but the certificate got renewed yesterday and caddy serves this domain. How long are the results of the ask backend cached?

openssl x509 -startdate -noout -in academy.****.de.crt
notBefore=Feb  8 09:58:52 2021 GMT

Certificates that have been expired for 14 days will get deleted. If Caddy recently renewed it, it won’t get deleted. To stop renewing it, remove that domain from Caddy’s config – with on_demand, clients will have to stop accessing your server using that domain because on-demand TLS maintains certificates as they are used/needed. Does that make sense?

Yes, it makes sense, however as we use the ASK backend, we assumed that it tells caddy to serve the given domain and thus if no longer present would not renew the certificate as it should not serve the domain at all anymore.

The only other solution, I see would be deleting the certificates from the storage via a script if a domain got disabled.

Here’s the Caddyfile

{
	"apps": {
		"http": {
			"servers": {
				"mainserver.de": {
					"listen": [":443"],
					"routes": [{
						"handle": [...],
						"match": [...],
						"terminal": true
					},{
						"handle": [{
							"handler": "subroute",
							"routes": [{
								"handle": [{
									"handler": "headers",
									...
								},{
									"handler": "reverse_proxy",
									"transport": {
										"protocol": "http",
										"tls": {
											"insecure_skip_verify": true
										}
									},
									"upstreams": [{
										"dial": "{env.ENDPOINT}"
									}],
									"handle_response": [...]
								}
								]}
							]
						}],
						"terminal": true
					}],
					"tls_connection_policies": [{
						"cipher_suites": [...]
					}]
				},
				"status-check": {
					"listen": [":8080"],
					"routes": [{
						"handle": [{
							"body": "OK!",
							"handler": "static_response",
							"status_code": 200
						}],
						"match": [{
							"path": ["{env.STATUS_ROUTE}"]
						}]
					},{
						"handle": [{
							"handler": "reverse_proxy",
							"headers": {
								"request": {
									"add": {"X-Cloud": ["{env.CLOUD}"]}
								}
							},
							"upstreams": [{"dial": "{env.TLS_ASK}"}]
						}],
						"match": [{
							"path": ["/ask"]
						}]
					}]
				}
			}
		},
		"tls": {
			"automation": {
				"on_demand": {
					"ask": "http://127.0.0.1:8080/ask"
				},
				"policies": [{
					"issuers": [{
						"email": "{env.EMAIL}",
						"module": "acme"
					}],
					"on_demand": true
				}]
			},
			"certificates": {
				"load_folders": [
					"/certs"
				]
			}
		}
	},
	"logging": {...}
}

edit: the storage is added via a script:

echo $(jq '.storage = {}' $CONFIG_FILE) > $CONFIG_FILE
echo $(jq '.storage.module = "file_system"' $CONFIG_FILE) > $CONFIG_FILE
echo $(jq ".storage.root = \"$STORAGE\"" $CONFIG_FILE) > $CONFIG_FILE

The ask endpoint is only queried during TLS handshakes, and not periodically.

What happens is that if no certificate exists yet for a domain, and you have on demand enabled, Caddy will query your ask endpoint to find out if it should get one (along with a bunch of other checks). If that passes, it’ll attempt to issue a cert.

As long as that cert is not expired Caddy will continue to use that certificate for any request for that domain.

Once in the renewal window (which is the last 1/3 of the certificate’s life, which for 90-day certs is the last 30 days, so after 60 days of issuance), then the next time Caddy receives a request for that domain, it will again query the ask endpoint to find out if it should still manage it; if so, it’ll attempt to renew it in the background while returning the still-valid certificate for the handshake.

If the ask endpoint then says no when it comes time for renewal, then Caddy will wipe the certificate from its cache and return an error to the client (i.e. fail the handshake). So this means if the certificate is within the last 30 days of its life, it could still remain on disk until the end of those 30 days when the cleanup routine will cull them due to being expired.

Below is the code that does this, it should be pretty easy to follow the comments to understand how it works (that’s what I just did to write this).

Is there a specific reason you need it to be cleaned up early? I don’t really understand the concern here.

if it checks the ask endpoint before reobtaining, everything is fine, however, I was not sure if it does.

Are you able to find all the logs from Caddy for the past idk, month or so I suppose, which reference the academy.****.de domain with grep? Might help us better understand the history.

Yeah, ultimately, it’s up to your config to prevent a cert from being renewed, although, I think, given recent improvements in on-demand TLS, we can do better in a few ways:

  • Currently on-demand certs are renewed in the background with other ones, if they are already in memory. We can skip maintaining on-demand certs in the background renewal routine and let handshakes trigger maintenance instead.
  • By doing maintenance in the handshakes, the ask endpoint will be invoked before all renewals of on-demand certs.
1 Like

This commit should improve on this situation: Don't maintain on-demand certs in background · caddyserver/certmagic@d2311e1 · GitHub

This used to be CertMagic’s original behavior, but it was changed temporarily as a performance improvement and inadvertently not changed back after the performance problem was fixed.

1 Like

ah, so do I understand it correctly, that it indeed is/was a bug that old certificates get renewed without reasking the ask-backend?

When will it be part of a release?

At the moment we build like this inside a Docker Container:


RUN wget -O xcaddy.tar.gz "https://github.com/caddyserver/xcaddy/releases/download/v${XCADDY_VERSION}/xcaddy_${XCADDY_VERSION}_linux_amd64.tar.gz"; \
    tar x -z -f xcaddy.tar.gz -C /usr/bin xcaddy; \
    chmod +x /usr/bin/xcaddy;

COPY tls-insecure/ /usr/local/go/src/tls-insecure/

RUN /usr/bin/xcaddy build v${CADDY_VERSION} \
    --output /usr/bin/caddy \
    --with tls-insecure

You could use the official Caddy builder image instead, see https://hub.docker.com/_/caddy:

FROM caddy:{$CADDY_VERSION}-builder AS builder

RUN xcaddy build \
    --with path/to/your/plugin

FROM caddy:{$CADDY_VERSION}

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

This topic was automatically closed after 30 days. New replies are no longer allowed.