Proposal: embrace prometheus for metrics

I’ve been using the prometheus plugin for a while, and it is pretty useful. There are a few limitations though that I think are unfortunate:

  1. It is limited in what it can collect, since it is implemented as a middleware. There is potentially a bunch of useful data from caddy core that could be added, but which a plugin does not have easy access to.

  2. It is not clear if / how other plugins can add their own metrics to be collected via /metrics.

The more I have thought about these problems, the more I feel the solution is simple: make prometheus the “official” metrics collection point for caddy and its plugins. Specifically, any package that wishes to record metrics should use the github.com/prometheus/client_golang/prometheus package.

Prometheus uses a central metrics registry, so no matter where things are added from, they will be available via the /metrics endpoint if you use the Prometheus plugin.

This will allow easily for:

  • Caddy core to add useful metrics. Perhaps useful info on certificate issuance / renewal / errors. Perhaps lower level networking things than are available to a middleware. I think there are a lot of good possibilities here.

  • Other plugins to add data. I can think of quite a few things I’d like to see from various plugins. Auth things can record success / failure counts. I think plugin authors would come up with quite a few useful things if they were using any monitoring.

Anyone have any thoughts or opinions on this?

What is the cost?

I think the only negative impact is the cost of including the prometheus client package by default. It doesn’t take a ton of memory, as it only stores a few counters and things. Whether or not we are scraping metrics from /metrics, we would always be collecting and storing metric data, which has a small cost, but I think extremely negligible.

Does this prohibit additional future metrics plugins (graphite/influx/opentsdb/whatever)? I don’t think so. It would be fairly straightforward to write a plugin that traverses the prometheus registry and exports to some other format. Or use an external app to do the same thing.

What does caddy need to do?

Immediately, maybe nothing. It might make sense to make the prometheus plugin a built-in one. Other plugins can currently add metrics, but likely don’t because it is still a separate thing. Over time we can add useful metrics into the caddy core as appropriate.

i can see the value in this. We’ve started discussing implementing telemetry for research purposes. Does Prometheus do data push, so that each instance doesn’t have to register with some central aggregation server?

That’s correct. It’s entirely passive. Just supply the correct endpoint and it is up to some other app somewhere to request it periodically.

As an example: here is a branch I have that adds a certificate expiration metric as caddy checks for renewals:

https://github.com/mholt/caddy/compare/master...captncraig:certexpire?expand=1

That’s really all that needs to be done to record new metrics.