Caddy 0.11 Will Have Telemetry - discuss

(Matt Holt) #43

No – of course not. That’s contrary to everything we’ve been saying from the beginning. Telemetry can be disabled! It is entirely optional (literally, it is an option).

You will have the ability to make that decision, of course – you’re the sysadmin, you have the ability to control what programs you run with what configuration you want. This isn’t a hosted platform or a social network where we’re forcing you one way or the other behind a walled garden!

(rugk) #44

I just wanted to highlight that a compile-time opt-out is no opt-out for the server admin. Not all server admins use your pre-compiled binaries or compile it themselves. :smile:

(Matthew Fay) #45

I’m not sure I understand what this will achieve. The UUID is not derived from, nor can it reveal, any identifying information about the server it came from - except for the fact that it’s tied to the data it came with.

Hashing it doesn’t stop someone from requesting all data for a given UUID, it doesn’t protect any sensitive secrets or credentials from the database owner (as the UUID is neither), and it doesn’t stop the database owner from separating out any discrete instance of Caddy.

I suppose you could argue that the database owner can’t take the hash and put straight in to the front-end of the metrics site, but why would they do that when they can just look at the data in the database?

UUIDs are equally meritorious as hashed UUIDs or any other sufficiently random token for authentication purposes.

I’d argue that the data isn’t actually related to the site at all, but to Caddy itself. These aren’t web logs - they’re aggregated Caddy response latencies, MITM counts, TLS implementation, etc. across the entire web server. Once they’re aggregated locally, they’re indistinguishable.

So if someone higher up dictates that these stats can’t be included in the aggregate, I’d say that the entire server should be telemetry-disabled. There’s no real difference between the stats from a safe site and the stats from a sensitive site, so it would be best for it all to go off.

I don’t want to speak for Matt, but when I refer to the server admin, I mean the person ultimately responsible for the server, with the power to make decisions about what software runs on it. In this case, the person who provisions the server and puts the Caddy binary on it is the server admin, and they have the ability to choose whether they want telemetry. If I were just logging on to administer the web server, I’d be the web admin.

In the end, everyone either uses pre-compiled binaries or compiles it themselves.

(rugk) #46

As for hashing: Yes, maybe it does indeed not help much.

Are you kidding? There is the choice to use it from distro authors. And such a thing is always better than both options you mention, as they do not provide automatic updates.
So for security reasons (actually I thought that’s a topic for Caddy) you should really support apt/dnf installation (from distros).

And as such, it must still be configurable by the server admin. They can chooses the software, yes, but they are likely to deliberately choose the distro’s version for installing. And if they do so, they still want to be able to opt-out.
Otherwise that is no real opt-out and the opt-out is worth nothing.

(boxofrox) #47

@matt, being privacy-oriented myself, I’d prefer opt-in, but understand the benefit opt-out offers and appreciate the discussion you’ve initiated in the pursuit of transparency.

  1. When the list of collected metrics is published, I may find that acceptable and not opt-out of telemetry on my Caddy service, but as all things are subject to change, do you have or [intend to have] a policy in place for clearly documenting/announcing changes to the dataset collected by telemetry?

    When reviewing the list of collected metrics, at some future time, I would particularly like to see which version of Caddy introduced the collection of that metric. Of course, they’d all mention 0.11 in the first release, but over time, this information might be useful, particularly for researchers in order to limit the scope of data mining to those versions of caddy that actually collected a particular metric of interest.

    I would find it unsatisfactory to just toss the new metrics in the documentation and leave it as an exercise for the user to determine which were added (or removed if that’s a possibility).

    I imagine these changes would already be included in the release notes, which is grand.

    One other aspect is that fastidious adherence to such a policy will demonstrate a commitment to remaining transparent with regards to telemetry. It might also be worth mentioning such a policy in the documetation for telemetry. Food for thought.

  2. Will it be possible for server admins to offload the telemetry to an additional destination of their choosing (e.g. file, syslog, elasticsearch, etc)?

    In the spirit of transparency, this provides another avenue to audit the data collection and confirm that nothing more is collected than documented. In case, you know, your evil twin supplants you and intentionally conceals the collection of new metrics, both in the documentation and in the published telemetry data. I’m sure someone would eventually find those new metrics in the source code, but I think reviewing the data sent to the Caddy Telemetry Collection Service would foster more participation than code review.

    Of particular benefit, admins can automate that portion of telemetry collection where they’d otherwise have to download the telemetry digests from your service.

Thanks for the invitation to discuss this new feature.

(Matthew Fay) #48

No, I’m not kidding. Unless your package manager includes Golang with Caddy and compiles it for you on your computer as part of the install process, your distro is providing a pre-compiled binary.

(Matt Holt) #49

First, thanks for your thoughtful reply.

Before 1.0, wasn’t planning on anything too formal, but I do feel committed to detailing the changes in telemetry metrics with each release. And as for when a metric was introduced, this can be easily inferred by correlating Caddy version with the metric’s existence in the telemetry data; or even simpler, GitHub’s nice git blame view is handy for that kind of stuff.

We’re planning on advanced data export features in the future. Not even so much with transparency in mind, but just for making it easier to process the data you care about. (The less work I have to do, the better!)

(Abiola Ibrahim) #50

I do not see anything wrong with opt-out, as long as there is a way for users to disable it. In fact, most software that I am aware of (that does this) are opt-out, because that really is what they want.

My only concern is users not being aware they are sending data but I have reasons to believe Caddy can handle this well. The fact that this discussion is taking place is one of them.

Even Firefox, the privacy first browser, is opt-out.

(rugk) #51

Of course. And AFAIK as I understand you, now they (i.e. the distro maintainers/package manager) have to (can) decide whether to enable telemetry or not.
That’s still not want I’d call an opt-out. Taking your prime example Firefox, you’ll see, that they also provide an runtime opt-out and do not (only) offer an opt-out for compiled binaries.

Actually it is not, as you can see above. Users in this context are server admins, I say. But it is planned to make it possible to disable not for users, but for the compilers (i.e. Linux distro or so). That is a fundamental difference.