Caddy 0.11 Will Have Telemetry - discuss

@omz13 I was expressing a political opinion – nothing more. :slight_smile:

@pwhodges Thanks for your comments, I think they’re reasonable – and I appreciate you taking the time to write them!

I will take these into consideration as we wrap up the initial release here soon.

Took some time for my reply, sorry.

Okay, good point. But I have an easy solution:
Just hash the UUID on the telemtry server (with a strong password hash function)! Then you can still correlate the data, but unless someone submits their UUID to you, you cannot correlate the data to that single server. :smiley:

That’s your view, but not the one of the server admins. Say the have banking-secure.example and banking-website.example. They may enable telemtry for the second, but possibly they are disallowed by law to enable stat submissions for the first. Maybe they e.g. do not even are alloowed to keep logs for the first, etc. (This is just an example.)

It’s still opt-out, so no skew. Only few users may . This is a slight change, but if you would want 100% unbiased data, you would have to not include the possibility to opt-out. And you know, this is not a good thing. So keep the opt-out! And make it possible to disable for the server admin:

Depends:

  • Must-have: Opt-out at server admin level. (whoever compiles the software – distro owner or so – should not have the control over whether I – the server admin – want to submit telemetry or not. At least I should be able to override their decision, at least in the negative way to always be able to disable it.)
  • Nice-to-have: Opt-out per domain.

I just think these two can quite easily be combined, by adding a config option. :smiley:


So basically you ant to force users to have telemetry enabled? That’s stupid and will likely result in a fork or so. You don’t want that…
There is no alternative to runtime-opt-out . I know opt-in skews the statistics (and that’s the only thing you write in your blog), but opt-out not soo much as it seems. Remember that even Firefox and stuff provide a runtime opt-out! It would be silly for Mozilla to say: Hey you Firefox from your distro, then telemetry is always disabled. You got it from us? Then telemetry is always enabled.

The emoji does not make that polemic and incorrect statement any better. Again to explain: The GDBR applies to Euopean citizens and their data. So when you process/offer a product for Europeans, that applies. Of course, it does not apply to US customers or so.

Thwey would get the monesy back from you, that’s not hard. The only thing would be whether they’d care to really enforce that law for such a small project as Caddy.
But even without any legal requirement… let’s ignore the GDPR (and any potential legal threat) for a moment: It does not make a difference. You should still do it exactly the same, anonymize data, aggregate data.

Actually, when you as I just came up with the ID and hash the data, you could answer/implement such requests. User can enter their own UUID, server hashes it and get all the data.
That does not even need authentication as the UUID (if properly used with a rate-limiting/password hashing) acts as the authentication in this case.
So I think that may be a good idea. Especially, as, GDPR again, also includes the right for users to lookup their own data. So if this is done, users could do that. :smiley:

As explained that is not the case. Of course also a compile-time option skews the data, so I’d say this is the “skewness” scala:

  • no option to opt-out: no data skewing, but as users are lost due to fork actually also a kind of wrong data
  • opt-out at compile time: say 10%-15% data skewing (percent does not mean anything, just for the relations) – many distros will likely disable that by default
  • opt-out at compile time & runtime: 15%-20% data skewing
  • opt-out at runtime: 5%-10% data skewing
  • opt-in: likely >60% data skewing

That’s how I would estimate it. Based on how much users/maintainers/installations would likely disable that feature.
Thing is we have no facts here and never can, because to get out how many installations have that feature disabled, we’d need tracking again. So that does not work. One can only guess.

I mean if you have 3 million users, where 5000 do not send telemetry it also does not matter. Statistics always have some discrepancy, but you can neglect it when you do opt-out IMHO. Few users will adjust such a setting (especially if they feel that you care about privacy and that telemetry is actually useful).


Also you seem to ask about opt-in vs opt-out at compile time. I say opt-out at runtime. Is not that a reasonable compromise?
You see many users would argue for opt-in. I see your point that this skews data, but having an opt-out at compile time is not even worth the word “opt-out” (because it’s the wrong people who opt out. I am the server admin, this is my data. So I need an opt-out, not someone who wants to decide that for me.)

No – of course not. That’s contrary to everything we’ve been saying from the beginning. Telemetry can be disabled! It is entirely optional (literally, it is an option).

You will have the ability to make that decision, of course – you’re the sysadmin, you have the ability to control what programs you run with what configuration you want. This isn’t a hosted platform or a social network where we’re forcing you one way or the other behind a walled garden!

I just wanted to highlight that a compile-time opt-out is no opt-out for the server admin. Not all server admins use your pre-compiled binaries or compile it themselves. :smile:

I’m not sure I understand what this will achieve. The UUID is not derived from, nor can it reveal, any identifying information about the server it came from - except for the fact that it’s tied to the data it came with.

Hashing it doesn’t stop someone from requesting all data for a given UUID, it doesn’t protect any sensitive secrets or credentials from the database owner (as the UUID is neither), and it doesn’t stop the database owner from separating out any discrete instance of Caddy.

I suppose you could argue that the database owner can’t take the hash and put straight in to the front-end of the metrics site, but why would they do that when they can just look at the data in the database?

UUIDs are equally meritorious as hashed UUIDs or any other sufficiently random token for authentication purposes.

I’d argue that the data isn’t actually related to the site at all, but to Caddy itself. These aren’t web logs - they’re aggregated Caddy response latencies, MITM counts, TLS implementation, etc. across the entire web server. Once they’re aggregated locally, they’re indistinguishable.

So if someone higher up dictates that these stats can’t be included in the aggregate, I’d say that the entire server should be telemetry-disabled. There’s no real difference between the stats from a safe site and the stats from a sensitive site, so it would be best for it all to go off.

I don’t want to speak for Matt, but when I refer to the server admin, I mean the person ultimately responsible for the server, with the power to make decisions about what software runs on it. In this case, the person who provisions the server and puts the Caddy binary on it is the server admin, and they have the ability to choose whether they want telemetry. If I were just logging on to administer the web server, I’d be the web admin.

In the end, everyone either uses pre-compiled binaries or compiles it themselves.

2 Likes

As for hashing: Yes, maybe it does indeed not help much.

Are you kidding? There is the choice to use it from distro authors. And such a thing is always better than both options you mention, as they do not provide automatic updates.
So for security reasons (actually I thought that’s a topic for Caddy) you should really support apt/dnf installation (from distros).

And as such, it must still be configurable by the server admin. They can chooses the software, yes, but they are likely to deliberately choose the distro’s version for installing. And if they do so, they still want to be able to opt-out.
Otherwise that is no real opt-out and the opt-out is worth nothing.

@matt, being privacy-oriented myself, I’d prefer opt-in, but understand the benefit opt-out offers and appreciate the discussion you’ve initiated in the pursuit of transparency.

  1. When the list of collected metrics is published, I may find that acceptable and not opt-out of telemetry on my Caddy service, but as all things are subject to change, do you have or [intend to have] a policy in place for clearly documenting/announcing changes to the dataset collected by telemetry?

    When reviewing the list of collected metrics, at some future time, I would particularly like to see which version of Caddy introduced the collection of that metric. Of course, they’d all mention 0.11 in the first release, but over time, this information might be useful, particularly for researchers in order to limit the scope of data mining to those versions of caddy that actually collected a particular metric of interest.

    I would find it unsatisfactory to just toss the new metrics in the documentation and leave it as an exercise for the user to determine which were added (or removed if that’s a possibility).

    I imagine these changes would already be included in the release notes, which is grand.

    One other aspect is that fastidious adherence to such a policy will demonstrate a commitment to remaining transparent with regards to telemetry. It might also be worth mentioning such a policy in the documetation for telemetry. Food for thought.

  2. Will it be possible for server admins to offload the telemetry to an additional destination of their choosing (e.g. file, syslog, elasticsearch, etc)?

    In the spirit of transparency, this provides another avenue to audit the data collection and confirm that nothing more is collected than documented. In case, you know, your evil twin supplants you and intentionally conceals the collection of new metrics, both in the documentation and in the published telemetry data. I’m sure someone would eventually find those new metrics in the source code, but I think reviewing the data sent to the Caddy Telemetry Collection Service would foster more participation than code review.

    Of particular benefit, admins can automate that portion of telemetry collection where they’d otherwise have to download the telemetry digests from your service.

Thanks for the invitation to discuss this new feature.

2 Likes

No, I’m not kidding. Unless your package manager includes Golang with Caddy and compiles it for you on your computer as part of the install process, your distro is providing a pre-compiled binary.

First, thanks for your thoughtful reply.

Before 1.0, wasn’t planning on anything too formal, but I do feel committed to detailing the changes in telemetry metrics with each release. And as for when a metric was introduced, this can be easily inferred by correlating Caddy version with the metric’s existence in the telemetry data; or even simpler, GitHub’s nice git blame view is handy for that kind of stuff.

We’re planning on advanced data export features in the future. Not even so much with transparency in mind, but just for making it easier to process the data you care about. (The less work I have to do, the better!)

1 Like

I do not see anything wrong with opt-out, as long as there is a way for users to disable it. In fact, most software that I am aware of (that does this) are opt-out, because that really is what they want.

My only concern is users not being aware they are sending data but I have reasons to believe Caddy can handle this well. The fact that this discussion is taking place is one of them.

Even Firefox, the privacy first browser, is opt-out.

2 Likes

Of course. And AFAIK as I understand you, now they (i.e. the distro maintainers/package manager) have to (can) decide whether to enable telemetry or not.
That’s still not want I’d call an opt-out. Taking your prime example Firefox, you’ll see, that they also provide an runtime opt-out and do not (only) offer an opt-out for compiled binaries.

Actually it is not, as you can see above. Users in this context are server admins, I say. But it is planned to make it possible to disable not for users, but for the compilers (i.e. Linux distro or so). That is a fundamental difference.

My 2c:

In business, telemetry is considered a potential threat vector. If you choose a default opt-in posture, it must be easily disabled or businesses will look for other compliant server software.

I work with individuals using technology in oppressive regimes. What you’re proposing, if not handled carefully, could literally have people imprisoned or murdered by their governments.

We are just now embarking on a global debate over privacy. Respectfully, to implement default telemetry now … is a slap in the face to many of us.

The server admin is ultimately responsible for installing the software on his server.

Since Caddy isn’t officially packaged for distros (and there’s a number of obstacles to overcome in that regard), the installation process involves retrieving a binary and placing it in the path, or compiling it and doing the same, or trusting that the unofficial distro package satisfies your requirements.

The server admin can retrieve a binary from anywhere they like, including the Caddy build server, and there will be options for that admin to select a binary that is telemetry-disabled.

Regardless of whether you treat your servers like cattle or like pets, I think that the above is a very reasonable situation. It’s not difficult under any circumstance for the user to choose.

While I’m likely not representative of the average server admin with regards to Go usage, I personally consider compiling Caddy to be so simple as to be a negligible step to get exactly what you want. Scriptable, probably, in 10 lines or less; I consider it less effort than even writing most of my Caddyfiles.

So the last major benefit I see to run-time opt-out is that you don’t have to cross your fingers that your distro’s unofficial package opts to disable telemetry. I’m weighing that against the downside stated above, which is a loss of reliability in how representative the metrics are, and I don’t think it’s worth it; I just don’t see the friction here, it just seems too easy to get what you want even without a run-time toggle.

(The above is, of course, only my own opinion.)


@caddyhello: Good points, always important to keep in mind. I’m sad to hear that it’s so much as a slap in the face; certainly I’d love to see a simple list enumerating all the aggregated statistics the telemetry is planned to collect, so that it can be plainly seen and discussed which, if any, of those metrics could be dangerous. I dare say we all agree that we’d prefer to have usable data which is not capable of creating any danger at all.

1 Like
  • Matt mentioned it here, but is there a link to details or how one can compile Caddy custom binaries with telemetry disabled ? I always compile my own Caddy binaries for testing.
  • Have there be been tests to compare performance of Caddy with vs without telemetry ?
  • Will these tests be done every time a new Caddy version is released to ensure there are no performance regressions ?
1 Like

I understand that since progress is still very much underway, there’s no official documentation or such yet, but you’ll want to toggle this enableTelemetry variable in caddy/caddymain/run.go. (It affects this code block which initialises the telemetry.)

Not sure on tests. I imagine it would have (little to) no effect on request rate, it’s really just the check-in and hand-off on an interval where Caddy’s doing much it doesn’t already do. But there should definitely be tests.

1 Like

Depends on what is official, but actually it is: Caddy is packaged on Fedora and CentOS.

So this assumption is wrong. Maybe that’s why you resist to making a runtime configuration.

All text that follows bases on this assumption, so no: The server admin can use it from a distro. That will never change and when Caddy get’s more popular maybe it will get into even more distro’s.

Edit: Actuarial used a different site with more information, so it is packaged in these big “distros”:

  • Alpine Linux
  • Chocolatey (Windows)
  • EPEL 7
  • Fedora
  • FreeBSD

So actually quite a lot…

  • Gentoo
  • Homebrew (MacOS)

It’s not an assumption, rather a fact, and as you note, it’s a matter of the definition of “official”; to clarify, the developers have not officially made the Caddy software available through any package managers at all. Any Caddy package you find, as of this writing, is unofficial, provided directly by distro maintainers or volunteers based on the Apache 2.0 license the source code comes under.

It is something they want to get around to eventually, though (this thread is relevant: Packaging Caddy - #127 by carlwgeorge).

As for the reason why I resist, that’s not a mystery either - the closing statement of my last reply to you succinctly outlines why I don’t see the benefit of run-time configuration worth the downside, but to re-summarize: the benefit is small, and the downside is large.

I would contend with this statement as well; the fact that binaries exist for distribution (even if they were officially provided and maintained) via popular package managers does not preclude, in the slightest, the ease with which one can retrieve a binary from anywhere else (including any other unofficial package).

That’s the concept of distribution packages. Upstream can (and should) of course help, but basically it is the downstream (the packager for the distro), who packages the software.
That’s why all distro packages are unofficial, but actually you should not care. At least not imply that this is bad. This is just how it works!

You have never explained that. You never gave any stats for that.

And again, you never responded to my argument that no other software (to some extend – even Windows) disallows the user of a software to configure a telemetry setting at runtime.
So you would set q quite bad precedent in the FLOSS world at least. I doubt you want that.

Hell, yes, but who cares? No, your way of installing Caddy is not “the best”. It just is not! When users want to install it via proven standard ways of installing software on Linux (distro system packages), they can do so and should not have ridiculous disadvantages, such as not having a way to opt-out or opt-in (yes, in case a distro disables telemetry by default a runtime config can actually help you to get more stats!).

If you want to punish users, who do not follow your shiny “I download random binaries from the internet/shell and put them onto my system” install method, then sure, do so, but don’t complain if you get users in rage then.

And there are quite enough advantages of using system distros as it has been discussed in your linked thread already.
So don’t disregard people who deliberately choose to install it via system distros. They want to do so and they want to configure telemetry. And if it is a very hidden setting, they want to have a way…

I hope you can forgive the implication that package managers are bad. It was not my intention.

No, I guess I didn’t. I’ll address it by quoting from earlier in the thread, though:

Without measuring it, we simply don’t know how representative the data we have is (I’m getting dangerously close to tautology here).

My understanding is that it’s difficult to quantify the statistical significance of what’s lost, because by its very nature it becomes an unknown. But you’re right in that nobody’s given any stats in Caddy’s case. I don’t think there are any yet. I’m happy to be contradicted here if anyone’s got more information.

Sorry, I didn’t see where you made that point earlier. To take your example and run with it, Windows lets you configure (to a very limited extent) the level of data-gathering. In case you missed part of the announcement post, on top of the capability to totally opt-out from the outset, Caddy also plans to do this. Here’s the relevant section (emphasis mine):

You will be in control of your telemetry: you may always choose to not participate in it. In fact, the telemetry server has the ability to remotely disable (but NOT enable!) telemetry in Caddy instances at any time if deemed necessary. It can also disable certain metrics if that is needed.

https://caddyserver.com/blog/caddy-0_11-telemetry#your-controls-and-privacy

I didn’t make this assertion, and nor did you, so it would be asinine of me to remind you in turn that your own way of installing Caddy is likewise not the best.

There are many ways to install software. I’m a huge fan of Docker, myself - my home lab runs off a single docker-compose.yml file. Package managers are great too.

But I’m opinionated, I find inflexibility to be a poor quality in a system administrator. And I don’t think the problems you might have with your one source of many possible sources for Caddy compare favourably with the downside of accommodating them. It’s a discussion, though, and the devs are listening to many opinions; mine is just one of them here, yours is equally valid.