Caddy 0.11 Will Have Telemetry - discuss

matt · April 11, 2018, 4:25pm

No – of course not. That’s contrary to everything we’ve been saying from the beginning. Telemetry can be disabled! It is entirely optional (literally, it is an option).

You will have the ability to make that decision, of course – you’re the sysadmin, you have the ability to control what programs you run with what configuration you want. This isn’t a hosted platform or a social network where we’re forcing you one way or the other behind a walled garden!

rugk · April 11, 2018, 5:22pm

I just wanted to highlight that a compile-time opt-out is no opt-out for the server admin. Not all server admins use your pre-compiled binaries or compile it themselves.

Whitestrake · April 11, 2018, 11:54pm

I’m not sure I understand what this will achieve. The UUID is not derived from, nor can it reveal, any identifying information about the server it came from - except for the fact that it’s tied to the data it came with.

Hashing it doesn’t stop someone from requesting all data for a given UUID, it doesn’t protect any sensitive secrets or credentials from the database owner (as the UUID is neither), and it doesn’t stop the database owner from separating out any discrete instance of Caddy.

I suppose you could argue that the database owner can’t take the hash and put straight in to the front-end of the metrics site, but why would they do that when they can just look at the data in the database?

UUIDs are equally meritorious as hashed UUIDs or any other sufficiently random token for authentication purposes.

I’d argue that the data isn’t actually related to the site at all, but to Caddy itself. These aren’t web logs - they’re aggregated Caddy response latencies, MITM counts, TLS implementation, etc. across the entire web server. Once they’re aggregated locally, they’re indistinguishable.

So if someone higher up dictates that these stats can’t be included in the aggregate, I’d say that the entire server should be telemetry-disabled. There’s no real difference between the stats from a safe site and the stats from a sensitive site, so it would be best for it all to go off.

I don’t want to speak for Matt, but when I refer to the server admin, I mean the person ultimately responsible for the server, with the power to make decisions about what software runs on it. In this case, the person who provisions the server and puts the Caddy binary on it is the server admin, and they have the ability to choose whether they want telemetry. If I were just logging on to administer the web server, I’d be the web admin.

In the end, everyone either uses pre-compiled binaries or compiles it themselves.

rugk · April 18, 2018, 2:06pm

As for hashing: Yes, maybe it does indeed not help much.

Are you kidding? There is the choice to use it from distro authors. And such a thing is always better than both options you mention, as they do not provide automatic updates.
So for security reasons (actually I thought that’s a topic for Caddy) you should really support apt/dnf installation (from distros).

And as such, it must still be configurable by the server admin. They can chooses the software, yes, but they are likely to deliberately choose the distro’s version for installing. And if they do so, they still want to be able to opt-out.
Otherwise that is no real opt-out and the opt-out is worth nothing.

boxofrox · April 18, 2018, 8:02pm

@matt, being privacy-oriented myself, I’d prefer opt-in, but understand the benefit opt-out offers and appreciate the discussion you’ve initiated in the pursuit of transparency.

When the list of collected metrics is published, I may find that acceptable and not opt-out of telemetry on my Caddy service, but as all things are subject to change, do you have or [intend to have] a policy in place for clearly documenting/announcing changes to the dataset collected by telemetry?

When reviewing the list of collected metrics, at some future time, I would particularly like to see which version of Caddy introduced the collection of that metric. Of course, they’d all mention 0.11 in the first release, but over time, this information might be useful, particularly for researchers in order to limit the scope of data mining to those versions of caddy that actually collected a particular metric of interest.

I would find it unsatisfactory to just toss the new metrics in the documentation and leave it as an exercise for the user to determine which were added (or removed if that’s a possibility).

I imagine these changes would already be included in the release notes, which is grand.

One other aspect is that fastidious adherence to such a policy will demonstrate a commitment to remaining transparent with regards to telemetry. It might also be worth mentioning such a policy in the documetation for telemetry. Food for thought.
Will it be possible for server admins to offload the telemetry to an additional destination of their choosing (e.g. file, syslog, elasticsearch, etc)?

In the spirit of transparency, this provides another avenue to audit the data collection and confirm that nothing more is collected than documented. In case, you know, your evil twin supplants you and intentionally conceals the collection of new metrics, both in the documentation and in the published telemetry data. I’m sure someone would eventually find those new metrics in the source code, but I think reviewing the data sent to the Caddy Telemetry Collection Service would foster more participation than code review.

Of particular benefit, admins can automate that portion of telemetry collection where they’d otherwise have to download the telemetry digests from your service.

Thanks for the invitation to discuss this new feature.

Whitestrake · April 19, 2018, 1:09am

No, I’m not kidding. Unless your package manager includes Golang with Caddy and compiles it for you on your computer as part of the install process, your distro is providing a pre-compiled binary.

matt · April 19, 2018, 3:31am

First, thanks for your thoughtful reply.

Before 1.0, wasn’t planning on anything too formal, but I do feel committed to detailing the changes in telemetry metrics with each release. And as for when a metric was introduced, this can be easily inferred by correlating Caddy version with the metric’s existence in the telemetry data; or even simpler, GitHub’s nice git blame view is handy for that kind of stuff.

We’re planning on advanced data export features in the future. Not even so much with transparency in mind, but just for making it easier to process the data you care about. (The less work I have to do, the better!)

abiosoft · April 20, 2018, 6:13am

I do not see anything wrong with opt-out, as long as there is a way for users to disable it. In fact, most software that I am aware of (that does this) are opt-out, because that really is what they want.

My only concern is users not being aware they are sending data but I have reasons to believe Caddy can handle this well. The fact that this discussion is taking place is one of them.

Even Firefox, the privacy first browser, is opt-out.

rugk · April 23, 2018, 6:43pm

Of course. And AFAIK as I understand you, now they (i.e. the distro maintainers/package manager) have to (can) decide whether to enable telemetry or not.
That’s still not want I’d call an opt-out. Taking your prime example Firefox, you’ll see, that they also provide an runtime opt-out and do not (only) offer an opt-out for compiled binaries.

Actually it is not, as you can see above. Users in this context are server admins, I say. But it is planned to make it possible to disable not for users, but for the compilers (i.e. Linux distro or so). That is a fundamental difference.

caddyhello · April 25, 2018, 3:28am

My 2c:

In business, telemetry is considered a potential threat vector. If you choose a default opt-in posture, it must be easily disabled or businesses will look for other compliant server software.

I work with individuals using technology in oppressive regimes. What you’re proposing, if not handled carefully, could literally have people imprisoned or murdered by their governments.

We are just now embarking on a global debate over privacy. Respectfully, to implement default telemetry now … is a slap in the face to many of us.

Whitestrake · April 26, 2018, 2:13am

The server admin is ultimately responsible for installing the software on his server.

Since Caddy isn’t officially packaged for distros (and there’s a number of obstacles to overcome in that regard), the installation process involves retrieving a binary and placing it in the path, or compiling it and doing the same, or trusting that the unofficial distro package satisfies your requirements.

The server admin can retrieve a binary from anywhere they like, including the Caddy build server, and there will be options for that admin to select a binary that is telemetry-disabled.

Regardless of whether you treat your servers like cattle or like pets, I think that the above is a very reasonable situation. It’s not difficult under any circumstance for the user to choose.

While I’m likely not representative of the average server admin with regards to Go usage, I personally consider compiling Caddy to be so simple as to be a negligible step to get exactly what you want. Scriptable, probably, in 10 lines or less; I consider it less effort than even writing most of my Caddyfiles.

So the last major benefit I see to run-time opt-out is that you don’t have to cross your fingers that your distro’s unofficial package opts to disable telemetry. I’m weighing that against the downside stated above, which is a loss of reliability in how representative the metrics are, and I don’t think it’s worth it; I just don’t see the friction here, it just seems too easy to get what you want even without a run-time toggle.

(The above is, of course, only my own opinion.)

@caddyhello: Good points, always important to keep in mind. I’m sad to hear that it’s so much as a slap in the face; certainly I’d love to see a simple list enumerating all the aggregated statistics the telemetry is planned to collect, so that it can be plainly seen and discussed which, if any, of those metrics could be dangerous. I dare say we all agree that we’d prefer to have usable data which is not capable of creating any danger at all.

eva2000 · April 26, 2018, 2:33am

Matt mentioned it here, but is there a link to details or how one can compile Caddy custom binaries with telemetry disabled ? I always compile my own Caddy binaries for testing.
Have there be been tests to compare performance of Caddy with vs without telemetry ?
Will these tests be done every time a new Caddy version is released to ensure there are no performance regressions ?

Whitestrake · April 26, 2018, 2:39am

I understand that since progress is still very much underway, there’s no official documentation or such yet, but you’ll want to toggle this enableTelemetry variable in caddy/caddymain/run.go. (It affects this code block which initialises the telemetry.)

github.com

caddyserver/caddy/blob/518edd3cd45fa147a7c5bb3ca5cb717e17a624b2/caddy/caddymain/run.go#L362


      
          	appVersion = "(untracked dev build)" // inferred at startup
          	devBuild   = true                    // inferred at startup
          
          	buildDate        string // date -u
          	gitTag           string // git describe --exact-match HEAD 2> /dev/null
          	gitNearestTag    string // git describe --abbrev=0 --tags HEAD
          	gitCommit        string // git rev-parse HEAD
          	gitShortStat     string // git diff-index --shortstat
          	gitFilesModified string // git diff-index --name-only HEAD
          
          	enableTelemetry = true
          )

Not sure on tests. I imagine it would have (little to) no effect on request rate, it’s really just the check-in and hand-off on an interval where Caddy’s doing much it doesn’t already do. But there should definitely be tests.

rugk · April 29, 2018, 7:12pm

Depends on what is official, but actually it is: Caddy is packaged on Fedora and CentOS.

So this assumption is wrong. Maybe that’s why you resist to making a runtime configuration.

All text that follows bases on this assumption, so no: The server admin can use it from a distro. That will never change and when Caddy get’s more popular maybe it will get into even more distro’s.

Edit: Actuarial used a different site with more information, so it is packaged in these big “distros”:

Alpine Linux
Chocolatey (Windows)
EPEL 7
Fedora
FreeBSD

So actually quite a lot…

Gentoo
Homebrew (MacOS)

Whitestrake · April 29, 2018, 11:45pm

It’s not an assumption, rather a fact, and as you note, it’s a matter of the definition of “official”; to clarify, the developers have not officially made the Caddy software available through any package managers at all. Any Caddy package you find, as of this writing, is unofficial, provided directly by distro maintainers or volunteers based on the Apache 2.0 license the source code comes under.

It is something they want to get around to eventually, though (this thread is relevant: Packaging Caddy - #127 by carlwgeorge).

As for the reason why I resist, that’s not a mystery either - the closing statement of my last reply to you succinctly outlines why I don’t see the benefit of run-time configuration worth the downside, but to re-summarize: the benefit is small, and the downside is large.

I would contend with this statement as well; the fact that binaries exist for distribution (even if they were officially provided and maintained) via popular package managers does not preclude, in the slightest, the ease with which one can retrieve a binary from anywhere else (including any other unofficial package).

rugk · April 30, 2018, 7:24pm

That’s the concept of distribution packages. Upstream can (and should) of course help, but basically it is the downstream (the packager for the distro), who packages the software.
That’s why all distro packages are unofficial, but actually you should not care. At least not imply that this is bad. This is just how it works!

You have never explained that. You never gave any stats for that.

And again, you never responded to my argument that no other software (to some extend – even Windows) disallows the user of a software to configure a telemetry setting at runtime.
So you would set q quite bad precedent in the FLOSS world at least. I doubt you want that.

Hell, yes, but who cares? No, your way of installing Caddy is not “the best”. It just is not! When users want to install it via proven standard ways of installing software on Linux (distro system packages), they can do so and should not have ridiculous disadvantages, such as not having a way to opt-out or opt-in (yes, in case a distro disables telemetry by default a runtime config can actually help you to get more stats!).

If you want to punish users, who do not follow your shiny “I download random binaries from the internet/shell and put them onto my system” install method, then sure, do so, but don’t complain if you get users in rage then.

And there are quite enough advantages of using system distros as it has been discussed in your linked thread already.
So don’t disregard people who deliberately choose to install it via system distros. They want to do so and they want to configure telemetry. And if it is a very hidden setting, they want to have a way…

Whitestrake · May 1, 2018, 12:30am

I hope you can forgive the implication that package managers are bad. It was not my intention.

No, I guess I didn’t. I’ll address it by quoting from earlier in the thread, though:

Without measuring it, we simply don’t know how representative the data we have is (I’m getting dangerously close to tautology here).

My understanding is that it’s difficult to quantify the statistical significance of what’s lost, because by its very nature it becomes an unknown. But you’re right in that nobody’s given any stats in Caddy’s case. I don’t think there are any yet. I’m happy to be contradicted here if anyone’s got more information.

Sorry, I didn’t see where you made that point earlier. To take your example and run with it, Windows lets you configure (to a very limited extent) the level of data-gathering. In case you missed part of the announcement post, on top of the capability to totally opt-out from the outset, Caddy also plans to do this. Here’s the relevant section (emphasis mine):

You will be in control of your telemetry: you may always choose to not participate in it. In fact, the telemetry server has the ability to remotely disable (but NOT enable!) telemetry in Caddy instances at any time if deemed necessary. It can also disable certain metrics if that is needed.

– https://caddyserver.com/blog/caddy-0_11-telemetry#your-controls-and-privacy

I didn’t make this assertion, and nor did you, so it would be asinine of me to remind you in turn that your own way of installing Caddy is likewise not the best.

There are many ways to install software. I’m a huge fan of Docker, myself - my home lab runs off a single docker-compose.yml file. Package managers are great too.

But I’m opinionated, I find inflexibility to be a poor quality in a system administrator. And I don’t think the problems you might have with your one source of many possible sources for Caddy compare favourably with the downside of accommodating them. It’s a discussion, though, and the devs are listening to many opinions; mine is just one of them here, yours is equally valid.