I have the opportunity to work on Caddy for academic credit until April. The project involves implementing telemetry into Caddy so that we can gain a better understanding of how not only Caddy is being used, but also the Internet at large. The plan/goal is to enable telemetry by default on all Caddy instances (with an option to turn it off, of course) so that the public can inspect the behavior of Web clients and observe the health of the Internet in near-real-time.
Work has already begun. I am inviting any and all interested researchers or developers to participate! Please introduce yourself and I will send you an invite. We need feedback regarding measurements to collect and technologies to use, as well as actual implementation assistance. If you would like to contribute to this effort in any way, please let me know with an introduction of yourself and your interests and I will invite you to our Slack channel.
A few of us have already drafted a list of almost 100 questions to answer with a data set constructed from web servers operating all over the world on different networks and serving various different clients.
Some sample questions related to Caddy itself:
- How many instances are running?
- What countries are the instances in?
- What platform are they running on? OS/arch/CPUs, etc.
- Which Caddy versions are being used?
- Is Caddy being used locally or in production?
A few sample questions about TLS and Internet security:
- What is being advertised in TLS ClientHellos?
- What is being advertised by Caddy’s ServerHellos?
- What ends up being negotiated for a TLS connection?
- Why do TLS connections fail? What alerts are raised?
- How are certificates being managed by Caddy?
- Which clients fail to adhere to HSTS?
- How long is the average certificate chain?
- Which HTTPS connections are being MITM’ed? (What is the client being advertised vs. what characteristics does the client actually exhibit?)
- How often is SNI being used?
- Which clients are behaving abnormally? Which botnets might be rising? Who is being DDOS’ed or who is performing a DDOS?
A sample of other questions:
- What HTTP errors are most common?
- What errors is Caddy having internally?
- How many connections use IPv6 vs. IPv4?
- What is the latency in sending responses to clients?
These are just a few of the many, many questions we would like to be able to answer. There are Google Docs documents with the full list that are shared in our Slack channel which can be edited by participants.
And yes, of course privacy, transparency, and applicable laws are all going to be a huge part of our discussions – at the proper time. The first stages are to determine at a high level which measurements to collect and choose technologies to store and produce the data set and make it available.
If you have any questions or would like to participate, reply here and introduce yourself! We’d love to have you participate!
(Keep in mind that extended discussion about telemetry should be focused in our Slack channel for now. We don’t want to lose track of anything.)