I've started work on a fork of go-acme/lego which will eventually be used by Cer…tMagic.
# History
Originally -- back in early 2015, _well before_ Let's Encrypt's public beta -- lego was developed by @xenolf, conceived for use in Caddy as a Let's Encrypt client (hence the name lego, "Let's Encrypt for Go"). It was the first ACME client written in Go, and Caddy was the first public/generalized consumer of Let's Encrypt certificates using Go. Eventually, xenolf stepped down to focus on other things, and lead maintainership fell to Ludovic Fernandez, a Traefik engineer at Containous, who has done excellent work keeping lego active and useful, including a complete rewrite to better accommodate ACMEv2, which is actually the first formally standardized version of the ACME spec. Thank you, Ludovic, for all your hard work, especially on many long nights and weekends!
Eventually, Caddy's certificate management logic was surgically extracted into a new library, [CertMagic](https://github.com/caddyserver/certmagic), which is one of the primary consumers of lego (and Caddy is the primary consumer of CertMagic). CertMagic has seen adoption beyond just Caddy, too, used in who knows how many Go projects around the globe.
You could say that Caddy and CertMagic have been through a lot together. Through building, deploying, and working with users on these projects, I've seen all sorts of successes and failures, bugs and edge cases, enterprise requirements and various developer preferences and habits. CertMagic's design has evolved to be more flexible and accommodate as many scenarios as possible, incorporating the things we've learned along the way. And together with multiple organizations, we've written [a document of ACME best practices](https://github.com/https-dev/docs/blob/master/acme-ops.md).
Through these experiences, I would be so bold as to suggest that CertMagic is perhaps one of the most well-vetted, production-tested, self-contained certificate management libraries out there (even among those not written in Go). It is designed to scale to tens of thousands of certificates (or orders of magnitudes more, if your machine has enough memory, and if you have enough time to operate within CA rate limits at that scale). It can even obtain certificates dynamically at handshake-time, or coordinate certificate management across a cluster if so configured! I am not aware of other libraries or tools which are designed to scale to these levels or accommodate as many use cases... it is not easy, for sure.
# Changes
Now that the ACME spec has mostly settled down, I'd like to say that CertMagic is almost done. But, there's still quite a few things that need improvement, which are difficult without some changes in lego... and while lego is excellent in many ways, but it crumbles at scale. We've worked around a few of the problems in Caddy/CertMagic but it's far from ideal. The changes I want in lego may be considered significant... lego is on major version 3, making a major version 4 is not the problem. It's simply that these changes require a fundamentally different philosophy and vision.
Our opinions or priorities diverge on matters of:
- [Revamping logging](https://github.com/go-acme/lego/issues/969)
- [Less opaque error handling from ACME transactions](https://github.com/go-acme/lego/issues/793)
- [Moving DNS providers into a separate repo to be more lightweight and portable](https://github.com/go-acme/lego/issues/992) (although, on second thought, maybe not as separate Go modules)
- [Tweaking DNS provider interface/types](https://github.com/go-acme/lego/pull/1058)
- [Internal, configurable throttling](https://github.com/go-acme/lego/issues/976)
- Client reuse or caching (can't find a link right now)
- [Context support (for cancellation)](https://github.com/go-acme/lego/issues/970)
- [Proper, randomized challenge selection and retries](https://github.com/caddyserver/certmagic/issues/34#issuecomment-477828872)
- [Env variables and config loading](https://github.com/go-acme/lego/issues/1054)
Understandably, most of these differences of opinion are goal-oriented. These are pain points for Caddy/CertMagic, but maybe not necessarily for Traefik or the other consumers of the lego lib: our projects all have different goals, priorities, and customers. Even if we all agree that logging or error handling could be improved, our priorities and visions may differ. I want to be sure we accommodate enterprise scale use cases and failure scenarios that our own customers have experienced, at least. To that end, lego's current design does not accommodate all of Caddy's/CertMagic's requirements, necessitating major changes that would be a burden to its current maintainers, who volunteer their spare time to work on it. I just can't ask more of them after all their hard work.
To avoid added friction to the lego project and to reduce burden on lego maintainers, I will make a derivation of the lego project in a separate repository and have CertMagic use my derivation of lego.
# Vision
With the changes I hope to implement more freely in a fork, I expect that:
- CertMagic will better handle large-scale deployments, especially in error situations.
- CertMagic's code will slim down in some areas. Several work-arounds exist in CertMagic (such as using all enabled challenge types) that would ideally be fixed in the underlying library instead. These will be fixed in the refactored design of the lego fork.
- Overall code bloat will decrease as well, since we don't need a command (CertMagic's CLI is Caddy).
- Build dependencies will be dramatically reduced, with a go.sum file about 6% of the current size. This makes things much easier and more lightweight for development, and significantly lowers error possibilities at build-time.
- We can reduce load on ACME CAs with internal throttling of ACME transactions, which is not currently exposed for us to control.
- Performance (speed) of ACME operations will increase by properly managing and reusing state.
- Configuration will be more flexible all-around without needing to rely on global state.
- Caddy will be more efficient with frequent, large config reloads on busy servers thanks to proper cancellation of long-running validations.
- It will be easier and more flexible to configure more DNS providers for the DNS challenge.
- Errors will be less opaque, so they can be more intelligently handled, and occasions (and durations) of being rate-limited by Let's Encrypt will drop dramatically.
And these are just the foreseeable benefits.
Caddy's certificate management -- via CertMagic -- is already the gold standard. I'm hoping to raise the bar and keep it the best in a world that relies more and more on TLS. Today in 2020, Caddy is still the only web server to use HTTPS automatically and by default. Back in 2015, I thought that would be a short-lived novelty, maybe a cute feature, but 5 years later, it's still unique and magical. In a way, that's kind of disappointing. But we should strive to make privacy the default on the majority of sites, and if that means making Caddy (or CertMagic) the majority web server, so be it.
# FAQ
## What does this mean for Caddy users?
Except for the potential benefits described above, almost nothing. Caddy doesn't really expose much of the interop with go-acme/lego directly.
## What does this mean for CertMagic users?
Except for the potential benefits described above, probably some API changes if you are using CertMagic programmatically. CertMagic hasn't tagged 1.0 yet, so pay attention to breaking changes in the minor releases (v0.X) until then.
## What does this mean for lego users?
If you use lego's CLI, you can keep using lego. (Although, we recommend using a long-running server in the long term instead of cron jobs; services like Caddy or CertMagic have more robust error handling!) If you use lego as a library, you can keep using lego. In time, I believe the two code bases will diverge enough that you won't recognize that one is a fork of the other.
If desired, I can contribute back any requested changes if they're compatible with lego, subject to time and funding constraints.
## Which lib should I use?
Between lego and my fork? Probably neither, to be honest. Unless you're writing certificate management libraries yourself, most people should use CertMagic instead because one-off certificate operations aren't very useful. Certificates need to be renewed, OCSP needs careful management, etc. For most Go programs in general, CertMagic is a better fit.
If you happen to be writing certificate management software and can't use or contribute to CertMagic for some reason, then that choice is up to you. Lego is probably going to be more stable in the near-future, but I suspect that our derived library will have significant novel improvements as well.
## Where will the forked library be hosted?
Its temporarily -- possibly permanent home -- is [mholt/acmez](https://github.com/mholt/acmez), which I will be working on as time and funding permits.
## What can the community do to help?
You can [sponsor](https://github.com/sponsors/mholt) my work, and once I get the library to a point where it can be used, test the heck out of it. :) Also feel free to participate in discussions that arise in the process! Thank you for any help!