Developers supported by the Services Operations team ----- within the freshly reorganized Services Efficiency arm of the Web Platform org are going to be hearing a lot from us about CI and CD this year, so I'd like to take a minute to share some learnings from an initiative that recently improved CI and CD for Firefox and other projects. ----- CI means testing your code automatically every time you make a meaningful change. CD means automatically deploying code at appropriate times instead of waiting for someone from ops to do it. ---------- CI and CD are a force multiplier. Every test and deployment that we automate is one less chore that someone has to do by hand. ------------ Years ago, several teams within Mozilla realized that the web didn't yet have a task execution platform that was open and extensible enough to meet their build and testing needs. ---------------- Taskcluster is the current iteration of their work to fix that. ----------------- At this time last year, there was only one Taskcluster deployment at Mozilla, and it hosted a mix of experimental projects along with critical Firefox tasks. ------------ The developers were responsible for all its operations, -------------- and that production instance was also a place to test changes to Taskcluster itself to see if they'd work. ----------------- But making devs run their services by themselves is not great. The critical Firefox services had very different needs from some of the cutting-edge Research projects with which they were trying to share a cluster. ------------ Also, the developers wanted to make sure anyone in the world who want Taskcluster's features can run the software themselves. ------------- Fortunately, we have a team whose purpose is keeping essential Mozilla services up and running efficiently! That's us, in Services Operations. I had worked with Taskcluster before, so with help from my colleague Brian Pitts, I took on the task of migrating Taskcluster onto our Services Ops standard infrastructure and into ongoing Services Ops ownership. --------------- As well as plenty of technical takeaways from the migration, we learned some lessons about working together across the org that anyone can apply. ---------------- At first it seemed simple: Hand things off from the Taskcluster team to Services Ops.. But the process also ended up needing extensive input from relops and releng, and it affected the owners of basically every piece of code that runs to build and test firefox. ------------------- Every Wednesday afternoon we had a call to chat about how things were going. It was great to be able to simply invite newfound stakeholders to all join us at a known time, instead of adding new meetings for every new component that got involved ---------------------- The move helped us practice good technical communication, too. The developers helped the move succeed by updating Taskcluster to work with our Dockerflow standards, and in turn ops guided them toward our best practices by being clear about exactly what architecture, logging, and security standards we needed them to follow. ----------------------- The new staging deployment helped us find and fix a variety of issues before they ever got to production. And for several issues that did make it to production, we realized in retrospect that we could have found them in staging if we'd used it more rigorously. --------------------------- Since November 2019, we have 3 Taskcluster deployments within Mozilla. --------------------------- In Stage, which updates automatically on each Taskcluster release, we test Taskcluster changes in a setup that looks just like production. ---------------------------- “Production” is the FirefoxCI cluster, where we build and test mission-critical Firefox components. It gets a commensurate degree of security scrutiny for the cluster's services and workers. ----------------------------- If you have questions about this cluster, visit #firefox-ci on Slack. ------------------------------ And there's also the Community cluster, for the menagerie of projects that aren't on the critical path for Firefox but still deserve CI support. Community has the flexibility to be less strict about its processes and try out the cutting-edge features which projects on it often prefer. ------------------------------ So, that's how Services Efficiency stepped up to the challenge of hosting CI tooling that meets Firefox's needs. If you want fully-featured and extensible CI for Mozilla projects, talk to the Services Ops team about Taskcluster.