A software company like Google maintains a lot of code. Like, seriously, a lot. Like over 2 billion lines of code and 100 terabytes of data to go along with it.
It’s a scale that’s hard even to comprehend, like all the stars in the universe or all the universes in the multiverse. And you have thousands of engineers around the world working on it simultaneously. But, get this — they store all their code in a single monolithic repository and they’ve been doing it since the very beginning. Today, Google’s monorepo is likely the largest code base in the world but it takes an extraordinary effort to scale. They have their own homegrown version control system and a highly advanced build tool called Bazel which goes by the much cooler name of Blaze internally.
Here, in this article, we’ll learn everything we ever wanted to know about mono repos and how a humble javascript developer can build a high-performance mono repo in his garage. So let’s dive in!
You may have heard the exciting news that Vercelli acquired a company called Turbo Repo. It’s a build tool written in go that makes it really easy to manage multiple apps and packages in a single git repository.
But first, let’s answer the question of why would I want to use a mono repo. There are many reasons, but at the highest level, it gives you visibility of your company’s entire code base. This is without the need to track down and clone a bunch of different repos.
In addition, it provides consistency because you can share things like your eslint, config a library of web components for your design utility libraries, documentation, and so on.
The real power, though, comes in the form of dependency management. Imagine somebody makes a breaking change to a shared library. All affected applications will know instantly and monorepo tools can actually help you visualize the entire dependency graph of your software. When it comes to third party, a mono repo can dedupe packages that are used in multiple apps.
A mono repo is also ideal for continuous integration and automation because your code is already unified, by default, making it much easier to build and test everything together.
But, there is one big problem with monorepos, and that’s the fact that they’re big. As the monorepo becomes larger, there are more things to test and more things to build and a lot more artifacts to store.
As a result, vs code will lag trying to process the massive git history, and you’ll need a 20 minute smoke break waiting for everything to run on ci server after every commit, to operate a mono repo at scale. It’s absolutely essential to have the right tooling. That’s why Facebook created Buck, Microsoft created Rush and Google created Bazel. You just need a Phd in order to use it.
Luckily, there are other options out there. The most basic approach is to use your package manager like yarn or npm to define workspaces. These tools basically configure your project with a root level package JSON, which has nested workplaces like apps and packages that are linked back to the root level project.
A cool thing about this is that it will dedupe your node modules, which means if you have the same package installed in multiple apps; it will only be installed once. It also allows you to orchestrate scripts.
Like, if you want to build or test all of your apps at the same time, that’s a good, easy place to start. But, if you’re building an open source project that publishes a bunch of different packages, then you’ll likely want to look into a tool called Lerna. Lerna is a tool that can optimize the workflow of a multi-package repo. Here’s an example.
Turf js is a geospatial library that has a ton of different packages that are essentially helper functions for working with geolocation data. Each one of them can be installed as its own package and lives in its own subdirectory here on the repo.
Lerna is a tool that helps manage this workflow efficiently. Most importantly, it allows you to publish all of your packages to nom with a single command.
These tools are great at configuring monorepos, but they still suffer from the same problem mentioned earlier in the article. The tools become really slow and difficult to work with as monoreps grow larger.
One problem is the installation of dependencies. If you’re looking to improve your install speed, an easy optimization to make is to replace npm with pnpm. It’s a drop-in replacement that will install your dependencies globally and sim link them. That can make your install speeds up to three times faster.
That’s a nice upgrade, but what really makes a monorepo slow is the constant needs to recomplile, rebuild, and retest everything.
It becomes important, here, to discuss an entirely different class of tools that can make your mono repo operate at the speed of Google.
The tools being compared here are NX and Turbo Repo. They both operate as a smart build system. This implies that they create a dependency tree between all your apps and packages which allows the tooling to understand what it needs to be tested and what it needs to be rebuilt.
When there’s a change to the code base, they cache any files or artifacts that have already been built and can also run jobs in parallel to execute everything much faster.
Let us take an overview of the difference between NX and Turbo Repo. NX has been around for about five years and was created by two ex-Googlers. Turbo Repo, on the other hand, was just open source the day it was created by Jared Palmer, who you might know from React ecosystem.
With packages like Formic, at this point, Turbo Repo is a lot more minimal than NX. But, NX can do everything that Turbo Repo does and has features beyond it like Oli, which can automatically generate Boilerplate code for you.