Pros and Cons of Keeping Your Code in Monorepo
8 min read
Monorepo is an approach to organizing code of many projects inside one repository. It can go as far as keeping all the code maintained by a company inside a centralized repository. Many big companies are using the monorepo strategy: Google, Meta, Microsoft, etc.
I will do my best to show you the pros and cons of monorepos, but I have a confession to make: I like monorepos. A few years ago I was creating a separate repository for each application, but at last I got tired of the overhead being generated. Currently, I’m in the process of migrating all those projects into one big repository.
What is monorepo
A monorepo is a single repository that hosts multiple projects. It’s different from multi-repos (or poly-repos), where each project has its own repository. It’s different from the monolithic application because it contains different applications.
So, a simple example would be a monorepo that contains:
- the frontend code of an application,
- the backend code of the same application,
- the source code that generates the static website of the company, and
- some other projects maintained by the company.
All that is contained in a monorepo, no matter which programming languages are used where: so you could have JS on the frontend, Python on the backend, and PHP for the website.
Advantages of having separate projects
No matter how we organize our repositories, well-defined projects help with development and maintenance. For one thing, backend and frontend development are often done by separate teams in different technologies. The client/server split makes for a clear boundary between those parts, and with a bit of documentation, you can have a clearly defined relationship between them.
The same approach can be used for development that is done on the same end. By having multiple projects, you can use one technology stack for one application and use another one for another application. Separate projects provide more flexibility for your team. This gets more important as the team gets bigger and the solution you build becomes more complex.
Issues with multiple repositories
So, we see reasons to leave monolithic applications behind, but what’s the problem with having different repositories for different projects? After all, we can include our own projects as dependencies in places where you need them and use the same workflow we use for using-third party libraries.
For me, the problems start when we pretend that the backend is a third-party application to the frontend. It’s not: those two are usually developed in parallel. When I use a third-party library, I
- specify a version I’m happy with,
- update sporadically—usually when I’m forced by some critical issues or incompatibilities—and
- expect no collaboration from the library maintainer—I don’t hope Webpack will delay their release, so I can resolve some issues in my build.
In the case of the frontend–backend relationship, all three points are different:
- I use the backend version that is deployed, usually the most recent one.
- I have no direct control over what is deployed and when.
- I expect my backend colleagues to wait with their deployment if it requires frontend changes.
So if the needs are so different, the workflow should reflect that as well.
Atomic ⚛️ commits ✅
Let’s start with the main advantage of monorepo—you can make changes across all parts of the applications in one commit. Imagine you rename a field on a data model. This simple change will require many changes in the codebase:
- Backend code—where every time name is used in the backend, it will have to be updated: the documentation, data model in code, the code for fields that are dynamically generated. You may even need to put some data migration logic in there as well.
- Frontend code—the field is most likely read or written by some frontend code. You will need updates to the frontend logic.
- Test data for e2e, data seeding for the development environment, etc.
In poly-repos, you’ll see many repositories affected by this change, with commits that should be developed, merged and deployed in parallel. It’s a lot of manual work and mental overhead, even when everything goes smoothly. If you need to revert the changes, things get even more ugly. Monorepos allow you to create one atomic commit that contains all changes—and merge (or revert) it when needed.
With atomic commits spanning across many projects, it’s easy to have integration tests that truly check everything together. In an ideal setup, you would have
- unit tests checking each application in isolation—the same as in a poly-repo,
- an end-to-end test that checks whether the backend and frontend versions in the branch work together as expected.
I had been trying to achieve something like this in a poly-repo, and it was never easy. This and the atomic commits are the main reasons why I decided that monorepos are the way to go with code development.
So, after going through the main reason in favor of monorepos, let’s take a look at the downsides. The biggest one is complexity. A poly-repo allows you to pretend that each of your projects is an independent, standalone thing, so you can tackle things in more bite-size chunks. Let’s see what gets complicated as you move many projects into one repo.
The biggest thing that gets complicated is your continuous integration (CI). By moving projects, you introduce a trade-off:
- you either build (and test, etc.) everything in each CI run—which can get slow and resource intensive as you put more applications in the monorepo and as the applications grow—or
- you try to optimize the CI to guess what CI jobs should be run based on what code was changed.
With option #2, you save the time and computational resources, but you introduce a risk that some changes will not be tested even though they should be. To address this issue, my solution is to run
- limited CI for the merge requests—checking only the things that are likely affected—and
- a complete CI job for the main branch
This way, even if my optimization will cause a regression to go unnoticed for too long, the main branch will start failing, and I'll be able to resolve the issue.
Optimizing the CI for such a scenario is not an easy task. For example, here you have a simple setup for CI for a monorepo in GitLab. As you can see, it’s much more than CI configurations for single projects.
A quiet advantage of using one repository for a project is that you can use this repository as an artifact. So, for example, in your node.js package, you could just import your library directly from a remote Git repository by installing the dependency with something similar to:
$ npm i git+https://github.com/amcharts/amcharts3.git#3.18.3 added 1 package, and audited 2 packages in 9s found 0 vulnerabilities
In any other place, if you need your code in a specific version you could do something similar, effectively using your Git repository as both code and artifact repository.
As you move your projects inside the monorepo, you will need to replace this way of sharing code. Otherwise, you would be downloading the whole monorepo to use only a small library that is inside. You will need some package or artifact repository. For node libraries, you could use NPM—it can host public or private packages. Or, if you use GitLab, they provide a package registry that you can use to publish packages. This registry can be used with NPM or one of the other dependency managers.
For reusing code inside the mono repo, you could use the same approach—making sure that during the CI build the packages are published before you try to use them. Alternatively, you can use direct imports between applications—the relative paths between projects are tracked by Git, so it should work smoothly on different machines.
By housing more projects in one repository, more things will be happening there. No matter your Git workflow, each developer will have more remote changes to deal with—with rebases or with merges. Even though you shouldn’t be afraid of rebases, this can add a bit of overhead to your development—especially with bigger or more productive teams.
Single source code ✅
Another advantage to the monorepo approach is creating an obvious repository where almost any code belongs. Instead of defining many projects/repositories, each in its own location, you have one repository where you put all code in different folders. The question of which repository should host the code is replaced by what folder—you can use the same rules to decide, but the stakes are much lower due to a few factors.
Integrated git grep
git grep is a useful command to search through your repository. By default, it’s searching inside your current folder, but you can easily run it at the topmost folder of your repository to search across all projects. You could simulate something similar by getting all related projects from different repos next, but the advantage of the monorepo is that you don’t need to follow what project is being added or removed. Everything is in the repository, whether you pay attention to a given project or not.
Obvious places to put documentation
With multi-repo projects, it’s never clear where the relationship between different repositories should be documented. Should it be a frontend README that describes the relationship with the backend, the backend’s one, or some third place? We can even consider a company-wide wiki that everybody will forget about in two months. With the monorepo, there is a folder that contains every part of your solution, and it’s an obvious place to put documentation that spans multiple projects.
Interested in learning more?