In my post about things that might help you write better software, a couple points are controversial. Most of them I think are controversial for uninteresting reasons, but monorepos (putting all your code in one repository) are controversial for interesting reasons.
With monorepos the advice is controversial because monorepos are good at everything you might reasonably think they’re bad at, and multiple repos per project are bad at everything you might reasonably think they’re good at.
First a note: When I am talking about a monorepo I do not mean that you should have one undifferentiated ball o’ stuff where all your projects merge into one. The point is not that you should have a single project, but that one repository can contain multiple distinct projects.
In particular a monorepo should be organised with a number of subdirectories that each look more or less like the root directory of what would have otherwise been its own repo (possibly with additional directories for grouping projects together, though for small to medium sized companies I wouldn’t bother).
The root of your monorepo should have very little in it – A README, some useful scripts for managing workflows maybe, etc. It certainly shouldn’t have a “src” directory or whatever the local equivalent where you put your code is.
But given this distinction, I think almost everyone will benefit from organising their code this way.
For me the single biggest advantages of this sort of organisation style are:
- It is impossible for your code to get out of sync with itself.
- Any change can be considered and reviewed as a single atomic unit.
- Refactoring to modularity becomes cheap.
There are other advantages (and some disadvantages that I’ll get to later), but those are both necessary and sufficient for me: On their own they’re enough to justify the move to a monorepo, and without them I probably wouldn’t be convinced it was worth it.
Lets break them down further:
It is impossible for your code to get out of sync with itself
This is relatively straightforward, but is the precursor to the other benefits.
When you have multiple projects across multiple repos, “Am I using the right version of this code?” is a question you always have to be asking yourself – if you’ve made a change in one repository, is it picked up in a dependent one? If someone else has made changes in two repositories have you updated both of them? etc.
You can partly fix this with tooling, but people mostly don’t and when they do the tooling is rarely perfect, so it remains a constant low grade annoyance and source of wasted time.
With a monorepo this just doesn’t happen. You are always using the right version of the code because it’s right there in your local repo. Everything is kept entirely self-consistent because there is only a single coherent notion of version.
Any change can be considered and reviewed as a single atomic unit
This is essentially a consequence of the “single consistent version” feature, but it’s an important one: If you have a single notion of version, you have a single notion of change.
This is a big deal for reviewing and deploying code. The benefit for deploying is straightforward – you now just have a single notion of version to be put on a server, and you know that version has been tested against itself.
The benefit for reviews is more interesting.
How many times have you seen a pull request that says “This pull request depends on these pull requests in another repo”?
I’ve seen it happen a lot. When you’ve got well-factored libraries and/or services then it’s basically routine to want to add a feature to them at the same time as adding a consumer of that feature.
As well as adding significant overhead to the process by splitting it into multiple steps where it’s logically a single step, I find this makes code review significantly worse: You often end up either having to review something without the context needed to understand it or constantly switching between them.
This can in principle be fixed with better tooling to support multi-repo review, but review tools don’t seem to support that well at the moment, and at that point it really feels like trying to emulate a monorepo on top of a collection of repos just so you can say you don’t have one.
Refactoring to modularity becomes cheap
This is by far the biggest advantage of a monorepo for me, and is the one that I think is the most counter-intuitive to people.
People mostly seem to want multiple repositories for modularity, but multiple repositories actually significantly hurt modularity compared to a monorepo.
There are two major reasons for this:
The first builds on the previous two features of a monorepo: Because multiple repositories add friction, if repositories are the basis of your modularity then every time you want to make things more modular by extracting a library, a sub-project, etc. you are adding significant overhead: Now you have two things to keep in sync. Additionally it’s compounded by the previous problems directly: If I have some code I want to extract from one project into a common library, how on earth do I juggle the versions and reviews for that in a way that isn’t going to mess everyone up? It’s certainly manageable, but it’s enough of a pain that you will be reluctant to do what should be an easy thing.
The second is that by enforcing boundaries between projects across which it is difficult to refactor across you end up building up cruft along the boundary: If project A depends on project B, what will tend to happen if they are in separate repos is that A will build up an implicit “B-compatibility layer” of things that should have happened in B but it was just slightly too much work. This both means that different users of B end up duplicating work, and also often makes it much harder to make changes to B because you’d then need to change all the different weird compatibility layers at once.
In a monorepo this doesn’t happen: If you want to make an improvement to B as part of a change to A, you just do so as part of the same change – there’s no problem (If B has a lot of other dependants there might be a problem if you’re changing rather than adding things, but the build will tell you). Everyone then benefits, and the cruft is minimised.
I’ve seen this born out in practice multiple times, at both large and small scales: Well organised single repositories actually create much more modular code than trying to split it out into multiple repositories would allow for.
Reasons you might still want to have multiple repositories
It has almost always been my experience that multiple repositories are better consolidated, but there are a few exceptions.
The first is when your company’s code is partly but not entirely open source. In this case it is probably useful to have one or more repositories for your open source projects. This doesn’t make the problems of multiple repositories go away mind you, it just means that you can’t currently use the better system!
Similarly, if you’re doing client work where you are assigning copyright to someone else then you should probably keep per client repos separate.
Ideally I’d like to solve this problem with tooling that made it better to mirror directories of a monorepo as individual projects, but I’m not aware of any good systems that do that right now.
The other reason you might want to have multiple repositories is if your codebase is really large. The Linux kernel is about 15 million lines of code and works more or less fine as a single git repository, so that that’s a rough idea of the scale of what “large” means (if you have non text assets checked in you may hit that limit faster, but I’ve not seen that be much of a problem in practice).
This is another thing that should be fixable with tooling. To some extent it’s in the process of being fixed: Various large companies are investing in the ability of git and mercurial to handle larger scales.
If you do not currently have this problem you should not worry about having this problem. It will take you a very long time to get there, and if the tools haven’t improved sufficiently by the time you do then you can almost certainly afford to pay someone to customise them to your use case like the current generation of large companies are doing.
The final thing that might cause you to stick with multiple repos is the tooling you’ve built around your current multi repo set up. In the long run I think you’d still benefit from a monorepo, but if all your deploys, CI, etc. have multi repo assumptions baked into them then it can significantly increase the cost of migrating.
What to do now
Embrace the monorepo. The monorepo is your friend and wants you to be happy.
More seriously, the nice thing about this observation is that it doesn’t require you to go all in to get the benefits. All of the above benefits of having one repository also extend to having fewer repositories. So start reducing your repository count.
The first and easiest thing to do is just stop creating new repositories. Either create a new monorepo or designate some existing large project as the new monorepo by moving its current contents into a subdirectory. All new things that you’d previously have created a repository for now go as new directories in there.
Now move existing projects into a subdirectory as and when is convenient. e.g. before starting a large chunk of work that touches multiple repositories. Supposedly if you’re using git you can do this in a way that preserves their history, though when I’ve done it in the past I have typically just thrown away the history (or rather, kept the old repository around as read only for when I wanted to consult the history).
This may require some modifications to your deployment and build scripts to tie everything together, but it should be minor and most of the difficulty will come the first time you do it.
And you should feel the benefit almost immediately. Whenever I’ve done this it’s felt like an absolute breath of fresh air, and has immediately made me happier.