David R. MacIver's Blog: Review of a book that reviews (code) reviewing

Review of a book that reviews (code) reviewing

1 November 2016

In an earlier version of my recent software development advice post, I said the following:

I also think this will not significantly reduce the number of bugs in your software. Code review is just not a cost effective bug finding tool compared to almost anything else.

I appear to be wrong in this claim, if the literature is to be believed. I don’t have a strong reason to doubt the literature, and on examination I’m not actually certain why I believed the contrary (it appears to be cached knowledge), so I’m going to change my opinion on this: Code review is probably a very cost effective way of finding bugs if you do it well, and may be reasonable at it even if you do it badly.

The thing that’s updated my opinion on the subject is reading “Best Kept Secrets of Peer Code Review”, after Richard Barrell sent me a link to a chapter from the middle of it reviewing the literature.

With a couple caveats (the title being terrible definitely counted amongst them), the book is pretty good. There are two chapters worth skipping, but that’s more a function of the fact that the book is from 2006 than anything else (GitHub’s pull requests, while far from a stellar code review system, are in all ways better than what was widely available back in 2006 and for the common path probably not much worse than the state of the art).

The authors clearly have an agenda and the book is there to promote that, but that’s OK. I have an agenda too - it’s very hard to have expertise without also having an agenda in support of that expertise. And even if I believe they’re overstating the case, the case they present is reasonably convincing. It’s also refreshingly empirical - most software engineering advice is full of anecdotes and opinions and, while this book absolutely contains some of those, it’s also got a really pleasant amount of actual research backed by actual data. The data is often not of amazing quality in that many of the studies are quite small scale so the statistics are likely under powered, but that’s still a better standard of evidence than I’m used to.

It’s also pleasantly short. The whole book is only around 160 pages, and I read it over the course of an afternoon.

Anyway, that’s the review section of this post done. Now it’s time for the review section.

I have a couple take homes from this book which are currently in the state of “Plausible sounding statements that I will think about further and probably upgrade most of to actual beliefs”.

Code review is a very cost effective way of finding defects compared to manual QA of the application - it both has a higher rate of finding them per unit of time and also finds bugs that QA were probably never going to find.
As your notion of “defect” broadens to include things like your coworkers having no idea what you wrote down, the above statement becomes stronger.
Code review should be done by individuals (possibly multiple individuals working in parallel) rather than teams working together. As well as being more expensive per unit time, team code review seems to be less effective in absolute terms at finding defects.
The generally believed opinion that a 10 line change gets more useful review than a 500 line change seems to be actually true but too generous - the actual cut off seems to be more like 200 lines. It’s unclear to me whether this means 200 lines added or a 200 line diff, but I believe it means the former.
Code review effectiveness drops off dramatically as the time taken approaches about an hour. It’s unclear to me if you can fix this by walking away from the review and coming back later. This may also be the origin of the 200 lines limit - it may be impossible for most people to effectively review code at a rate much higher than 200 lines / hour.
Code review benefits greatly from a careful first reading of code before you do any reviewing (and doing this stage more slowly is better. Presumably this hits diminishing returns at some point).
Code review benefits greatly from being structured with checklists.

The checklist one is interesting, and I am actually pretty convinced by it: The overall literature on checklists helping construct reliable processes seems to be reasonably convincing (but my opinion here is very trend driven and I have not actually read any primary research on this subject. It conforms to my biases about the world though, and thus must be true), and the specific data presented in favour of checklists for reviewing is moderately convincing. This is the advice that is the most likely to make an actual practical difference to how I review code in future.

The advice on checklists is interesting in particular in how it’s constructed: They strongly recommend not having just a “standard” checklist, but actively curating one over time: When something is found in review that seems like it would make a good check list item, add it to the list! When an item hasn’t found anything useful in review in a while (say 6 months), think about removing it.

Here are a couple example check list items from the book that look generally useful to me (the example list this is from is much longer):

Documentation: All subroutines are commented in clear language
Documentation: Describe what happens with corner-case input.
Testing: Unit tests are added for new code paths or behaviours.
Testing: Unit tests cover errors and invalid parameters.
Testing: Don’t write new code that is already implemented in an existing, tested API.
Error handling: Invalid parameter values are handled properly early in the subroutine.

There are plenty more. The book suggests that checklists should really only have about the top 10-20 items that you’ve found most useful over time (the sample checklist has 25 items, so apparently they didn’t test the invalid input case here).

One interesting thing worth highlighting is that at least some checklist items may be worth automating out of the review entirely. e.g. “Testing: Unit tests are added for new code paths or behaviours” would mostly be better handled by enforcing a code coverage metric I think.

As well as improving the quality of the review itself, the book highlights another interesting application of checklists: It doesn’t have to just be the reviewer who goes over them! Doing a sort of self review by following the checklist yourself before submitting seems to be nearly as effective at removing bugs as the review itself would be (it is unclear to me if that remains true if you skip the review altogether - my feeling is that people are likely to get sloppy at the checklist if they imagine someone is not going to be checking it later). This presumably speeds up the review process a lot by reducing the number of back and forth exchanges required to pass it.

One suggestion off the back of this self review that they made (which seems to come from the Personal Software Process world) is that as well as maintaining a global checklist for reviews it might be worth maintaining individual checklists for reviews, where people maintain their own list of things that it’s worth them pre-checking because they often make a mistake in that area which then comes up in review.

Anyway, I’m really glad I read the book. I can’t swear to its correctness, but it’s an interesting and plausible perspective that I’ll be thinking about further.

PS. If you’d like to fuel my book habit, I have a public Amazon wishlist. I’ll happily review any books you buy me...