Peter Seibel wrote a piece about code reading recently. It’s a good piece which meshes well with my experience of code reading, and it got me thinking about how I do it.
I think there are three basic tenets of my code reading approach:
- The goal of code reading is to learn how to modify the code. Sure your ultimate goal might be to understand the code in some abstract sense (e.g. because if you want to use the ideas elsewhere), but ultimately code you don’t know how to modify is code you probably don’t understand as well as you think you do, and code you do know how to modify is code that you probably understand better than if you’d merely set out to understand it.
- The meaning of code is inextricably tied with its execution. In order to understand code you need to be able to follow its execution. You can do a certain amount of this in your head by manually tracing through things (and you will need to be able to), but you have a machine sitting in front of you designed to execute this code and you should be using it for that. For languages with a decent debugger, you even have a machine sitting in front of you which will execute the code and show you its working. For languages without a decent debugger (or setups where it’s hard to use one), you can still get a hell of a lot of mileage out of the humble print statement.
- Ask many small questions. Ignore everything you do not need to answer the current question.
Many people completely rewrite code in order to understand it. This is an extreme form of learning to modify it – modification through rewriting. Sometimes this is fine – especially for small bits of code – but it’s pretty inefficient and isn’t going to be much help to you once you get above a few hundred lines. Most code bases you’ll need to read are more than a few hundred lines.
What you really want to be doing is learning through changing a couple lines at a time, because then what you are learning is which lines to change to achieve specific effects.
An extremely good tool for learning this is fixing bugs. They’re almost the optimal small question to ask: Something is wrong. What? How do I fix it? You need to read enough of the code to eliminate possibilities and find out where things are actually wrong, and you’ve got a sufficiently specific goal that you shouldn’t get too distracted by bits you don’t need.
If you don’t have that, here are some other small questions you might find useful to ask:
- How do I run this code?
- How do I write a test for this code? This doesn’t necessarily have to be in some fancy testing framework (though it’s often nice if it is!). It can just be a script or other small program you can run which will tell you if something goes wrong.
- Pick a random line in the codebase. It doesn’t have to be uniformly at random – a good algorithm might be to pick one half of a branch in a largish function in a module you’re interested in. How do I get that line to execute? Stick an assert false in there to make sure the answer is right. If there’s a test suite with low coverage, try finding an uncovered line and writing a test which executes it.
- Pick a branch. What will happen if I invert this branch?
- Pick a constant somewhere. If I wanted to make this configurable at runtime, what would I need to do?
- Specific variations on “How can I break this code?”. e.g. in C “Can I get this code to read from/write to an invalid address?” is often a useful question. In web applications “Can I cause a SQL/XSS/other injection attack?” is one. This forces you to figure out how data flows to the system through various endpoints, and then if you succeed in finding such a bug then you get to figure out how to fix it.
- How can I write a test to verify this belief I have about the code?
- What would I need to change to break this test?