David R. MacIver's Blog: Reading code without running it

Reading code without running it

25 September 2014

A while back I wrote about reading code by asking questions. I still think it’s pretty good advice.

What’s unfortunate is that I am currently finding it very hard to apply at my new job, so I think I have to revise that: It’s good advice if you can apply it, but you might not be able to.

There are two main issues that are making it difficult for me:

We’re writing in C++
We’re dealing with a lot more data than I historically have, and correspondingly bigger models

Why do these matter?

Well C++ doesn’t matter because it’s an intrinsically complicated language. I mean, it is, but the subset of it we’re using is generally pretty clean and usable. It’s not a problem. I picked up our version of C++ about as fast as I’ve picked up any other language in recent memory - it honestly didn’t take me longer to pick up than it did e.g. python (Having written a fair bit of C and some Clay and C# helped a lot here - there was very little in the way of new semantics for me to learn, just new syntax. So much new syntax).

No, C++ has an effect for an entirely different reason.

What reason?

Well, I never really believed this comic before.

C++ builds are slow, especially when you have a large code base. There are things you can do to improve this (and we have an entire team working on improving this. I’m very much looking forward to their work), but at the moment building a single file can take up to a minute (this parallelises of course when you have multiple files, but for single files it’s hard to improve without semantic changes to the language and it’ll never speed up much faster than the number of cores you have), and then once you have a build you have to link it and that can take even longer.

And then you have to start the application. The resulting binaries are large enough that even that is non-trivial, but then you have to do things like loading data models, launching services, reticulating splines, herding game theory, etc.

Between these two the cycle of making a change and seeing what happens can be anywhere for a couple minutes to half an hour.

When I blog about my internal thoughts and process I’m always a little concerned that I’m telling just so stories. I’m telling you less about how my brain actually works and more how I think my brain works. These are often surprisingly different.

In this case though I’m getting pretty strong evidence that I was bang on about how I read code. The major difference between what I’m doing now and what I’ve done previously is this cycle time - it’s a new language, but I’ve done that. It’s a large codebase, but I’m focused on a small enough section of it that I’ve dealt with larger. It’s a new build process, but I don’t think I’ve ever started a job that didn’t have an unfamiliar build process. The thing that is different is the cycle time.

And it completely destroys my previous code reading technique, which is probably a pretty good indicator that I was right, it’s just a shame that I had to discover that by testing to destruction.

The problem is that that the compile and run step is basically the inner loop of the process. This means that how long everything takes basically scales linearly with how long it takes to compile and run, so if you’ve gone from 10 seconds to 10 minutes then it will take you 60 times as long to read things (longer, really, because you’ve almost certainly lost your focus during those 10 minutes).

And of course this is probably happening because there’s a lot more to read.

Obviously this is unsustainable.

Part of the solution is to try to speed this up, but it’s hard. You can get a factor of two or three out with some careful management of things, but when what you really want is a factor of 10 that is merely a band-aid on a much more serious injury.

The real solution is to figure out how to read code without running it. I’m having some success relying more heavily on compiler integration with my editor (YouCompleteMe is lovely) - so I’m still adopting the “making changes and seeing what happens” approach, but instead of running them and seeing what they’re doing I’m relying more on the type system to catch a lot of the things I would previously have caught at runtime.

This is very much the poorer cousin of the previous approach. It’s faster than a full compile because you don’t have to link or generate object files, and also because it’s happening in parallel with my editing it’s much easier to tolerate the still not inconsiderable slowness.

More importantly it still doesn’t tell me if it works, but it at least significantly reduces the number of iterations I spend figuring out that it doesn’t (yes, I know you can do a lot more to prove correctness in a static type system. When the language you’re writing in is C++ that’s not necessarily an improvement, and anyway the codebase I’m working on doesn’t do that).

So this feels... inadequate. It’s a bit like what I describe as the fake meat problem. There’s a thing you miss and can’t have, so you substitute something strictly inferior that pretends to be it, when instead you could be exploring new possibilities that your new limitation forces you to consider.

(The fake meat analogy breaks down because I think this will continue to be a useful feature of any solution whileas I can’t stand fake meat and couldn’t even when I was a committed vegetarian, but the principle is still applicable - it’s a bad idea to just substitute and carry on pretending nothing has changed).

What are those possibilities? Wish I knew. Hopefully I’ll figure out some great new technique. For now the best anyone seems to be able to tell me is “you just sorta figure it out”, which as a solution is somewhat short of actionable. Any suggestions?

Comments

John on 2014-10-02 02:47:31:

One thing I’ve found myself doing in this sort of situation is making an effort to parallelize the computer’s compilation work and my own thinking processes as much as possible. Launch any given test as soon as possible, and prepare the next one while it’s working. With sufficient resources, one might even want to be running multiple independent compile/test runs, but that runs into limitations both in tooling and headspace.

While I am waiting for your test to complete, I make an effort to predict its outcome. Often this more precise direction for my thoughts makes me realise I’m not really interested in that answer anyway, and I cancel the build before it’s done. Or perhaps I have narrowed it down to just a couple of possible outcomes, and decided on my next course of action (possibly even prepared it) for each possibility.

Of course, the most valuable case is when the outcome is not any of the things I have predicted. Low probability events provide the most information.

Now that I actually write this down, I realise that it suggests another goal: try to make your experiments have as many plausible outcomes as possible. If you’re just asking a single yes/no question, you can expect to learn at most one bit from the experiment (usually less, because the answers are not equally likely). But if there are many possibilities, the information you garner from a single test cycle should be increased. Not sure how practical that suggestion is, though...