David R. MacIver's Blog: Tests are a license to delete

Tests are a license to delete

18 April 2015

I’ve spent the majority of my career working on systems that can loosely be described as “Take any instance of this poorly specified and extremely messy type of data found in the wild and transform it into something structured enough for us to use”.

If you’ve never worked on such a system, yes they’re about as painful as you might imagine. Probably a bit more. If you have worked on such a system you’re probably wincing in sympathy right about now.

One of the defining characteristics of such systems is that they’re full of kludges. You end up with lots of code with comments like “If the audio track of this video is broken in this particular way, strip it out, pass it to this external program to fix it, and then replace it in the video with the fixed version” or “our NLP code doesn’t correctly handle wikipedia titles of this particular form, so first apply this regex which will normalize it down to something we can cope with” (Both of these are “inspired by” real examples rather than being direct instances of this sort of thing).

This isn’t surprising. Data found in the wild is messy, and your code tends to become correspondingly messy to deal with all its edge cases. However kludges tend to accumulate over time, making the code base harder and harder to work with, even if you’re familiar with it.

It has however historically made me very unhappy. I used to think this was because I hate messy code.

Fast-forward to Hypothesis however. The internals are full of kludges. They’re generally hidden behind relatively clean APIs and abstraction layers, but there’s a whole bunch of weird heuristics with arbitrary magic numbers in them and horrendous workarounds for obscure bugs in other peoples’ software (Edit: This one is now deleted! Thanks to Marius Gedminas for telling me about a better way of doing it).

I’m totally fine with this.

Some of this is doubtless because I wrote all these kludges, but it’s not like I didn’t write a lot of the kludges in the previous system! I have many failings and many virtues as a developer, but an inability to write terrible code is emphatically not either of them.

The real reason why I’m totally fine with these kludges is that I know how to delete them: Every single one of these kludges was introduced to make a test pass. Obviously the weird workarounds for bugs all have tests (what do you take me for?), but all the kludges for simplification or generation have tests too. There are tests for quality of minimized examples and tests for the probability of various events occurring. Tuning these are the two major sources of kludges.

And I’m pretty sure that this is what makes the difference: The problem with the previous kludges is that they could never go away. A lot of these systems were fairly severely under-tested - sometimes for good reasons (we didn’t have any files which were less that 5TB that could reproduce a problem), some for code quality reasons (our pipeline was impossible to detangle), sometimes just as a general reflection of the culture of the company towards testing (are you saying we write bugs??).

This meant that the only arbiter for whether you could remove a lot of those kludges was “does it make things break/worse on the production system?”, and this meant that it was always vastly easier to leave the kludges in than it was to remove them.

With Hypothesis, and with other well tested systems, the answer to “Can I replace this terrible code with this better code?” is always “Sure, if the tests pass”, and that’s immensely liberating. A kludge is no longer a thing you’re stuck with, it’s code that you can make go away if it causes you problems and you come up with a better solution.

I’m sure there will always be kludges in Hypothesis, and I’m sure that many of the existing kludges will stay in it for years (I basically don’t see myself stopping supporting versions of Python with that importlib bug any time in the near future), but the knowledge that every individual kludge can be removed if I need to is very liberating, and it takes away a lot of the things about them that previously made me unhappy.

Comments

Marius Gedminas on 2015-04-18 09:50:21:

Whee the for loop with various time.sleep()’s to work around http://bugs.python.org/issue23412!

I’m curious: why didn’t you use importlib.invalidate_caches()? It worked for me.

david on 2015-04-18 09:51:50:

Because I had no idea this was a thing or that it would fix it! Thank you!

Looks like this *is* a kludge I get to delete after all.

david on 2015-04-18 10:17:46:

(For clarity: I ran into this issue about two hours after it was first reported, so my fix predates that answer to the bug)

Richard Bradley on 2015-06-15 15:07:32:

This is so true.

Maybe this is why writing “meta” systems (compilers, dev tools) is often much more fun than “real” systems -- you can test them properly.

david on 2015-06-16 18:44:58:

It has not been my experience that “tested properly” is an accurate label for, uh, just about any compiler really, but certainly the only one I’ve actively worked on.