The things you forget to test

As mentioned recently, Hypothesis has 100% branch coverage.

There’s a line which I often attribute to being from one of the people who develops SQLite but now can’t find a reference for:

100% coverage tells you nothing, but less than 100% coverage tells you something.

(The implication being that the something it tells you is “Your code isn’t well tested enough”. Thanks Joke Explainer).

I’m somewhat on the fence as to whether 100% coverage is necessary. I think there are a lot of cases where getting to it is going to be more pain than it’s really worth. A self-contained stand-alone library like Hypothesis is very much not in that category – 100% coverage is both achievable and helpful there – but testing e.g. recommendation algorithms that do complex data processing on external services (err, hypothetically), is something where there are going to be edge cases which are a real nuisance to test properly and you’d basically have to mock out everything interesting in order to get to the code. For cases like that I suspect 100% coverage is still achievable but probably not really worth achieving.

But.

There are two areas that I inevitably find are the last bits to remain untested when I think a code base is well enough tested that 100% branch coverage forces me to deal with. Even once I’ve got 100% branch coverage, when I write a bunch of new code that I think is properly tested, these are usually the bits that I get an angry email from Travis telling me the build has failed because I’ve forgotten to test them. Further, I think you should be testing them even if 100% branch coverage is not a goal.

These are:

  1. Negative testing: Testing that if you do the wrong thing, it throws an exception or fails in some other way.
  2. Testing your object pretty printing.

Somehow these live in blind spots where I have to consciously remind myself that I should be testing them and will otherwise forget entirely.

Are they important to test though?

Well, let me ask you two more questions before I answer that one:

  1. How do you feel about silent data corruption?
  2. Do you like being able to debug errors in your program when it goes wrong?

I’m going to assume the answers are “not positively” and “Yes”. If instead you think that silent data corruption errors where you have no useful debugging information sounds like a super fun game then you can stop reading now.

The result of not properly validating your arguments and getting passed the wrong value will often be silent data corruption. Some of the most annoying things I’ve had to debug have been when something was passed a value that it couldn’t actually handle but just blithely assumed that it could and did the wrong thing rather than raising an error. This is a lot easier in dynamic languages of course (examples include when someone has passed a single element list of strings instead of a string, or when someone has passed a dict when there was supposed to be an ORM model), but it’s not hard to imagine cases that can happen in a statically typed language either (e.g. in Java, nullness checks, in real statically typed languages expecting a non-empty list, etc)

Now, in general, when coverage is complaining about your data validation not being covered that’s a sign that you actually are checking your arguments, but it can still be worth testing that – it makes sure that code doesn’t go away, it makes sure that the errors you get when that happens make sense (exceptions thrown during error handling are the most embarrassing exceptions), but even when coverage isn’t complaining about it it’s worth remembering that you should probably be testing the negative path as well as the positive one.

The pretty printing one I’m more ambivalent about. Whenever I write tests for my object representations I always feel embarrassingly pedantic.

And then my tests for my object representations find bugs in them and I get to bask in the smug glow of having done the right thing.

(I have a test that uses hypothesis to generate random descriptors, get the strategy for them and test that the repr for that strategy does not match a certain regexp. You would not believe how smug I was when that test found a bug that I would not have otherwise caught in testing).

Fun fact: Just after I wrote the above I thought “Oh, there’s an aspect of my reprs I haven’t tested. I should test that” and added another check to the recursive self test for the reprs (basically that it evaled to the something equal to the object). I found bugs.

So smug.

Uh, but anyway, am I still being pedantic? Does getting your repr right actually matter that much? Who cares if there are bugs in unimportant code?

Well, it matters in most of the programs I write. I write essentially three types of programs:

  1. Libraries and similar tooling
  2. Long running server programs
  3. Hacky scripts that are mostly for my personal consumption

The third they’re not a big deal in which is great because I’m not really writing tests for those anyway.

The first two they’re actually pretty important for.

For libraries, your object representation is part of your user interface. It’s hopefully uncontroversial (though probably isn’t) that your user interface should be tested. Moreover it’s easy to test your user interface here, so you really should.

For long running server processes, this is where my “Do you like being able to debug your programs?” question comes in. It’s really annoying when your error message says something like “Error calling method foo on #<Project 1234>”. It’s important to have good string representations of your objects because those are what’s going to show up in your logs.

So it’s important to have good string representations. But your string representation is code, and because it’s code it’s probably going to have bugs in it. You can find those bugs in one of two ways: You can find them when you test your code, or you can find them when they you’re looking through your server logs and you find that the thing you’re getting an error on is showing up as “Project(name=%s)” and that’s why you’re here shaving this yak.

So I think both of these things are things worth testing, and what’s nice is that they’re things that are easy to test. They mostly don’t suffer the problems that make some things hard to test – they don’t generally involve complicated set up or have a lot of external dependencies. You just create some objects or call some functions and inspect the output.

More generally, I still stand by the idea that 100% branch coverage isn’t always going to be worth it, but I think what is worth it is working on at least one non-trivial project with 100% branch coverage. It forces you to take a hard look at testing and think about what you can and can’t test, and what you can test but rarely do. This is a really valuable experience that I think will improve your development skills a lot.

This entry was posted in Hypothesis, Uncategorized on by .