I feel a little bad continuing to write about testmachine. At this point I’m basically just going “I have a magic bug finding machine and you don’t! Nyah nyah”. I’m trying to come up with a way to library-ise it, and I have some ideas, but they’re still gelling.
So, for now, nyah nyah. Here’s what my magic bug finding machine can do.
Yesterday I added another 50+ test cases to intmap’s deterministic test suite. Most of them were entirely automatically generated.
How’d I generate them? Well, I decided that I should aim for a much higher degree of code coverage than I had previously, so I started running dettest (the deterministic test suite) under gcov to find what my code coverage was like. Then when I found lines that weren’t executed by dettest I started adding assertions to them (actually I added an UNREACHABLE macro which is only an assertion if you compile with a particular preprocessor directive). Then I ran testmachine to find the assertions and added its output to the test case.
This was especially helped by the fact that last night I wrote a compiler of sorts which took the stack programs that testmachine runs and compiles them to SSA C code which doesn’t reference the test machine library. It’s the world’s dodgiest compiler, and I found and fixed a bunch of issues in it today, but it’s really useful. For example here’s a test case it generated. The numbers and strings it uses are a bit weird, but it otherwise looks like a very nice hand written test case.
These tests aren’t all amazingly useful. A lot of them just create an intmap through a bunch of operations and then free it without asserting any properties about it. So even if I get to 100% coverage this way it’s very likely that there are bugs that it won’t catch. But even this is still useful – intmap has some internal integrity checks it can run, and valgrind can do invalid access and leak checks on those tests.
Also, knowing that the code can be executed turns out to be useful. I’ve found at least one interesting class of bug in the course of this which meant that the query optimiser was basically never firing because I was forcing everything with an intmap_is_empty check before doing algebraic operations. This meant that a whole bunch of code could never be triggered, which was showing up as lines the tests were never hitting.
There are still a few (7 and counting) UNREACHED annotations in intmap.c that I haven’t been able to figure out how to trigger. A few of them I’m pretty sure are legitimately impossible to trigger and I’m OK with that for now (e.g. I’ve got find_single_in which is a completely internal function that gets used mostly for intersection. It has a test for empty but is currently never passed an empty value). One or two I think it’s just that testmachine generates them with too low probability. Some of them look suspicious and I’ve not yet got to the bottom of them.
It’s hard to tell what sort of code coverage I’ve got now because I’ve not been able to figure out how to get unified metrics across multiple different programs, and I’ve got several different ways of invoking the test suite (I have some compiler flags designed to make bugs easier to find, a slightly more vanilla build, and an optimised build with debug turned off). Also, I’ve been able to demonstrate in a few cases that gcov is lying about a line being unreached (it claims it’s never executed but if I add an UNREACHED annotation there the tests catch it). But on any given run I’ve got gcov reporting about 95% line coverage just on the deterministic test suite. I started at around 70%. In pure numbers, this isn’t a huge improvement, but I find that getting the first 80% of anything like coverage done is comparatively easy, and it tends to be the last steps that are like pulling teeth, so for about a day’s work that’s a pretty satisfying amount.