(cue evil laughter)
At this point I’m mostly just using intmap as a breeding ground for new testing strategies. And it’s working.
Today I’ve just created a testing strategy that I suspect to be entirely novel, if possibly because it was too ludicrous for anyone to try. Specifically, I wrote a programming language to test my code, and I’m generating random programs in that language.
OK, that’s a total exaggeration. What I really wrote was a stack machine with a simple text format to serialize programs against it.
Here’s an example program:
empty singleton 7703584101918615599 '9ab29f4316' check_equal assert_not_check
I introduced a bug where empty intmaps would always compare equal and this is the program that demonstrates it (produced entirely automatically).
The stack machine is simply a stack of intmaps plus a check bool. It’s defined in C, along with a whole bunch of operations on it. This definition is then exported to python via a shared object, and the python code drives it with random programs.
Why a stack machine you ask?
Well, for one very simple reason: Minimizing the C programs output by testgen.py is a nightmare. I’m going to be looking into better tools (e.g. C-reduce) to minimize those test cases, but I wanted to try something with with a higher level representation that was just easier to minimize in the first place.
And it works. Minimizing a testmachine program is really easy. So easy I was able to write a program to do it in less time than it normally takes me to hand minimize a single testgen test case.
The reduction process is as follows:
- Delete some instructions
- Simulate the program that remains.
- Any instructions that are no longer valid for the state the stack is in when they’re reached are dropped
- Any checks for a particular state are corrected for the new program
Currently in step 1 we try deleting single instructions, and if that doesn’t work adjacent pairs of instructions. If we succeed at reducing a program to another failing program we then recursively try to minimize that one. It’s a very simple greedy process that typically reduces the length of the failing programs by a factor of 10 or more (most failing programs fail in the first 100 steps. The minimizer typically produces programs of fewer than ten steps).
Does it work? Well it’s found two non-trivial bugs in intmap already, one a memory leak the other a correctness bug, and it seems to catch every bug I’ve tried deliberately introducing, and produces extremely short, clear, examples to demonstrate them.
It’s also fast compared to testgen, which is a big benefit because it means it can run more tests. It turns out that not having a C compiler and valgrind in the inner loop means your tests run much faster. Who knew?
All told, I think this idea is a serious win. I may be trying to generalise it and introduce something similar into hypothesis in the not too distant future.