Author Archives: david

Improving code coverage with testmachine

I feel a little bad continuing to write about testmachine. At this point I’m basically just going “I have a magic bug finding machine and you don’t! Nyah nyah”. I’m trying to come up with a way to library-ise it, and I have some ideas, but they’re still gelling.

So, for now, nyah  nyah. Here’s what my magic bug finding machine can do.

Yesterday I added another 50+ test cases to intmap’s deterministic test suite. Most of them were entirely automatically generated.

How’d I generate them? Well, I decided that I should aim for a much higher degree of code coverage than I had previously, so I started running dettest (the deterministic test suite) under gcov to find what my code coverage was like. Then when I found lines that weren’t executed by dettest I started adding assertions to them (actually I added an UNREACHABLE macro which is only an assertion if you compile with a particular preprocessor directive). Then I ran testmachine to find the assertions and added its output to the test case.

This was especially helped by the fact that last night I wrote a compiler of sorts which took the stack programs that testmachine runs and compiles them to SSA C code which doesn’t reference the test machine library. It’s the world’s dodgiest compiler, and I found and fixed a bunch of issues in it today, but it’s really useful. For example here’s a test case it generated. The numbers and strings it uses are a bit weird, but it otherwise looks like a very nice hand written test case.

These tests aren’t all amazingly useful. A lot of them just create an intmap through a bunch of operations and then free it without asserting any properties about it. So even if I get to 100% coverage this way it’s very likely that there are bugs that it won’t catch. But even this is still useful – intmap has some internal integrity checks it can run, and valgrind can do invalid access and leak checks on those tests.

Also, knowing that the code can be executed turns out to be useful. I’ve found at least one interesting class of bug in the course of this which meant that the query optimiser was basically never firing because I was forcing everything with an intmap_is_empty check before doing algebraic operations. This meant that a whole bunch of code could never be triggered, which was showing up as lines the tests were never hitting.

There are still a few (7 and counting) UNREACHED annotations in intmap.c that I haven’t been able to figure out how to trigger. A few of them I’m pretty sure are legitimately impossible to trigger and I’m OK with that for now (e.g. I’ve got find_single_in which is a completely internal function that gets used mostly for intersection. It has a test for empty but is currently never passed an empty value). One or two I think it’s just that testmachine generates them with too low probability. Some of them look suspicious and I’ve not yet got to the bottom of them.

It’s hard to tell what sort of code coverage I’ve got now because I’ve not been able to figure out how to get unified metrics across multiple different programs, and I’ve got several different ways of invoking the test suite (I have some compiler flags designed to make bugs easier to find, a slightly more vanilla build, and an optimised build with debug turned off). Also, I’ve been able to demonstrate in a few cases that gcov is lying about a line being unreached (it claims it’s never executed but if I add an UNREACHED annotation there the tests catch it). But on any given run I’ve got gcov reporting about 95% line coverage just on the deterministic test suite. I started at around 70%. In pure numbers, this isn’t a huge improvement, but I find that getting the first 80% of anything like coverage done is comparatively easy, and it tends to be the last steps that are like pulling teeth, so for about a day’s work that’s a pretty satisfying amount.

This entry was posted in Code on by .

Compressed ranges for intmap

A feature I’ve been wondering about which is an extension to the original datastructure that intmap is based on is whether one can reasonably do it in a way that has a more efficient representation for contiguous ranges – rather than having multiple very large objects for every single work in the range, maybe we could just store the start and end of the range? Essentially, a hybrid between this very trivial integer set idea and intmap’s patricia trie, which added an additional node type for a contiguous range.

A complicating factor for this is that intmap has not just keys but also values, and you don’t want to store a contiguous range with distinct values as an array, because then you lose a lot of the cheap update properties that are nice about intmap in the first place – e.g. insertion becomes O(n) instead of O(log(n)).

But there turns out to be a relatively easy and nice compromise: Storing compressed ranges mapping to a single value. For the use case where intmap is basically an intset this works out pretty well, and for use cases where you have a lot of distinct values there is essentially no performance degradation (you get a little bit due to the extra checks, but not much).

The way ranges are incorporated into intmap is relatively simple: Most operations have some special treatment for ranges. For example, if you intersect two ranges with the same value then the result is immediately a new range with that value. Similarly for unioning two ranges if they overlap. Subtracting ranges from things is usually much more efficient because it’s very easy to tell if all keys in a map lie in a range (the prefix of the patricia trie defines an upper and lower bound), and a whole bunch more operations.

And when there’s nothing sensible to do with a range – e.g. we’re removing a hole from the middle of a range, or unioning together two ranges that won’t coalesce, etc – we just turn it into a split node with two ranges as children. Often we’ll then want to recurse and will end up expanding more range nodes, but for many operations if we start with a range we’ll still only end up with a tree with O(log(n)) nodes for O(n) keys – e.g. if we insert or delete a key from a range.

The code for this has ended up a bit hairy in places (honestly, intmap could use a bit of a cleanup already I think), but it appears to work rather well, so I’m quite pleased with the result.

In writing this testmachine has been an absolute godsend. I found some many bugs in the implementation with it.

The process for writing things against testmachine is that I provide a parallel implementation of the function in python, only that implementation can be as basic as I like. In the parallel python representation, intmaps are just dicts mapping ints to strings, and the operations are  correspondingly simple, with little concern for the efficiency of the operations.

Then it finds bugs. Almost inevitably. This is a combination of two factors:

  1. intmap code is quite tricky to get right. I’m emulating a lot of things C isn’t designed to do in C, and C is tricky to get right in the first place, and frankly datastructure implementations are always a bit of a pain.
  2. testmachine is almost magically good at finding bugs.

I’m not kidding about point 2. I’ve more than doubled the size of the deterministic test suite as part of this chunk of work, and they’re all from things testmachine has found. As I opined on Twitter earlier: When you first encounter Quickcheck style testing it feels like you’ve been given a magical bug finding tool. This is like that experience all over again where the baseline has been moved to quickcheck.

This entry was posted in Code on by .

Introducing THE TEST MACHINE

(cue evil laughter)

At this point I’m mostly just using intmap as a breeding ground for new testing strategies. And it’s working.

Today I’ve just created a testing strategy that I suspect to be entirely novel, if possibly because it was too ludicrous for anyone to try. Specifically, I wrote a programming language to test my code, and I’m generating random programs in that language.

OK, that’s a total exaggeration. What I really wrote was a stack machine with a simple text format to serialize programs against it.

Here’s an example program:

empty
singleton	7703584101918615599	'9ab29f4316'
check_equal
assert_not_check

I introduced a bug where empty intmaps would always compare equal and this is the program that demonstrates it (produced entirely automatically).

The stack machine is simply a stack of intmaps plus a check bool. It’s defined in C, along with a whole bunch of operations on it. This definition is then exported to python via a shared object, and the python code drives it with random programs.

Why a stack machine you ask?

Well, for one very simple reason: Minimizing the C programs output by testgen.py is a nightmare. I’m going to be looking into better tools (e.g. C-reduce) to minimize those test cases, but I wanted to try something with with a higher level representation that was just easier to minimize in the first place.

And it works. Minimizing a testmachine program is really easy. So easy I was able to write a program to do it in less time than it normally takes me to hand minimize a single testgen test case.

The reduction process is as follows:

  1. Delete some instructions
  2. Simulate the program that remains.
    1. Any instructions that are no longer valid for the state the stack is in when they’re reached are dropped
    2. Any checks for a particular state are corrected for the new program

Currently in step 1 we try deleting single instructions, and if that doesn’t work adjacent pairs of instructions. If we succeed at reducing a program to another failing program we then recursively try to minimize that one. It’s a very simple greedy process that typically reduces the length of the failing programs by a factor of 10 or more (most failing programs fail in the first 100 steps. The minimizer typically produces programs of fewer than ten steps).

Does it work? Well it’s found two non-trivial bugs in intmap already, one a memory leak the other a correctness bug, and it seems to catch every bug I’ve tried deliberately introducing, and produces extremely short, clear, examples to demonstrate them.

It’s also fast compared to testgen, which is a big benefit because it means it can run  more tests. It turns out that not having a C compiler and valgrind in the inner loop means your tests run much faster. Who knew?

All told, I think this idea is a serious win. I may be trying to generalise it and introduce something similar into hypothesis in the not too distant future.

This entry was posted in Code on by .

A potentially fun game mechanic

Revisiting my old post about Scrabble variations reminded me of the existence of Richman games. This got me thinking more generally about the class of games where how often you move is something you have control over, which caused me to think of a potentially fun game mechanic.

I don’t currently have a game to attach it to. It essentially works on the following class of games:

  1. Each player has a pool of tokens unique to them, and the number they have in reserve matters
  2. A play consists of expending one or more tokens
  3. Having a low reserve puts you at a significant strategic disadvantage

As an example such game, consider the following territory capturing game:

  1. Play is on a hexagonal board with up to 6 players. Each player owns a single side of the board, assigned by lot at the beginning of the game. Each side has a coloured token associated with it.
  2. A play consists of placing a token of your colour adjacent to either your side of the board or an existing token of your colour.
  3. If someone places a token which is adjacent to an empty square that is also adjacent to one of your tokens, you may immediately respond by placing one of your tokens in that empty square. You may not respond in more than one square simultaneously, but other players can respond to responses. This can cascade indefinitely. If more than one player can respond to a given move, people may decide whether to respond in clockwise order from the player who moved.
  4. The game ends when no-one can move, either because they’re boxed in or have run out of pieces. At this point you count up the territory owned by each player. A tile is owned if it contains one of your pieces or is in an empty space surrounded entirely by your pieces.

(This might actually be a fun game in its own right. I tailored it to work well with this mechanic, but it could also work well without it)

Given such a game, the game mechanic I have in mind proceeds as follows. Instead of playing alternating turns, what happens is that you have a bag of tokens, mixing from all the players. A turn now proceeds as follows:

  1. Each player may put any number of pieces from their reserve into the bag. If the bag is empty all players must put at least one piece into it if they have any in reserve. If no players have any pieces in reserve and the bag is empty, the game ends.
  2. A piece is drawn from the bag
  3. This piece is added to its owner’s reserve. All other pieces remain in the bag
  4. The player whose piece was drawn may now choose to make a play or pass.

You control your likelihood of getting to play by adding pieces to the bag, but in doing so you deplete your reserve – in the early game when you have lots of reserve, this may not matter, but in the later game you are limited by the fact that a play consumes as many pieces as you draw, so you may want more pieces in your reserve (e.g. in the example game, not being able to participate in response cascades is a major disadvantage)

This entry was posted in Games on by .

Report on a scrabble variation

A while ago I suggested a bunch of variations on Scrabble. I’ve still yet to play any of them, but I’ve spent a bunch of time this holiday playing a different variation (I actually thought it was one of the ones I’d previously proposed, but looking at them again it isn’t).

The variation works as follows:

  1. As well as the usual set of face-down tiles, you maintain a set of 7 face up tiles
  2. When drawing new tiles, you may draw from either the face up or the face down tiles. You can mix between the two, and you can look at tiles you’ve drawn before deciding where you want your next one from.
  3. After you have drawn tiles, if there are any left draw from the face down tiles to replenish the face up ones back up to 7.
  4. If you draw a blank to place in the face up tiles, shuffle it back into the face down tiles and draw again. This doesn’t apply if you’ve run out of face down tiles
  5. If the same letter appears 3 times in the face up tiles, shuffle them all back into the face down tiles and draw again. Again, this doesn’t apply if you’ve run out of face down tiles. (We added this rule later when it became clear that the face up tiles would often fill up with letters nobody wanted).

The basic intent is that it reduces the element of chance in the game by allowing you to pick your letters more strategically, while still maintaining the basic play of Scrabble.

How does it play? Mostly pretty well, with one caveat which I’ll get to later.

I’ve played 4 games of this this holiday, 3 with my dad (which I won) and 1 with my brother (which I lost, and was also the first one). One of the ones with my dad was super scrabble.

High scoring letters tend not to stay face up for very long – they’re usually drawn within a turn, two at most. The result is that it doesn’t seem to be that much easier to get high scoring letters than it would otherwise have been – there’s still a strong element of chance in whether you get them or not.

Where it makes a real difference is in letting you control the number of vowels in your hand, and in general which low scoring letters you’re carrying. It’s not impossible to still end up with a surplus or surfeit of vowels, but it’s much harder and tends to happen more through bad planning than bad luck.

So in general I’d say all of this is a win – it significantly reduces the ways in which random chance can screw you over and makes for a much more tactical game.

Which, unfortunately, leads us to the caveat.

It turns out that by reducing the element of chance in the game what you end up with is a much harder game. This feels almost like a reductio ad absurbum of Scrabble, in that it makes it impossible to ignore that at its heart Scrabble is a bit of a fundamentally flawed game.

The issue is that good Scrabble play results in Scrabble boards which are really painful to play on. You build small words across small words, you link things up, and you get a board which is dense and hard to expand.

Like this one:

scrabbleboard

The above is the board Dad and I finished with last night. It’s a little more extreme than most of our games, but not by much.

We both had a few tiles left in our hands and gave up at that point (we realised afterwards that we could probably have continued a few words further, but he couldn’t have closed the 20 point gap so we decided to call it game then). This was pretty much my consistent experience with playing this rules variation: We actually couldn’t finish the board, as we’d already exploited most of the places where you could put finishing tiles.

Another thing that seemed to happen is that the game took a bit longer than normal – I think this is partly because we were unfamiliar with it so were spending more time thinking through the implications, but it was also because we spent more time in the state where we had good letters and a frustrating board so we were sitting there going “But I must be able to do something good with this”.

Super Scrabble helped offset this a bit. I’ve generally found that super Scrabble becomes less congested because you have more room to expand and it lets you move around blockages, and that definitely seemed to be the case here. Also, in Super Scrabble, unlike normal scrabble, bingos are commonplace so in many cases victory is simply decided by how many bingos you get, and being able to choose your low scoring letters helps a lot in reducing the amount of chance in that.

All told, despite these flaws, I think I enjoyed this variant more than classic Scrabble, but I feel like this might be a slight case of Stockholm syndrome: The variation very successfully pointed out the flaws endemic in normal Scrabble play, so if I enjoy normal Scrabble then I must enjoy this variation dammit. On the other hand, a lot of it could simply be that normal Scrabble victory often feels quite arbitrary and this variation made it feel much less so.

 

This entry was posted in Games on by .