David R. MacIver's Blog: Honey I shrunk the clones: List simplification in Hypothesis

Honey I shrunk the clones: List simplification in Hypothesis

1 April 2015

Simplification in Hypothesis is something of a vanity feature for me. I spend significantly more development effort on it than it strictly warrants, because I like nice examples. In reality it doesn’t really matter if the simplest value of a list of integers with at least three duplicates is [0, 0, 0] or [-37, -37, -37] but it matters to me because the latter makes me look bad. (To me. Nobody else cares I suspect)

So I was really pleased that I got simultaneous simplification in as a feature for Hypothesis 1.0. If a list contains a bunch of duplicate values (and because templating this is easy to check for - all templates are hashable and comparable for equality) before trying to simplify them individually, Hypothesis tries to simplify them in a batch all at once.

As well as solving the above ugly example, this turns out to be really good for performance when it fires. Even if your test case has nothing to do with duplication, a lot of the time there will be elements in the list whose value fundamentally doesn’t matter. e.g. imagine that all that is needed to trigger a failure is a list of more than 100 elements. The individual elements don’t matter at all. If we happen to have produced an example where we have a list of 100 elements of the same value, we can simplify this 100x as fast as if we had to simplify every individual element.

I’m doing a bunch of work on simplification at the moment, and as a result I have lots of tests for example quality of the form “In this circumstance, Hypothesis should produce an example that looks like this”. Some of them had a tendency to get into really pathological performance problems because they’re in exactly this scenario: They have to do a lot of individual simplifications of values in order to find an optimal solution, and this takes a long time. For example, I have a test that says that the list must contain at least 70 elements of size at least ten. This example was deliberately constructed to make some of the existing most powerful simplification passes in Hypothesis cry, but it ended up wreaking havoc on basically all the passes. The goal is that you should get a list of 70 copies of 10 out at the end, and the simplification run should not take more than 5 seconds.

This obviously is going to work well with simultaneous simplification: If you have a set of duplicate indices greater than 10, simultaneous simplification will move them all to 10 at once.

Unfortunately the chances of getting an interesting amount of duplication for integers larger than 10 is pretty low, so this rarely fires and we usually have to fall back to individual simplification which ends up taking ages (I don’t actually know how long - I’ve got a hard 5 second timeout on the test that it usually hits, but eyeballing it it looked like it got about halfway in that time).

So the question is this: Can we design another simplification pass that is designed to deliberately put the list into a state where simultaneous simplification can fire?

On the face of it the answer is obviously yes: You can just change a bunch of elements in the list into duplicates. Then if any of those are also falsifying examples we end up in a state where we can apply simultaneous simplification and race to the finish line.

Naturally there’s a wrinkle. The problem is that simplification in Hypothesis should be designed to make progress towards a goal. Loops aren’t a problem, but things which cause unbounded paths in the simplify graph will cause it to spend a vast amount of time not doing anything very useful until it gets bored of this simplification, declares enough is enough, and gives you whatever it’s got at the time (a lesson I should probably learn from my own creation).

Or, to put it more directly: How can we tell if a list of cloned elements is actually simpler. If we have [1, 2, 3, 4, 5, 6, 7, 8, 9999999999999999] we’d really much rather clone 1 all over the place than 9999999999999999.

The solution is to extend SearchStrategy with yet another bloody method (it has a default, fortunately), which allows it to test whether one of two templates should be consider strictly simpler than the other. This is a strict partial order, so x is not strictly simpler than x and it needn’t be the case that for any x and y one of the two is simpler. In general this is intended as a heuristic, so fast is better than high accuracy.

For the particular case of integers the rule is that every positive number is simpler than every negative number, and otherwise a number is simpler if it’s closer to zero.

We can now use this to implement a cloning strategy which always makes progress towards a simpler list (or produces nothing):

Pick a random element of the list. Call this the original.
Find the indices of every element in the list for which the original is strictly simpler.
If that set is empty, nothing to do here.
Otherwise pick a random subset of those indices (the current choice is that we pick an integer uniformly at random between 1 and the size of the list and then pick that many elements. This is not a terribly scientifically chosen approach but seems to work well).
Replace the element in each index we selected with a clone of the original

Empirically this seems to do a very good job of solving the particular example it was designed to solve and does not harm the remaining cases too much (I also have an example which deliberately throws away duplicates. Believe it or not this sped that example up).

It’s unclear how useful this is in real practical tests, but it at the very least shouldn’t hurt and appears it would often help. The list generation code is fairly fundamental in that most other collection generation code (and later on, stateful testing) is built on it, so it’s worth putting in this sort of effort.