One of the decisions I’m quite happy I made is that the SearchStrategy API in Hypothesis is not public for 1.0. This gives me some freedom to break it when I figure out ways to make it better.
Which is good, because I’ve already figured out a way to break it to make it better.
In my fuzzer for yapf, I realised an interesting thing: The minimization happens in two phases, first line based, then character based. Once character based simplification has started we stop bothering with line based simplification altogether: Although in principle character based simplification could get us back into a state where line based simplification would be useful again, in practice this happens so rarely that trying line based simplification is basically never worth going back to.
Meanwhile, back in Hypothesis land, I was staring at the following quality test this morning:
def dedupe(xs): result =  for x in xs: if x not in result: result.append(x) return result @given(strategy([bytes]).map(dedupe)) def is_sorted(xs): assume(len(xs) >= 10) assert sorted(xs) == xs
This is an example that is designed to fail and mostly exists to test the quality of the minimization: What’s the simplest list of byte strings it can find such that:
- The list only has distinct elements
- The list has at least 10 elements
- The list is not sorted
It’s not hard to find such an example – generally speaking Hypothesis will find an example within the first couple of throws of the dice – but producing a good example is harder, which is what this test looks for (In the actual test we make a bunch of assertions about the quality of the result we got at the end).
The problem is that sometimes this can get into a bad state where it takes an extraordinarily long time to minimize.
The problem is basically this: Every time it successfully minimizes an example, the simplification process starts again from scratch on the new example. This means that it tries all the usual simplifications for deleting elements from the list. These are a complete waste of time because they always shrink the list, violating the size assumption. But every time we shrink an element (and there are 10 elements to shrink) we do it all again.
I was originally thinking of trying to fix this in terms of making simplify skip simplifications which fail to produce examples too often, but that’s not really very satisfactory and requires a lot of delicate state tracking. It might be worth it in some cases, but I think it probably wouldn’t be.
Then I realised that this problem fits entirely into the approach I mentioned using for yapf, and we can use it to fix this problem.
Right now Hypothesis essentially follows the classic quickcheck API where there is a function shrink which takes values and provides a generator over simpler versions of the value. Simplified, the hypothesis example finding algorithm looks like:
def find_minimal(x, f): for s in shrink(x): if f(s): return find_minimal(s, f) return x
(that’s not actual Hypothesis code. The real implementation is more complex in several ways and doesn’t use recursion because Python, but it’s the idea of it).
The idea is to instead change this so that there are multiple shrinkers, each behaving like our shrink function above, and then we apply each shrinker in turn, never returning to an earlier one. This would be something akin to:
def minimize_with_shrinker(x, f, shrinker): for s in shrinker(x): if f(s): return minimize_with_shrinker(s, f, shrinker) return x def find_minimal(x, f): for shrinker in shrinkers: x = minimize_with_shrinker(x, f, shrinker) return x
This lets us fix the list example by splitting up shrinking into two phases: First it tries to shrink by removing elements from the list. Then it tries to shrink by shrinking individual elements. Once it starts shrinking elements, the size of the list is fixed – it’s already found the smallest list it’s going to.
This does impact the quality of the examples a little bit. If necessary that could be fixed by iterating to a fixed point: We run the above algorithm then we run it again until it no longer changes the result. A simpler version would just be to run it twice. I don’t plan to do this unless I find the results from a single pass of this are much worse than I expect them to be. The goal here isn’t always to find the single simplest example, only to find a simple enough example in a short amount of time. I think this will be entirely acceptable quality and it should be very aceptable in time in comparison to the existing implementation.