Advance note: Read the Conjecture design document first or this post won’t make sense.
I’ve been doing some prototyping of Conjecture in a private fork of Hypothesis, mostly to take advantage of Hypothesis’s fairly brutal test suite and comprehensive range of examples. I figure if I can make the Conjecture approach pass the Hypothesis test suite then that’s a pretty good indication that the idea works.
Spoiler alert: I can’t currently make Conjecture pass the Hypothesis test suite.
Fortunately, the way I can’t make Conjecture pass the Hypothesis test suite isn’t the end of the world. The problem is basically one of performance. For small examples in a lot of cases Conjecture’s approach performs as well as or better than Hypothesis. For large examples it’s just too slow: It gets the right answer eventually, but it takes an order of magnitude more time than Hypothesis does.
The reason for this is pretty simple: The underlying data buffers that conjecture ends up generating for an example of any reasonable size are quite large (10s of kilobytes), and this is going to remain the case even after simplification. So any structure unaware binary simplification process is going to be a bit stuck about what to do. It’ll find the right answer eventually, but there are a lot of things to try.
I think the answer is going to have to be to make it structure aware. I haven’t yet hit on exactly the right approach for this yet, but the minimal abstraction that I think will work is to give it an idea of nesting: Examples can call conjecture_example_start() when they begin and conjecture_example_end() when they end, and this gives us nested ranges of where examples live in our data stream, which should be useful information for providing us with hints about where to start. e.g. one of the best simplifications is to try deleting whole ranges of data, and another good one is to try zeroing whole ranges. Unfortunately there are a lot of ranges in a buffer (O(n^2) in fact) and it’s very hard to pick the good ones. Focusing on ranges that snap to example boundaries would help a lot here.
It also potentially gives us a way to deal with the accidental dependency problem, as we can look for pieces that have moved from one example to another in the course of simplification as likely candidates for deletion. This might be too hard to keep track of the be worth it though as this hasn’t proven to be a major problem in practice.
In the course of thinking about these ideas I also came up with another idea that I initially thought was ridiculous but actually now I’m not so sure and think might be a good idea: Prioritizing examples that print less data as a mitigation for cases where simplicity of the underlying buffer isn’t always reflected by simplicity of example.
The way this would work is that we would still maintain our example search exactly as normal, but we would also maintain a “favoured” example, which is the example with the smallest number of bytes printed, with ties being broken by simplicity of underlying buffer (i.e. if we shrink to an example which prints the same number of bytes we always replace).
The reason I think this is a good idea is that most of what we’re focusing on in examples is readability, and size of example is a pretty good proxy for that – large examples are hard to read, small examples are easy to read. It’s not a perfect proxy for it, which is why we still want the underlying simplification process, but I think it’s a good tool for mitigating that process’s potential flaws.
Another thing that would be useful to investigate is whether a smarter search algorithm is useful here. e.g. simulated annealing. The local search approach was a great idea when you had a fairly abstract data representation and needed a simple API that was easy to extend to new things, but when what you’ve got is a concrete binary representation it’s probably not the best search algorithm.
Basically, all of this is harder than I thought. It’s not intractably hard, and for Conjecture itself which is C, as long as you’re running it on pretty fast tests you can basically brute force through the problem (although I need to update its simplification). I still think the idea is a great one, but great ideas also need hard work and that’s still yet to come.