I’m continuing to work on some of the issues I described in my last post about simplification for conjecture. I’ve got something that needs further work but is pretty good.
The idea is to let the data generation provide structure hints. As such it adds two additional functions to the conjecture API: One starts an example, another ends it. Examples can be and indeed routinely are nested.
We use this API to produce a series of intervals as [start, end) pairs. These then get to be the main focus of where we try to shrink.
So here’s how the shrinking works given that:
Every time we run a test, if the test fails we replace the current best buffer with the buffer that triggered the failure, trimming it to only the initial segment that was read (we only ever consider buffers that are simpler than the current best, so we don’t need to check that it is).
When we do this we also record all the intervals in the buffer that correspond to examples. We sort these from most complicated to simplest in the same order that we use for buffers – that is, an interval is simpler than another if it is shorter than it or is of the same length but lexicographically before.
The first step is to try to prune the buffer down as much as possible by throwing away parts of it that don’t matter. We greedily delete intervals, starting from most to least complicated. If a deletion succeeds, we repeat this until no intervals can be deleted.
Some details:
- A trick: If we start from the most complicated each time then we’re going to end up doing a lot of work that never succeeds. What we actually do is that we maintain an index variable and always take the interval at that index in the list. When we hit the end, we either stop the process or if we’ve made any changes we start again from the beginning. This works despite the fact that the list of intervals may be changing under us.
- In my test cases there end up being a lot of small examples and it’s not worth trying to delete them all in this crude trimming pass. The current magic number here is that we ignore any intervals with fewer than 8 bytes.
In my test cases (which are not very representative it must be said) this typically cuts the buffer size by a factor of ten. It’s also a lot faster than structure ignorant shrinking of this sort was – if the structure gave nothing but this then it would be a win.
Starting from that buffer, for each interval (starting from the least complicated to most) we then perform a simplification I’m calling improve and clone.
We first try to improve the interval as much as possible: This involves repeatedly applying bytewise simplification operations (basically:
- For each index, try replacing the byte with a smaller one and see if that helps. A linear probe is actually OK here though I use something slightly smarter to cut down on constant factors)
- Try a replacement chosen uniformly at random that sorts lexicographically before the current value.
- Interpreting the interval as an unsigned n-byte 64-bit integer, try subtracting one.
You should normally avoid operations that look like the third one (actually the second one isn’t great either), but note that we only try it once and then we cycle back to to bytewise lowering, which will tend to then feed on its results to produce a lot more of a shrink.
Once that’s done we try cloning the interval: We take the segment currently in that interval and try replacing other intervals with it. We only choose intervals which are at least the same length (so we don’t make the buffer longer) and are “not too much larger” (and, if the intervals are the same size, sort lexicographically larger than the current one). The current upper bound is no more than twice as large, or no more than 16 if that would be less than 16. We do this regardless of whether any changes were made in the early phase.
The way the cloning phase is mixed in is important: As well as being a generally good strategy, by doing the example cloning on the element we just shrank we can avoid duplicating a lot of our hard work when it comes to shrinking other things.
We use the same index trick to do this for every interval until there are no more improvements or we run out of time / hit other configuration limits.
This still needs more work, but for the examples I’ve tried it on (from Hypothesis’s example test suite) it appears to be nearly as good as Hypothesis’s simplification now (i.e. probably better than most other Quickchecks). This is only for some of the structurally simpler examples (they’re large but not very complicated) and I suspect as I flesh it out further I’ll find limitations to this approach, but they’re still examples that the initial attempts struggled massively on, so it’s pretty encouraging.