David R. MacIver's Blog: Simplify starts from the wrong end

Simplify starts from the wrong end

24 March 2015

This is a thing I just noticed when working out some performance bugs in Hypothesis in prep for the 1.0 release: In a lot of cases, I’ve been doing simplification (shrinking in Quickcheck terms) entirely backwards.

It turns out that if I’d paid attention this information was in Quickcheck all along - I haven’t found it in the paper (ok I haven’t looked in the paper) but this is what the API does.

Basically, if you write a function simplify which takes a value and gives you something to iterate over the simpler versions of that value, you will be tempted to start from things that are most similar to that value and work your way down. This is totally wrong. The correct order is to start from the simplest version and work your way up.

The reason for this is that it will result in far fewer calls to simplify in most cases: Most of the time your values will be a lot more complicated than you need them to be, and you will end up with a lot of recursive calls to simplify if you start from the top. If you start from the bottom you will very rapidly converge on the simplest thing that can possibly work.

I can’t guarantee I’ll catch all the instances of my doing this before the 1.0 release, but I should catch all the instances where this is causing significant performance problems. In particular the performance of simplifying strings is now much faster.

Comments

27 bugs in 24 hours | David R. MacIver on 2015-03-27 12:09:54:

[…] Try all strings that consist of some leading segment of the lines in the file, starting from the smallest […]