David R. MacIver's Blog: Speeding up Hypothesis simplification by warming up

Speeding up Hypothesis simplification by warming up

9 May 2015

This post might be a bit too much of Hypothesis inside baseball, but it’s something I did this morning that I found pretty interesting so I thought I’d share it. Hypothesis has a pretty sophisticated system for example simplification. A simplification pass looks roughly like

def simplify_such_that(search_strategy, random, value, condition):
    changed = True
    while changed:
        changed = False
        for simplify in search_strategy.simplifiers(random, t):
            while True:
                for s in simplify(random, value):
                    if condition(s):
                        changed = True
                        value = s
                        break
                else:
                    break
    return value

Instead of having a single simplify function we split our simplification up into multiple passes. Each pass is repeatedly applied until it stops working. Then we move onto the next one. If any pass succeeded at simplifying the value we start again from the beginning, else we’re done and return the current best value. This works well for a number of reasons, but the principle goal of it is to avoid spending time on steps that we’re pretty sure are going to be useless: If one pass deletes elements of a list and another simplifies them, we don’t want to try deleting again every time we successfully shrink a value because usually we’ve already found the smallest list possible (the reason we start again from the beginning if any pass worked is because sometimes later passes can unblock earlier ones, but generally speaking this doesn’t happen much). This generally works pretty well, but there’s an edge case where it has pathologically bad performance, which is when we have a pass which is useless but has a lot of possible values. This happens e.g. when you have large lists of complicated values. One of my example quality tests involves finding lists of strings with high unicode codepoints and long strictly increasing subsequences. This normally works pretty well, but I was finding it hit this pathological case sometimes and the test would fail because it used up all its time trying simplification passes that wouldn’t work because they were blocked by the previous step. This morning I figured out a trick which seems to improve this behaviour a lot. The idea is to spend a few passes (5 is my highly scientifically derived number here) where we cut each simplifier off early: We give it a few attempts to improve matters and if it doesn’t we bail immediately rather than running to the end. The new code looks roughly like this:

from itertools import islice

max_warmups = 5

def simplify_such_that(search_strategy, random, value, condition):
    changed = true
    warmup = 0
    while changed or warmup < max_warmups:
        warmup += 1
        changed = false
        for simplify in search_strategy.simplifiers(random, t):
            while true:
                simpler = simplify(random, value)
                if warmup < max_warmups:
                    simpler = islice(simpler, warmup)
                for s in simpler:
                    if condition(s):
                        changed = true
                        value = s
                        break
                else:
                    break
    return value

This seems to avoid the pathological case because rather than getting stuck on a useless simplifier we simply skip over it fairly quickly and give other simplifiers that are more likely to work a chance to shine. Then once the warmup phase is over we get to do the full simplification algorithm as before, but because we’ve already chewed it down to something much less complicated than we started with there isn’t as much of a problem - we tend not to have individually long simplifier passes because most of the really complex structure has already been thrown away. Empirically, for the test cases this was designed to improve this is more than an order of magnitude speed improvement (even when they’re not hitting the pathological case where they fail altogether), going from giving up after hitting the timeout at 30 seconds to completing the full pass in 3, and for everything else it merely seems about the same or a little better. So, yeah, I’m pretty pleased with this. This is definitely in the category of “Hypothesis’s example simplification is an overly sophisticated solution to problems that are mostly self-inflicted”, but who doesn’t like nice examples?