Adaptive delta debugging

Note: This blog post is mostly intended to be expository about a nice technique, so I’m omitting a lot of the evidence for my claims. Sorry. I hope to write those up better at some later date.

As I explained previously, I’m quite interested in the problem of faster test case reduction.

In my previous post I explained how to improve on the basic greedy algorithm for test case reduction by better choices of where to try to delete. Now I want to show how to improve on delta debugging itself by better choices of deletion size.

Delta debugging can be thought of as the following code:

def ddmin(test_case, predicate):
    block_size = len(test_case) // 2
    while block_size > 0:
        prev = test_case
        i = 0
        while i < len(test_case):
            attempt = list(test_case)
            del attempt[i:i+block_size]
            if predicate(attempt):
                test_case = attempt
                i += block_size
        if test_case == prev:
            block_size //= 2
    return test_case

(this is not how delta debugging is described in the original paper, which is in terms of partition size, but it is compatible with their description and is more straightforward to adapt to the linear time version of greedy search)

This is the same as the greedy search I outlined previously, but instead of always deleting ranges of size one we try to delete long ranges. When this doesn’t work, we decrease the size of the range and try again.

For many cases this produces a dramatic speedup. e.g. it can reduce a long list to a single element list in log(n) steps.

Unfortunately for many other cases this ends up being substantially slower than the naive greedy algorithm. In simulation I’ve found that if the final test case is about 10-15% of the size of the original test case then delta debugging starts to be more expensive than just running the greedy search. For cases where the test case is already minimal, delta debugging pays an additional cost between 50% and 100% on top of the greedy algorithm.

The problem is basically that delta debugging is not making good choices of block size for these cases, and doesn’t have a better way to choose a block size than just try it and see what happens. In order to improve on this behaviour we need to get a block size that is more appropriate for the particular target we’re shrinking.

There turns out to be an easy way to do this. Instead of trying to choose the block size globally, we choose it locally, finding a block size for the current index that produces a large delete. The key tool for doing this will be the binary probe and search algorithm I outlined before.

At each index in our greedy search instead of just trying to delete one element, we search for a long block that we can delete starting from there and delete that:

def adaptive_ddmin(test_case, criterion):
    prev = None
    while prev != test_case:
        prev = test_case
        i = 0
        while i < len(test_case):
            def is_block_deletable(k):
                return criterion(test_case[:i] + test_case[i+k:])
            block_size = find_change_point(is_block_deletable, 0, len(test_case) - i)
            if block_size > 0:
                test_case = test_case[:i] + test_case[i + block_size:]
            i += 1
    return test_case

This has all the advantages of ddmin, but is able to automatically figure out the appropriate block size. In almost all circumstances it will perform far fewer shrinks than ddmin. The counterexamples are basically when there is a great deal that can be deleted, where in some cases it will perform an additional log(n) deletes to discover that fact.

It also performs much more closely to greedy search in the cases where greedy search beats delta debugging. It doesn’t always achieve that close to greedy search – each block deletion deletes block_size elements and checks that you can’t delete an additional element. This takes O(log(block_size)) calls and would take block_size + 1 calls in the greedy algorithm, so as block_size gets large this algorithm starts to win big. For small block sizes though it’s not a win, and you can see it making up to 50% more calls (as bad as classic delta debugging! Although in very different scenarios) in some cass.

This can be improved by special casing the algorithm to do a linear probe for very small values. We replace it with:

def find_change_point(f, lo, hi):
    base = f(lo)
    linear_cap = min(hi + 1, lo + 6)
    for i in range(lo + 1, linear_cap):
        if f(i) != base:
            return i - 1
    return binary_search(f, *exponential_probe(f, linear_cap - 1, hi))

This special case removes some of the worst cases, and leaves us with the the following:

block_size ratio
1 1.000000
2 1.000000
3 1.000000
4 1.000000
5 1.000000
6 1.142857
7 1.000000
8 1.111111
9 1.000000
10 0.909091
11 0.833333

Above this the ratio consistently remains better than 1. So the worst case scenario is that this algorithm ends up making < 15% more calls than the greedy algorithm.

It also consistently beats ddmin in simulation. I tested them both on a model of reducing a sequence to a unique subset (the predicate was that the example was interesting if and only if it contained a particular set of indices), with subsets chosen at various fixed expected fractions of the sequence. It was essentially always better, over a range from 1% to 99% of the set. It fairly consistently makes about 75% as many calls to get the same result, but I don’t have a theoretical argument that that should be the case.

This may not seem like a big deal, but this sort of technique is often applied in cases where individual calls can take multiple seconds (e.g. because you need to compile and run some code), so anything that gives you a speedup like this is greatly appreciated.

It also appears to be a major improvement in practice as well as in simulation. I’m not currently using this technique in hypothesis (though I do plan to), but I have a branch of structureshrink using this and it’s proving to be a major win.

This entry was posted in Code on by .