Category Archives: Hypothesis

Another new hypothesis feature: flags

This has the virtue of being the first hypothesis feature that isn’t blatantly stolen, excuse me, “ported”, from Scalacheck.

Instead it’s blatantly stolen from a post by John Regehr about Csmith.

Update: John Regehr pointed out in the comments that I was misattributing this idea to him, and it was in fact due to Alex Groce.

What’s the idea?

The idea is basically to change the distribution of the search inputs. Rather than exploring the whole space more or less uniformly at each size, each probe into the space is more focused. This produces examples which are in some sense more “interesting”.

How does this work?

The idea is that in any given run of the producers we have a set of flags which may be turned on and off. Let me show an example:

@produces(int)
def produce_int(self,size):
    p = 1.0 / (size + 1)
    n =  int(log(random()) / log(1 - p))
    if self.is_flag_enabled("allow_negative_numbers", size):
        if random() <= 0.5:
            n = -n
    return n

So on some runs we will be allowed to produce negative numbers, on some we won’t.

This means that we can, for example, produce very long lists of only positive integers because negative integers were turned off for this run. Previously a list would be all-positive with probability \(2^{-n}\) (where \(n\) is the length of the list), so for example the following would have been unfalsifiable:

from hypothesis import assume, falsify
 
def long_lists_contain_negative_elements(xs):
    assume(len(xs) >= 50)
    assert any((a < 0 for a in xs))
    return True
 
falsify(long_lists_contain_negative_elements, [int])

But with the new flag based code it’s easy to falsify.

The example that actually motivated this was the stateful testing one in the previous post. If we add a “clear” operation to it and allow that as a test step then we become unable to find falsifying examples because the clear step stomps on things too regularly to ever generate long enough examples.

This is also illustrative of why this is useful in general: Bugs will typically be exhibited by the interaction of one or two features. When we’re looking at all the features we may not exercise the interaction of those two enough to actually test them properly, but when we selectively disable features we can study pairwise interactions much more thoroughly.

Currently the way this is implemented internally is quite gross. I wouldn’t look at it if I were you. I’m mostly focused on hypothesis externals right now, and will think about how to better redesign its insides at a later date.

Update: This feature has been temporarily removed due to me being really unhappy with the implementation and it getting in the way of some internals I’m refactoring at the moment. It will be back before the next released version of hypothesis.

This entry was posted in Hypothesis, Uncategorized on by .

Stateful testing with hypothesis

The idea of stateful testing is that it is “quickcheck for mutable data structures”.

The way it works is that rather than trying to produce arguments which falsify an example, we instead try and produce a sequence of operations which break a data structure.

Let me show you. We’re going to start with the following broken implementation of a set:

class BadSet:
    def __init__(self):
        self.data = []
 
    def add(self, arg):
        self.data.append(arg)
 
    def remove(self, arg):
        for i in xrange(0, len(self.data)):
            if self.data[i] == arg:
                del self.data[i]
                break
 
    def contains(self, arg):
        return arg in self.data

Because it uses an array internally to store its items and doesn’t check if an item is already contained when adding it, if you add an item twice and then remove it then the item will still be there.

(Obviously this is a really stupid example, but it should demonstrate how it works for more complicated examples)

Now lets write some tests that break this!

The operations we want to test are adding and removing elements. So we do the following:

from hypothesis.statefultesting import StatefulTest, step, requires
 
class BadSetTester(StatefulTest):
    def __init__(self):
        self.target = BadSet()
 
    @step
    @requires(int)
    def add(self,i):
        self.target.add(i)
        assert self.target.contains(i)
 
    @step
    @requires(int)
    def remove(self,i):
        self.target.remove(i)
        assert not self.target.contains(i)

We can now ask this to produce us a breaking example:

>>> print BadSetTester.breaking_example()
[('add', 0), ('add', 0), ('remove', 0)]

Note the nicely minimized example sequence.

At the moment this code is very much a work in progress – what I have works, but should probably be considered a sketch of how it might work rather than a finished product. As such, feedback would very definitely be appreciated.

This entry was posted in Hypothesis, programming on by .

Quickcheck style testing in python with hypothesis

Edit: Hypothesis has moved on a lot since this post. At this point if you want a QuickCheck for Python it’s really the only game in town. This post remains for historic purposes, but you should check out its new site at hypothesis.works.

So I’ve been tinkering a bit more with hypothesis. I would now cautiously label it “probably not too broken and I’m unlikely to completely change the API”. The version number is currently 0.0.4 though, so you should certainly regard it with some degree of suspicion. However I’m now reasonably confident that it’s good enough for simple usage in the sense that I expect bug reports from most people who use it in anger but probably not all people and probably not many bug reports from most. :-)

Since the initial version I’ve mostly been thinking about the API and fixing anything that was really obviously stupid. I’ve also cleaned up a bunch of internals. The implementation is still pretty far from perfect, but it’s no longer awful.

Most importantly, I’ve added a feature to it that makes it actually useful. Test integration! You can now use it for some fairly pretty quickcheck style testing:

@given(int,int)
def test_int_addition_is_commutative(x,y):
    assert x + y == y + x
 
@given(str,str)
def test_str_addition_is_commutative(x,y):
    assert x + y == y + x

The first test will pass, because int addition really is commutative, but of course string addition is not so what you’ll get is an error from the test suite:

    x = '0', y = '1'
 
    @given(str,str)
    def test_str_addition_is_commutative(x,y):
&gt;       assert x + y == y + x
E       assert '01' == '10'
E         - 01
E         + 10

This doesn’t currently support passing keyword arguments through to the tests – all arguments have to be positional. I think I know how to fix this, I just haven’t sorted out the details yet.

But yeah, quickcheck style testing with test case minimization (look at how nice the counter-example it produced was!). If that turns out to be exactly what you’ve always wanted, go wild.

This entry was posted in Hypothesis, programming on by .

Falsifying hypotheses in python

I haven’t written any Scala in ages. Mostly I’m OK with this – there are still a few language features I miss from time to time, but nothing too serious.

One thing I do often miss though is Scalacheck. It’s really the nicest instance of Quickcheck style random testing I’ve seen (this is admittedly not based on a great deal of experience such libraries other than Quickcheck itself and some exceedingly half-assed ports).

One of its nice features is that as well as randomly generating test data, it also minimizes the examples it generates. So rather than generating really complicated examples as soon as it finds one that fails, it takes those examples and attempts to generate something simpler out of them.

I was thinking about this today and decided to have a play with putting something like this together in python. Rather than build a full testing setup I thought I’d instead just build the basic core algorithm and then figure out how to integrate this with py.test and similar later.

Enter hypothesis, a library for generating and minimizing data and using it to falsify hypotheses.

In [1]: from hypothesis import falsify
 
In [2]: falsify(lambda x,y,z: (x + y) + z == x + (y +z), float,float,float)
Out[2]: (1.0, 2.0507190744664223, -10.940188909437985)
 
In [3]: falsify(lambda x: sum(x) < 100, [int])
Out[3]: ([6, 29, 65],)
 
In [4]: falsify(lambda x: sum(x) < 100, [int,float])
Out[4]: ([18.0, 82],)
 
In [12]: falsify(lambda x: "a" not in x, str)
Out[12]: ('a',)

The current state of this is “working prototype”. It’s reasonably well tested, and the external interface to falsify is probably not going to change all that much, but the internals are currently terrible and liable to change almost entirely, but I’m reasonably happy with it as a proof of concept.

This entry was posted in Hypothesis, Uncategorized on by .