Category Archives: Hypothesis

Conjecture, parametrization and data distribution

Up front warning: This is a very inside baseball post, and I’m the only person who plays this particular variant of the game. This blog post is mostly a mix of notes to self to and sharing my working.

I’m in the process of trying to rewrite the Hypothesis backend to use the Conjecture approach.

At this point the thing I was originally worried was intractable – shrinking of data – is basically solved. Conjecture shrinks as well as or better than Hypothesis. There are a few quirks to still pay attention to – the shrinking can always be improved, and I’m still on the fence as to whether some of the work I have with explicit costing and output based shrink control is useful (I think it’s probably not), but basically I could ship what I have today for shrinking and it would be fine.

However I’m discovering another problem: The other major innovative area of Hypothesis is its parametrized approach to data generation. More generally, I’m finding that getting great quality initial data out of Conjecture is hard.

This manifests in two major ways:

  1. It can be difficult to get good data when you also have good shrinking because you want to try nasty distributions. e.g. just generating 8 bytes and converting it to an IEEE 754 binary float representation produces great shrinking, but a fairly sub-par distribution – e.g. the probability of generating NaN is 1 in 2048 (actually very slightly lower).
  2. The big important feature of Hypothesis’s parametrization is correlated output. e.g. you can’t feasibly generate a list of 100 positive integers by chance if you’re generating each element independently. Correlated output is good for finding bugs.

1 is relatively easily solved by letting data generators participate in the initial distribution: Instead of having the signature draw_bytes(self, n) you have the signature draw_bytes(self, n, distribution=uniform). So you can let the floating point generator specify an alternative distribution that is good at hitting special case floating point numbers without worrying about how it affects distributions. Then, you run the tests in two modes: The first where you’re building the data as you go and use the provided distributions, the second where you’re drawing from a pre-allocated block of data and ignore the distribution entirely.

This is a bit low-level unfortunately, but I think it’s mostly a very low level problem. I’m still hoping for a better solution. Watch this space.

For the second part… I think I can just steal Hypothesis’s solution to some degree. Instead of the current case where strategies expose a single function draw_value(self, data) they can now expose functions draw_parameter(self, data) and draw_value(self, data, parameter). A normal draw call then just does strategy.draw_value(data, strategy.draw_parameter(data)), but you can use alternate calls to induce correlation.

There are a couple problems with this:

  1. It significantly complicates the usage pattern: I think the parametrization is one of the bits of Hypothesis people who look at the internals least understand, and one of the selling points of Conjecture was “You just write functions”. On the other hand I’m increasingly not sold on “You just write functions” as a good thing: A lot of the value of Hypothesis is the strategies library, and having a slightly more structured data type there is quite useful. It’s still easy to go from a function from testdata to a value to a strategy, so this isn’t a major loss.
  2. It’s much less language agnostic. In statically typed languages you need some way to encode different strategies having different parameter types, ideally without this being exposed in the strategy (because then strategies don’t form a monad, or even an applicative). You can solve this problem a bit by making parameters an opaque identifier and keeping track of them in some sort of state dictionary on the strategy, but that’s a bit gross.
  3. Much more care with parameter design is needed than in Hypothesis because the parameter affects the shrinking. As long as shrinking of the parameter works sensibly this should be OK, but this can become much more complicated. An example of where this gets complicated later.
  4. I currently have no good ideas how parameters should work for flatmap, and only some bad ones. This isn’t a major problem because you can fall back to a slightly worse distribution but it’s annoying because Conjecture previously had the property that the monadic and applicative interfaces were equivalently good.

Here’s an example of where parametrization can be a bit tricky:

Suppose you have the strategy one_of(s1, …, sn) – that is, you have n strategies and you want to pick a random one and then draw from that.

One natural way to parametrize this is as follows: Pick a random non-empty subset of {1, .., n}. Those are the enabled alternatives. Now pick a parameter for each of these options. Drawing a value is then picking a random one of the enabled alternatives and feeding it its parameter.

There are a couple major problems with this, but the main one is that it shrinks terribly.

First off: The general approach to shrinking directions Hypothesis takes for alternation is that earlier branches are preserved. e.g. if I do integers() | text() we’ll prefer integers. If I do text() | integers() we’ll prefer text. This generally works quite well. Conjecture’s preference for things that consume less data slightly ruins this (e.g. The integer 1 will always be preferred to the string “antidisestablishmentarianism” regardless of the order), but not to an intolerable degree, and it would be nice to preserve this property.

More generally, we don’t want a bad initial parameter draw to screw things up for us. So for example if we have just(None) | something_really_complicated() and we happen to draw a parameter which only allows the second, but it turns out this value doesn’t matter at all, we really want to be able to simplify to None.

So what we need is a parameter that shrinks in a way that makes it more permissive. The way to do this is to:

  1. Draw n bits.
  2. Invert those n bits.
  3. If the result is zero, try again.
  4. Else, return a parameter that allows all set bits.

The reason for this is that the initially drawn n bits will shrink towards zero, so as you shrink, the parameter will have more set bits.

This then presents two further problems that need solving.

The next problem is that if we pick options through choice(enabled_parameters) then this will change as we enable more things. This may sometimes work, but in general will require difficult to manage simultaneous shrinks to work well. We want to be able to shrink the parameter and the elements independently if at all possible.

So what we do is rejection sampling: We generate a random number from one to n, then if that bit is set we accept it, if not we start again. If the number of set bits is very low this can be horrendously inefficient, but we can short-circuit that problem by using the control over the distribution of bytes suggested above!

The nice thing about doing it this way is that we can mark the intermediate draws as deletable, so they get discarded and if you pay no attention to the instrumentation behind the curtain it looks like our rejection sampling magically always draws the right thing on its first draw. We can then try bytewise shrinking of the parameter, which leads to a more permissive set of options (that could then later allow us to shrink this), and the previously chosen option remains stable.

This then leads to the final problem: If we draw all the parameters up front, adding in more bits will cause us to read more data because we’ll have. This is to draw parameters for them. This is forbidden: Conjecture requires shrinks to read no more data than the example you started from (for good reason – this both helps guarantee the termination of the shrink process and keeps you in areas where shrinking is fast).

The solution here is to generate parameters lazily. When you pick alternative i, you first check if you’ve already generated a parameter for it. If you have you use that, if not you generate a new one there and then. This keeps the number and location of generated parameters relatively stable.

In writing this, a natural generalization occurred to me. It’s a little weird, but it nicely solves this problem in a way that also generates to monadic bind:

  1. parameters are generated from data.new_parameter(). All this is in an integer counter.
  2. There is a function data.parameter_value(parameter, strategy) which does the same lazy calculation keyed off the parameter ID: If we already have a parameter value for this ID and strategy, use that. If we don’t, draw a new one and store that.
  3. Before drawing from it, all strategies are interned. That is, replaced with an equivalent strategy we’ve previously seen in this test run. This means that if you have something like booleans().flatmap(lambda b: lists(just(b))), both lists(just(False)) and lists(just(True)) will be replaced with stable strategies from a pool when drawing. This means that parameters get reused.

I think this might be a good idea. It’s actually a better API, because it becomes much harder to use the wrong parameter value, and there’s no worry about leaking values or state on strategy objects, because the life cycle is fairly sharply confined to that of the test. It doesn’t solve the problem with typing this well, but it solves the problem of using it incorrectly well enough that an unsafe cast is probably fine if you’re unable to do so.

Anyway, brain dump over. I’m not sure this made sense to anyone but me, but it helped me think through the problems quite a lot.

This entry was posted in Hypothesis on by .

Let Hypothesis making your choices for you

I had a moment of weakness this morning and did some feature development on Hypothesis despite promising not to. The result is Hypothesis 1.14.0.

This adds a bunch of interesting new strategies to the list. One I’d like to talk about in particular is the new choices strategy.

What does it do?ell, it gives you something that behaves like random.choice only under Hypothesis’s control and subject to minimization. This more or less solves the problem I had a long and complicated post about a while ago for picking elements from a list. You can now do something like:

from hypothesis import given, strategies as st
 
 
@given(st.lists(st.integers(), min_size=1), st.choices())
def test_deletion(values, choice):
    v = choice(values)
    values.remove(v)
    assert v not in values

Then running this will print something like:

_____________________________________________ test_deletion ______________________________________________
test_del.py:4: in test_deletion
    def test_deletion(values, choice):
src/hypothesis/core.py:583: in wrapped_test
    print_example=True, is_final=True
src/hypothesis/executors/executors.py:25: in default_executor
    return function()
src/hypothesis/core.py:365: in run
    return test(*args, **kwargs)
test_del.py:7: in test_deletion
    assert v not in values
E   assert 0 not in [0]
----------------------------------------------- Hypothesis -----------------------------------------------
Falsifying example: test_deletion(values=[0, 0], choice=choice)
Choice #1: 0
===================

Note that the choices are printed as they are made. This was one of the major obstacles to implementing something like this in the past: The lack of the ability to display the results from within the test. The new note API offers a solution to this.

This entry was posted in Hypothesis, Python on by .

New improved development experience for Hypothesis

As part of my drive to make Hypothesis more of a community project, one of the things I need to do is to ensure it’s easy for new people to pick up, and easy for people who have a different environment to use.

There are a couple major consistent sources of issues people have with Hypothesis development:

  1. It requires the availability of a lot of different versions of Python. I use pyenv heavily, so this hasn’t been a major problem for me, but other people don’t so are less likely to have, say, both Python 3.5 and Python 3.4 installed (because of reasons some build tasks require one, some the other).
  2. A full test run of Hypothesis takes a very long time. If you don’t parallelise it it’s in the region of 2 hours.
  3. Some of the build steps are very counterintuitive in their behaviour – e.g. “tox -e lint” runs a mix of linting and formatting operations and then errors if you have a git diff. This is perfectly reasonable behaviour for running on a CI, but there’s no separate way of getting the formatter to fix your code for you.

Part of the problem in 3 is that tox is a test runner, not a general task runner, and there was a lack of good unified interface to the different tasks that you might reasonably want to run.

So I’ve introduced a new unified system which provides a much better developer experience, gives a single interface to all of the normal Hypothesis development tasks, and automates a lot of the issues around managing different versions of Python. Better yet, it’s based on a program which is widely deployed on most developers’ machines, so there’s no bootstrapping issue.

I am, of course, talking about a Makefile.

No, this isn’t some sort of sick joke.

Make is actually pretty great for development automation: It runs shell commands, checks if things are up to date, and expresses dependencies well. It does have some weird syntactic quirks, and writing portable shell isn’t exactly straightforward, but as an end user it’s pretty great.

In particular because the makefile can handle installing all of the relevant pythons for you (I shell out to pyenv‘s build plugin for this but don’t otherwise use pyenv for this) the juggling many pythons problem goes away.

Other highlights:

  • ‘make check-fast’ for running a fast subset of the tests
  • ‘make format’ for reformatting your code to the Hypothesis style
  • ‘make check-django’ and ‘make check-pytest’ for testing their respective integrations (there’s also ‘make check-nose’ for checking Hypothesis works under nose and I never giggle when typing that at all).

You can see the full Makefile here, and the CONTRIBUTING.rst documents some of the other common operations.

Here’s an asciinema of it in action:

This entry was posted in Hypothesis, programming, Python on by .

Future directions for Hypothesis

There’s something going on the Hypothesis project right now: There are currently three high quality pull requests open from people who aren’t me adding new functionality.

Additionally, Alexander Shorin (author of the characters strategy one) has a CouchDB backed implementation of the Hypothesis example database which I am encouraging him to try to merge into core.

When I did my big mic drop post it was very unclear whether this was going to happen. One possible outcome of feature freeze was simply that Hypothesis was going to stabilize at its current level of functionality except for when I occasionally couldn’t resist the urge to add a feature.

I’m really glad it has though. There’s a vast quantity of things I could do with Hypothesis, and particularly around data generation and integrations it’s more or less infinitely parallellisable and doesn’t require any deep knowledge of Hypothesis itself, so getting other people involved is great and I’m very grateful to everyone who has submitted work so far.

And I’d like to take this forward, so I’ve updated the documentation and generally made the situation more explicit:

Firstly, it now says in the documentation that I do not do unpaid Hypothesis feature development. I will happily take sponsorship for new features, but for the rest of it I will absolutely help you every step of the way in writing and designing the feature, but it’s up to the community to actually drive the work.

Secondly, I’ve now labelled all enhancements that I think are accessible for someone else to work on. Some of these are large-ish and people will need me (or, eventually, someone else!) to lend a hand with, but I think they all have the benefit of being relatively self-contained and approachable without requiring too deep an understanding of Hypothesis.

Will this work? Only time (and effort) will tell, but I think the current set of pull requests demonstrates that it can work, and the general level of interest I see from most people I introduce Hypothesis to seems to indicate that it’s got a pretty good fighting chance.

This entry was posted in Hypothesis, Python on by .

Finding more bugs with less work

I was at PyCon UK this weekend, which was a great conference and I will definitely be attending next year.

Among the things that occurred at this conference is that I gave my talk, “Finding more bugs with less work”. The video is up, and you can see the slides here.

I may do a transcript at some point (like I did for my django talk), but I haven’t yet.

This entry was posted in Hypothesis, programming, Python on by .