A vague roadmap for Hypothesis 2.0

As mentioned, there are a whole bunch of things I’d like to work on in Hypothesis still, but it’s a bit of a full time job and I’ve had enough of doing that on Hypothesis for now.

But I’d like to get the details down while they’re still fresh in my head. Unfortunately that makes this a bit of a “neener neener, these are the features you’re not going to get” post. It’s likely that in about 6 months time when Hypothesis has even more traction than it currently does I will do some sort of Kickstarter to fund working on this, or I might just do it for free the next time I feel like taking a big pile of money and setting fire to it.

This is sorted by order in which I would likely do them in rather than any sort of interest order.

Flaws in the current API

The Hypothesis API is pretty good in my not very humble opinion, but it definitely has some weird eccentricities and things I’d like to do better. Some of these are holdouts from the early early days of Hypothesis back in 2013, some of these are more modern and are just because of things I couldn’t figure out a nice way to express when I wrote them.

Weird parameters to @given

Arguments to @given are confusing. They can be positional or they can be keyword based, and there are some special arguments there. In particular you can explicitly pass in a Settings object or a Random object to use. These currently live in the same namespace as the arguments to be passed to your underlying function. As a result, fun fact: You can’t have arguments to Hypothesis called ‘random’ or ‘settings’ unless you pass them positionally.

New API:

  1. Like @example, @given may take its arguments either positionally or as keywords, but not both.
  2. All arguments to @given will be passed to the underlying test as values. If you want to configure it with a custom random, settings, or any of various other things the API would be free to grow to, the syntax would be something like @given(…).with_configuration(random=my_random) def my_test_function(…)

This opens up a lot of possibilities, simplifies the slightly horrendous edge cases of given, and generally smooths out a lot of difficulties people sometimes run into with using it. It also means there are much fewer confusing edge cases using it with internal decorators because the interactions with arguments and keyword arguments become much simpler.

Status: I know exactly how to do this. It’s a bit fiddly to do well, but I don’t think there are any surprises here.

Life cycle API

One of the things you need for testing is a bunch of hooks into the life-cycle – setup and teardown for example. Hypothesis currently has the executors API which nobody including me likes. It’s a bit weird, very limited, and ties you to class based testing.

I’d like to expose a detailed lifecycle API that lets you hook into a number of events. In particular I need to be able to insert code around just the test body (executors currently treat value creation and test execution identically). Combining this with the better @given configuration above makes it easy to ditch the dependence on class based testing.

I’d still like to be able to support some sort of auto derivation of life cycle events for class based tests (unittest setup and tear down for example).

I’d also like to integrate this with py.test function level fixtures, but that’s currently impossible.

Status: The basic life cycle API I mostly know how to do. Beyond that things start to get a bit shaky.

Better stateful testing

Hypothesis’s stateful testing is extremely powerful and if you’re not using it you should be.

But… I’ll be the first to admit it’s a little hard to use. The generic API is fine. It’s quite low level but it’s easy to use. The rule based stuff should be easy to use, but there’s a bit too much boiler plate and bundles of variables are a weird second class citizen.

What I’d like to be able to do is make them just behave like strategies, where the strategy’s evaluation is deferred until execution time, so you can use them as if they were any other strategy and everything should Just Work.

I would also like the syntax for using it to be more unified with the normal @given syntax. Ideally it would be nice if every @given invocation had an implicit state machine associated with it so as to unify the two approaches.

Status: I know how to do every part of this except the last bit. I suspect the last bit will get punted on.

Better process isolation

Currently if you want to have process isolation for your tests you can use the forking executor.

But you probably shouldn’t. It has a number of weird limitations: It may (usually will) interact poorly with how test runners capture output, and for reasons that will be completely opaque to you but make sense I promise some examples will not minimize correctly.

I’d like to make this better and integrate it more thoroughly into Hypothesis, so it’s just a config item to get process isolation and it should transparently work with things. Ideally this would give built in parallel test execution too.

Status: Given the lifecycle API I’m pretty sure I know how to do this on posix platforms. I’m unclear on the details for windows but think I can probably manage something. This may require breaking my zero dependencies rule to do well, but I’m not against doing that. I would strongly consider just adding execnet as a dependency for implementing this.

Size bounding

A common problem in Hypothesis is that you ask it to generate examples that are too large. e.g. a lists(lists(lists(booleans))) will typically contain more than a thousand elements. This is unfortunate. A lot of this problem comes from the fact that Hypothesis does not use the traditional sizing mechanism from Quickcheck.

The way to fix this is basically to draw from the conditional distribution of values which are <= some maximum size. The mechanisms for doing most of this are already in place from the work on recursive strategies, but it would be nice to be able to generalize it everywhere as it would sovle a common cause of slowness.

Status: I’ve got about 80% of a design sketched out. There are a few things I’m worried might not work as well as I’d hope, but I don’t think this is too hard.

New functionality


This is a very simple feature, but it would be nice to have easily configurable “profiles” for Hypothesis: You want to run Hypothesis very differently on your CI server than in local development for example.

Status: Trivial. I’m almost embarrassed I haven’t done this already.

Missing Quickcheck features

There are two Quickcheck features which Hypothesis doesn’t implement. One of these is an “eh I could and probably will but I doubt Python programmers care much”. The other one is one that is genuinely important and I would like to support but haven’t yet got around to.

Function Strategies

One of the nice things Quickcheck can do is generate random functions. I’m pretty sure I know how to do this in Hypothesis, there’s just not really been much demand for it so I’ve never got around to doing it.

Status: Not that hard, but I need to work out some details and figure out the specific requirements for this. Python functions are vastly nastier beasts than Haskell functions.

Example classification

Quickcheck lets you label examples and then at the end reports a breakdown of the statistics for different labels.

I’ve never got around to implementing this in Hypothesis despite the fact that I’ve been intending to do so since even before 1.0. Part of why is that it matters less for Hypothesis – one of the classic reasons you want this is because Quickcheck can easily generate a lot of duplicate examples if you’re not careful. Hypothesis generally doesn’t do that because of its built in deduplication.

Still, it would be a useful feature which I would like to implement at some point.

Status: I know how this should work. I’ve had chunks of a working prototype before and everything was straightforward, but it got deprioritised.

Coverage based example discovery

The big thing that means that I can’t currently say “Hypothesis is great and you should use it for everything” is that actually for a lot of things you should be using python-AFL instead.

This shouldn’t be the case. Really everything python AFL can do, Hypothesis should be able to do better. It just currently can’t because it’s entirely black box.

Hypothesis should be able to use essentially the AFL algorithm to do this: Instead of just generating examples it generates examples, sees the coverage profile of those, then minimizes down to a seed set for that profile and starts mutating from there.

This would be particularly great in combination with the stateful testing, which has exactly the sort of enormous state space that this sort of concept is great for exploring.

Status: I have prototyped this and it works great, but my previous attempts were unacceptably slow. I’m about 80% sure I now know how to fix that.

Grammar based strategy definition

I’d like to be able to generate strings matching some grammar. I’ve previously implemented this in a very early version of Hypothesis for regular grammars but never finished it off.

Status: I’ve done enough of this before that I know how to do it again, but it would require starting from scratch because too much has changed since then.

Strategies for common string formats

I’d like to have built in strategies for URIs, emails, etc. that do not depend on fake factory. Fake factory is fine, but it’s basically insufficiently devious for the level of making your code suffer that Hypothesis users have come to expect.

Status: A bunch of work to get good, but not that hard to do. Especially given grammar based strategy definitions.

Discovery mode

Currently Hypothesis is intended for being run as part of a small test suite of reasonably fast tests. This is great and all, but particularly given the coverage based discovery features what you really want is to be able to part Hypothesis on a server somewhere and just have it sitting there 24/7 looking for bugs in your code.

Status: Requires some fairly hefty designing. A lot of known unknowns, probably some unknown unknowns. I don’t think there’s anything fundamentally difficult about this, but there are a lot of “How should it handle X?” questions that would need answering.

This entry was posted in Hypothesis, Python on by .

One thought on “A vague roadmap for Hypothesis 2.0

  1. Mathieu

    A lot of this is very exciting, but I’m very eager to see the “coverage based example discovery” feature taking life !

Comments are closed.