Category Archives: Hypothesis

A pathological example for test-case reduction

This is an example I cut from a paper I am currently writing about test-case reduction because I needed to save space and nobody except me actually cares about the algorithmic complexity of test-case reduction.

Suppose you have the following test case:

from hypothesis import given, strategies as st
K = 64
INTS = st.integers(0, 2 ** K - 1)
@given(INTS, INTS)
def test_are_not_too_far_apart(m, n):
    assert abs(m - n) > 1

Then if this test ever fails, with initial values \(m\) and \(n\), if reduction can replace an int with its predecessor but can’t change two ints at once, the reduction will take at least \(\frac{m + n}{2}\) test executions: The fastest possible path you can take is to at each step reduce the larger of \(m\) and \(n\) by two, so each step only reduces \(m + n\) by \(2\), and the whole iteration takes \(\frac{m + n}{2}\) steps.

(Side note: You can get lucky with Hypothesis and trigger some special cases that make this test faster. if you happen to have \(m = n\) then it can reduce the two together, but if you start with \(m = n \pm 1\) then it will currently never trigger because it will not ever have duplicates at the entry to that step. Hypothesis will also actually find this bug immediately because it will try it with both examples set to zero. Trivial modifications to the test can be made to avoid these problems, but I’m going to ignore them here).

The interesting thing about this from a Hypothesis point of view is that \(m + n\) is potentially exponential in \(k\), and the data size is linear in \(k\), so Hypothesis’s test case reduction is of exponential complexity (which doesn’t really cause a performance problem in practice because the number of successful reductions gets capped, but does cause an example quality problem because you then don’t run the full reduction). But this isn’t specifically a Hypothesis problem – I’m morally certain every current property-based testing library’s test case reduction is exponential in this case (except for ones that haven’t implemented reduction at all), possibly with one or two patches to avoid trivial special cases like always trying zero first.

Another way to get around this is to almost never trigger this test case with large values! Typically property-based testing libraries will usually only generate an example like this with very small values. But it’s easy for that not to be the case – mutational property-based testing libraries like Hypothesis or crowbar can in theory fairly easily find this example for large \(m\) and \(n\) (Hypothesis currently doesn’t. I haven’t tried with Crowbar). Another way you could easily trigger it is with distributions that special case large values.

One thing I want to emphasise is that regardless of the specific nature of the example and our workarounds for it, this sort of problem is inherent. It’s easy to make patches that avoid this particular example (especially in Hypothesis which has no problem making simultaneous changes to \(m\) and \(n\)).

But if you fix this one, I can just construct another that is tailored to break whatever heuristic it was that you came up with. Test-case reduction is a local search method in an exponentially large  space, and this sort of problem is just what happens when you block off all of the large changes your local search method tries to make but still allow some of the small ones.

You can basically keep dangling the carrot in front of the test-case reducer going “Maybe after this reduction you’ll be done”, and you can keep doing that indefinitely because of the size of the space. Pathological examples like this are not weird special cases, if anything the weird thing is that most examples are not pathological like this.

My suspicion is that we don’t see this problem cropping up much in practice for a couple of reasons:

  1. Existing property-based testing libraries are very biased towards finding small examples in the first place, so even when we hit cases with pathological complexity, \(n\) is so small that it doesn’t actually matter.
  2. This sort of boundary case relies on what are essentially “weird coincidences” in the test case. They happen when small local changes unlock other small local changes that were previously locked. This requires subtle dependency between different parts of the test case, and where that subtle dependency exists we are almost never finding it. Thus I suspect the fact that we are not hitting exponential slow downs in our test case reduction on a regular basis may actually be a sign that there are entire classes of bug that we are just never finding because the probability of hitting the required dependency combination is too low.
  3. It may also be that bugs just typically do not tend to have that kind of sensitive dependencies. My suspicion is that this is not true given the prevalence of off-by-one errors.
  4. It may also be that people are hitting this sort of problem in practice and aren’t telling us because they don’t care that much about the performance of test case reduction or example quality.
This entry was posted in Hypothesis, Python on by .

Looking into doing a PhD

As regular readers of this blog have probably figured out, I’m a researchy sort of person.

A lot of my hobbies – maths, voting theory, weird corners of programming, etc – are research oriented, and most of my work has had some sort of research slant to it.

The last two years I’ve basically been engaged in a research project working on Hypothesis. It’s come quite far in that time, and I feel reasonably comfortable saying that it’s the best open source property based testing library on most metrics you’d care to choose. It has a number of novel features and implementation details that advance the state of the art.

It’s been pretty great working on Hypothesis like this, but it’s also been incredibly frustrating.

The big problem is that I do not have an academic background. I have a masters in mathematics (more technically I have a BA, an MA, and a CASM. Cambridge is weird. It’s entirely equivalent to a masters in mathematics though), but that’s where I stopped. Although it says “DR” in my online handle and the domain of this blog, those are just my initials and not my qualification.

As a result, I have little to no formal training or experience in doing academic research, and a similarly low understanding of who’s who and what’s what within the relevant fields. So I’ve been reading papers and trying to figure out the right people to talk to all on my own, and while it’s gone OK it’s still felt like fumbling around in the dark.

Which leads to the obvious solution that I spoilered in the title: If the problem is that I’m trying to do research outside of an academic context, the solution is to do research in an academic context.

So I’d like to do a PhD that is either about Hypothesis, or about something close enough to Hypothesis that each can benefit from the other.

There’s probably enough novel work in Hypothesis already that I could “just” clean it up, factor it out, and turn it into a PhD thesis as it is, but I’m not really expecting to do that (though I’d like that to be part of it). There are a number of additional directions that I think it would be worth exploring, and I expect most PhD funding will come with a focus subject attached which I would be happy to adapt to (a lot of the most interesting innovations in Hypothesis came because some external factor forced me to think about things in ways I wouldn’t otherwise have!). If you’d like to know more, I’ve written up a fairly long article about Hypothesis and why I think it’s interesting research on the main Hypothesis site.

Which, finally, brings me to the main point of the post: What I want from you.

I’m already looking into and approaching potential universities and interesting researchers there who might be good supervisors or able to recommend people who are. I’ve been in touch with a couple (some of whom might be reading this post. Hi), but I would also massively appreciate suggestions and introductions.

So, if you work in relevant areas or know of people who do and think it would be useful for me to talk to, please drop me an email at [email protected]. Or just leave a comment on this blog post, tweet at me, etc.

This entry was posted in Hypothesis, life, programming, Python on by .

Randomized exhaustive exploration of a many branching tree

Attention conservation notice: This post is largely an obvious in retrospect solution to an obscure problem.

It’s a problem I spent a lot of time considering wrong or complicated solutions too though (this is what the persistent data structure I was considering was for, but it’s not needed in the end), so I thought it was worth writing up the correct solution.

The problem is this: I have a tree of unknown structure. Each node in the tree is either a leaf or has an ordered sequence of N (variable per node) children (technically the leaves are just a special case of this with N=0, but leaves have special properties as we’ll see in a moment).

I want to do an exhaustive search of the leaves of this tree, but there’s a limitation: when walking this tree I may only walk complete branches starting from the root and finishing at a leaf. I cannot back up and choose another branch.

The tree is potentially very large, to the point where an exhaustive search might be impossible. So this should actually be treated as an anytime algorithm, where we’re iteratively yielding new paths to a leaf which some external algorithm is then doing something with. When yielding those we want to guarantee:

  • We never yield the same path twice
  • We know when we have explored the entire tree and stop

It’s easy to do a lexicographic exploration of the branches in sorted order – just build a trail of the choices you took. At each step, increment the last choice. If that causes it to exceed the number of choices available, pop it from the end and increment the previous next choice. When making an unknown choice always choose zero.

But there’s a reason you might not want to do this: Under certain reasonable assumptions, you may end up spending a lot of time exploring very deep leaves.

Those reasonable assumptions are this:

  • All else being equal, we should prefer leaves which are closer to the root (e.g. because each walk to a child is expensive, so it takes less time to explore them and we can explore more leaves / unit time)
  • The expected number depth of a tree below any node given uniformly at random choices is usaully much lower than the maximum depth of the tree elow that point.

Under these assumptions the optimal solution for exploring the tree is instead to just randomly walk it: You will avoid the long paths most of the time because of the expectation assumption, and will do a better job of exploring close to the root.

Unfortunately, pure random exploration does not satisfy our requirements: We might generate duplicate paths, and we don’t know when we’ve explored all the nodes.

So what we want is a hybrid solution that exhaustively searches the tree without duplicates, but chooses its paths as randomly as possible.

Any way, there turns out to be a very simple way of doing this. It’s as follows:

We maintain a list of paths to unexplored nodes, bucketed by their length. This is initially the empty path leading to the root.

We then repeatedly iterate the following steps until there are no unexplored nodes in the list:

  1. Pick an unexplored node of minimal length uniformly at random and remove it from the list
  2. Walk to that node
  3. Walk randomly below that node. Every time we make a random decision on a child, add the path we took to get to this point plus each other decision we could have taken to the list of unexplored nodes.
  4. When we reach a leaf node, yield that path to the controlling algorithm.

Every path we walk goes through a node we have not previously explored, so must be unique. When the list is empty we’ve explored all the nodes and are done.

The correct storage format for the paths is just an immutable singly linked list stored with the last node in the path first – appending a node to the path is just consing it to the front of the list. You’re going to have to do O(n) work to reverse it before walking upwards, but that’s OK: You’re going to have to do O(n) work with n the same and a higher constant factor to walk back up the tree along that path anyway.

The reason for my original need for a data structure with better forward iteration than that was because the algorithm I was originally considering always picked the unexplored node with the shortlex minimal (lexicographically first amongst those with the shortest length) path to it, but there’s no real reason you need to do that and it makes things much more expensive.

If you want to instead do weighted random walks of the tree and are prepared to relax the requirements around exploring shorter paths first a bit, there’s an alternative algorithm you can use (and I’m actually more likely to use this one in practice because I’ll need to do weighted draws) which works as follows:

We build a shadow tree in parallel every time we walk the real tree. The shadow tree is mutable, and each node in it is in one of three states: Unexplored, Active or Finished. It starts with a single root node which is Unexplored.

We maintain the following invariant: Any node in the shadow tree which is not Finished has at least one child node that is not Finished. In particular, leaf nodes are always Finished.

We then decide how to walk the tree as follows: We always walk the shadow tree in parallel, matching our walk of the real tree. Whenever we walk to a node that is Unexplored, we do one of two things:

  1. If the corresponding real node is a leaf, we immediately mark it as Finished (and our walk is now done)
  2. If the corresponding real node has N children, we mark it as Active and create N Unexplored child nodes for it.

Once our walk has complete (i.e. we’ve reached a leaf), we look at the list of nodes in the shadow tree that we encountered  and walk it backwards. If all of the children of the current node are Finished, we mark the node as Finished as well. If not, we stop the iteration.

We can now use the shadow tree to guide our exploration of the real tree as follows: Given some random process for picking which child of the real tree we want to navigate to, we simply modify the process to never pick a child whose corresponding shadow node is Finished.

We can also use the shadow tree to determine when to stop: Once the root node is marked as Finished, all nodes have been explored and we’re done.

Note that these two approaches don’t produce at all the same distribution: If one child has many more branches below it than another, the initial algorithm will spend much more time below that child, while the shadow tree algorithm will explore both equally until it has fully exhausted one of them (even if this results in much deeper paths). It probably depends on the use case which of these are better.

You can use the shadow tree to guide the exploration too if you like. e.g. if there are any unexplored child nodes you could always choose one of them. You could also store additional information on it (e.g. you could track the average depth below an explored node and use that to guide the search).

Anyway, both of these algorithms are pretty obvious, but it took me a long time to see that they were pretty obvious so hopefully sharing this is useful.

As an aside: You might be wondering why I care about this problem.

The answer is that it’s for Hypothesis.

As described in How Hypothesis Works, Hypothesis is essentially an interactive byte stream fuzzer plus a library for doing structured testing on top of this. The current approach for generating that bytestream isn’t great, and has a number of limitations. In particular:

  • At the moment it fairly often generates duplicates
  • It cannot currently detect when it can just exhaustively enumerate all examples and then stop (e.g. if you do @given(booleans()) at the moment the result isn’t going to be great).
  • The API for specifying the byte distribution is fairly opaque to Hypothesis and as a result it limits a number of opportunities for improved performance.

So I’m exploring an alternative implementation of the core where you specify an explicit weighted distribution for what the next byte should be and that’s it.

This then opens it up to using something like the above for exploration. The nodes in the tree are prefixes of the bytestream, the leaves are complete test executions. Each non-leaf node has up to 256 (it might be fewer because not every byte can be emitted at every point) child nodes which correspond to which byte to emit next, and then we can treat it as a tree exploration problem to avoid all of the above issues.

I’ve currently got a branch exploring just changing the API and keeping the existing implementation to see how feasible rebuilding Hypothesis this way is. So far the answer is that it’ll be a fair bit of work but looks promising. If that promise pans out, one of the above (probably the shadow tree implementation) is likely coming to a Hypothesis near you at some point in 2017.

(Why yes, it is a programming post! I do still do those. I know these are a bit thin on the ground right now. If you want more, consider donating to my Patreon and encouraging me to write more).

This entry was posted in Hypothesis, programming, Python on by .

Declaring code bankruptcy for the rest of 2016

This is a small PSA.

It probably hasn’t been too visible from the outside, but I’ve not been doing very well recently.

In particular I’ve been finding my productivity has pretty much gone through the floor over the last couple months. 2016 is stressing me out for a whole pile of reasons (about 80% the same ones it’s stressing everyone else out), and I’m not dealing with it very well. This is making it very difficult to stay on track and motivated.

It’s not a problem when I’ve got something external to focus me (e.g. a client), but when I have to self-motivate on a programming project I find that I can’t right now, and the result is that I’m unproductive, which makes me depressed, which makes me even less productive.

So I’ve decided to stop. If I’m not getting any programming done and feeling bad about it, it’s clearly better to not get any programming done and not feel bad about it. Being depressed isn’t doing anyone any favours, let alone me. So, for the rest of the year I will not be writing any code unless someone is explicitly paying me to write it.

I currently have one client and a few potential clients, and my obligations to them will absolutely still be met (and probably be met a lot better than they would have in my previous mood). I am happy to accept new clients, but I probably won’t be actively seeking them out until the new year.

I’m also going to keep reviewing pull requests on Hypothesis and doing Hypothesis releases with other people’s stuff (there will probably be some Hypothesis releases with paid work from me too).

I’m probably also going to keep coding when it’s required to solve an immediate problem I have, and maybe when I need it to answer a question for a blog post or something.

So it’s not a complete cessation, but what it is is a freedom from a sense of obligation: If it doesn’t fall into one of these categories then I shouldn’t be doing it, and as a result I should be looking for something else to do if I don’t have any programming to do in one of those categories rather than procrastinating to avoid some vague sense of obligation to be coding.

This is going to give me a lot of time that I’m currently filling with failing to program in, so I’ll probably end up with some non-programming projects. I don’t yet know what these are going to be, but I’ve got a couple candidates:

  • When I first started working on Hypothesis I was taking a work break in which I’d intended to brush off some mathematics textbooks. This didn’t happen. It might happen this time. I started working through some of the exercises in Bollobas’s Combinatorics earlier and I’m finding it surprisingly enjoyable. I may keep this up.
  • I’ve been working on getting in shape the last couple of months. I don’t want to throw myself into this too vigorously because I’m more likely to just do myself an injury, but I’m likely to step this up at least moderately.
  • The War On Sleep must still be waged.
  • I’ve been thinking of doing NaNoWriMo, but I probably won’t.

Other than that, I don’t know. Watch this space while I found out.

This entry was posted in Hypothesis, life, Python on by .

Brand split

Short version

Expect less directly Hypothesis related content here. If you want lots of Hypothesis related content, go over to and check out the articles.

There will still be brain dumps of research, reflections on the social aspects of Hypothesis, etc. but most of the more directly expository content will be over there.

Note: If you read this from Planet Python, all of the Python specific articles will already be coming in automatically. There are a bunch of non Python related ones you may also wish to check out.

What’s going on?

Basically, I need to make money. I’m making some, but I’m not making enough. I am currently paying myself a salary (yay!) but it’s a salary that if any friend of mine was getting paid it I would tell them that their employer was screwing them over and they should fire them into the sun (note: This is not qualified by “any friend in the software industry”). Some of that is to build up liquidity for my business bank account (e.g. so I can pay contractors), some of it is just my company’s current revenue stream is not very large. Books are nice, and I’m very grateful to the small number of contributors on bountysource, but really all of the money comes from training, consulting and development contracts and right now I am not taking on even close to enough of them.

If I continue making money at the current rate then at some point later this year (Pencilled in at end of August if my current revenue doesn’t increase by then and end of 2016 if my savings don’t start going up rather than down by then) I really will have to throw in the towel and go get a day job. Ideally a part time one, and either way Hypothesis R&D will continue and I will probably still be available for contracting, training etc (and of course any existing commitments will be honoured),  but it is going to have to get downgraded to a part time thing.

What am I going to do about it?

The contracts I’ve had so far have gone very well, and both I and my customers have come away from them pretty happy. Some of them will be turning into repeat custom, though unfortunately not on a sufficiently small cycle that they’re enough to be a path to sustainability on their own yet.

The problem is not that the contracts don’t pay enough (they do) or that I don’t know what sort of services bring real value to clients, but that I am simply not getting enough of them.

Why am I not getting enough of them? It turns out that the answer to this is quite straightforward: Very few people are aware they’re an option, and most of those who are aware that it is an option aren’t really sure it’s an option they need. I think they’re wrong, but clearly my arguments to date have been insufficiently persuasive.

Well, it turns out that there are two closely related disciplines that help you with the problem “I have a great product that not enough people know about and the people who know about aren’t buying”. They are of course, marketing and sales respectively.

These are things that historically I have been completely crap at. A lot of why nobody knows this is a thing I do is because I haven’t told them.

Oh, there’s been stuff on this site, but lets be honest: This site is very obviously just some guy’s blog. If you’re someone with budget and come here you don’t see a vendor, you see some guy ranting about social dynamics or game design. The best thing for really selling people on Hypothesis has been the documentation, and as well as being very developer focused it’s also doing a rubbish job of marketing.

So as well as being a generally great place to go for Hypothesis content, the goal of is to have a single place which is just about Hypothesis and how great it is, and to make it very obvious that it is a thing you can pay for.

What can you do to help me?

The Hypothesis community is full of lovely people and I do get lots of offers of help. Unfortunately I haven’t made it easy to be helped. Some of this is because it’s intrinsically difficult – a small community of lovely people can’t and shouldn’t donate enough to support me – and some of it is just the aforementioned lack of skill at marketing that I’m working on.

If you don’t want to help, that’s OK. I won’t hate you. If you can’t help that’s definitely OK. But if you want to help, you can do some of the following…

Easy things:

  1. Donate on bounty source.
  2. Buy me books. This won’t actually help in the financial sense, but it’ll sure make me feel better :-)
  3. Buy my book. OK. You can’t actually do that yet. But watch this space.
  4. Share links to articles, both on social media and internally to your company.
  5. Slightly harder variation: Write posts about how great Hypothesis is and link to If you’d like to guest post on about how great Hypothesis is, let me know.

Hard things:

The reality is that the best way to help me is by giving me contracts. I don’t want charity, I want to exchange goods and services for currency. As an individual you’re not well placed to help with that.

As an employee (or contractor/consultant) you might be. If you think your company, or a company you know, would find either the services or the training I offer useful, talk to them about it. Also, if I can do anything to improve the site to make that job easier, get in touch and tell me about it.

This entry was posted in Admin, Hypothesis on by .