Category Archives: Hypothesis

Language reconstruction based fuzzing without reconstructing languages

Note: This idea has not been implemented yet so may turn out not to work. It all clicked together while I was walking around Paris this afternoon and I’m writing it down while it’s fresh in my mind. I think it will work though.

Secondary note: This post may be too inside baseball for you.

An idea that I’ve been banging my head against recently is that of reconstructing languages corresponding to tests based on some bounded data. I’ve posted a bit about its use for reducing, but I’ve also been interested in its use for fuzzing.

Here’s a problem Hypothesis experiences: The | operator is currently not associative. x | (y | z) will produce a different data distribution than (x | y) | z. This is suboptimal. I could special case it (and for a long time I thought I had, but apparently I dreamed that code), but it’s easy to come up with trivial ways to defeat that. e.g. (x | y).map(lambda x: x) | z should really have the same distribution as x | y | z but doesn’t if you special case |.

But under the hood Hypothesis just works with streams of bytes, so in theory you can represent its data distribution as a finite state machine. You could then randomly walk the state machine reweighting your choices by the number of reachable states and thus get much get better coverage of the set of available paths by reaching rare paths much more often.

Or, somewhat suggestively, here’s another algorithm that could work well and is a lot simpler than reweighting:

  1. Pick a state uniformly at random
  2. Take the shortest path to that state (or, indeed, any random path to that state).
  3. Continue the random walk from there

Unfortunately we run into a major problem with either of these approaches: Reconstructing regular languages for unknown predicates is really hard.

The insight I had this afternoon is this: We don’t need to! We can use the same trick as I used to produce self-optimizing boolean expressions to just store exemplars of distinct states we’ve found. We then just pick a random state we’ve already discovered and run the generation from there (Hypothesis can extend any string to a complete but possibly invalid example).

How does this work?

We maintain three sets of data:

  1. A corpus of examples such that we know that every element of this list definitely corresponds to a different state in the state machine for our predicate.
  2. A set of previously seen strings (this doesn’t have to be precise and can e.g. be a bloom filter)
  3. A binary search tree with branches labelled by strings and leaves labelled by indices into our list of examples.

We populate this as follows:

  1. The corpus starts as the empty string
  2. The set of previously seen strings contains just the empty string
  3. The binary tree is a single leaf pointing to 0 (the index of the empty string).

We then fuzz as follows:

  1. Pick a random element of our corpus.
  2. Generate an example such that that random element is a strict prefix of the example (it’s easy to do this in Hypothesis).
  3. For each proper prefix of that example, perform the incorporate operation.

The incorporate operation when run on a string s is:

  1. If the string is in our set of seen strings, stop. Else, add it to the set.
  2. Walk the binary tree until we hit a leaf. At each branch look at the label e. If the concatenation of s followed by e is valid, walk right. Else, walk left.
  3. Once we are at a node, find the string in our corpus index by it. Call that t. By repeatedly fuzzing extending s and t, attempt to find e such that s + e and t + e produce different results. If we do find such an e, add s to the corpus as a new element and split the current tree node, creating a branch labelled by e and with the left and right children being leaves containing s and t (which one is which depending on which way around the difference occurred). Otherwise, if s is a better example than t (shorter, or same length and lexicographically smaller), replace t in the corpus with s.

The experiments we split on witness that two nodes must be in different states (cf. the Myhill-Nerode theorem), so we guarantee that our corpus consists of unique states. The fuzzing step is probabilistic, so we can sometimes discard a state inappropriately, but this shouldn’t be a major problem in practice – states that we can’t distinguish with high probability might as well be the same from a fuzzing point of view.

Anyway, like I said, it’s currently unclear how useful this is because I haven’t actually implemented it! In particular I suspect it’s more useful for long running fuzzing processes than for the sort of fast fuzzing Hypothesis tries to do. Fortunately I’m planning how to incorporate such workflows into Hypothesis. However I definitely still think it’s a good idea and I’m looking forward to exploring it further.

This entry was posted in Hypothesis, programming on by .

Delta debugging with inferred separators

Delta debugging, or something akin to it, tends to be phase one of most file based shrinkers: Shrinking is much easier the faster your test function is, test functions tend to be faster on smaller data, and shrinking is also easier on smaller data because you’ve got fewer things to try, so anything you can do to prune down the file before you get onto the hard work is good, and delta debugging like algorithms do a pretty good job of that.

But delta debugging in its original form is very line oriented: You’re trying to make the file one-minimal in the sense that you can’t delete any single line. What do you do if you want to delete within lines? Or if your file is binary and not line oriented at all?

Delta debugging really works on arbitrary sequences, so you can always just do bytewise delta debugging and it will work perfectly well, but there are usually a lot more bytes than there are lines, and deleting an arbitrary block of bytes is much less likely to work than deleting a sequence of lines is. So it works, but it takes what is often a rather slow process and makes it even slower.

I spotted a trick this morning that works rather well. Define the set of separators for a string as containing all one or two byte substrings.

Now for each separator, we perform delta debugging on the file split by that separator (i.e. we treat it as if it were the newline in our classic delta debugging: We split by separator, try to minimize that sequence subject to the constraint that joining by that separator is interesting).

On it’s own, this works but not all that well because most separators don’t work or work no better than shrinking bytewise, but there’s a refinement we can perform, which is that we can give each separator a score: The score of a separator is the smallest length in bytes of any component in the split. We then process separators in decreasing order of score.

The reason this works is that this guarantees that on a successful shrink we always delete at least the score of the separator many bytes (because we always delete at least one component). This lets us focus on separators which produce large shrinks first, so by the time we get to less useful separators we should have much less to do and already be running pretty fast.

So far I’ve only run this on one example (John Regehr’s from Reducers are Fuzzers), but it works pretty well there – much better than a straightforward bytewise delta debugging certainly, and the file has a lot of reduction potential other than newlines).

In this example at least it’s particularly interesting noticing which separators seem to work well – the obvious ones like ‘ ‘ and ‘\n’ do of course, but ‘{‘ worked particularly well for this example, as does ‘.c’ and ‘e;’. I think this is because they are by luck or reality capturing some nicely exploitable facet of the structure of the example, which suggests this idea has broader application.

I’m not sure these techniques will make it into Hypothesis, as Hypothesis has some special hinting to figure out what’s best to delete that is probably already good enough, and this seems like it would be a technique that worked better for the problem of reducing slow operations on large files, but I’m still pretty pleased with it and will probably find an opportunity to use it somewhere.

This entry was posted in Hypothesis, programming, Python on by .

Gerrymandering take two

So yesterday I wrote about how you can use Z3 to solve Gerrymandering. I was unimpressed by Z3’s performance so I decided to rewrite it with a MILP solver instead. It took me a while to figure out a good encoding (which is why I didn’t do this in the first place), but eventually I did and unsurprisingly the results were just infinitely better. Instead of struggling with 20 districts the LP solver based approach starts to take a couple of seconds when you ask it to solve 1000 districts. Based on past experiences, I’d place bets that more than 50% of that time was PulP having a hard time rather than the underlying solver.

This very much reaffirms my prejudice that despite most open source MILP solvers being abandonware with flaky tooling and despite SMT solvers being the new hotness which solve vastly more problems than MILP solvers do, SMT solvers are bullshit and should be avoided at all costs whileas MILP solvers are amazing and should be used at all opportunity.

Of course, then I was staring at the shape of the solutions and I realised that I was being ridiculous and that there is an obvious greedy algorithm that gets the correct answer 100% of the time: Starting from the smallest district and working upwards, put just enough people in that district to win it. If you don’t have enough people to win it, put the rest of your people wherever because there are no more districts you can win.

I’ve posted some updated code with all three implementations along with some tests that demonstrate they always produce the same answer (using Hypothesis, naturally)

This entry was posted in Hypothesis, Python, voting on by .

Buy me books!

A thing people sometimes ask me is how they can donate to Hypothesis. Historically the answer has been that they can’t – I’d rather people pay for services, support contracts, etc. If companies want to donate I can definitely figure something out for you, but individual donations are both not enough to make the difference between sustainability and not and also honestly I feel a little weird about it.

I probably will arrange some better funding options like this, but in general I’d much rather companies rather than individuals pay me.

But it occurred to me yesterday that a) There are a lot of books that it would be useful for me to have and b) For no obviously sensible reason, I do not feel at all weird about people buying me books. People buying me books is great.

So, to the point! I now have a public amazon wish list for people who want to say thanks for Hypothesis. You can buy me things off it if you’d like to do so. You shouldn’t feel obligated to – it’s totally fine if you can’t afford to or don’t want to – but obviously it would be super appreciated if you do.

Things to note:

  • For now at least it will be entirely books. I may add non-book things later but I’ve no real plans to.
  • I’m always up for more book recommendations to add to the list.
  • It’ll give me your name and a note you can add or not as you prefer, but not share either of our addresses with the other.
  • The general theme of the list is mostly going to be my research interests and other things I can reasonably claim are work related. These are quite Hypothesis centric, so buying me books from it will tend to have useful consequences for Hypothesis development, but no promises.
  • Everyone who buys me something will of course get a public thanks (unless you ask me not to).
  • Naturally, you’re welcome to buy me things off this list as thanks for things other than Hypothesis.

Somewhat to my surprise, so far it’s gone really well! Rob Smallshire has bought me Debug It. I’ve also been bought Combinatorial Optimization and Growing Object Oriented Software, but I won’t know by whom until they arrive (I only know this by process of elimination. Amazon doesn’t tell me anything until I get the book).

This entry was posted in Hypothesis, programming, Python, Uncategorized on by .

The easy way to get started with property based testing

There seems to be a common impression that property based testing is really for when you have code that manipulates perfect platonic objects with elegant algebraic relationships between them.

And, well, don’t get me wrong. When you do have those, property-based testing really shines. It’s an incredibly powerful way to test them, and it works really well.

But… that’s mostly because this is a realm that is incredibly easy to test compared to the complexity of getting it right, so almost anything you can do to improve your testing will help it. It’s less that that’s what property based testing is good for, and more that it’s easy to try new things when you’re playing testing on easy mode.

Here’s a recipe for getting started when you’re not playing testing on easy mode:

  1. Pick a function in your code base that you want to be better tested
  2. Call it with random data

This will possibly require you to figure out how to generate your domain objects. Hypothesis has a pretty extensive library of tools for generating custom types, but if you can try to start somewhere where the types you need aren’t too complicated to generate.

Chances are actually pretty good that you’ll find something wrong this way if you pick a sufficiently interesting entry point. For example, there’s a long track record of people trying to test interesting properties with their text handling and getting unicode errors when text() gives them something that their code didn’t know how to handle (The astral plane: It’s not just for D&D).

You’ll probably get exceptions here you don’t care about. e.g. some arguments to functions may not be valid. Set up your test to ignore those.

So at this point you’ll have something like this:

from hypothesis import given, reject
 
@given(some(), strategies())
def test_some_stuff(x, y):
    try:
       my_function(x, y)
    except (Ignorable, Exceptions):
       reject()

(reject simply filters out the example – you’re trying to find a large number of examples that don’t raise any of those exceptions).

This is already a pretty good starting point and does have a decent tendency to flush out bugs. You’ll often find cases where you forgot some boundary condition and your code misbehaves as a result. But there’s still plenty of room to improve.

There are now two directions you can go in from here:

  1. Try to assert some things about the function’s result. Anything at all. What type is it? Can it be None? Does it have any relation at all to the input values that you can check? It doesn’t have to be clever – even very trivial properties are useful here.
  2. Start making your code more defensive

The second part is probably the most productive one.

The goal is to turn faulty assumptions in your code into crashes instead of silent corruption of your application state. You can do this in a couple ways:

  1. Add argument checking to your functions (Hypothesis uses a dedicated InvalidArgumentException for this case, but you can raise whatever errors you find appropriate)
  2. Add a whole bunch of assertions into your code itself

Even when it’s hard to reason about formal properties of your code, it’s usually relatively easy to add local properties, and assertions are a great way to encode them. I like this post by John Regehr on the subject.

This approach will make your code more robust even if you don’t find any bugs in it during testing (and you’ll probably find bugs in it during testing), and it gives you a nice easy route into property based testing by letting you focus on only one half of the problem at a time.

And, of course, if you’re still having trouble getting started with property-based testing, the other easy way is to persuade your company to hire me for a training course. Drop me an email if you’re interested.

This entry was posted in Hypothesis, programming, Python on by .