Author Archives: david

Shrinking failing input using a SAT solver

Advance warning: This is mostly me writing up a negative to neutral result. The technique described in this post works OK, but it doesn’t seem to work better than a well optimized delta debugging and as such does not currently justify its complexity. It also doesn’t work very well when you start from large examples, but that’s fortunately not a huge problem using the ideas I outlined in my last post because you can work with smaller sequences until the example is small enough to be feasible when approached bytewise.

My hope was that it would give a system that was much better at finding longer intervals to delete (delta debugging only guarantees that you can not delete any byte. Guaranteeing that you cannot delete any interval provably requires a quadratic number of tests to check, but I was hoping that this would be able to exploit structure found in the string to find more intervals that pure delta debugging would have missed.

I’ve got some ideas I want to pursue to extend it that might make it worth it, but this post just describes the current state of work in progress. Also I might end up incorporating it into my shrinker anyway and hope that I can improve its working with more cleverness later (I already have a prototype of doing this).

Anyway, disclaimers over, on to the idea.

We are trying to take a string \(S\) and a predicate \(f\). We are trying to construct a string \(T\) of minimal length such that \(f(T)\) is true.

This builds on my previous idea of using regular language induction to aid shrinking. The version I described in that post appears to be too slow to make work.

The core idea is still that we are looking for a DFA with fewer than \(|S|\) nodes such that every accepted string satisfies \(f\). If we can find such a DFA then we can shrink \(S\) because the shortest path to an accepted node must have no more than the number of nodes, which is fewer than \(|S|\) by assumption.

The idea is to try to manufacture a DFA from \(S\) using a SAT solver (Disclaimer: I actually used the naive algorithm that they aimed to replace because I didn’t quite understand their work and wanted to prototype something quickly) to colour the nodes in a manner consistent with the transitions observed in \(S\). (i.e. if we colour the indices \(i\) and \(j\) the same and \(S[i] = S[j]\) then we colour \(i + 1\) and \(j + 1\) the same).

We maintain a set of pairs of indices which we know do not correspond to the same state in the DFA. We initialize it to \(\{(0, |S|\}\) at the start and will add to it as we go.

We now produce a colouring consistent with that set and the transition requirement.

If that colouring gives a unique colour to every index, stop. There is no DFA with fewer than \(|S|\) nodes that matches \(S\) and.

Otherwise, produce the quotient DFA and look at the minimal path to an accepting node. If it satisfies \(f\), we have shrunk the string. Start again from the beginning with the new string (this usually doesn’t work, but it’s worth checking).

If that didn’t work, we now produce either a strictly smaller string satisfying \(f\) or we increase the size of our set of known inconsistencies as follows:

For each colour, we look at the smallest and the largest index in it, \(i, j\). If deleting the interval \([i, j)\) from the  string produces a string satisfying \(f\) then we have shrunk the string. Start again from the beginning with the new string. Otherwise, we have proved that the subsequences of \(S\) up to \(i\) and \(j\) respectively must live in distinct states (because following the suffix of \(S\) starting at \(j\) produces different results). Add \((i, j)\) to the set of inconsistencies. Do this for each colour (starting from the ones which produce the largest gap so we maximize how much we shrink by), and then if none of them produced a shrink then try colouring again.

Eventually we’ll either hit the point where we need all \(|S| + 1\) colours to colour the indices, or we’ll have shrunk the string.

Note: I think this doesn’t actually guarantee even 1-minimality like delta debugging does. The problem is that you might have an index that is not equivalent to its successor index even though you can delete the character there. I’m not currently sure if this algorithm can witness that possibility or not.

Anyway, I’ve been running the example I used to prototype structureshrink (the C++ hello world) with this as the list minimization algorithm. It seems to do alright, but it’s nothing to write home about, but it’s been an interesting experiment and I’m glad to have got the excuse to play with minisat.

This entry was posted in programming, Python on by .

Delta debugging with inferred separators

Delta debugging, or something akin to it, tends to be phase one of most file based shrinkers: Shrinking is much easier the faster your test function is, test functions tend to be faster on smaller data, and shrinking is also easier on smaller data because you’ve got fewer things to try, so anything you can do to prune down the file before you get onto the hard work is good, and delta debugging like algorithms do a pretty good job of that.

But delta debugging in its original form is very line oriented: You’re trying to make the file one-minimal in the sense that you can’t delete any single line. What do you do if you want to delete within lines? Or if your file is binary and not line oriented at all?

Delta debugging really works on arbitrary sequences, so you can always just do bytewise delta debugging and it will work perfectly well, but there are usually a lot more bytes than there are lines, and deleting an arbitrary block of bytes is much less likely to work than deleting a sequence of lines is. So it works, but it takes what is often a rather slow process and makes it even slower.

I spotted a trick this morning that works rather well. Define the set of separators for a string as containing all one or two byte substrings.

Now for each separator, we perform delta debugging on the file split by that separator (i.e. we treat it as if it were the newline in our classic delta debugging: We split by separator, try to minimize that sequence subject to the constraint that joining by that separator is interesting).

On it’s own, this works but not all that well because most separators don’t work or work no better than shrinking bytewise, but there’s a refinement we can perform, which is that we can give each separator a score: The score of a separator is the smallest length in bytes of any component in the split. We then process separators in decreasing order of score.

The reason this works is that this guarantees that on a successful shrink we always delete at least the score of the separator many bytes (because we always delete at least one component). This lets us focus on separators which produce large shrinks first, so by the time we get to less useful separators we should have much less to do and already be running pretty fast.

So far I’ve only run this on one example (John Regehr’s from Reducers are Fuzzers), but it works pretty well there – much better than a straightforward bytewise delta debugging certainly, and the file has a lot of reduction potential other than newlines).

In this example at least it’s particularly interesting noticing which separators seem to work well – the obvious ones like ‘ ‘ and ‘\n’ do of course, but ‘{‘ worked particularly well for this example, as does ‘.c’ and ‘e;’. I think this is because they are by luck or reality capturing some nicely exploitable facet of the structure of the example, which suggests this idea has broader application.

I’m not sure these techniques will make it into Hypothesis, as Hypothesis has some special hinting to figure out what’s best to delete that is probably already good enough, and this seems like it would be a technique that worked better for the problem of reducing slow operations on large files, but I’m still pretty pleased with it and will probably find an opportunity to use it somewhere.

This entry was posted in Hypothesis, programming, Python on by .

Gerrymandering take two

So yesterday I wrote about how you can use Z3 to solve Gerrymandering. I was unimpressed by Z3’s performance so I decided to rewrite it with a MILP solver instead. It took me a while to figure out a good encoding (which is why I didn’t do this in the first place), but eventually I did and unsurprisingly the results were just infinitely better. Instead of struggling with 20 districts the LP solver based approach starts to take a couple of seconds when you ask it to solve 1000 districts. Based on past experiences, I’d place bets that more than 50% of that time was PulP having a hard time rather than the underlying solver.

This very much reaffirms my prejudice that despite most open source MILP solvers being abandonware with flaky tooling and despite SMT solvers being the new hotness which solve vastly more problems than MILP solvers do, SMT solvers are bullshit and should be avoided at all costs whileas MILP solvers are amazing and should be used at all opportunity.

Of course, then I was staring at the shape of the solutions and I realised that I was being ridiculous and that there is an obvious greedy algorithm that gets the correct answer 100% of the time: Starting from the smallest district and working upwards, put just enough people in that district to win it. If you don’t have enough people to win it, put the rest of your people wherever because there are no more districts you can win.

I’ve posted some updated code with all three implementations along with some tests that demonstrate they always produce the same answer (using Hypothesis, naturally)

This entry was posted in Hypothesis, Python, voting on by .

Optimal gerrymandering with Z3

Edit to add: See next post for why this was a silly thing to do.

As some of you might recall from before I spent a year writing about Hypothesis, one of my other interests is voting theory. I may be doing a talk about it at some point soon, and am reading about it.

I was thinking about Gerrymandering, and I was wondering what the best (worst) possible degrees of gerrymandering were.

For example, suppose you’ve got a population of 9 split up into 3 districts of 3. If you have 6 people out of nine who want to vote blue, it’s easy to get all three seats as blue (you put 2 in each district), and it’s easy to get only two (put all 6 of your supporters into two districts), but can you be gerrymandered to get only one? It seems implausible, but how would we prove that?

Well, as you probably guessed from the title, it turns out this is one of those things where rather than proving it by hand we can just ask a computer to do it for us! (It’s actually quite easy to prove this specific case, but I was interested in more general ones).

Here’s a Z3 program to work out the answer:

Running this, the answer is that if you have 6 people then you’re guaranteed at least two seats, but if you have 5 then you can be gerrymandered down to only one seat with the opposition claiming 2 (by your opponents splitting their four across two districts so you end up with 3, 1 and 1 in each district respectively).

It takes a list of districts with their populations and a total number of people who support a party, and tries to arrange those people in a way that maximizes the number of seats that party gets.

It only supports two party systems. It wouldn’t be too difficult to expand it to a larger number of parties though. It also doesn’t work very well for large (or even medium really) numbers of districts. It handles 20 districts in about 10 seconds, but the time seems to go up extremely nonlinearly (it handles 10 in about 20 milliseconds) and it doesn’t seem inclined to ever complete with 30 districts. This might be fixable if I understood more about performance tuning Z3.

It may be possible to improve this using the Z3 Optimize stuff, but when I tried to use Optimize I segfaulted my Python interpreter so I decided to steer clear (stuff like this is why I’ve yet to seriously consider Z3 as a backend for anything interesting unfortunately). Instead this just binary searches between a set of constraints to find the best value.

This entry was posted in Python, voting on by .

Buy me books!

A thing people sometimes ask me is how they can donate to Hypothesis. Historically the answer has been that they can’t – I’d rather people pay for services, support contracts, etc. If companies want to donate I can definitely figure something out for you, but individual donations are both not enough to make the difference between sustainability and not and also honestly I feel a little weird about it.

I probably will arrange some better funding options like this, but in general I’d much rather companies rather than individuals pay me.

But it occurred to me yesterday that a) There are a lot of books that it would be useful for me to have and b) For no obviously sensible reason, I do not feel at all weird about people buying me books. People buying me books is great.

So, to the point! I now have a public amazon wish list for people who want to say thanks for Hypothesis. You can buy me things off it if you’d like to do so. You shouldn’t feel obligated to – it’s totally fine if you can’t afford to or don’t want to – but obviously it would be super appreciated if you do.

Things to note:

  • For now at least it will be entirely books. I may add non-book things later but I’ve no real plans to.
  • I’m always up for more book recommendations to add to the list.
  • It’ll give me your name and a note you can add or not as you prefer, but not share either of our addresses with the other.
  • The general theme of the list is mostly going to be my research interests and other things I can reasonably claim are work related. These are quite Hypothesis centric, so buying me books from it will tend to have useful consequences for Hypothesis development, but no promises.
  • Everyone who buys me something will of course get a public thanks (unless you ask me not to).
  • Naturally, you’re welcome to buy me things off this list as thanks for things other than Hypothesis.

Somewhat to my surprise, so far it’s gone really well! Rob Smallshire has bought me Debug It. I’ve also been bought Combinatorial Optimization and Growing Object Oriented Software, but I won’t know by whom until they arrive (I only know this by process of elimination. Amazon doesn’t tell me anything until I get the book).

This entry was posted in Hypothesis, programming, Python, Uncategorized on by .