Author Archives: david

Using z3 to try to solve Feedback Arc Set for Tournaments

I’ve been meaning to check out Z3 for a while – it’s an open source theorem prover from Microsoft with great looking Python bindings. I’m partly interested just out of pure curiousity, partly because there are some interesting potential applications for Hypothesis (it’s currently unclear to me how possible those are – the architecture of Hypothesis is not well set up for concolic testing, nor are the semantics of Python, but still).

Anyway, I thought I’d turn it loose on my favourite evil problem, feedback arc set for tournaments. It’s at that weird boundary where what you want is sortof a SAT solver and sortof an integer linear programming solver. There’s some suggestion that using hybrid approaches would be better than either, but I haven’t really found any easily usable hybrid solvers (there’s a fork of minisat that might work. It’s on my TODO list)

Anyway, basically what we do with z3 is that we create the initial problem, ask it for a feasible solution, then successively ask it to improve on that feasible solution until it no longer can. Here’s the code:

The good: z3 was extremely easy to use. The API was pleasant, reasonably intuitive, and basically did what I expected.

The bad: This solution is strictly worse than a solution which just generates a massive LP file and hands it off to lp_solve. So although it was very easy to solve this problem with z3, z3 appears not to be well suited to it. Z3 struggled at n = 15 and seemed to get stuck at n = 20.

This shouldn’t necessarily be considered indicative that Z3 is “bad”. This is a problem with a lot of constraints (O(n^3)), so it’s probably not representative of the sort of things Z3 is good at, and it is representative of the sort of things that lp_solve is good at. So I’m not really deriving any lessons from the fact that Z3 wasn’t very good at this problem . I’m more interested in the fact that it was super easy to use, so I really should look at using it more in future.

This entry was posted in Python on by .

Properties for testing optimisation

I’ve been thinking a bunch recently about how to use Hypothesis to test constrained optimisation problems. I put together this talk for pydata London, and yesterday I sketched out these tests for how you might test a solution to the knapsack problem (both of these are specifically combinatorial optimisation problems, but the principles I’m outlining here work in general).

Essentially there are only two properties to constrained optimisation problems:

  1. For all input problems, the optimised solution satisfies the constraints
  2. For all input problems and solutions satisfying the constraints of those problems, the solution does not score better than the optimised solution.

The first property is usable as is – you write a generator for the problems you’re trying to solve, write a test that runs the optimiser on the problem and spits out a solution, check it satisfies the constraints. This is a super useful test for the correctness of your optimiser and you should definitely write it.

But really the one we’re interested in is the second property. We want to assert that the solution our optimiser provides us with is actually optimal.

The problem is that fuzz testers are not optimisers, and if your optimal solution is worse than simply picking a point at random, basically any sort of test will demonstrate this. You can’t reliably expect Hypothesis or similar to find an improvement just by asking for arbitrary data.

If we have a reference solution (e.g. by restriction to a sufficiently small corner of the space that we can solve the problem through brute force) then that’s one way of doing it: Just run the problem through both your optimiser and the reference solution and assert that they produce solutions with the same score (note: Not necessarily the same solution. There may be multiple equally good optima). This also works if you have a reference but not very good solution – if you implement say a cheap local optimiser and assert that running the local optimiser on a randomly picked point doesn’t improve on your optimal solution, this is still a pretty good test.

But I’m interested in testing the case where we don’t have a reference solution. How would we do that?

Without further ado, here are some properties I’ve figured out that seem to work well:

  1. If we take the optimal solution and perturb it slightly, this should never improve on the optimal solution. Bonus: If you have a local optimiser, perturb the solution slightly and then run the local optimiser on that perturbed solution.
  2. If we make the problem strictly easier (e.g. in the knapsack problem by raising the value of one of the items we’ve already chosen) then this should strictly improve the score if we rerun the optimiser.
  3. If we make the problem strictly harder (e.g. by removing one of the chosen items in the knapsack) and rerun the optimiser, this should never improve the score.

The last seems the most promising to me, because it helps create situations where you break out of local optima. You potentially kill off the local optima you’ve found yourself in and see if you can find a better one. Notably, both of the tests that demonstrated a problem in the knapsack algorithm (increasing the weight of a chosen item, removing a chosen item) are of this form.

The first is also useful, particularly for testing a local optimiser because it helps you find new things you can add in to the optimisation process! e.g. if you find that reversing the solution and rerunning the optimiser on the reversed solution sometimes improves matters, why not iterate that to a fixed point in your optimisation process?

The middle solution is I suspect not that useful because in general things that make the problem easier by promoting stuff that was already chosen, so the optimiser will tend to make the same choice. I might be missing cases where this is actually a useful thing to be doing. Edit: I was wrong, because I’d got the form of the property slightly wrong. What you want to be doing is not “make the current solution better”, it’s “keep the current solution possible but add something in that you could use instead”. In the knapsack packing problem, you can do this by taking an item from the current solution and cloning it but adding one to the value. This means that the current solution is still usable, but there is definitely a solution with a higher score.

I’m going to keep an eye on this, as it seems to be an under-explored area (not completely unexplored – there’s some research on doing more classic fuzz testing on SMT solvers that I need to read up on) and I think it might inform useful heuristics for other places where you can do property based testing (machine learning for example is something that I’m interested in figuring out how to do).

This entry was posted in Hypothesis, Python on by .

Want to take a weekend to write tests?

I’d like to get more people using Hypothesis, and I’d like to raise the general quality of open source software. For bonus, lets see if we can help people out who wouldn’t normally contribute to open source. So, here’s the plan!

  1. Some number of us gather together in London one weekend.
  2. The rest of you pair up and work on writing tests for open source software using Hypothesis, and fixing any bugs you find.
  3. I act in a supervisory and teaching role, running around helping people out when they get stuck or need some advice.

(Obviously there will be a code of conduct for the event. I will probably continue my habit of using Recurse Center social rules despite never having attended Recurse Center).

If this sounds like your cup of tea, coffee, or other beverage of your choice, please put your name down and I’ll keep you apprised of any further developments.

This entry was posted in Hypothesis, Python on by .

Hypothesis 1.7.1 is out

(Note: I’ve realised that this blog has a much higher number of interested readers than the mailing list does, so I’m going to start mirroring announcements here)

As of this past Monday, Hypothesis 1.7.1 (Codename: There is no Hypothesis 1.7.0) is out.

The main feature this release adds is Python 2.6 support. Thanks hugely to Jeff Meadows for doing most of the work for getting this in.

Other features:

  • Strategies now has a permutations() function which returns a strategy yielding permutations of values from a given collection.
  • if you have a flaky test it will print the exception that it last saw before failing with Flaky, even if you do not have verbose reporting on.
  • Slightly experimental git merge script available as “python -m hypothesis.tools.mergedbs”. Instructions on how to use it in the docstring of that file.

This also contains two important counting related bug fixes:

  • floats() with a negative min_value would not have worked correctly (worryingly, it would have just silently failed to run any examples). This is now fixed.
  • tests using sampled_from would error if the number of sampled elements was smaller than min_satisfying_examples.

It also contains some changes to filtering that should improve performance and reliability in cases where you’re filtering by hard to satisfy conditions (although they could
also hurt performance simply by virtue of enabling Hypothesis to find more examples and thus running your test more times!).

This should be a pretty safe upgrade, and given the counting bugs I would strongly encourage you to do so.

This entry was posted in Uncategorized on by .

Back-pedalling on Python 2.6

I wrote a while back that I wouldn’t support Python 2.6 unless someone paid me to. I still think this is an eminently reasonable position which I encourage most people to take.

Unfortunately I am not going to be taking it.

The problem is that I am in the course of trying to put together a business around Hypothesis, offering consulting and training around using it. This is a much easier sell to large companies, and it’s a much easier sell to companies who are already using it at least somewhat.

Unfortunately, not supporting Python 2.6 out of the box means that many large companies who might otherwise become customers cannot start using Hypothesis until they’ve paid me money, which is a much harder sell. This way in order to get a company to start using Hypothesis I only need to convince developers that it’s great, which is easy because it is,

If it were hard to support Python 2.6 I probably still wouldn’t have done so until there were some serious demand for it, but fortunately Jeff Meadows was extremely kind and did most of the hard work. It turned out to be less hard work than I thought – I think the last time I tried there was still a bunch of strange behaviour that was hard to port that I have since removed – but it was nevertheless a highly appreciated contribution.

So, at some point in the next few days we should be seeing a new minor version of Hypothesis with support for 2.6, as well as a few other minor bug fixes and new features.

If you’re just maintaining a small open source library in your spare time, I would still encourage you to not support 2.6 without a really strong financial incentive to do so, but I think for now it sadly makes sense for me to add the support for free.

This entry was posted in Hypothesis, Python on by .