Author Archives: david

Properties for testing optimisation

I’ve been thinking a bunch recently about how to use Hypothesis to test constrained optimisation problems. I put together this talk for pydata London, and yesterday I sketched out these tests for how you might test a solution to the knapsack problem (both of these are specifically combinatorial optimisation problems, but the principles I’m outlining here work in general).

Essentially there are only two properties to constrained optimisation problems:

  1. For all input problems, the optimised solution satisfies the constraints
  2. For all input problems and solutions satisfying the constraints of those problems, the solution does not score better than the optimised solution.

The first property is usable as is – you write a generator for the problems you’re trying to solve, write a test that runs the optimiser on the problem and spits out a solution, check it satisfies the constraints. This is a super useful test for the correctness of your optimiser and you should definitely write it.

But really the one we’re interested in is the second property. We want to assert that the solution our optimiser provides us with is actually optimal.

The problem is that fuzz testers are not optimisers, and if your optimal solution is worse than simply picking a point at random, basically any sort of test will demonstrate this. You can’t reliably expect Hypothesis or similar to find an improvement just by asking for arbitrary data.

If we have a reference solution (e.g. by restriction to a sufficiently small corner of the space that we can solve the problem through brute force) then that’s one way of doing it: Just run the problem through both your optimiser and the reference solution and assert that they produce solutions with the same score (note: Not necessarily the same solution. There may be multiple equally good optima). This also works if you have a reference but not very good solution – if you implement say a cheap local optimiser and assert that running the local optimiser on a randomly picked point doesn’t improve on your optimal solution, this is still a pretty good test.

But I’m interested in testing the case where we don’t have a reference solution. How would we do that?

Without further ado, here are some properties I’ve figured out that seem to work well:

  1. If we take the optimal solution and perturb it slightly, this should never improve on the optimal solution. Bonus: If you have a local optimiser, perturb the solution slightly and then run the local optimiser on that perturbed solution.
  2. If we make the problem strictly easier (e.g. in the knapsack problem by raising the value of one of the items we’ve already chosen) then this should strictly improve the score if we rerun the optimiser.
  3. If we make the problem strictly harder (e.g. by removing one of the chosen items in the knapsack) and rerun the optimiser, this should never improve the score.

The last seems the most promising to me, because it helps create situations where you break out of local optima. You potentially kill off the local optima you’ve found yourself in and see if you can find a better one. Notably, both of the tests that demonstrated a problem in the knapsack algorithm (increasing the weight of a chosen item, removing a chosen item) are of this form.

The first is also useful, particularly for testing a local optimiser because it helps you find new things you can add in to the optimisation process! e.g. if you find that reversing the solution and rerunning the optimiser on the reversed solution sometimes improves matters, why not iterate that to a fixed point in your optimisation process?

The middle solution is I suspect not that useful because in general things that make the problem easier by promoting stuff that was already chosen, so the optimiser will tend to make the same choice. I might be missing cases where this is actually a useful thing to be doing. Edit: I was wrong, because I’d got the form of the property slightly wrong. What you want to be doing is not “make the current solution better”, it’s “keep the current solution possible but add something in that you could use instead”. In the knapsack packing problem, you can do this by taking an item from the current solution and cloning it but adding one to the value. This means that the current solution is still usable, but there is definitely a solution with a higher score.

I’m going to keep an eye on this, as it seems to be an under-explored area (not completely unexplored – there’s some research on doing more classic fuzz testing on SMT solvers that I need to read up on) and I think it might inform useful heuristics for other places where you can do property based testing (machine learning for example is something that I’m interested in figuring out how to do).

This entry was posted in Hypothesis, Python on by .

Want to take a weekend to write tests?

I’d like to get more people using Hypothesis, and I’d like to raise the general quality of open source software. For bonus, lets see if we can help people out who wouldn’t normally contribute to open source. So, here’s the plan!

  1. Some number of us gather together in London one weekend.
  2. The rest of you pair up and work on writing tests for open source software using Hypothesis, and fixing any bugs you find.
  3. I act in a supervisory and teaching role, running around helping people out when they get stuck or need some advice.

(Obviously there will be a code of conduct for the event. I will probably continue my habit of using Recurse Center social rules despite never having attended Recurse Center).

If this sounds like your cup of tea, coffee, or other beverage of your choice, please put your name down and I’ll keep you apprised of any further developments.

This entry was posted in Hypothesis, Python on by .

Hypothesis 1.7.1 is out

(Note: I’ve realised that this blog has a much higher number of interested readers than the mailing list does, so I’m going to start mirroring announcements here)

As of this past Monday, Hypothesis 1.7.1 (Codename: There is no Hypothesis 1.7.0) is out.

The main feature this release adds is Python 2.6 support. Thanks hugely to Jeff Meadows for doing most of the work for getting this in.

Other features:

  • Strategies now has a permutations() function which returns a strategy yielding permutations of values from a given collection.
  • if you have a flaky test it will print the exception that it last saw before failing with Flaky, even if you do not have verbose reporting on.
  • Slightly experimental git merge script available as “python -m hypothesis.tools.mergedbs”. Instructions on how to use it in the docstring of that file.

This also contains two important counting related bug fixes:

  • floats() with a negative min_value would not have worked correctly (worryingly, it would have just silently failed to run any examples). This is now fixed.
  • tests using sampled_from would error if the number of sampled elements was smaller than min_satisfying_examples.

It also contains some changes to filtering that should improve performance and reliability in cases where you’re filtering by hard to satisfy conditions (although they could
also hurt performance simply by virtue of enabling Hypothesis to find more examples and thus running your test more times!).

This should be a pretty safe upgrade, and given the counting bugs I would strongly encourage you to do so.

This entry was posted in Hypothesis, Uncategorized on by .

Back-pedalling on Python 2.6

I wrote a while back that I wouldn’t support Python 2.6 unless someone paid me to. I still think this is an eminently reasonable position which I encourage most people to take.

Unfortunately I am not going to be taking it.

The problem is that I am in the course of trying to put together a business around Hypothesis, offering consulting and training around using it. This is a much easier sell to large companies, and it’s a much easier sell to companies who are already using it at least somewhat.

Unfortunately, not supporting Python 2.6 out of the box means that many large companies who might otherwise become customers cannot start using Hypothesis until they’ve paid me money, which is a much harder sell. This way in order to get a company to start using Hypothesis I only need to convince developers that it’s great, which is easy because it is,

If it were hard to support Python 2.6 I probably still wouldn’t have done so until there were some serious demand for it, but fortunately Jeff Meadows was extremely kind and did most of the hard work. It turned out to be less hard work than I thought – I think the last time I tried there was still a bunch of strange behaviour that was hard to port that I have since removed – but it was nevertheless a highly appreciated contribution.

So, at some point in the next few days we should be seeing a new minor version of Hypothesis with support for 2.6, as well as a few other minor bug fixes and new features.

If you’re just maintaining a small open source library in your spare time, I would still encourage you to not support 2.6 without a really strong financial incentive to do so, but I think for now it sadly makes sense for me to add the support for free.

This entry was posted in Hypothesis, Python on by .

Some stuff on example simplification in Hypothesis

One of the slightly thankless tasks I engage in in Hypothesis (subtype: People only notice if I don’t do my job well) is example simplification. I spend a lot of time on example simplification – far more than I think anyone else who writes a Quickcheck bothers with.

What is example simplification? Well, when Hypothesis tells you that an example that breaks your test is [1, 2, 3] then it almost certainly did not start out with that. It probably started out with a list about 20 elements long with numbers like 349781419708474 in it, and then prune it down to [1, 2, 3].

Example simplification is important because random data is very messy but there is usually a simple example that demonstrates the bug, and the latter is helpful but the former rarely is.

Simplification is of course not new to Hypothesis – it’s been in Quickcheck for 15 years – but Hypothesis’s example simplification is atypically sophisticated.

So, why so sophisticated? Why is the normal Quickcheck approach not good enough?

Well the official reasons are as follows:

  1. Hypothesis generates much more complex examples than Quickcheck. e.g. Quickcheck will generate a long list of small integers or a short list of large integers, but will never generate a long list of long integers.
  2. Tests in Hypothesis tend to be slower, due to a mix of Python performance and Python idioms.

The unofficial reasons are probably actually closer to being the true reasons:

  1. Including bad examples in documentation is embarrassing
  2. My test suites do a lot of example simplification so faster example simplification gives faster test suites gives a less stressed David
  3. I find combinatorial optimisation fun.

At any rate, the result is that I’ve spent a lot of time on this, so it seemed worth sharing some of the knowledge. I’ve done this in two places. The first is that I gave a talk at the London Big O user group. The slides are also available separately.

I was going to transcribe it, but actually I didn’t think it was a very good talk (I prepared it in a bit of a rush), so instead I’ve built an ipython notebook working through the concepts. This actually goes slightly further than the talk, as I figured out some things at the end that I need to merge back into Hypothesis proper.

This entry was posted in Hypothesis, programming, Python on by .