Category Archives: Python

Finding more bugs with less work

I was at PyCon UK this weekend, which was a great conference and I will definitely be attending next year.

Among the things that occurred at this conference is that I gave my talk, “Finding more bugs with less work”. The video is up, and you can see the slides here.

I may do a transcript at some point (like I did for my django talk), but I haven’t yet.

This entry was posted in Hypothesis, programming, Python on by .

The repr thing

In case you haven’t noticed, some parts of Hypothesis are designed with a lot of attention to detail. Some parts (particularly internals or anything that’s been around since the beginning) are a bit sloppy, some are quite well polished, and some of them are pedantic beyond the ken of mortal man and you would would be forgiven for wondering what on earth I was on when I was writing them.

The repr you get from standard strategies is one of those sections of which I am really quite proud, in a also slightly embarrassed sort of way.

>>> import hypothesis.strategies as st
>>> st.integers()
integers()
>>> st.integers(min_value=1)
integers(min_value=1)
>>> st.integers(min_value=1).map(lambda x: x * 2)
integers(min_value=1).map(lambda x: )
>>> st.integers(min_value=1) | st.booleans()
integers(min_value=1) | booleans()
>>> st.lists(st.integers(min_value=1) | st.booleans(), min_size=3)
lists(elements=integers(min_value=1) | booleans(), min_size=3)

Aren’t those reprs nice?

The lambda one bugs me a bit. If this had been in a file you’d have actually got the body of the lambda, but I can’t currently make that work in the python console. It works in ipython, and fixing it to work in the normal console would require me to write or vendor a decompiler in order to get good reprs and… well I’d be lying if I said I hadn’t considered it but so far a combination of laziness and judgement have prevailed.

This becomes more interesting when you realise that depending on the arguments you pass in a strategies function may return radically different implementations. e.g. if you do floats(min_value=-0.0, max_value=5e-324) then there are only three floating point numbers in that range, and you get back something that is more or less equivalent to sampled_from((-0.0, 0.0, 5e-324)).

How does all this work?

Well, most of this is done with a single decorator and a bunch of pain:

def defines_strategy(strategy_definition):
    from hypothesis.internal.reflection import proxies, arg_string, \
        convert_positional_arguments
    argspec = getargspec(strategy_definition)
    defaults = {}
    if argspec.defaults is not None:
        for k in hrange(1, len(argspec.defaults) + 1):
            defaults[argspec.args[-k]] = argspec.defaults[-k]
 
    @proxies(strategy_definition)
    def accept(*args, **kwargs):
        result = strategy_definition(*args, **kwargs)
        args, kwargs = convert_positional_arguments(
            strategy_definition, args, kwargs)
        kwargs_for_repr = dict(kwargs)
        for k, v in defaults.items():
            if k in kwargs_for_repr and kwargs_for_repr[k] is defaults[k]:
                del kwargs_for_repr[k]
        representation = u'%s(%s)' % (
            strategy_definition.__name__,
            arg_string(strategy_definition, args, kwargs_for_repr)
        )
        return ReprWrapperStrategy(result, representation)
    return accept

What’s this doing?

Well, ReprWrapper strategy is more or less what it sounds like: It wraps a strategy and provides it with a custom repr string. proxies is basically functools.wrap but with a bit more attention given to getting the argspec exactly right.

So in this what we’re doing is:

  1. Converting all positional arguments to their kwargs equivalent where possible
  2. Removing any keyword arguments that are exactly the default
  3. Producing an argument string that when invoked with the remaining args (from varargs) and any keyword args would be equivalent to the ones that were actually passed in (Special note: The keyword arguments are ordered in the order of the argument lists, alphabetically and after real keyword arguments for kwargs. This ensures that we have a stable repr that doesn’t depend on hash iteration order (why are kwargs not an OrderededDict?).

Most of the heavy lifting in here is done in the reflection module, which is named such mostly because myhateforthepythonobjectmodelburnswiththefireoftenthousandsuns was too long a module name.

Then we have the bit with map().

Here is the definition of repr for map:

    def __repr__(self):
        if not hasattr(self, u'_cached_repr'):
            self._cached_repr = u'%r.map(%s)' % (
                self.mapped_strategy, get_pretty_function_description(
                    self.pack)
            )
        return self._cached_repr

We cache the repr on first evaluation because get_pretty_function_description is quite slow (not outrageously slow, but quite slow), so we neither want to call it lots of times nor want to calculate it if you don’t need it.

For non-lambda functions, get_pretty_function_description returns their __name__. For lambdas, it tries to figure out their source code through a mix of inspect.getsource (which doesn’t actually work, and the fact that it doesn’t work is considered notabugwontfix) and some terrible terrible hacks. In the event of something going wrong here it returns the “lambda arg, names: <unknown>” we saw above. If you pass something that isn’t a function (e.g. a functools.partial) it just returns the repr so you see things like:

>>> from hypothesis.strategies import integers
>>> from functools import partial
>>> def add(x, y):
...     return x + y
... 
>>> from functools import partial
>>> integers().map(partial(add, 1))
integers().map(functools.partial(, 1))

I may at some point add a special case for functools.partial because I am that pedantic.

This union repr is much more straightforward in implementation but still worth having:

    def __repr__(self):
        return u' | '.join(map(repr, self.element_strategies))

Is all this worth it? I don’t know. Almost nobody has commented on it, but it makes me feel better. Examples in documentation look a bit prettier, it renders some error messages and reporting better, and generally makes it a lot more transparent what’s actually going on when you’re looking at a repr.

It probably isn’t worth the amount of effort I’ve put into the functionality it’s built on top of, but most of the functionality was already there – I don’t think I added any new functions to reflection to write this, it’s all code I’ve repurposed from other things.

Should you copy me? No, probably not. Nobody actually cares about repr quality as much as I do, but it’s a nice little touch that makes interactive usage of the library a little bit easier, so it’s at least worth thinking about.

This entry was posted in Hypothesis, Python on by .

Hypothesis: Staying on brand

You know that person who is always insisting that you use the right capitalisation for your company, you have to use the official font, it’s really important whether you use a dash or a space and you absolutely must use the right terminology in all public communication, etc, etc? Aren’t they annoying? Don’t they seem to be ridiculously focused on silly details that absolutely don’t matter?

I used to feel that way too. I mean I generally humoured them because it always feels like other peoples’ important details don’t matter and that’s usually a sign that you don’t understand their job, but I didn’t really believe that it mattered.

Then I made Hypothesis and now I’m that person.

There’s a long list of guidelines on how I communicate Hypothesis that I have literally never communicated to anyone so I really have no right to get even slightly annoyed when people don’t follow it. To be honest, I have no real right to get annoyed when people don’t follow it even if they have read this post. So consider the intent of this post more “here is what I do, I would appreciate if you do the same when talking about Hypothesis and it will annoy me slightly if you don’t but you are entirely welcome to do your own thing and I appreciate you talking about Hypothesis however you do it”.

Without further ado, here are the Hypothesis brand guidelines.

The little things

  1. You’re probably going to pronounce my name wrong unless you’ve read the pronunciation guide.
  2. Hypothesis is a testing library, not a testing framework. This is actually important, because a thing that people never seem to realise (possibly because I don’t communicate it clearly enough, but it does say so in the documentation) is that Hypothesis does not have its own test runner, it just uses your normal test runners.
  3. I try not to use the phrase “Hypothesis tests” (I slip up on this all the time when speaking) because that’s a statistical concept. I generally use “Tests using Hypothesis”. It’s more awkward but less ambiguous.
  4. Hypothesis isn’t really a Quickcheck. I describe it as “Inspired by Quickcheck” rather than “Based on Quickcheck” or “A Quickcheck port”. It started out life as a Quickcheck port, but modern Hypothesis is a very different beast, both internally and stylistically.

The big thing

All of the classic Quickcheck examples are terrible. Please don’t use them. Pretty please?

In particular I hate the reversing a list example. It’s toy, there’s no easy way to get it wrong, and it’s doing this style of testing a great injustice by failing to showcase all the genuinely nasty edge cases it can find.

In general I try never to show an example using Hypothesis that does not expose a real bug that I did not deliberately introduce. Usually producing this it is enough to write a medium complexity example which you know can be well tested with Hypothesis, then add Hypothesis based tests. You can write it TDD if you like as long as you don’t use Hypothesis to do so.

The Tone

The tone I generally try to go for is “Computers are terrible and you can never devote enough time to testing, so Hypothesis is a tool you can add to your arsenal to make the time you have more effective”.

Disclaimers and safety warnings

  1. Hypothesis will not find all your bugs.
  2. Hypothesis will not replace all your example based tests.
  3. Hypothesis isn’t magic.
  4. Hypothesis will not grant you the ability to write correct code, only help you understand ways in which your code might be incorrect.

The Rest

There are probably a bunch of things I’ve forgotten on this, and I will update the list as I think of them.

This entry was posted in Hypothesis, Python on by .

Soliciting advice: Bindings, Conjecture, and error handling

Edit: I think I have been talked into a significantly simpler system than the one described here that simply uses error codes plus some custom hooks to make this work. I’m leaving this up for posterity and am still interested in advice, but don’t worry I’ve already been talked out of using setjmp or implementing my own exception handling system.

I’m working on some rudimentary Python bindings to Conjecture and running into a bit of a problem: I’d like it to be possible to run Conjecture without forking, but I’m really struggling to come up with an error handling interface that works for this.

In Conjecture’s current design, any of the data generation functions can abort the process, and are run in a subprocess to guard against that. For testing C this makes absolute sense: It’s clean, easy to use, and there are so many things that can go wrong in a C program that will crash the process that your C testing really has to be resilient against the process crashing anyway so you might as well take advantage of that.

For Python, this is a bit sub-optimal. It would be really nice to be able to run Conjecture tests purely in process just looking for exceptions. os.fork() has to do a bunch of things which makes it much slower than just using C forking straight off (and the program behaves really weirdly when you send it a signal if you try to use the native fork function), and it’s also just a bit unneccessary for 90% of what you do with Python testing.

It would also be good to support a fork free mode so that Conjecture can eventually work on Windows (right now it’s very unixy).

Note: I don’t need forkless mode to handle crashes that are not caused by an explicit call into the conjecture API. conjecture_reject and conjecture_fail (which doesn’t exist right now but could) will explicitly abort the test, but other things that cause a crash are allowed to just crash the process in forkless mode.

So the problem is basically how to combine these interfaces, and every thing I come up with seems to be “Now we design an exception system…”

Here is the least objectionable plan I have so far. It requires a lot of drudge work on my part, but this should mostly be invisible to the end user (“Doing the drudge work so you don’t have to” is practically my motto of good library design)

Step 1: For each draw_* function in the API, add a second draw_*_checked function which has exactly the same signature. This does a setjmp, followed by a call to the underlying draw_* function. If that function aborts, it does a longjmp back to the setjmp and sets a is_aborted flag and returns some default value. Bindings must always call the _checked version of the function, then check conjecture_is_aborted() and convert it into a language appropriate error condition.

Note: It is a usage error to call one checked function from another and this will result in your crashing the process. Don’t do that. These are intended to be entry points to the API, not something that you should use in defining data generators.

Step 2: Define a “test runner” interface. This takes a test, some associated data, and runs it and returns one of three states: Passing test, failing test, rejected test. The forking based interface then becomes a single test runner. Another one using techniques similar to the checked interface is possible. Bindings libraries should write their own – e.g. a Python one would catch all exceptions and convert them into an appropriate response.

Step 3: Define a cleanup API. This lets you register a void (*cleanup)(void *data) function and some data to pass to it which may get called right before aborting. In “crash the process” model it is not required to be called, and it will not get called if your process otherwise exits abnormally. Note: This changes the memory ownership model of all data generation. Data returned to you from generators is no longer owned by you and you may not free it.

I think this satisfies the requirements of being easy to use from both C and other languages, but I’m a little worried that I’m not so much reinventing the wheel as trying to get from point A to point B without even having heard of these wheel things and so I invented the pogo stick instead. Can anyone who has more familiarity with writing C libraries designed to be usable from both C and other languages offer me some advice and/or (heh) pointers?

This entry was posted in Hypothesis, programming, Python on by .

Mighty morphing power strategies

The Hypothesis API is a bit of a bizarre sleight of hand. It pretends to be very clean and simple, but that’s mostly a distraction to stop you thinking too hard about it. Everything looks easy, so you aren’t surprised when it just works, and you don’t think too hard and realise that what it just did should actually have been impossible.

Take for example this piece of code using the Django integration (this comes straight from the docs):

from hypothesis.strategies import lists, just
 
def generate_with_shops(company):
  return lists(models(Shop, company=just(company))).map(lambda _: company)
 
company_with_shops_strategy = models(Company).flatmap(generate_with_shops)

We take a strategy that generates a model inserted into the database, then use flatmap to create a strategy for children of that and bind that in. Everything just works neatly and everything will simplify just fine – the list of children can be simplified, both by throwing away children and simplifying individual children, the original element can be simplifies, everything lives in the database, all is good with the world. Examples will be persisted into the Hypothesis example database as normal. Everything works great, no reason to think twice about it.

Except it is completely ridiculous that this works, and it’s certainly unique to Hypothesis. No other Quickcheck has anything like it.

Some of this is just the magic of Hypothesis templating at work. There’s a lot more information available than is present in the end result, and this also explains how you can mutate the value by adding children to it and have simplification still work, etc.

But there’s something that should make you very suspicious going on here: We cannot create the strategy we need to generate the children until we have already performed some side effects (i.e. put some things in the database). What could the template for this possibly be?

The answer to this is quite bad. But it’s quite bad and hidden behind another abstraction layer!

The answer is that we have a type called Morpher. As far as we are concerned for now, a Morpher has one method called “become”. You call my_morpher.become(my_strategy) and you get a value that could have been drawn from my_strategy.

You can think of Morphers as starting out as a reproducible way of getting examples from strategies, but there’s more to it than that, for one very important reason: A Morpher can be simplified and serialized.

This gives us a very easy implementation of flatmap:

def flatmap(self, f):
    return builds(lambda s, m: m.become(f(s)), self, morphers())

i.e. we generate an element of the strategy, apply f to it to get a new strategy, and then tell the morpher to become an instance of that. Easy!

Easy, that is, except for the fact that it still looks completely impossible, because we’re no closer to understanding how morphers work.

I’m not going to explain how they work in too much detail, because the details are still in flux, but I’ll sketch out how the magic works and if you want the gory details you can check the code.

As it starts out, a Morpher is simple: It contains a random seed for a parameter value and a template value, and become() just draws parameters and templates with those standard seeds and then reifies the result.

This would achieve all of the desired results except for simplification: You can save the seeds and you can now generate.

So how do we simplify? Noting that each time we may have to become a different strategy and that templates are not compatible between strategies.

There is a two part trick to this:

  1. The template for a Morpher (which is actually the Morpher itself) is secretly mutable (actually quite a lot of Hypothesis template strategies are mutable). When we call become() on a Morpher, the strategy used is stored on the template for later so we have access to it when we want to simplify, as is the template that was produced in the course of the become() call.
  2. As well as storing a random seed we also store a list of serialized representations of possible templates. These are the representations that would be used when saving in the database. The reason for this is that the Hypothesis database has the following really helpful invariant: Any serialized representation can either be turned into a valid template for the strategy or rejected as invalid. Moreover the representations are quite straightforward, so usually similar strategies will have compatible representations.
  3. When we wish to become a strategy, we first try our serialized representations in order to see if one of them produces a valid template. If it does, we use that template, otherwise if we reach the end we generate a fresh one using the random seed method mentioned above.
  4. When we simplify, we try to simplify the last template generated with the last strategy used, and then replace the representation that generated that strategy with the simplified form of it, thus generating a new morpher with the same seed and parameter but a simplified serialized representation.

If you’re reading that thinking that it’s horrifying, you’re not wrong. It’s also quite fragile – there are definitely some corner cases of it that I haven’t quite shaken out yet, and it’s why flatmap is somewhat slower and buggier than things using more normal methods of generation.

But it’s also really powerful, because you can use this technology for things other than flatmap. I originally intended it as a shared implementation between flatmap and the stateful testing, although for various reasons I haven’t got around to rebuilding the stateful testing on top of it yet. It’s also what powers a really new cool new feature I released today (I know I said I wasn’t doing new features, but I couldn’t resist and it only took me an hour), which is essentially a form of do notation for Hypothesis strategies (only more pythonic).

@composite
def list_and_sample(draw, elements):
    values = draw(lists(elements, min_size=1))
    redraw = draw(lists(sampled_from(values)))
    return (values, redraw)

list_and_sample(integers) now gives you a strategy which draws a list of at least one integer, then draws another sample from that list (with replacement)

What this does is give you a magic “draw” function that produces examples from a strategy, and composes these all together into a single strategy built on repeatedly calling that draw function as many times as you like.

This is also black magic, but it’s not novel black magic: It’s just morphers again. We generate an infinite stream of morphers, then map a function over it. draw maintains a counter, and the nth time you call it it gets the nth element from the stream and tell it to become the strategy that you’ve just passed in. There’s a bit more fiddling and details to work out in terms of making everything line up right, but that’s more or less it. We’ve vastly simplified the definition of strategies that you would previously have used an ugly chain of lambdas and flatmaps to build up.

If you want to read more about this feature, I commend you to the documentation. It’s available in the latest release (1.11.0), so have fun playing with it.

This entry was posted in Hypothesis, Python on by .