There is no single acceptable defect rate

This is a response to a good piece called “Infrastructure as code would be easy if we cared”. It’s not required reading for this post, but I recommend it anyway, and I mostly agree with its message.

But there’s one point in it that I feel the need to quibble with, partly because I spent so long making it myself before I realised it was bogus:

But any real business:

  • Knows what their acceptable defect rate is
  • Is already operating at it

This isn’t true.

I’m not going to quibble about “any real business” or “know”. I could, but it would be needless pedantry.

What I want to quibble with is the idea that businesses operate at their acceptable defect rate.

The defect rate a business operates is not a single number that is magically acceptable when all others aren’t. It’s a rate they can’t afford to improve: When making improvements to your defect rate costs more than the reduced rate of defects would save you, you stop trying to improve the rate.

You might call that the acceptable defect rate if you like, but if you do the point is that the acceptable defect rate is inherently unstable and the idea that it would remain the same under changes to your work flow is just untrue.

This is not a trivial point: It means that you can reduce the defect rate and get more money if you change the economics. A reduction in defects that is currently non-viable because it would cost you 100 person hours could suddenly become viable if it cost you 50 person hours instead (it could also become viable if e.g. new regulations come in that make some of those defects really expensive).

And some things do change the costs of reducing defects. Obviously I think Hypothesis is one of those things, because it increases your ability to find defects much more quickly, thus decreasing the cost of fixing them. This is a comparatively rare example of something that changes the cost of finding and fixing defects while keeping most other things fixed, and I genuinely believe that this reduces the defect rate in software.

It’s always tempting to think of the world as immutable and hard to change, but the reality is that it’s mostly just the result of large systems responding to costs and incentives, and small changes in those costs incentives can produce remarkably large effects if you give them time to work.

This entry was posted in Uncategorized on by .

Finding more bugs with less work

I was at PyCon UK this weekend, which was a great conference and I will definitely be attending next year.

Among the things that occurred at this conference is that I gave my talk, “Finding more bugs with less work”. The video is up, and you can see the slides here.

I may do a transcript at some point (like I did for my django talk), but I haven’t yet.

This entry was posted in Hypothesis, programming, Python on by .

The repr thing

In case you haven’t noticed, some parts of Hypothesis are designed with a lot of attention to detail. Some parts (particularly internals or anything that’s been around since the beginning) are a bit sloppy, some are quite well polished, and some of them are pedantic beyond the ken of mortal man and you would would be forgiven for wondering what on earth I was on when I was writing them.

The repr you get from standard strategies is one of those sections of which I am really quite proud, in a also slightly embarrassed sort of way.

>>> import hypothesis.strategies as st
>>> st.integers()
integers()
>>> st.integers(min_value=1)
integers(min_value=1)
>>> st.integers(min_value=1).map(lambda x: x * 2)
integers(min_value=1).map(lambda x: )
>>> st.integers(min_value=1) | st.booleans()
integers(min_value=1) | booleans()
>>> st.lists(st.integers(min_value=1) | st.booleans(), min_size=3)
lists(elements=integers(min_value=1) | booleans(), min_size=3)

Aren’t those reprs nice?

The lambda one bugs me a bit. If this had been in a file you’d have actually got the body of the lambda, but I can’t currently make that work in the python console. It works in ipython, and fixing it to work in the normal console would require me to write or vendor a decompiler in order to get good reprs and… well I’d be lying if I said I hadn’t considered it but so far a combination of laziness and judgement have prevailed.

This becomes more interesting when you realise that depending on the arguments you pass in a strategies function may return radically different implementations. e.g. if you do floats(min_value=-0.0, max_value=5e-324) then there are only three floating point numbers in that range, and you get back something that is more or less equivalent to sampled_from((-0.0, 0.0, 5e-324)).

How does all this work?

Well, most of this is done with a single decorator and a bunch of pain:

def defines_strategy(strategy_definition):
    from hypothesis.internal.reflection import proxies, arg_string, \
        convert_positional_arguments
    argspec = getargspec(strategy_definition)
    defaults = {}
    if argspec.defaults is not None:
        for k in hrange(1, len(argspec.defaults) + 1):
            defaults[argspec.args[-k]] = argspec.defaults[-k]
 
    @proxies(strategy_definition)
    def accept(*args, **kwargs):
        result = strategy_definition(*args, **kwargs)
        args, kwargs = convert_positional_arguments(
            strategy_definition, args, kwargs)
        kwargs_for_repr = dict(kwargs)
        for k, v in defaults.items():
            if k in kwargs_for_repr and kwargs_for_repr[k] is defaults[k]:
                del kwargs_for_repr[k]
        representation = u'%s(%s)' % (
            strategy_definition.__name__,
            arg_string(strategy_definition, args, kwargs_for_repr)
        )
        return ReprWrapperStrategy(result, representation)
    return accept

What’s this doing?

Well, ReprWrapper strategy is more or less what it sounds like: It wraps a strategy and provides it with a custom repr string. proxies is basically functools.wrap but with a bit more attention given to getting the argspec exactly right.

So in this what we’re doing is:

  1. Converting all positional arguments to their kwargs equivalent where possible
  2. Removing any keyword arguments that are exactly the default
  3. Producing an argument string that when invoked with the remaining args (from varargs) and any keyword args would be equivalent to the ones that were actually passed in (Special note: The keyword arguments are ordered in the order of the argument lists, alphabetically and after real keyword arguments for kwargs. This ensures that we have a stable repr that doesn’t depend on hash iteration order (why are kwargs not an OrderededDict?).

Most of the heavy lifting in here is done in the reflection module, which is named such mostly because myhateforthepythonobjectmodelburnswiththefireoftenthousandsuns was too long a module name.

Then we have the bit with map().

Here is the definition of repr for map:

    def __repr__(self):
        if not hasattr(self, u'_cached_repr'):
            self._cached_repr = u'%r.map(%s)' % (
                self.mapped_strategy, get_pretty_function_description(
                    self.pack)
            )
        return self._cached_repr

We cache the repr on first evaluation because get_pretty_function_description is quite slow (not outrageously slow, but quite slow), so we neither want to call it lots of times nor want to calculate it if you don’t need it.

For non-lambda functions, get_pretty_function_description returns their __name__. For lambdas, it tries to figure out their source code through a mix of inspect.getsource (which doesn’t actually work, and the fact that it doesn’t work is considered notabugwontfix) and some terrible terrible hacks. In the event of something going wrong here it returns the “lambda arg, names: <unknown>” we saw above. If you pass something that isn’t a function (e.g. a functools.partial) it just returns the repr so you see things like:

>>> from hypothesis.strategies import integers
>>> from functools import partial
>>> def add(x, y):
...     return x + y
... 
>>> from functools import partial
>>> integers().map(partial(add, 1))
integers().map(functools.partial(, 1))

I may at some point add a special case for functools.partial because I am that pedantic.

This union repr is much more straightforward in implementation but still worth having:

    def __repr__(self):
        return u' | '.join(map(repr, self.element_strategies))

Is all this worth it? I don’t know. Almost nobody has commented on it, but it makes me feel better. Examples in documentation look a bit prettier, it renders some error messages and reporting better, and generally makes it a lot more transparent what’s actually going on when you’re looking at a repr.

It probably isn’t worth the amount of effort I’ve put into the functionality it’s built on top of, but most of the functionality was already there – I don’t think I added any new functions to reflection to write this, it’s all code I’ve repurposed from other things.

Should you copy me? No, probably not. Nobody actually cares about repr quality as much as I do, but it’s a nice little touch that makes interactive usage of the library a little bit easier, so it’s at least worth thinking about.

This entry was posted in Hypothesis, Python on by .

Hypothesis: Staying on brand

You know that person who is always insisting that you use the right capitalisation for your company, you have to use the official font, it’s really important whether you use a dash or a space and you absolutely must use the right terminology in all public communication, etc, etc? Aren’t they annoying? Don’t they seem to be ridiculously focused on silly details that absolutely don’t matter?

I used to feel that way too. I mean I generally humoured them because it always feels like other peoples’ important details don’t matter and that’s usually a sign that you don’t understand their job, but I didn’t really believe that it mattered.

Then I made Hypothesis and now I’m that person.

There’s a long list of guidelines on how I communicate Hypothesis that I have literally never communicated to anyone so I really have no right to get even slightly annoyed when people don’t follow it. To be honest, I have no real right to get annoyed when people don’t follow it even if they have read this post. So consider the intent of this post more “here is what I do, I would appreciate if you do the same when talking about Hypothesis and it will annoy me slightly if you don’t but you are entirely welcome to do your own thing and I appreciate you talking about Hypothesis however you do it”.

Without further ado, here are the Hypothesis brand guidelines.

The little things

  1. You’re probably going to pronounce my name wrong unless you’ve read the pronunciation guide.
  2. Hypothesis is a testing library, not a testing framework. This is actually important, because a thing that people never seem to realise (possibly because I don’t communicate it clearly enough, but it does say so in the documentation) is that Hypothesis does not have its own test runner, it just uses your normal test runners.
  3. I try not to use the phrase “Hypothesis tests” (I slip up on this all the time when speaking) because that’s a statistical concept. I generally use “Tests using Hypothesis”. It’s more awkward but less ambiguous.
  4. Hypothesis isn’t really a Quickcheck. I describe it as “Inspired by Quickcheck” rather than “Based on Quickcheck” or “A Quickcheck port”. It started out life as a Quickcheck port, but modern Hypothesis is a very different beast, both internally and stylistically.

The big thing

All of the classic Quickcheck examples are terrible. Please don’t use them. Pretty please?

In particular I hate the reversing a list example. It’s toy, there’s no easy way to get it wrong, and it’s doing this style of testing a great injustice by failing to showcase all the genuinely nasty edge cases it can find.

In general I try never to show an example using Hypothesis that does not expose a real bug that I did not deliberately introduce. Usually producing this it is enough to write a medium complexity example which you know can be well tested with Hypothesis, then add Hypothesis based tests. You can write it TDD if you like as long as you don’t use Hypothesis to do so.

The Tone

The tone I generally try to go for is “Computers are terrible and you can never devote enough time to testing, so Hypothesis is a tool you can add to your arsenal to make the time you have more effective”.

Disclaimers and safety warnings

  1. Hypothesis will not find all your bugs.
  2. Hypothesis will not replace all your example based tests.
  3. Hypothesis isn’t magic.
  4. Hypothesis will not grant you the ability to write correct code, only help you understand ways in which your code might be incorrect.

The Rest

There are probably a bunch of things I’ve forgotten on this, and I will update the list as I think of them.

This entry was posted in Hypothesis, Python on by .

Designing a feature auction

I’m thinking of doing a crowd funding campaign for Conjecture.

One of the things that makes this a nice proposition is that there’s an almost unbounded amount of work I can do on it, but there’s also quite a nice finite core that it would still be very useful to make really good and would take much less time.

However how would I decide which of the unbounded work to do? The classic model seems to be stretch goals. I hate stretch goals, particularly ones where the stretch goal required to make it useful for you is 2 or 3 items down the list (I’m looking at you, people who put android support as a stretch goal).

The obvious answer is to let people pay for work. I was originally thinking in terms of reward tiers there, but I’m not a huge fan of that. If I declare that, say, ruby support costs £3000 and 300 ruby developers are willing to chip in £10, I should do ruby support. That seems like exactly the point of crowd funding.

So I started sketching out how I’d like this to work and came up with a system I’m pretty pleased with. I’m not sure if it’s a good idea, but it’s a nice design, so I thought I’d share.

It’s based heavily off the single transferable vote method of proportional representation but with some tweaks to fit the problem.

Usage: When creating a campaign, you set an “initial cost” figure (this should be the same as your campaign goal). You also specify a list of additional features, in your personal priority order, with costs attached to them.

Every pound (dollar, euro, whatever) people contribute to your crowd funding campaign then gives them voting power to choose which of these features is implemented. They vote by simply listing their feature preferences in order of most preferred to least. They can list as many or as few of the features as they like.

First, everyone pays for the initial cost you set. Everyone pays the same fraction of their contributuon, chosen to exactly match the cost

Voting now proceeds in rounds. In each round there are a number of active features – inactive features have either been chosen to be implemented or disqualified. Additionally each voter has a set of remaining funds, which starts at the amount they contributed and is reduced as they pay for features.

Each contributor votes for their current most preferred active item. If they’ve run out of active items they care about they vote for your most preferred active item (this part ensures people that you don’t get “free money” – as much money as can be spent on features will be spent on features, it’s just that if people don’t express a preference you get to choose).

Now, some features may have been funded: Any feature for whom the total remaining funds of the people voting for it exceeds its cost is funded. The feature is chosen to be implemented and marked inactive. Anyone who voted for it now has their remaining funds reduced by the same fraction, so that just enough is spent to cover the cost. E.g. if Ruby cost £3000 and there had been £6000 available funds, each person voting for it would spend half their remaining funds.

If no features were funded, one of the features drops out. Take the feature that is furthest from being funded (i.e. cost – funds allocated is highest), breaking ties by picking the one that is lowest in your preference order. This is disqualified and marked inactive.

This process is repeated until the total funds remaining is smaller than the cost of any active feature.

If it’s also less than the cost of any feature that was not chosen, stop. Otherwise, start again from the beginning with the current remaining funds and only the set of features that were not chosen previously.

Repeat this until you make it through an entire vote without choosing anything. At that point just fill in the remaining features in order of your preferences.

Design notes

  1. It is a little complicated, as there are a bunch of edge cases I noticed when writing this up, but I think it’s simpler to use than to describe, and most of those edge cases contribute to making it better for both you and the contributors.
  2. I’m not sure how essential the use of the preference list is. Certainly the “uncast votes go to your preference list” is quite useful, because it lets you shape the results in your direction while still complying with peoples’ wishes – e.g. if Ruby and Lua both cost £3000 and currently have £2500 voting for each, but I prefer to do Lua and now have £1000 in my spare change pool, I get to choose Lua.
  3. The multiple repeats thing is annoying, but it feels unfair to just go straight to the preference list.
This entry was posted in Uncategorized on by .