The horror lurking at the heart of the new hypothesis

I’ve just merged a branch with a redesign of the hypothesis core into master. All the tests are passing, and I’m much happier with the new design than I was with the old one. It takes the idea of the search strategy I mentioned in my previous post and makes it the fundamental building block of the library. This seems to work quite well.

But in doing this rewrite I’ve gone and taken something which was always at the core of hypothesis and made it explicit, and it’s pretty… well actually I don’t want to say it’s bad per se. Using it has actually been very pleasant, and it’s made some very nice things possible that would otherwise have been quite awkward to do.

So while it’s not exactly bad, it’s sure as hell unpythonic.

What is it?

Well, it’s essentially a highly dynamic multiple dispatch object model with prototype based inheritance.

Err, yeah.

Why, David? Why do you do this?

Well, I arrived at it through a sequence of individually logical steps.

Here’s what drove me down this path:

The first requirement:

We need to be able to go from a type to a strategy. e.g. we want to say “give me a strategy for ints”. Or “give me a strategy for strings”. Or “give me a strategy for tuples of (int, str)”.

However, we absolutely don’t want to put the logic for this on the types themselves. We’re not going to monkey patch things to add in a strategy method or anything uncouth like that.

There are two reasons we’re not going to do that:

  1. We want to be able to configure specific strategies per run, e.g. biasing for short lists or only producing alphanumeric strings
  2. It would be gross

Note also that we need to be able to pass instances, not just types here. For example we can ask for a strategy for the tuple (int, str) which will give us a strategy returning pairs where the first is an int and the second a string.

So that’s where the multiple dispatch came from. It’s not implemented in any very clever way, and it currently is closer to just type based overriding than true multiple dispatch because it doesn’t support subtyping (if you define a handler for instances of Foo then it will not trigger for instances of subclasses of Foo. I may fix this at some point but I currently have no need to do so).

Where does the prototype based inheritance come in?

Well, it started at just level one: There was a single default object which all other lookups would delegate to if needed. The reason for this was that it allowed you to both define new strategy handlers wherever you wanted because you could define them on the default object, but you could then override them on a fresh local copy.

Then I realised that having full prototype based inheritance was no more work and was actually quite useful. So I provided that.

Why? Let me show you an example of its usage in practice. Here’s how we define the strategy for strings:

class StringStrategy(MappedSearchStrategy):
    def __init__( self,
                    strategies,
                    descriptor,
                    **kwargs):
        SearchStrategy.__init__(self, strategies, descriptor,**kwargs)
        self.length_strategy = strategies.strategy(int)
        char_strategy = kwargs.get("char_strategy",
                                   OneCharStringStrategy)
 
        cs = strategies.new_child_mapper()
        cs.define_specification_for(str, char_strategy)
        self.mapped_strategy = cs.strategy([str])
 
    def pack(self, ls):
        return ''.join(ls)
 
    def unpack(self, s):
        return list(s)

What are we doing here?

Well, MappedSearchStrategy lets you define a search strategy in terms of another one. You then just need to define methods for converting to and from instances of that other type. In this case we are defining the strategy for strings in terms of the strategy for lists of strings.

Which makes it sound like it’s recursive, but it’s not. Let me draw your attention to the following bit:

        char_strategy = kwargs.get("char_strategy",
                                   OneCharStringStrategy)        
        cs = strategies.new_child_mapper()
        cs.define_specification_for(str, char_strategy)
        self.mapped_strategy = cs.strategy([str])

What do we have here?

We have another strategy for strings which only generates one character strings (python doesn’t have a Character data type). This is not normally installed anywhere.

We now create a new child mapper. This is a fresh SearchStrategies object whose prototype is the SearchStrategies object we started with. We can now define a strategy for strings on it, which overrides strings for that mapper but leaves the original untouched. When we ask it for a strategy for lists of strings, it uses the list strategy it’s inherited and the string strategy that’s just been defined for it to give a strategy for lists of strings that uses the new one character string strategy.

This isn’t the only place it’s used. We also use it in stateful testing in a broadly similar way for generating the individual test steps without interfering with the larger strategies namespace.

Is this all a terrible idea? I dunno. It works pretty well, and it’s allowed to write some quite interestingly powerful code, but I do have the feeling that it’s probably a bit too clever and that there might be a better solution.

This entry was posted in Hypothesis, programming on by .