Monadic data generation strategies and why you should care

I posted this gist earlier. It’s a toy port of one of the new templatized data generation for Hypothesis to Haskell.

It doesn’t do most of the important things. Its purpose is to demonstrate one simple point: Hypothesis strategies are now monads.

I don’t want to get into the deep philosophical implications of this or how it means Hypothesis is now like a burrito in a space suit. What I want to do here is point out that this enables some super useful things.

Consider the following example from the Hypothesis README (at the time of this writing. This is going to change soon for various reasons, oe of them being the stuff I’m about to get into)

from decimal import Decimal
from hypothesis.searchstrategy import MappedSearchStrategy
class DecimalStrategy(MappedSearchStrategy):
    def pack(self, x):
        return Decimal(x) / 100
    def unpack(self, x):
        return int(x * 100)

This is for defining a Decimal strategy in terms of an integer strategy – it has an operation to convert a decimal to an int and an int to a decimal.

The reason it needs to convert a decimal to an int is because of simplification. If it can’t convert back then it can’t simplify. This is also what stops strategies from being a functor (remember that all monads are functors): In order for something to be a functor we need to be able to define a new version by just mapping values. We can’t require the mapping to go both ways.

Which is why it’s pretty great that the new template API lets us throw away half of this! Now MappedSearchStrategy no longer requires you to implement unpack. As well as being half the work, this means you can use it in cases where you might sometimes need to throw away data – the mapping no longer has to be a bijection. The reason it can do this is that it just uses the templates for the original type, so there’s no need for you to convert back.

But that’s just an example where being a functor is useful. Why is being a monad useful?

Well, the operation that monads add over functors is bind. map let us take a function a -> b and turn a Strategy a into a Strategy b. bind lets us take something that maps a to a Strategy b and turn a Strategy a into a Strategy b. When would we want to do this?

Well, one example is when we want some sort of sharing constraint. Suppose for example we wanted to generate a list of dates, but we wanted them to all be in the same time zone. The bind operation would let us do this: We could do something like strategy(timezones).bind(lambda s: [dates_from(s)]) (this is a made up API, the details for this are not yet in place in actual Hypothesis). This would generate a timezone, then we generate a strategy for generating dates in that time zone, and a strategy for producing lists from that.

Given that this is Python, you don’t have the advantage you get in Haskell that being a monad gives you nice syntax and a rich set of support libraries, but that’s OK. The reason monads are a thing in the first place is that the monad operations are generally super useful in their own right, and that remains true in Python too.

This entry was posted in Hypothesis, Uncategorized on by .