Category Archives: Hypothesis

Hypothesis for Django

On Tuesday I gave a talk to the London Django Meetup Group, who are a lovely crowd. The theme was (clue is in the title, but really what else would I be speaking about?) “Hypothesis for Django”. Aside from a few lightning talks and one or two really half-arsed barcamp sessions this was my first real public speaking. Given that, if I do say so myself it went unreasonably well.

Anyway, thanks to SkillsMatter who kindly recorded the event, the video is now up. For those who would prefer text (me too), the slides are at https://bit.ly/hypothesis-for-django, and I have written up a transcription for you:

Starting slide

Ok, right. So.

Who am I?

Hi. As per what I said I’m here to talk to you about Hypothesis for Django.
I am David R. MacIver. The R is a namespacing thing. There are a lot of David MacIvers. I’m not most of them.
I wrote Hypothesis. And I have no idea what I’m doing.
I don’t actually know Django very well. I write tools for people who know Django much better than me but they’re the ones writing the Django applications, it’s usually not me. So if I get anything wrong on the Dango front, I apologise in advance for that. If I get anything wrong on the Hypothesis front, I really should know better but I’ve not actually done a presentation about it before now so please bear with me.

What is Hypothesis?

So what is this Hypothesis thing I’m here to talk to you about?
It’s a testing framework [Ed: I hate that I said this. It’s a library, not a framework]. It’s based on a Haskell library called Quickcheck… and you don’t need to run away.
There’s apparently a major problem where people come to the Hypothesis documentation, and they see the word Haskell, and they just go “Oh god, this is going to be really complicated, I’m not going to do this right now”, and they leave. I’ve spent a lot of time making sure Hypothesis is actually very Pythonic. If you know Haskell, a few bits will look familiar. If you don’t know Haskell, that’s really fine, you don’t need to at all. I will never mention the word Monad again after this point.
And the basic idea of this style of testing is that you write your tests almost like normal, but instead of you writing the examples the testing library does that for you. You tell it “I want examples that look roughly like this”, and it gives you a bunch of examples that look roughly like that. It then runs your tests against these examples and if any of them fail it turns them into a smaller example that basically says “Hey, you’ve got a bug in your code”.
And it integrates well with your existing testing libraries. You don’t need to use my own custom test runners with this. It works in unittest, it works in pytest, it works in nose, it should work in anything but those are the ones I tested it on. And of course, it works well with Django. It both works with the existing Django unit test runner and there’s also some specific custom support for it, which is what we will be using today.

The Setup

So here is what we’re going to be doing. We have some Django project that we’re testing the backend of and we have two models that we care about. One of them is User, one of them is Project. User in this case isn’t actually a standard Django auth user. It could be. It would perfectly well if it was, I just sort of forgot they existed while I was writing the example. See “I don’t know Django”. And, basically, Projects have Users collaborating on them and every Project has a max number of users it is allowed to have. That would presumably in a real application be set by billing, but we’re not doing that. We just have a number. And if you try to add more users to the project than are allowed then you will get an error.
And what we’re going to do is that we’re going to start from a fairly normal, boring test using standard Django stuff that you’ve probably seen a thousand things like it before. And first of all we’re going to refactor it to use Hypothesis and in the process hopefully the test should become clearer and more correctly express our intent and once we’ve done that we’re going to let Hypothesis have some fun and basically refactor the test to do a lot more and find a bug in the process.

Code slide 1

Here is our starting point. This obviously in any well tested application this would be only one test amongst many, but it’s the only test we’re going to look at today. We want to test that you actually can add users to a project up to the limit, and this test would currently past even if we never implemented the limit in the first place, we’re just saying we can create a project, it has a limit of 3, we add 3 users, alex, kim and pat to it and we assert after that that they’re all on the project.
Like I say, you’ve seen tests like this a thousand times before, which makes it easy to sort of fail to notice that it’s actually quite bad. And the major problem with it is that it has lots of distracting details that absolutely don’t matter for the test. Basic distracting details: A project has a name, the users have email addresses, there are exactly 3 users and a collaboration limit of 3, and which of these details actually matter? It’s completely not obvious from the test. It would be really surprising if the project name mattered. It probably isn’t the case that the user emails matter. It might be the case. They’re all from the same domain for example. Is there some custom domain support? Who knows? Test doesn’t say. You’d have to look at the code to say. And the real stinker is the 3. What’s special about 3? Again, probably nothing, but often like 0, 1 and 2 are special cases so is 3 there because it’s the first non special number? Who knows? Test doesn’t say.

Code slide 2

So let us throw all of that away. And what we’ve done here is we’ve taken exactly the same test, we’ve not thrown away the 3 for now, we’ve thrown everything else away, and we have said “Hypothesis, please give me some examples”. And what happens here is we accept all of these as function arguments and the decorator tells Hypothesis how it can provide these to us. And we’ve told them that our final 3 arguments are Users, the models function is a thing from Hypothesis that just says “Generate me an instance of this Django model”. It does automatic introspection on your models to figure out how to build them, but as you can see from the Project example you can also override any individual one. Here we’ve got a collaborator limit set to 3, just is a function that returns a trivial strategy that always returns the same value. One final thing to note here is that we had to use our own test runner. That’s due to technical reasons with transaction management. It works exactly the same as a normal Django test runner, it just does a little bit more that we need for these to work.
And what will happen when you try to run this test is pretty much the same thing that happened when we ran the previous version of the test, except that it will run it multiple times with different instances matching this. And unlike, say, a fixture which we could have used for this, genuinely the details that aren’t present don’t matter, because if they’re not present then they won’t be satisfied because Hypothesis will try something else as well.
So this should hopefully be a slightly, once you’re familiar with the Hypothesis syntax, a slightly clearer version of the original test which doesn’t have any of those distracting details.

Code slide 3

We will just clean up slightly further en route to making it better yet, en route to making it better yet and getting rid of that three, and say that rather than giving each of these a name, given that we don’t actually care about their names now we’re going to ask for lists. And the way this works is that we take our models(User) function and say that we want lists of that. We can specify the min and max size, there isn’t a precise size function but that’s fine, so in this case the collaborators function argument is now being passed a list of precisely 3 users. And otherwise this test works the same way as before. We add each collaborator to the project and then we assert that they are on the team. And otherwise this is the same as the previous one, and in particular the 3 is still there. Lets kill the 3.

Code slide 4

What we are doing now is that we have opened up the range of values that the collaborator limit can take. We’ve told it that its minimum value is zero, you can’t have fewer than zero collaborators, and its maximum value is 20. The 20 is still a bit distracting, but it’s needed there for performance basically. Because otherwise Hypothesis would be trying to generate really massive lists, and this can work fine. It can generate really massive lists, but then it will take forever on any individual test run and then it’s running the tests on, depending on configuration, possibly 200 times, you’ll probably want to configure it lower than that, and that will just take ages and wont’ do much useful, so 20 is a good number. Similarily we’ve capped our lists of users at length 20 because we don’t want more users than collaborators right now.
And the only other interesting detail over the previous one is that we’ve got this assume function call. And what this is saying is that we need this condition to be satisfied in order for Hypothesis to give us, in order for this to be a good example. What this test is currently testing is that when there are fewer collaborators than project limit and anything else isn’t interesting for this test. And it’s more or less the same thing as if we just said if this is not true return early, but the difference is that Hypothesis will try to give you fewer examples that don’t satisfy this and so that if you accidentally write your test so that it’s not doing anything useful, Hypothesis will complain at you. It will say “All of the examples I gave you were bad. What did you want me to do?”. Again, otherwise this is pretty much the same as before. We have a project, we have a list of users, we are adding users to the project and asserting that they’re in afterwards. And the users must be fewer than the collaborator limit.
And this is pretty much, this is as far as I’m concerned a better version of the test we started with. It more carefully specifies what the behaviour that you had, and doesn’t have any of that distracting detail, and as a nice side benefit when we change the shape of our models it will just continue working. The test doesn’t really know anything about how to create a model or anything like that. From that part, we’re done. This runs fine, it tests
[Audience question is inaudible. From what I recall it was about how assume worked: Checking that what happens is that the two arguments are drawn independently and then the assume filters out ones that don’t match]
Yes. Yes, exactly. It filters them out and it also does a little bit of work to make sure you get fewer examples like that in future.
And yeah. So, this test runs fine, and everything seems to be working. I guess we’ve written bug free code. Woo.
Turns out we didn’t write bug free code. So lets see if we can get Hypothesis to prove that to us. What we’re going to do now is just a sort of data driven testing where we give Hypothesis free reign and just see what breaks. We’re going to remove this assume call and this code should break when we remove this assume call, because we have this collaborator limit and we’re going to exceed the collaborator limit and that should give us an exception.

Code slide 5

So this is the change, all we’ve done is remove the assume.

Code slide 6

And we get an exception! And Hypothesis tells us the example, it says “I created a project with a collaborator limit of 0, I tried to add a user to it, I got an exception”. That’s what’s supposed to happen, excellent!

Code slide 7

So lets change the test. Now what we do when we are adding the user is we check that if the project is at the collaborator limit something different should happen. We should fail to add the user and then the user should not be on the project and otherwise we should add the user and the user should be on the project. We’ve also inlined the assert true next to the adding because this way we can do each branch separately, but that shouldn’t change the logic.

Code slide 8

Now we run this again and Hypothesis tells us that our test is still causing an error. And what’s happened here is that Hypothesis has tried to add the same user twice, and afterwards it’s saying… and even though we’re at the collaborator limit, afterwards it’s saying the user is still on the project. Well, OK, so the users should still be on the project because the user started on the project.

Code slide 9

So lets just exclude that option from that branch and see what happens now.
In the first branch all we’re doing is adding an extra condition that we don’t care about that example, pass it through to the next bit.

Code slide 10

Still failing. Same example in fact. Hypothesis will have remembered this example and just tried it again immediately. [Ed: This isn’t actually the case. I didn’t notice at the time but the email addresses are different. I think the way I was running examples for the talk made it so that they weren’t shared because they were saved under different keys]. And what’s happening here is that we’re adding a user to a project of limit 1, and then we’re adding them again. And it’s still raising that limit reached exception, and we’re not really sure what’s going on here. And the problem is that at this point Hypothesis is basically forcing us to be consistent and saying “What do you actually want to happen when I add the same user twice?”.

Code slide 11

So lets look at the code now.
The code is very simple. If the project is at the collaboration limit, raise a limit reached, otherwise just add the user to the project. And looking at this, this is inconsistent. Because what will happen is that if you are not at the collaboration limit this will work fine. Adding the user to the project will be a no op because that’s how many to many relationships work in Django. But if you are at the collaboration limit, even though the operation would have done nothing you still get the limit reached error. And basically we need to take a stance here and say either this should always be an error or this should never be an error because anything else is just silly.

Code slide 12

We arbitrarily pick that this should never be an error. It should behave like a no-op in all circumstances.

Code slide 13

And we re-run the test and this time it passes.
It passes in a slightly long period of time because it’s running quite a lot of examples. Often what you do is turn the number of examples down in development mode and then run this more seriously in the long term.
And that is pretty much it for Hypothesis.

Obligatory plug

I have an obligatory plug, which is that I do offer training and consulting around this library. You don’t need it to get started, you should try and get started before you pay me, but then you should pay me if you really want to. I am also potentially available for other contracting work if Hypothesis doesn’t sound that exciting to you.

Details

And here are my details at the top. There is the Hypothesis documentation. And there are the links to these slides, available permanently on the internet for you to reread at your leisure.
Thank you very much. Any questions?

This entry was posted in Hypothesis, Python, Uncategorized on by .

Using Hypothesis with Factory Boy

I gave a talk on the Hypothesis Django Integration last night (video and transcript here). I got some questions asking about integration with Factory Boy.

My answer at the time was that I’ve thought about adding explicit support but there’s nothing to stop you from doing it yourself. I’d like to amend that: There’s nothing to stop you from doing it yourself and it’s so easy to do that I can’t actually imagine how I would improve it with explicit support.

Both Factory Boy and Hypothesis are designed along a “we’re a library, not a framework” approach (the Hypothesis django integration goes a little further in the direction of a framework than I’d like by requiring a custom test runner, but fortunately factory boy does not), so they don’t interfere with eachother. Further, factory boy is set up to take arbitrary values, Hypothesis is set up to provide them, so you can easily feed the latter into the former.

For example, the following defines a strategy that uses a factory boy UserFactory object to parametrize over unsaved user objects with an arbitrary first name:

from hypothesis import given
from hypothesis.strategies import builds, text
from hypothesis.extra.django import TestCase
from myfactories import UserFactory
 
class TestUser(TestCase):
    @given(builds(UserFactory.build, first_name=text(max_length=50)))
    def test_can_save_a_user(self, user):
        user.save()

Both factory boy and Hypothesis are designed to play well with others, so unless I’m missing something, nothing specific seems necessary to make them play well with each other. This is how it should work.

The only thing that I can imagine people conceivably wanting custom support for is auto deriving strategies for factory boy instances that are using random fields filled by fake factory. It wouldn’t be too hard to do, but I’m not sure it’s worth it. Honestly if you’re doing randomized testing like that, you should be using Hypothesis and its existing fake factory integration to feed your factories instead. It will be a much better experience.

This entry was posted in Hypothesis, Python on by .

The era of rapid Hypothesis development is coming to an end

Don’t panic. This is not an announcement of my abandoning the project. Hypothesis still has a long way to go, and I 100% intend to be working on getting it there.

What this is an announcement of is of my continued existence in a market economy and my tragic need to acquire currency in order to convert it into food and accommodation.

I haven’t been making a big deal of it, so some of you might be surprised to learn that the reason Hypothesis development has been so rapid for the last 6 months is that I’ve been working on it full time unpaid. It’s not so much that I took time off to write Hypothesis as that I had the time off anyway and I thought I’d do something useful with it. Hypothesis is that something useful.

I would love to continue working on Hypothesis full time. But the whole “unpaid” thing is starting to become not viable, and will become critically non-viable as soon as I move back to London.

So I’m going to need money.

I will do something more organised in the next month, but for now if you are a company or individual interested in paying me to do any of the following, I would very much like to hear from you:

  • Sponsored Hypothesis development (this can include paying for implementing specific features if you want)
  • Integration work getting Hypothesis to work well your testing environment
  • Training courses on how to use Hypothesis
  • Anything else Hypothesis related

If the above sounds interesting, please email me at [email protected].

If no money to continue working on Hypothesis is forthcoming, Hypothesis development will absolutely continue, but at a greatly reduced rate. The current development cycle is approximately a minor version a week. This will likely go down to at most a minor version every month, more likely a minor version every two. This would be a shame, as I have a bunch of exciting features I still want to work on, and then I need to tie everything together into a coherent 2.0 release. With full time work I would project that to happen end of this year, without I can’t really make any predictions at the moment.

This entry was posted in Hypothesis, Python on by .

Constraint based fixtures with Hypothesis

A common experience when writing database backed web applications (which I hear is a thing some people like to do) is that rather than laboriously setting up each example in each test you use a set of fixtures – standard project definitions, standard users, etc.

Typically these start off small but grow increasingly unwieldy over time, as new tests occasionally require some additional detail and it’s easier to add little things to an existing fixture than it is to create one afresh.

And then what happens is that it becomes increasingly unclear which bits of those fixtures actually matter and which of them are just there because some other test happened to need them.

You can use tools like factory_boy to make this easier, but ultimately it’s still just making the above process easier – you still have the same problems, but it’s less work to get there.

What if instead of having these complicated fixtures your tests could just ask for what they want and be given it?

As well as its use in testing, Hypothesis has the ability to find values satisfying some predicate. And this can be used to create fixtures that are constraint based instead of example based. That is: You don’t ask for the foo_project fixture, you instead say “I want a project whose owner is on the free plan”, and Hypothesis gives it to you:

from hypothesis import find
from hypothesis.extra.django.models import models
from mymodels import Project, User
 
def test_add_users_to_free_project():
    project = find(
        models(Project, owner=models(User)),
        lambda x: x.owner.plan == "free")
    do_some_stuff_with(project)

And that’s basically it. You write fixture based tests as you normally would, only you can be as explicit as you like as to what features you want from the fixtures rather than just using what happens to be around.

It’s unclear to me whether this is ever an improvement on using Hypothesis as it is intended to be used – I feel like it might work better in cases where the assumptions are relatively hard to satisfy, and it’s probably better for things where the test is really slow – but what is the case is that it’s a lot less alien to people coming from a classical unit testing background than Hypothesis’s style of property testing is, which makes it a convenient gateway usage mode for people who want to get their feet wet in this sort of testing world without having to fundamentally change the way that they think in order to test trivial features.

There are a bunch of things I can do to make this sort of thing better if it proves popular, but all of the above works today. If it sounds appealing, give it a try. Let me know how it works out for you.

Edit to add: I’ve realised there are some problems with using current Hypothesis with Django like this unfortunately. Specifically if you have unique constraints on models you’re constructing this will not work right now. This concept works fine for normal data, and if there’s interest I’m pretty sure I can make it work in Django, but it needs some further thought.

This entry was posted in Hypothesis, Python on by .

If you want Python 2.6 support, pay me

This post is both about the actual plan for Hypothesis and also what I think you should do as a maintainer of a Python library.

Hypothesis supports Python 2.7 and will almost certainly continue doing so until it hits end of life in 2020.

Hypothesis does not support Python 2.6.

Could Hypothesis support Python 2.6? Almost certainly. It would be a bunch of work, but probably no more than a few weeks, maybe a month if things are worse than I expect. It would also slow down future development because I’d have to maintain compatibility with it, but not to an unbearable degree given that I’ve already got the machinery in place for multiple version compatibility.

I’m not going to do this though. I’m doing enough free labour as it is without supporting a version of Python that is only still viable because a company is charging for commercial support for it.

I’m sorry, I misspoke. What I meant to say is that I’m not going to do this for free.

If you were to pay me, say, £15,000 for development costs, I would be happy to commit to providing 2.6 support in a released version of Hypothesis, followed by one year in which there is a version that supports Python 2.6 and is getting active bug fixes (this would probably always be the latest version, but if I hit a blocker I might end up dropping 2.6 from the latest and providing patch releases for a previous minor version).

People who are still using Python 2.6 are generally large companies who are already paying for commercial support, so I think it’s perfectly reasonable to demand this of them.

And I think everyone developing open source Python who is considering supporting Python 2.6 should do this too. Multi version library development is hard enough as it is without supporting 2.6. Why should you work for free on something that you are really quite justified in asking for payment for?

This entry was posted in Hypothesis, Python, Uncategorized on by .