On Tuesday I gave a talk to the London Django Meetup Group, who are a lovely crowd. The theme was (clue is in the title, but really what else would I be speaking about?) “Hypothesis for Django”. Aside from a few lightning talks and one or two really half-arsed barcamp sessions this was my first real public speaking. Given that, if I do say so myself it went unreasonably well.
Anyway, thanks to SkillsMatter who kindly recorded the event, the video is now up. For those who would prefer text (me too), the slides are at https://bit.ly/hypothesis-for-django, and I have written up a transcription for you:
Ok, right. So.
Who am I?
Hi. As per what I said I’m here to talk to you about Hypothesis for Django.
I am David R. MacIver. The R is a namespacing thing. There are a lot of David MacIvers. I’m not most of them.
I wrote Hypothesis. And I have no idea what I’m doing.
I don’t actually know Django very well. I write tools for people who know Django much better than me but they’re the ones writing the Django applications, it’s usually not me. So if I get anything wrong on the Dango front, I apologise in advance for that. If I get anything wrong on the Hypothesis front, I really should know better but I’ve not actually done a presentation about it before now so please bear with me.
What is Hypothesis?
So what is this Hypothesis thing I’m here to talk to you about?
It’s a testing framework [Ed: I hate that I said this. It’s a library, not a framework]. It’s based on a Haskell library called Quickcheck… and you don’t need to run away.
There’s apparently a major problem where people come to the Hypothesis documentation, and they see the word Haskell, and they just go “Oh god, this is going to be really complicated, I’m not going to do this right now”, and they leave. I’ve spent a lot of time making sure Hypothesis is actually very Pythonic. If you know Haskell, a few bits will look familiar. If you don’t know Haskell, that’s really fine, you don’t need to at all. I will never mention the word Monad again after this point.
And the basic idea of this style of testing is that you write your tests almost like normal, but instead of you writing the examples the testing library does that for you. You tell it “I want examples that look roughly like this”, and it gives you a bunch of examples that look roughly like that. It then runs your tests against these examples and if any of them fail it turns them into a smaller example that basically says “Hey, you’ve got a bug in your code”.
And it integrates well with your existing testing libraries. You don’t need to use my own custom test runners with this. It works in unittest, it works in pytest, it works in nose, it should work in anything but those are the ones I tested it on. And of course, it works well with Django. It both works with the existing Django unit test runner and there’s also some specific custom support for it, which is what we will be using today.
So here is what we’re going to be doing. We have some Django project that we’re testing the backend of and we have two models that we care about. One of them is User, one of them is Project. User in this case isn’t actually a standard Django auth user. It could be. It would perfectly well if it was, I just sort of forgot they existed while I was writing the example. See “I don’t know Django”. And, basically, Projects have Users collaborating on them and every Project has a max number of users it is allowed to have. That would presumably in a real application be set by billing, but we’re not doing that. We just have a number. And if you try to add more users to the project than are allowed then you will get an error.
And what we’re going to do is that we’re going to start from a fairly normal, boring test using standard Django stuff that you’ve probably seen a thousand things like it before. And first of all we’re going to refactor it to use Hypothesis and in the process hopefully the test should become clearer and more correctly express our intent and once we’ve done that we’re going to let Hypothesis have some fun and basically refactor the test to do a lot more and find a bug in the process.
Code slide 1
Here is our starting point. This obviously in any well tested application this would be only one test amongst many, but it’s the only test we’re going to look at today. We want to test that you actually can add users to a project up to the limit, and this test would currently past even if we never implemented the limit in the first place, we’re just saying we can create a project, it has a limit of 3, we add 3 users, alex, kim and pat to it and we assert after that that they’re all on the project.
Like I say, you’ve seen tests like this a thousand times before, which makes it easy to sort of fail to notice that it’s actually quite bad. And the major problem with it is that it has lots of distracting details that absolutely don’t matter for the test. Basic distracting details: A project has a name, the users have email addresses, there are exactly 3 users and a collaboration limit of 3, and which of these details actually matter? It’s completely not obvious from the test. It would be really surprising if the project name mattered. It probably isn’t the case that the user emails matter. It might be the case. They’re all from the same domain for example. Is there some custom domain support? Who knows? Test doesn’t say. You’d have to look at the code to say. And the real stinker is the 3. What’s special about 3? Again, probably nothing, but often like 0, 1 and 2 are special cases so is 3 there because it’s the first non special number? Who knows? Test doesn’t say.
Code slide 2
So let us throw all of that away. And what we’ve done here is we’ve taken exactly the same test, we’ve not thrown away the 3 for now, we’ve thrown everything else away, and we have said “Hypothesis, please give me some examples”. And what happens here is we accept all of these as function arguments and the decorator tells Hypothesis how it can provide these to us. And we’ve told them that our final 3 arguments are Users, the models function is a thing from Hypothesis that just says “Generate me an instance of this Django model”. It does automatic introspection on your models to figure out how to build them, but as you can see from the Project example you can also override any individual one. Here we’ve got a collaborator limit set to 3, just is a function that returns a trivial strategy that always returns the same value. One final thing to note here is that we had to use our own test runner. That’s due to technical reasons with transaction management. It works exactly the same as a normal Django test runner, it just does a little bit more that we need for these to work.
And what will happen when you try to run this test is pretty much the same thing that happened when we ran the previous version of the test, except that it will run it multiple times with different instances matching this. And unlike, say, a fixture which we could have used for this, genuinely the details that aren’t present don’t matter, because if they’re not present then they won’t be satisfied because Hypothesis will try something else as well.
So this should hopefully be a slightly, once you’re familiar with the Hypothesis syntax, a slightly clearer version of the original test which doesn’t have any of those distracting details.
Code slide 3
We will just clean up slightly further en route to making it better yet, en route to making it better yet and getting rid of that three, and say that rather than giving each of these a name, given that we don’t actually care about their names now we’re going to ask for lists. And the way this works is that we take our models(User) function and say that we want lists of that. We can specify the min and max size, there isn’t a precise size function but that’s fine, so in this case the collaborators function argument is now being passed a list of precisely 3 users. And otherwise this test works the same way as before. We add each collaborator to the project and then we assert that they are on the team. And otherwise this is the same as the previous one, and in particular the 3 is still there. Lets kill the 3.
Code slide 4
What we are doing now is that we have opened up the range of values that the collaborator limit can take. We’ve told it that its minimum value is zero, you can’t have fewer than zero collaborators, and its maximum value is 20. The 20 is still a bit distracting, but it’s needed there for performance basically. Because otherwise Hypothesis would be trying to generate really massive lists, and this can work fine. It can generate really massive lists, but then it will take forever on any individual test run and then it’s running the tests on, depending on configuration, possibly 200 times, you’ll probably want to configure it lower than that, and that will just take ages and wont’ do much useful, so 20 is a good number. Similarily we’ve capped our lists of users at length 20 because we don’t want more users than collaborators right now.
And the only other interesting detail over the previous one is that we’ve got this assume function call. And what this is saying is that we need this condition to be satisfied in order for Hypothesis to give us, in order for this to be a good example. What this test is currently testing is that when there are fewer collaborators than project limit and anything else isn’t interesting for this test. And it’s more or less the same thing as if we just said if this is not true return early, but the difference is that Hypothesis will try to give you fewer examples that don’t satisfy this and so that if you accidentally write your test so that it’s not doing anything useful, Hypothesis will complain at you. It will say “All of the examples I gave you were bad. What did you want me to do?”. Again, otherwise this is pretty much the same as before. We have a project, we have a list of users, we are adding users to the project and asserting that they’re in afterwards. And the users must be fewer than the collaborator limit.
And this is pretty much, this is as far as I’m concerned a better version of the test we started with. It more carefully specifies what the behaviour that you had, and doesn’t have any of that distracting detail, and as a nice side benefit when we change the shape of our models it will just continue working. The test doesn’t really know anything about how to create a model or anything like that. From that part, we’re done. This runs fine, it tests
[Audience question is inaudible. From what I recall it was about how assume worked: Checking that what happens is that the two arguments are drawn independently and then the assume filters out ones that don’t match]
Yes. Yes, exactly. It filters them out and it also does a little bit of work to make sure you get fewer examples like that in future.
And yeah. So, this test runs fine, and everything seems to be working. I guess we’ve written bug free code. Woo.
Turns out we didn’t write bug free code. So lets see if we can get Hypothesis to prove that to us. What we’re going to do now is just a sort of data driven testing where we give Hypothesis free reign and just see what breaks. We’re going to remove this assume call and this code should break when we remove this assume call, because we have this collaborator limit and we’re going to exceed the collaborator limit and that should give us an exception.
Code slide 5
So this is the change, all we’ve done is remove the assume.
Code slide 6
And we get an exception! And Hypothesis tells us the example, it says “I created a project with a collaborator limit of 0, I tried to add a user to it, I got an exception”. That’s what’s supposed to happen, excellent!
Code slide 7
So lets change the test. Now what we do when we are adding the user is we check that if the project is at the collaborator limit something different should happen. We should fail to add the user and then the user should not be on the project and otherwise we should add the user and the user should be on the project. We’ve also inlined the assert true next to the adding because this way we can do each branch separately, but that shouldn’t change the logic.
Code slide 8
Now we run this again and Hypothesis tells us that our test is still causing an error. And what’s happened here is that Hypothesis has tried to add the same user twice, and afterwards it’s saying… and even though we’re at the collaborator limit, afterwards it’s saying the user is still on the project. Well, OK, so the users should still be on the project because the user started on the project.
Code slide 9
So lets just exclude that option from that branch and see what happens now.
In the first branch all we’re doing is adding an extra condition that we don’t care about that example, pass it through to the next bit.
Code slide 10
Still failing. Same example in fact. Hypothesis will have remembered this example and just tried it again immediately. [Ed: This isn’t actually the case. I didn’t notice at the time but the email addresses are different. I think the way I was running examples for the talk made it so that they weren’t shared because they were saved under different keys]. And what’s happening here is that we’re adding a user to a project of limit 1, and then we’re adding them again. And it’s still raising that limit reached exception, and we’re not really sure what’s going on here. And the problem is that at this point Hypothesis is basically forcing us to be consistent and saying “What do you actually want to happen when I add the same user twice?”.
Code slide 11
So lets look at the code now.
The code is very simple. If the project is at the collaboration limit, raise a limit reached, otherwise just add the user to the project. And looking at this, this is inconsistent. Because what will happen is that if you are not at the collaboration limit this will work fine. Adding the user to the project will be a no op because that’s how many to many relationships work in Django. But if you are at the collaboration limit, even though the operation would have done nothing you still get the limit reached error. And basically we need to take a stance here and say either this should always be an error or this should never be an error because anything else is just silly.
Code slide 12
We arbitrarily pick that this should never be an error. It should behave like a no-op in all circumstances.
Code slide 13
And we re-run the test and this time it passes.
It passes in a slightly long period of time because it’s running quite a lot of examples. Often what you do is turn the number of examples down in development mode and then run this more seriously in the long term.
And that is pretty much it for Hypothesis.
I have an obligatory plug, which is that I do offer training and consulting around this library. You don’t need it to get started, you should try and get started before you pay me, but then you should pay me if you really want to. I am also potentially available for other contracting work if Hypothesis doesn’t sound that exciting to you.
And here are my details at the top. There is the Hypothesis documentation. And there are the links to these slides, available permanently on the internet for you to reread at your leisure.
Thank you very much. Any questions?
Nice tool, nice talk. One small point, if you are in front of a group and answer a question, you should repeat the question (summarized) before answering so others in the audience and those watching the video will know what you’re answering.
Thanks! Yeah, I realised afterwards that I’d forgotten to do the repetition thing and kicked myself for forgetting.
Pingback: Using Hypothesis with Factory Boy | David R. MacIver
Pingback: Notes on Hypothesis performance tuning | David R. MacIver