Author Archives: david

Thinking with the machine

Content note: Rambling and slightly incoherent.

When was the last time you got lost?

It used to be very easy for me to get lost. I have a terrible sense of direction. Now I have an excellent sense of direction. It’s a little black and glass oblong, fits in my pocket. I only really get lost if I’m out of power or data [edit: Or, as I discovered half an hour after writing this, driving and unable to access my phone].

I also tend not to forget commonly available information, because I can just type Google for the information.

It used to be the case that calculating pi was a life’s work. At the conference this weekend were complaining about how using Python it took seconds – sometimes even minutes – to get this sort of approximation.

If I want to think through a line of thought I can write it down and save it for later. Even just considering raw typing speed, this is about three times faster than doing so by hand, but it also offers unprecedented editing capabilities and an essentially unlimited amount of writing space. This allows me to coherently put together more complicated thoughts than I would ever be able to do unaided.

In a very real sense I am vastly more intelligent than someone of even a hundred years ago, let alone a thousand. It’s not that I’m more intelligent in some biological sense (though due to advances in nutrition and healthcare this may be true too), but I am augmented by the world around me and the tools available to me in such a way as to greatly boost my natural capabilities.

There’s a joke in AI circles that artificial Intelligence is whatever hasn’t been done yet [irony: Counter-example to my above claim about forgetting things. Googling for this phrase just turns up irrelevant stuff about AI risk. I had to result to my other source of transactive memory]. If we have figured out how to make a computer do it then it’s just calculating. The same seems to hold true for natural intelligence – once upon a time, a good memory was considered the hallmark of intelligence. Now it’s just a thing you use your computer for.

So we’ve got the useful things the computers do and the actual intelligence that we leave to the humans.

But there is a middle way, and I think that that way is where the really exciting stuff lies.

If you looked at my examples above, you might have noticed that as the bird says, one of these things is not like the others.

When I navigate the computer is doing the work. When I Google, there is art in asking the right question, but answering the question is all the computer’s work.

With writing in order to think through a problem though, the computer isn’t really doing the work. I am. The computer is lending me its capabilities – the ones that aren’t “really” intelligence, but that somehow when you add them to “real” intelligence you get something greater.

Computers are good at many things that we are not. In this case I am more or less using the computer as a working memory, because my working memory is pretty good but still bounded, while the computer’s is effectively infinite (in that it’s finite, but it’s so much larger than mine that I hit my limits in terms of how I can offload to it long before we hit its limits). The result is that the ecosystem of me plus the computer is something greater than the sum of our parts – I can use the computer’s strengths to remove my weaknesses, and the result is something that I could not have produced unaided, and the computer certainly couldn’t have produced.

This is also how I think of Hypothesis.

People talk about Quickcheck, or Hypothesis, finding a bug in their software. This is not correct. Hypothesis does not find bugs, people do. Hypothesis sure helps though.

There is software that finds bugs without you having to do anything other than run it. e.g. static analysis tools fall into this category. This is not what property based testing does. In property based testing you are still the one writing the tests, you are still the one finding the bugs, the computer is just there to help you out at the bits you’re bad at by doing the thing that computers do best: Repeating the same task over and over again really quickly.

When I started this post I thought I was going to be introducing the concept of “transhumanist software tools”. Software tools that work by augmenting human intelligence in order to help us write better software. There are some tools that I think are unambiguously of this style: Property based testing, interactive theorem provers, IDEs (in particular autocomplete).

But I think this is a wrong label. In much the same way that there is no such thing as a functional programming language, I don’t think there’s any such thing as a transhumanist software tool. It’s too fuzzy a category. Is a REPL transhumanist? Is a type system? The answer is obvious: “Kinda?”.

There is such a thing as transhumanist software development though: Software development where we lean heavily on the computer, and think in terms of how we can not just make the computer work for us but also with us.

And I think there’s a lot of potential to explore here. Right now we assume any task is either intrinsically human or is “automation”, where we just want to replace the people doing it with a small shell script, and the middle ground is really under explored.

Computers cannot write software (yet). But sometimes it feels like neither can humans. Perhaps together we can?

This entry was posted in Hypothesis, Python, Uncategorized on by .

Hypothesis continues to teach me

I’ve learned a lot technically in my work so far on Hypothesis. It’s both taught me interesting computer science things and also has I think caused me to level up a lot as a developer. It’s been a great, if occasionally frustrating, experience and I expect it will continue to be one for some time yet.

But that’s not what I’m learning about right now. As you’ve probably noticed, and I mentioned previously, I’ve not been doing a huge amount of development recently. There have been a couple patch releases for bug fixes and example quality but nothing very serious. I have some interesting work going on behind the scenes on finding multiple bugs with one test, but it’s probably a while off yet.

Because right now what I’m learning about because of Hypothesis is

  • Public speaking
  • Marketing
  • Pricing and sales

You know, “fluffy stuff”.

I’m also learning how to basically suck it up and admit I want things. A combination of geek and English social failings makes it very hard for me to do that. So when I put out a new project or write a blog post there’s always this weird dance of “yeah I totally just did this for me. I guess you can retweet it if you like, maybe star it on github, but whatever I don’t really care” followed by staring obsessively at every notification about it.

With Hypothesis it’s different, because there’s no pretence. I want Hypothesis to be popular. It will make the world a better place, and potentially it will make me some money (or at least help me recoup the money I effectively burned by taking a sabbatical to make it).

And this is weird to me, because it’s basically forcing me out of my shell and making me develop the skills I’ve always shunned. Public Speaking is something I assumed I would never be good at (turns out that I’m actually pretty OK at it. Maybe with some practice I’ll even be good). Sales and marketing have always been things where… I knew abstractly that they weren’t intrinsically evil, but they always felt dirty and I didn’t really want to have anything to do with them. This wasn’t my reasoned and held position so much as my subconscious biases at work, but those are if anything harder to go against.

With Hypothesis, I need to figure out how to promote it if I want people to use it, and I do want people to use it, so I’m forced into a sales and marketing position. Moreover, talking about it to new groups is one of the best things I can do to promote it, so this in turn forces me into public speaking.

Moreover, it’s fairly unambiguously a good thing for me to ask for money for it. I know I’ve done great work in Hypothesis, and I want to continue doing great work in Hypothesis, but in order to do that I also need to eat, have a place to live, etc.

Moreover it’s clearly a bad thing for me to undercharge! As well as value of labour, etc. etc. it’s a bad thing simply because I’m mostly not charging for the open source development part, so if I’m undercharging that means I have to do more work that isn’t that in order to make decent money, which will in turn mean that less work that benefits everybody gets done.

Not undercharging turns out to be hard. I’ve had multiple conversations with friends to the tune of “I was thinking of charging £X?” “Um. No. It would be cheap at £2X.” “I guess I could charge £Y?” “MORE MONEY” “OK OK how about £Z?” “Yeah I guess you could start there and raise your prices later”.  I understand where these numbers come from, and my friends are right and I am wrong, but that’s sure not how it feels.

Ultimately this is proving to be an… interesting experience. It’s super uncomfortable, as I’m having to go against all my social instincts and unlearn a lot of bad habits, but I think it will be a good thing for me, and hopefully it will be a good thing for Hypothesis too.

This entry was posted in Hypothesis, Python, Uncategorized on by .

Pydata London 2015

I’ve just spent the weekend at the Pydata London conference. It was great. I’m really more data science adjacent than a data scientist, so I’m not precisely the target audience, so I suspect if you are a data scientist it would have been amazing and you should go next year.

Obviously the true highlight of the conference was the 5 minute lightning talk I gave at the end in which I totally stole the show (note: This is not true. But peopled seemed to like it). I demoed using Hypothesis to test an optimization function. If you’re interested, slides are here, and there’s also a very rough script which I didn’t really follow but gives you an idea of what I was doing.

The actual highlights for me were:

  • Romain Guillebert talking about pypy, C extensions, and in particular about pymetabiosis, which lets you use C extensions seamlessly from pypy by embedding CPython as a library inside pypy using CFFI (!!). This is a pretty great idea.
  • Juan Luis Cano’s lightning talk about poliastro, a python astrodynamics library.
  • James Powell’s talk “Integrating with the vernacular”, which was basically a talk about how weird numpy is and made me go “oh god someone understands my pain” as he enumerated everything that has ever given me problems with supporting and using it. Don’t get me wrong, numpy is great, but it is so weird, and it doesn’t obey the contracts of any of the standard methods it implements.

Obviously this exposes my “not really a data scientist” biases and other people will have a different set of highlights.

I also collected a bunch of interesting projects to look into further:

  • Apache Tika and textract both look vastly better than the tools I had available last time I wanted to turn arbitrary document formats into text.
  • I already knew about this one, but I was reminded that pymc is a thing I really need to have a proper play with.
  • theano looks pretty great for doing computation heavy stuff from Python.
  • The Github organisation page for the Harvard Intelligent Probabilistic Systems Group looks fascinating. I’m going to have to do a trawl through their projects at some point.

Best quote of the conference for me:

This was suggested specifically for numpy and similar because it helps numfocus, the non-profit foundation supporting them, to get funding, but I think it is also both intended and true in general.

Thanks once again to Pydata and its organising team for putting on a great conference.

This entry was posted in Python, Uncategorized on by .

We made this

You know that thing when after you’ve had an argument and walked away from it, you suddenly realise what you should have said in the argument?

The longest interval for that with me was about two years.

During my more experimental student days I went to a debate group about the existence of god. I was young and foolish and wanted the free food, and it wasn’t run by the society whose name I dare not speak lest I summon by their attention (I think it was run by the Islam society).

It was not a very deep level of discussion, but one argument stuck in my memory, mostly because of how bad I thought it was.

Among the free food we had were strawberries. A woman at the event cited the perfect strawberry she was holding as evidence of the divine. Look at how great it is: It’s large, attractive, juicy, and delicious. How could it not have been made for us? Isn’t its perfection evidence of a designer?

At the time I dismissed this as an obviously silly argument and didn’t take much notice of it, which is why it took me so long to realise that she was entirely and completely correct.

The beautiful strawberry she was holding was designed. A higher power crafted it and made the strawberry, shaping it to be pleasing to the eye and the mouth. Mere chance, or even an unguided evolution, could not and would not have produced such a thing.

That higher power? That was us. We worked really hard at it. Thanks for noticing.

I don’t know if you’ve ever encountered a wild strawberry, but it’s a tiny dry berry smaller than my little fingernail. It’s pretty delicious, but it’s neither attractive nor large, and it’s certainly not juicy.

We spent about 700 years taking those wild strawberries and shaping them into the one she was holding. Mere chance didn’t create it, we did.

It doesn’t stop at strawberries. Almost everything we eat is of our creation, vastly different from any wild ancestor.

The land too, is mostly of our creation. The rolling hills and green pastures we think of as untouched wilderness are mostly the result of a history of cultivation and deforestation.

England, the land which I mostly call home, is generally an extremely benign environment. That’s because we killed everything that threatened us and destroyed the habitat of much of the rest.

We’ve spent most of, depending on how you count and what you consider to be “us”, the last 10 to 100 thousand years shaping the world according to our desires.

Our desires aren’t always very sensible, and we’re often really bad at accommodating them in a way we won’t regret later, but the fact remains that the world we live in is in fact mere thousands of years old, and was intelligently designed.

We worked really hard at it. Thanks for noticing.

This entry was posted in Uncategorized on by .

Hypothesis for Django

On Tuesday I gave a talk to the London Django Meetup Group, who are a lovely crowd. The theme was (clue is in the title, but really what else would I be speaking about?) “Hypothesis for Django”. Aside from a few lightning talks and one or two really half-arsed barcamp sessions this was my first real public speaking. Given that, if I do say so myself it went unreasonably well.

Anyway, thanks to SkillsMatter who kindly recorded the event, the video is now up. For those who would prefer text (me too), the slides are at https://bit.ly/hypothesis-for-django, and I have written up a transcription for you:

Starting slide

Ok, right. So.

Who am I?

Hi. As per what I said I’m here to talk to you about Hypothesis for Django.
I am David R. MacIver. The R is a namespacing thing. There are a lot of David MacIvers. I’m not most of them.
I wrote Hypothesis. And I have no idea what I’m doing.
I don’t actually know Django very well. I write tools for people who know Django much better than me but they’re the ones writing the Django applications, it’s usually not me. So if I get anything wrong on the Dango front, I apologise in advance for that. If I get anything wrong on the Hypothesis front, I really should know better but I’ve not actually done a presentation about it before now so please bear with me.

What is Hypothesis?

So what is this Hypothesis thing I’m here to talk to you about?
It’s a testing framework [Ed: I hate that I said this. It’s a library, not a framework]. It’s based on a Haskell library called Quickcheck… and you don’t need to run away.
There’s apparently a major problem where people come to the Hypothesis documentation, and they see the word Haskell, and they just go “Oh god, this is going to be really complicated, I’m not going to do this right now”, and they leave. I’ve spent a lot of time making sure Hypothesis is actually very Pythonic. If you know Haskell, a few bits will look familiar. If you don’t know Haskell, that’s really fine, you don’t need to at all. I will never mention the word Monad again after this point.
And the basic idea of this style of testing is that you write your tests almost like normal, but instead of you writing the examples the testing library does that for you. You tell it “I want examples that look roughly like this”, and it gives you a bunch of examples that look roughly like that. It then runs your tests against these examples and if any of them fail it turns them into a smaller example that basically says “Hey, you’ve got a bug in your code”.
And it integrates well with your existing testing libraries. You don’t need to use my own custom test runners with this. It works in unittest, it works in pytest, it works in nose, it should work in anything but those are the ones I tested it on. And of course, it works well with Django. It both works with the existing Django unit test runner and there’s also some specific custom support for it, which is what we will be using today.

The Setup

So here is what we’re going to be doing. We have some Django project that we’re testing the backend of and we have two models that we care about. One of them is User, one of them is Project. User in this case isn’t actually a standard Django auth user. It could be. It would perfectly well if it was, I just sort of forgot they existed while I was writing the example. See “I don’t know Django”. And, basically, Projects have Users collaborating on them and every Project has a max number of users it is allowed to have. That would presumably in a real application be set by billing, but we’re not doing that. We just have a number. And if you try to add more users to the project than are allowed then you will get an error.
And what we’re going to do is that we’re going to start from a fairly normal, boring test using standard Django stuff that you’ve probably seen a thousand things like it before. And first of all we’re going to refactor it to use Hypothesis and in the process hopefully the test should become clearer and more correctly express our intent and once we’ve done that we’re going to let Hypothesis have some fun and basically refactor the test to do a lot more and find a bug in the process.

Code slide 1

Here is our starting point. This obviously in any well tested application this would be only one test amongst many, but it’s the only test we’re going to look at today. We want to test that you actually can add users to a project up to the limit, and this test would currently past even if we never implemented the limit in the first place, we’re just saying we can create a project, it has a limit of 3, we add 3 users, alex, kim and pat to it and we assert after that that they’re all on the project.
Like I say, you’ve seen tests like this a thousand times before, which makes it easy to sort of fail to notice that it’s actually quite bad. And the major problem with it is that it has lots of distracting details that absolutely don’t matter for the test. Basic distracting details: A project has a name, the users have email addresses, there are exactly 3 users and a collaboration limit of 3, and which of these details actually matter? It’s completely not obvious from the test. It would be really surprising if the project name mattered. It probably isn’t the case that the user emails matter. It might be the case. They’re all from the same domain for example. Is there some custom domain support? Who knows? Test doesn’t say. You’d have to look at the code to say. And the real stinker is the 3. What’s special about 3? Again, probably nothing, but often like 0, 1 and 2 are special cases so is 3 there because it’s the first non special number? Who knows? Test doesn’t say.

Code slide 2

So let us throw all of that away. And what we’ve done here is we’ve taken exactly the same test, we’ve not thrown away the 3 for now, we’ve thrown everything else away, and we have said “Hypothesis, please give me some examples”. And what happens here is we accept all of these as function arguments and the decorator tells Hypothesis how it can provide these to us. And we’ve told them that our final 3 arguments are Users, the models function is a thing from Hypothesis that just says “Generate me an instance of this Django model”. It does automatic introspection on your models to figure out how to build them, but as you can see from the Project example you can also override any individual one. Here we’ve got a collaborator limit set to 3, just is a function that returns a trivial strategy that always returns the same value. One final thing to note here is that we had to use our own test runner. That’s due to technical reasons with transaction management. It works exactly the same as a normal Django test runner, it just does a little bit more that we need for these to work.
And what will happen when you try to run this test is pretty much the same thing that happened when we ran the previous version of the test, except that it will run it multiple times with different instances matching this. And unlike, say, a fixture which we could have used for this, genuinely the details that aren’t present don’t matter, because if they’re not present then they won’t be satisfied because Hypothesis will try something else as well.
So this should hopefully be a slightly, once you’re familiar with the Hypothesis syntax, a slightly clearer version of the original test which doesn’t have any of those distracting details.

Code slide 3

We will just clean up slightly further en route to making it better yet, en route to making it better yet and getting rid of that three, and say that rather than giving each of these a name, given that we don’t actually care about their names now we’re going to ask for lists. And the way this works is that we take our models(User) function and say that we want lists of that. We can specify the min and max size, there isn’t a precise size function but that’s fine, so in this case the collaborators function argument is now being passed a list of precisely 3 users. And otherwise this test works the same way as before. We add each collaborator to the project and then we assert that they are on the team. And otherwise this is the same as the previous one, and in particular the 3 is still there. Lets kill the 3.

Code slide 4

What we are doing now is that we have opened up the range of values that the collaborator limit can take. We’ve told it that its minimum value is zero, you can’t have fewer than zero collaborators, and its maximum value is 20. The 20 is still a bit distracting, but it’s needed there for performance basically. Because otherwise Hypothesis would be trying to generate really massive lists, and this can work fine. It can generate really massive lists, but then it will take forever on any individual test run and then it’s running the tests on, depending on configuration, possibly 200 times, you’ll probably want to configure it lower than that, and that will just take ages and wont’ do much useful, so 20 is a good number. Similarily we’ve capped our lists of users at length 20 because we don’t want more users than collaborators right now.
And the only other interesting detail over the previous one is that we’ve got this assume function call. And what this is saying is that we need this condition to be satisfied in order for Hypothesis to give us, in order for this to be a good example. What this test is currently testing is that when there are fewer collaborators than project limit and anything else isn’t interesting for this test. And it’s more or less the same thing as if we just said if this is not true return early, but the difference is that Hypothesis will try to give you fewer examples that don’t satisfy this and so that if you accidentally write your test so that it’s not doing anything useful, Hypothesis will complain at you. It will say “All of the examples I gave you were bad. What did you want me to do?”. Again, otherwise this is pretty much the same as before. We have a project, we have a list of users, we are adding users to the project and asserting that they’re in afterwards. And the users must be fewer than the collaborator limit.
And this is pretty much, this is as far as I’m concerned a better version of the test we started with. It more carefully specifies what the behaviour that you had, and doesn’t have any of that distracting detail, and as a nice side benefit when we change the shape of our models it will just continue working. The test doesn’t really know anything about how to create a model or anything like that. From that part, we’re done. This runs fine, it tests
[Audience question is inaudible. From what I recall it was about how assume worked: Checking that what happens is that the two arguments are drawn independently and then the assume filters out ones that don’t match]
Yes. Yes, exactly. It filters them out and it also does a little bit of work to make sure you get fewer examples like that in future.
And yeah. So, this test runs fine, and everything seems to be working. I guess we’ve written bug free code. Woo.
Turns out we didn’t write bug free code. So lets see if we can get Hypothesis to prove that to us. What we’re going to do now is just a sort of data driven testing where we give Hypothesis free reign and just see what breaks. We’re going to remove this assume call and this code should break when we remove this assume call, because we have this collaborator limit and we’re going to exceed the collaborator limit and that should give us an exception.

Code slide 5

So this is the change, all we’ve done is remove the assume.

Code slide 6

And we get an exception! And Hypothesis tells us the example, it says “I created a project with a collaborator limit of 0, I tried to add a user to it, I got an exception”. That’s what’s supposed to happen, excellent!

Code slide 7

So lets change the test. Now what we do when we are adding the user is we check that if the project is at the collaborator limit something different should happen. We should fail to add the user and then the user should not be on the project and otherwise we should add the user and the user should be on the project. We’ve also inlined the assert true next to the adding because this way we can do each branch separately, but that shouldn’t change the logic.

Code slide 8

Now we run this again and Hypothesis tells us that our test is still causing an error. And what’s happened here is that Hypothesis has tried to add the same user twice, and afterwards it’s saying… and even though we’re at the collaborator limit, afterwards it’s saying the user is still on the project. Well, OK, so the users should still be on the project because the user started on the project.

Code slide 9

So lets just exclude that option from that branch and see what happens now.
In the first branch all we’re doing is adding an extra condition that we don’t care about that example, pass it through to the next bit.

Code slide 10

Still failing. Same example in fact. Hypothesis will have remembered this example and just tried it again immediately. [Ed: This isn’t actually the case. I didn’t notice at the time but the email addresses are different. I think the way I was running examples for the talk made it so that they weren’t shared because they were saved under different keys]. And what’s happening here is that we’re adding a user to a project of limit 1, and then we’re adding them again. And it’s still raising that limit reached exception, and we’re not really sure what’s going on here. And the problem is that at this point Hypothesis is basically forcing us to be consistent and saying “What do you actually want to happen when I add the same user twice?”.

Code slide 11

So lets look at the code now.
The code is very simple. If the project is at the collaboration limit, raise a limit reached, otherwise just add the user to the project. And looking at this, this is inconsistent. Because what will happen is that if you are not at the collaboration limit this will work fine. Adding the user to the project will be a no op because that’s how many to many relationships work in Django. But if you are at the collaboration limit, even though the operation would have done nothing you still get the limit reached error. And basically we need to take a stance here and say either this should always be an error or this should never be an error because anything else is just silly.

Code slide 12

We arbitrarily pick that this should never be an error. It should behave like a no-op in all circumstances.

Code slide 13

And we re-run the test and this time it passes.
It passes in a slightly long period of time because it’s running quite a lot of examples. Often what you do is turn the number of examples down in development mode and then run this more seriously in the long term.
And that is pretty much it for Hypothesis.

Obligatory plug

I have an obligatory plug, which is that I do offer training and consulting around this library. You don’t need it to get started, you should try and get started before you pay me, but then you should pay me if you really want to. I am also potentially available for other contracting work if Hypothesis doesn’t sound that exciting to you.

Details

And here are my details at the top. There is the Hypothesis documentation. And there are the links to these slides, available permanently on the internet for you to reread at your leisure.
Thank you very much. Any questions?

This entry was posted in Hypothesis, Python, Uncategorized on by .