Category Archives: Uncategorized

Pydata London 2015

I’ve just spent the weekend at the Pydata London conference. It was great. I’m really more data science adjacent than a data scientist, so I’m not precisely the target audience, so I suspect if you are a data scientist it would have been amazing and you should go next year.

Obviously the true highlight of the conference was the 5 minute lightning talk I gave at the end in which I totally stole the show (note: This is not true. But peopled seemed to like it). I demoed using Hypothesis to test an optimization function. If you’re interested, slides are here, and there’s also a very rough script which I didn’t really follow but gives you an idea of what I was doing.

The actual highlights for me were:

  • Romain Guillebert talking about pypy, C extensions, and in particular about pymetabiosis, which lets you use C extensions seamlessly from pypy by embedding CPython as a library inside pypy using CFFI (!!). This is a pretty great idea.
  • Juan Luis Cano’s lightning talk about poliastro, a python astrodynamics library.
  • James Powell’s talk “Integrating with the vernacular”, which was basically a talk about how weird numpy is and made me go “oh god someone understands my pain” as he enumerated everything that has ever given me problems with supporting and using it. Don’t get me wrong, numpy is great, but it is so weird, and it doesn’t obey the contracts of any of the standard methods it implements.

Obviously this exposes my “not really a data scientist” biases and other people will have a different set of highlights.

I also collected a bunch of interesting projects to look into further:

  • Apache Tika and textract both look vastly better than the tools I had available last time I wanted to turn arbitrary document formats into text.
  • I already knew about this one, but I was reminded that pymc is a thing I really need to have a proper play with.
  • theano looks pretty great for doing computation heavy stuff from Python.
  • The Github organisation page for the Harvard Intelligent Probabilistic Systems Group looks fascinating. I’m going to have to do a trawl through their projects at some point.

Best quote of the conference for me:

This was suggested specifically for numpy and similar because it helps numfocus, the non-profit foundation supporting them, to get funding, but I think it is also both intended and true in general.

Thanks once again to Pydata and its organising team for putting on a great conference.

This entry was posted in Python, Uncategorized on by .

We made this

You know that thing when after you’ve had an argument and walked away from it, you suddenly realise what you should have said in the argument?

The longest interval for that with me was about two years.

During my more experimental student days I went to a debate group about the existence of god. I was young and foolish and wanted the free food, and it wasn’t run by the society whose name I dare not speak lest I summon by their attention (I think it was run by the Islam society).

It was not a very deep level of discussion, but one argument stuck in my memory, mostly because of how bad I thought it was.

Among the free food we had were strawberries. A woman at the event cited the perfect strawberry she was holding as evidence of the divine. Look at how great it is: It’s large, attractive, juicy, and delicious. How could it not have been made for us? Isn’t its perfection evidence of a designer?

At the time I dismissed this as an obviously silly argument and didn’t take much notice of it, which is why it took me so long to realise that she was entirely and completely correct.

The beautiful strawberry she was holding was designed. A higher power crafted it and made the strawberry, shaping it to be pleasing to the eye and the mouth. Mere chance, or even an unguided evolution, could not and would not have produced such a thing.

That higher power? That was us. We worked really hard at it. Thanks for noticing.

I don’t know if you’ve ever encountered a wild strawberry, but it’s a tiny dry berry smaller than my little fingernail. It’s pretty delicious, but it’s neither attractive nor large, and it’s certainly not juicy.

We spent about 700 years taking those wild strawberries and shaping them into the one she was holding. Mere chance didn’t create it, we did.

It doesn’t stop at strawberries. Almost everything we eat is of our creation, vastly different from any wild ancestor.

The land too, is mostly of our creation. The rolling hills and green pastures we think of as untouched wilderness are mostly the result of a history of cultivation and deforestation.

England, the land which I mostly call home, is generally an extremely benign environment. That’s because we killed everything that threatened us and destroyed the habitat of much of the rest.

We’ve spent most of, depending on how you count and what you consider to be “us”, the last 10 to 100 thousand years shaping the world according to our desires.

Our desires aren’t always very sensible, and we’re often really bad at accommodating them in a way we won’t regret later, but the fact remains that the world we live in is in fact mere thousands of years old, and was intelligently designed.

We worked really hard at it. Thanks for noticing.

This entry was posted in Uncategorized on by .

Hypothesis for Django

On Tuesday I gave a talk to the London Django Meetup Group, who are a lovely crowd. The theme was (clue is in the title, but really what else would I be speaking about?) “Hypothesis for Django”. Aside from a few lightning talks and one or two really half-arsed barcamp sessions this was my first real public speaking. Given that, if I do say so myself it went unreasonably well.

Anyway, thanks to SkillsMatter who kindly recorded the event, the video is now up. For those who would prefer text (me too), the slides are at https://bit.ly/hypothesis-for-django, and I have written up a transcription for you:

Starting slide

Ok, right. So.

Who am I?

Hi. As per what I said I’m here to talk to you about Hypothesis for Django.
I am David R. MacIver. The R is a namespacing thing. There are a lot of David MacIvers. I’m not most of them.
I wrote Hypothesis. And I have no idea what I’m doing.
I don’t actually know Django very well. I write tools for people who know Django much better than me but they’re the ones writing the Django applications, it’s usually not me. So if I get anything wrong on the Dango front, I apologise in advance for that. If I get anything wrong on the Hypothesis front, I really should know better but I’ve not actually done a presentation about it before now so please bear with me.

What is Hypothesis?

So what is this Hypothesis thing I’m here to talk to you about?
It’s a testing framework [Ed: I hate that I said this. It’s a library, not a framework]. It’s based on a Haskell library called Quickcheck… and you don’t need to run away.
There’s apparently a major problem where people come to the Hypothesis documentation, and they see the word Haskell, and they just go “Oh god, this is going to be really complicated, I’m not going to do this right now”, and they leave. I’ve spent a lot of time making sure Hypothesis is actually very Pythonic. If you know Haskell, a few bits will look familiar. If you don’t know Haskell, that’s really fine, you don’t need to at all. I will never mention the word Monad again after this point.
And the basic idea of this style of testing is that you write your tests almost like normal, but instead of you writing the examples the testing library does that for you. You tell it “I want examples that look roughly like this”, and it gives you a bunch of examples that look roughly like that. It then runs your tests against these examples and if any of them fail it turns them into a smaller example that basically says “Hey, you’ve got a bug in your code”.
And it integrates well with your existing testing libraries. You don’t need to use my own custom test runners with this. It works in unittest, it works in pytest, it works in nose, it should work in anything but those are the ones I tested it on. And of course, it works well with Django. It both works with the existing Django unit test runner and there’s also some specific custom support for it, which is what we will be using today.

The Setup

So here is what we’re going to be doing. We have some Django project that we’re testing the backend of and we have two models that we care about. One of them is User, one of them is Project. User in this case isn’t actually a standard Django auth user. It could be. It would perfectly well if it was, I just sort of forgot they existed while I was writing the example. See “I don’t know Django”. And, basically, Projects have Users collaborating on them and every Project has a max number of users it is allowed to have. That would presumably in a real application be set by billing, but we’re not doing that. We just have a number. And if you try to add more users to the project than are allowed then you will get an error.
And what we’re going to do is that we’re going to start from a fairly normal, boring test using standard Django stuff that you’ve probably seen a thousand things like it before. And first of all we’re going to refactor it to use Hypothesis and in the process hopefully the test should become clearer and more correctly express our intent and once we’ve done that we’re going to let Hypothesis have some fun and basically refactor the test to do a lot more and find a bug in the process.

Code slide 1

Here is our starting point. This obviously in any well tested application this would be only one test amongst many, but it’s the only test we’re going to look at today. We want to test that you actually can add users to a project up to the limit, and this test would currently past even if we never implemented the limit in the first place, we’re just saying we can create a project, it has a limit of 3, we add 3 users, alex, kim and pat to it and we assert after that that they’re all on the project.
Like I say, you’ve seen tests like this a thousand times before, which makes it easy to sort of fail to notice that it’s actually quite bad. And the major problem with it is that it has lots of distracting details that absolutely don’t matter for the test. Basic distracting details: A project has a name, the users have email addresses, there are exactly 3 users and a collaboration limit of 3, and which of these details actually matter? It’s completely not obvious from the test. It would be really surprising if the project name mattered. It probably isn’t the case that the user emails matter. It might be the case. They’re all from the same domain for example. Is there some custom domain support? Who knows? Test doesn’t say. You’d have to look at the code to say. And the real stinker is the 3. What’s special about 3? Again, probably nothing, but often like 0, 1 and 2 are special cases so is 3 there because it’s the first non special number? Who knows? Test doesn’t say.

Code slide 2

So let us throw all of that away. And what we’ve done here is we’ve taken exactly the same test, we’ve not thrown away the 3 for now, we’ve thrown everything else away, and we have said “Hypothesis, please give me some examples”. And what happens here is we accept all of these as function arguments and the decorator tells Hypothesis how it can provide these to us. And we’ve told them that our final 3 arguments are Users, the models function is a thing from Hypothesis that just says “Generate me an instance of this Django model”. It does automatic introspection on your models to figure out how to build them, but as you can see from the Project example you can also override any individual one. Here we’ve got a collaborator limit set to 3, just is a function that returns a trivial strategy that always returns the same value. One final thing to note here is that we had to use our own test runner. That’s due to technical reasons with transaction management. It works exactly the same as a normal Django test runner, it just does a little bit more that we need for these to work.
And what will happen when you try to run this test is pretty much the same thing that happened when we ran the previous version of the test, except that it will run it multiple times with different instances matching this. And unlike, say, a fixture which we could have used for this, genuinely the details that aren’t present don’t matter, because if they’re not present then they won’t be satisfied because Hypothesis will try something else as well.
So this should hopefully be a slightly, once you’re familiar with the Hypothesis syntax, a slightly clearer version of the original test which doesn’t have any of those distracting details.

Code slide 3

We will just clean up slightly further en route to making it better yet, en route to making it better yet and getting rid of that three, and say that rather than giving each of these a name, given that we don’t actually care about their names now we’re going to ask for lists. And the way this works is that we take our models(User) function and say that we want lists of that. We can specify the min and max size, there isn’t a precise size function but that’s fine, so in this case the collaborators function argument is now being passed a list of precisely 3 users. And otherwise this test works the same way as before. We add each collaborator to the project and then we assert that they are on the team. And otherwise this is the same as the previous one, and in particular the 3 is still there. Lets kill the 3.

Code slide 4

What we are doing now is that we have opened up the range of values that the collaborator limit can take. We’ve told it that its minimum value is zero, you can’t have fewer than zero collaborators, and its maximum value is 20. The 20 is still a bit distracting, but it’s needed there for performance basically. Because otherwise Hypothesis would be trying to generate really massive lists, and this can work fine. It can generate really massive lists, but then it will take forever on any individual test run and then it’s running the tests on, depending on configuration, possibly 200 times, you’ll probably want to configure it lower than that, and that will just take ages and wont’ do much useful, so 20 is a good number. Similarily we’ve capped our lists of users at length 20 because we don’t want more users than collaborators right now.
And the only other interesting detail over the previous one is that we’ve got this assume function call. And what this is saying is that we need this condition to be satisfied in order for Hypothesis to give us, in order for this to be a good example. What this test is currently testing is that when there are fewer collaborators than project limit and anything else isn’t interesting for this test. And it’s more or less the same thing as if we just said if this is not true return early, but the difference is that Hypothesis will try to give you fewer examples that don’t satisfy this and so that if you accidentally write your test so that it’s not doing anything useful, Hypothesis will complain at you. It will say “All of the examples I gave you were bad. What did you want me to do?”. Again, otherwise this is pretty much the same as before. We have a project, we have a list of users, we are adding users to the project and asserting that they’re in afterwards. And the users must be fewer than the collaborator limit.
And this is pretty much, this is as far as I’m concerned a better version of the test we started with. It more carefully specifies what the behaviour that you had, and doesn’t have any of that distracting detail, and as a nice side benefit when we change the shape of our models it will just continue working. The test doesn’t really know anything about how to create a model or anything like that. From that part, we’re done. This runs fine, it tests
[Audience question is inaudible. From what I recall it was about how assume worked: Checking that what happens is that the two arguments are drawn independently and then the assume filters out ones that don’t match]
Yes. Yes, exactly. It filters them out and it also does a little bit of work to make sure you get fewer examples like that in future.
And yeah. So, this test runs fine, and everything seems to be working. I guess we’ve written bug free code. Woo.
Turns out we didn’t write bug free code. So lets see if we can get Hypothesis to prove that to us. What we’re going to do now is just a sort of data driven testing where we give Hypothesis free reign and just see what breaks. We’re going to remove this assume call and this code should break when we remove this assume call, because we have this collaborator limit and we’re going to exceed the collaborator limit and that should give us an exception.

Code slide 5

So this is the change, all we’ve done is remove the assume.

Code slide 6

And we get an exception! And Hypothesis tells us the example, it says “I created a project with a collaborator limit of 0, I tried to add a user to it, I got an exception”. That’s what’s supposed to happen, excellent!

Code slide 7

So lets change the test. Now what we do when we are adding the user is we check that if the project is at the collaborator limit something different should happen. We should fail to add the user and then the user should not be on the project and otherwise we should add the user and the user should be on the project. We’ve also inlined the assert true next to the adding because this way we can do each branch separately, but that shouldn’t change the logic.

Code slide 8

Now we run this again and Hypothesis tells us that our test is still causing an error. And what’s happened here is that Hypothesis has tried to add the same user twice, and afterwards it’s saying… and even though we’re at the collaborator limit, afterwards it’s saying the user is still on the project. Well, OK, so the users should still be on the project because the user started on the project.

Code slide 9

So lets just exclude that option from that branch and see what happens now.
In the first branch all we’re doing is adding an extra condition that we don’t care about that example, pass it through to the next bit.

Code slide 10

Still failing. Same example in fact. Hypothesis will have remembered this example and just tried it again immediately. [Ed: This isn’t actually the case. I didn’t notice at the time but the email addresses are different. I think the way I was running examples for the talk made it so that they weren’t shared because they were saved under different keys]. And what’s happening here is that we’re adding a user to a project of limit 1, and then we’re adding them again. And it’s still raising that limit reached exception, and we’re not really sure what’s going on here. And the problem is that at this point Hypothesis is basically forcing us to be consistent and saying “What do you actually want to happen when I add the same user twice?”.

Code slide 11

So lets look at the code now.
The code is very simple. If the project is at the collaboration limit, raise a limit reached, otherwise just add the user to the project. And looking at this, this is inconsistent. Because what will happen is that if you are not at the collaboration limit this will work fine. Adding the user to the project will be a no op because that’s how many to many relationships work in Django. But if you are at the collaboration limit, even though the operation would have done nothing you still get the limit reached error. And basically we need to take a stance here and say either this should always be an error or this should never be an error because anything else is just silly.

Code slide 12

We arbitrarily pick that this should never be an error. It should behave like a no-op in all circumstances.

Code slide 13

And we re-run the test and this time it passes.
It passes in a slightly long period of time because it’s running quite a lot of examples. Often what you do is turn the number of examples down in development mode and then run this more seriously in the long term.
And that is pretty much it for Hypothesis.

Obligatory plug

I have an obligatory plug, which is that I do offer training and consulting around this library. You don’t need it to get started, you should try and get started before you pay me, but then you should pay me if you really want to. I am also potentially available for other contracting work if Hypothesis doesn’t sound that exciting to you.

Details

And here are my details at the top. There is the Hypothesis documentation. And there are the links to these slides, available permanently on the internet for you to reread at your leisure.
Thank you very much. Any questions?

This entry was posted in Hypothesis, Python, Uncategorized on by .

Thoughts on Strangeloop and Moldbug

I was doing very well at not engaging with this, and then I got into a Twitter conversation about it last night. This was about as frustrating as you would expect given the limitations of the medium, so now I feel compelled to write out my thoughts in long form.

For those just joining us: Curtis Yarvin, aka Mencius Moldbug, was going to be talking about his software, Urbit, at Strange Loop. Someone made the connection “Hey isn’t this guy that massive racist online?”, this blew up, and now he has been uninvited from the conference. Naturally this has a lot of people very angry about things on both sides.

I have mixed feelings on the subject, mostly due to an inability to hold any stance other than “it’s complicated”. I’m perfectly comfortable with banning him, and I think it was the right call, but I also probably wouldn’t have condemned a decision to not ban him.

Essentially the following are what I consider the two reasonable approaches:

  1. “No part time assholes”. We don’t care if he would obey the code of conduct, we still don’t want him. We are building a community here and we do not want known racists to be a part of that even if they agree to play nice because it will bias strongly in favour of people who can tolerate racists and against people who will never be comfortable in their presence even if they are playing nice.
  2. “The ideas are what are important”. There are plenty of great ideas that came from terrible people. As long as those people agree to obey the code of conduct and we have a reasonable expectation that they will (e.g. they don’t have a history of abusive behaviour at conferences, they’ve not previously claimed they would obey a CoC and then failed to do so, etc), if they have something interesting to say we are prepared to hear it.

(Note that I do not consider the version without an enforced code of conduct a reasonable position. If you can’t guarantee that you will protect the safety of people attending your conference you have no business running a conference).

I have a strong personal preference for the former, as it creates the sort of communities I think we need more of and that I personally want to be a part of, but I think “the ideas are what are important” style conferences are also useful. There are terrible people who have otherwise great ideas that are worth spreading, and a world in which they only get to speak at McRacismConf isn’t actually a better one, because the people who still want to hear those ideas will end up going to McRacismConf to hear them and being exposed to more racism, and the people who don’t want to go to McRacismConf will miss out on some useful ideas.

Edit to add: McRacismConf is indeed a bit of a straw man. The real failure mode here isn’t conferences about racism, it’s unchecked conferences without a code of conduct with a plethora of assholes. The problem is that by insisting that conferences hold to the no part time asshole rule you create an incentive for people to go to conferences which are welcoming to full time assholes.

The problem is that if you have known racists or other bigots speaking, people from marginalized groups will make the entirely reasonable threat assessment that it’s probably not going to be a great environment for them and steer clear. This is bad because excluding marginalized people from all your industry’s conferences is bad, but it’s also bad even if you only care about the ideas.  There are also a lot of people from marginalized groups who have great ideas and you’re going to be missing out on those in the “the ideas are what are important” conferences.

So I think there is need for both approaches and a lack of a one size fits all solution. However, I also think you need a lot more communities which exclude part time assholes (especially given we have so many full time assholes in tech, and such a problem with already excluding marginalized people), and I am glad that The Strange Loop have decided to be one of them.

This entry was posted in Uncategorized on by .

Large scale utilitarianism and dust motes

Content note: Some dispassionate discussion of torture due to source material. No graphic descriptions. Some discussion of murder, mediated by various classic ethical dilemmas around trolleys.

Epistemic status: I think this is right, but I’m not sure it results in useful conclusions. At any rate, this was interesting for me to think about.

I’d like to talk about a thought experiment which comes from Less Wrong (of which I am not a member, but am an occasionally interested reader). Torture vs Dust Specks. There is also Sublimity vs Youtube, which is intended to be a less polarizing framing. In this post I’m going to abstract away slightly and refer to suffering vs inconvenience.

The experiment is this: Let N be some unimaginably huge number. It’s chosen to be 3^^^3 in the original post, but for our purposes it’s sufficient that N be significantly greater than the number of atoms in the universe. You may choose between two options. In the negative version, one person suffers horribly for an extended period of time, or each of N people experience a tiny momentary inconvenience. In the positive version, one person gets to experience a life of supreme bliss and fulfilment, or each of N people experience about a second of moderate amusement and contentment. Which of these options do you choose?

What this experiment is supposed to do is point out a consequence of additive Utilitarianism with real valued scores. Irritation/contentment has a non-zero but small utility (negative in one case, positive in the other), whileas suffering/sublimity has a large non-zero utility, but not N times as large. Therefore by “shutting up and multiplying” it’s clearly better to have the large number of small utilities because they add up to a vastly bigger number. So you should respectively choose individual suffering as the lesser evil and mild contentment as the greater good.

I don’t generally agree with this sort of additive utilitarianism and I’ve previously considered this result… not necessarily wrong, but suspicious. Sufficiently far from the realm of possible experience that you can’t really draw any useful conclusions from it. Still, my moral intuitions are for preferring irritation over suffering, and I don’t really have a strong moral intuition for contentment vs sublimity but lean vaguely in the direction of contentment.

I recently had a mental reframing of the concept that has actually caused me to agree with the utilitarian answer: You should clearly choose contentment and suffering respectively.

The reframing is probably obvious if you’re a decision theorist and believe in things like Von Neumann-Morgenstern utility functions, and if you’re such a person you’ll think I’m just doing a proof from the axioms. I’m not such a person, but in this case I think the formulation is revealing.

The reframing is this: The natural interpretation of this question is in terms of “Would you cause this specific person to suffer to prevent the dustmotepocalpyse?”. This is essentially the fat man version of the trolley problem. It personalizes it. The correct formulation, which from a utilitarian point of view is ethically equivalent, is that a randomly chosen individual amongst these N will be  caused to suffer.

For me this becomes much simpler to reason about.

First, lets consider another reformulation: Instead of having a guaranteed individual amongst the N who suffers, your choice is that either each individual gets a dust mote or each individual has a probability \(\frac{1}{N}\) of suffering.

These are not exactly equivalent: In this case the number of people suffering follows a Poisson distribution (technically it’s not exactly a Poisson distribution, but it’s close enough that no physically possible experiment can discern them). However I find I am basically indifferent between them. The expected amount of suffering is the same, and the variance isn’t large enough that I think it matters. I’m prepared to say these are as good as morally equivalent (certainly they are in the utilitarian formulation).

And this now has decoupled each of the N people and we can reduce it to a decision about one person.

So, on the individual level, which do you choose? A \(\frac{1}{N}\) chance of suffering or a tiny inconvenience?

I argue that choosing the chance of suffering is basically the only reasonable conclusion and that if you would choose otherwise then you don’t understand how large N is.

N is so large that if I were to give you the option to replace every dust mote equivalent piece of annoyance with a \(\frac{1}{N}\) chance of suffering then your chances of dying of a heart attack just before being struck by an asteroid landing directly on your head are still greater than this chance of suffering ever coming to pass. On any practical level your choice is “Would you rather have this mild inconvenience or not?” If you have ever made a choice for convenience over safety then you cannot legitimately claim that this is not the decision you should make.

So if you gave me the opportunity to intervene in someone’s life and replace any amount of minor inconveniences with this negligible chance of suffering, the moral thing to do is obviously to take it.

And similarly if I can do this for each of N people the moral thing to do is still to take it. Even given the statistical knowledge that this will result in a couple of people suffering out of the N, the fact that it is obviously the correct choice for any individual and that there is no significant interaction between the effects (the chance that anyone you know gets the bad option is still statistically indistinguishable from zero).

One of the problems with deriving general lessons here is that I don’t think this tracks the sort of decisions of this shape that one actually makes in practice: It’s not usually the case that when you’re choosing whether k people should suffer to prevent inconvenience to N – k that N is indescribably huge or the k are chosen uniformly at random. It tends to be more that the k people are some specific subgroup, often one who will be picked as convenient to persecute over and over again. Also it turns out that there aren’t more people than atoms in the universe, so in practice the chances are not nearly so minuscule and it’s less likely that every reasonable person should decide the same way. So as usual I think that the elided details of our idealized thought experiment turn out to be the important ones.

Still, it’s interesting that when I worked through the details of the VNM + utilitarian argument I found I agreed with the conclusion. I still don’t regard them as a general source of ethical truth, but you can broadly apply similar reasoning here for a lot of large scale systems design, so it has made me at least more inclined to pay attention to what it has to say on the subject.

This entry was posted in Uncategorized on by .