Using Hypothesis with Factory Boy

I gave a talk on the Hypothesis Django Integration last night (video and transcript here). I got some questions asking about integration with Factory Boy.

My answer at the time was that I’ve thought about adding explicit support but there’s nothing to stop you from doing it yourself. I’d like to amend that: There’s nothing to stop you from doing it yourself and it’s so easy to do that I can’t actually imagine how I would improve it with explicit support.

Both Factory Boy and Hypothesis are designed along a “we’re a library, not a framework” approach (the Hypothesis django integration goes a little further in the direction of a framework than I’d like by requiring a custom test runner, but fortunately factory boy does not), so they don’t interfere with eachother. Further, factory boy is set up to take arbitrary values, Hypothesis is set up to provide them, so you can easily feed the latter into the former.

For example, the following defines a strategy that uses a factory boy UserFactory object to parametrize over unsaved user objects with an arbitrary first name:

from hypothesis import given
from hypothesis.strategies import builds, text
from hypothesis.extra.django import TestCase
from myfactories import UserFactory
 
class TestUser(TestCase):
    @given(builds(UserFactory.build, first_name=text(max_length=50)))
    def test_can_save_a_user(self, user):
        user.save()

Both factory boy and Hypothesis are designed to play well with others, so unless I’m missing something, nothing specific seems necessary to make them play well with each other. This is how it should work.

The only thing that I can imagine people conceivably wanting custom support for is auto deriving strategies for factory boy instances that are using random fields filled by fake factory. It wouldn’t be too hard to do, but I’m not sure it’s worth it. Honestly if you’re doing randomized testing like that, you should be using Hypothesis and its existing fake factory integration to feed your factories instead. It will be a much better experience.

This entry was posted in Hypothesis, Python on by .

Thoughts on Strangeloop and Moldbug

I was doing very well at not engaging with this, and then I got into a Twitter conversation about it last night. This was about as frustrating as you would expect given the limitations of the medium, so now I feel compelled to write out my thoughts in long form.

For those just joining us: Curtis Yarvin, aka Mencius Moldbug, was going to be talking about his software, Urbit, at Strange Loop. Someone made the connection “Hey isn’t this guy that massive racist online?”, this blew up, and now he has been uninvited from the conference. Naturally this has a lot of people very angry about things on both sides.

I have mixed feelings on the subject, mostly due to an inability to hold any stance other than “it’s complicated”. I’m perfectly comfortable with banning him, and I think it was the right call, but I also probably wouldn’t have condemned a decision to not ban him.

Essentially the following are what I consider the two reasonable approaches:

  1. “No part time assholes”. We don’t care if he would obey the code of conduct, we still don’t want him. We are building a community here and we do not want known racists to be a part of that even if they agree to play nice because it will bias strongly in favour of people who can tolerate racists and against people who will never be comfortable in their presence even if they are playing nice.
  2. “The ideas are what are important”. There are plenty of great ideas that came from terrible people. As long as those people agree to obey the code of conduct and we have a reasonable expectation that they will (e.g. they don’t have a history of abusive behaviour at conferences, they’ve not previously claimed they would obey a CoC and then failed to do so, etc), if they have something interesting to say we are prepared to hear it.

(Note that I do not consider the version without an enforced code of conduct a reasonable position. If you can’t guarantee that you will protect the safety of people attending your conference you have no business running a conference).

I have a strong personal preference for the former, as it creates the sort of communities I think we need more of and that I personally want to be a part of, but I think “the ideas are what are important” style conferences are also useful. There are terrible people who have otherwise great ideas that are worth spreading, and a world in which they only get to speak at McRacismConf isn’t actually a better one, because the people who still want to hear those ideas will end up going to McRacismConf to hear them and being exposed to more racism, and the people who don’t want to go to McRacismConf will miss out on some useful ideas.

Edit to add: McRacismConf is indeed a bit of a straw man. The real failure mode here isn’t conferences about racism, it’s unchecked conferences without a code of conduct with a plethora of assholes. The problem is that by insisting that conferences hold to the no part time asshole rule you create an incentive for people to go to conferences which are welcoming to full time assholes.

The problem is that if you have known racists or other bigots speaking, people from marginalized groups will make the entirely reasonable threat assessment that it’s probably not going to be a great environment for them and steer clear. This is bad because excluding marginalized people from all your industry’s conferences is bad, but it’s also bad even if you only care about the ideas.  There are also a lot of people from marginalized groups who have great ideas and you’re going to be missing out on those in the “the ideas are what are important” conferences.

So I think there is need for both approaches and a lack of a one size fits all solution. However, I also think you need a lot more communities which exclude part time assholes (especially given we have so many full time assholes in tech, and such a problem with already excluding marginalized people), and I am glad that The Strange Loop have decided to be one of them.

This entry was posted in Uncategorized on by .

Large scale utilitarianism and dust motes

Content note: Some dispassionate discussion of torture due to source material. No graphic descriptions. Some discussion of murder, mediated by various classic ethical dilemmas around trolleys.

Epistemic status: I think this is right, but I’m not sure it results in useful conclusions. At any rate, this was interesting for me to think about.

I’d like to talk about a thought experiment which comes from Less Wrong (of which I am not a member, but am an occasionally interested reader). Torture vs Dust Specks. There is also Sublimity vs Youtube, which is intended to be a less polarizing framing. In this post I’m going to abstract away slightly and refer to suffering vs inconvenience.

The experiment is this: Let N be some unimaginably huge number. It’s chosen to be 3^^^3 in the original post, but for our purposes it’s sufficient that N be significantly greater than the number of atoms in the universe. You may choose between two options. In the negative version, one person suffers horribly for an extended period of time, or each of N people experience a tiny momentary inconvenience. In the positive version, one person gets to experience a life of supreme bliss and fulfilment, or each of N people experience about a second of moderate amusement and contentment. Which of these options do you choose?

What this experiment is supposed to do is point out a consequence of additive Utilitarianism with real valued scores. Irritation/contentment has a non-zero but small utility (negative in one case, positive in the other), whileas suffering/sublimity has a large non-zero utility, but not N times as large. Therefore by “shutting up and multiplying” it’s clearly better to have the large number of small utilities because they add up to a vastly bigger number. So you should respectively choose individual suffering as the lesser evil and mild contentment as the greater good.

I don’t generally agree with this sort of additive utilitarianism and I’ve previously considered this result… not necessarily wrong, but suspicious. Sufficiently far from the realm of possible experience that you can’t really draw any useful conclusions from it. Still, my moral intuitions are for preferring irritation over suffering, and I don’t really have a strong moral intuition for contentment vs sublimity but lean vaguely in the direction of contentment.

I recently had a mental reframing of the concept that has actually caused me to agree with the utilitarian answer: You should clearly choose contentment and suffering respectively.

The reframing is probably obvious if you’re a decision theorist and believe in things like Von Neumann-Morgenstern utility functions, and if you’re such a person you’ll think I’m just doing a proof from the axioms. I’m not such a person, but in this case I think the formulation is revealing.

The reframing is this: The natural interpretation of this question is in terms of “Would you cause this specific person to suffer to prevent the dustmotepocalpyse?”. This is essentially the fat man version of the trolley problem. It personalizes it. The correct formulation, which from a utilitarian point of view is ethically equivalent, is that a randomly chosen individual amongst these N will be  caused to suffer.

For me this becomes much simpler to reason about.

First, lets consider another reformulation: Instead of having a guaranteed individual amongst the N who suffers, your choice is that either each individual gets a dust mote or each individual has a probability \(\frac{1}{N}\) of suffering.

These are not exactly equivalent: In this case the number of people suffering follows a Poisson distribution (technically it’s not exactly a Poisson distribution, but it’s close enough that no physically possible experiment can discern them). However I find I am basically indifferent between them. The expected amount of suffering is the same, and the variance isn’t large enough that I think it matters. I’m prepared to say these are as good as morally equivalent (certainly they are in the utilitarian formulation).

And this now has decoupled each of the N people and we can reduce it to a decision about one person.

So, on the individual level, which do you choose? A \(\frac{1}{N}\) chance of suffering or a tiny inconvenience?

I argue that choosing the chance of suffering is basically the only reasonable conclusion and that if you would choose otherwise then you don’t understand how large N is.

N is so large that if I were to give you the option to replace every dust mote equivalent piece of annoyance with a \(\frac{1}{N}\) chance of suffering then your chances of dying of a heart attack just before being struck by an asteroid landing directly on your head are still greater than this chance of suffering ever coming to pass. On any practical level your choice is “Would you rather have this mild inconvenience or not?” If you have ever made a choice for convenience over safety then you cannot legitimately claim that this is not the decision you should make.

So if you gave me the opportunity to intervene in someone’s life and replace any amount of minor inconveniences with this negligible chance of suffering, the moral thing to do is obviously to take it.

And similarly if I can do this for each of N people the moral thing to do is still to take it. Even given the statistical knowledge that this will result in a couple of people suffering out of the N, the fact that it is obviously the correct choice for any individual and that there is no significant interaction between the effects (the chance that anyone you know gets the bad option is still statistically indistinguishable from zero).

One of the problems with deriving general lessons here is that I don’t think this tracks the sort of decisions of this shape that one actually makes in practice: It’s not usually the case that when you’re choosing whether k people should suffer to prevent inconvenience to N – k that N is indescribably huge or the k are chosen uniformly at random. It tends to be more that the k people are some specific subgroup, often one who will be picked as convenient to persecute over and over again. Also it turns out that there aren’t more people than atoms in the universe, so in practice the chances are not nearly so minuscule and it’s less likely that every reasonable person should decide the same way. So as usual I think that the elided details of our idealized thought experiment turn out to be the important ones.

Still, it’s interesting that when I worked through the details of the VNM + utilitarian argument I found I agreed with the conclusion. I still don’t regard them as a general source of ethical truth, but you can broadly apply similar reasoning here for a lot of large scale systems design, so it has made me at least more inclined to pay attention to what it has to say on the subject.

This entry was posted in Uncategorized on by .

Using tmux to test your console applications

Edit to add: The ideas in this post are now implemented in a library. Feel free to read this post for understanding the concepts, but you’re probably better off just using that.

If you’ve noticed that there haven’t been 5000 words of blogging and seven Hypothesis releases in the last week and wondered whether I was dead, fear not. I was just on a fort.

The theme of this fort was that we were going to work on a C project. People made the mistake of listening to me, and one thing lead to another, and the result is Conch, a security free curses interface to our terrible Twitter clone, Bugle. Here’s a sneak peek:

conch

This post is mostly not about Conch, except that working on it was what lead to this concept.

We used check for testing a lot of the implementation, which was fine if a little laborious. However testing the front end (oh god, why do we have front end code written in C?) proved challenging. ncurses was not our friend here.

I tried using pexpect for this, which is like expect but written in python instead of TCL, but ran into a bunch of problems. It has an ANSI virtual terminal, but for whatever reason (this might have been my fault) it got very confused and ended up with lots of problems with partial drawing of things and leaving the screen in a wrong state.

So I put my thinking cap on, read some man pages, and applied some twisted imagination to the problem and came up with a solution that works great, albeit at some small cost to my dignity and sense of taste.

The solution is this: What I need is a robust virtual terminal I can control and inspect the state of.

Fortunately I have one. It’s called tmux.

Tmux is pretty great. It has a whole bunch of functionality, is a rock solid virtual terminal that is widely used with a wide variety of programs, and you can control it all externally via the command line. Putting these all together lead to a bunch of primitive operations out of which I could basically build selenium style testing for the console.

I’m still figuring out the details. When I do I’ll probably turn this into a library for testing console applications rather than the current ad-hoc thing I have, but basically there’s a relatively small set of operations you can build this testing out of:

  1. First we allocate a unique ID for our test. This should be long and random to avoid conflicting with existing sessions. Henceforth it will be called $ID.
  2. “tmux -L $ID  new-session -d <your shell command>” will start your program running in a fresh tmux session under your id. The -d is necessary because you will not be starting this from a controlling terminal in your program, so you want it to start detached. If you want you can specify width and height with -x and -y respectively.
  3. At the end, “tmux -L $ID kill-server” will shut down all sessions in your test tmux server, including the child processes.
  4. In order to capture the current contents of your tmux session you can run: “tmux -L $ID capture-pane; tmux -L $ID show-buffer; tmux -L $ID delete-buffer”. This will save a “screenshot” of the currently active pane (of which there is only one if you’ve just used these commands) to a new paste buffer, print the paste buffer to stdout, then delete the buffer.
  5. In order to send key presses to the running program you can use “tmux -L $ID send-key <some-char>”. These can either be ascii characters or a variety of control ones. e.g. PageUp, PageDown and Enter do what you expect. Adding C- as a modifier will hold down control, so e.g. C-c and C-d would be Control-c and Control-d with their usual interpretations (send an interrupt to the running program, send EOF).
  6. In order to send non-ascii or larger text you can use do “tmux -L $ID set-buffer <my text>; tmux -L $ID paste-buffer; tmux -L $ID delete-buffer”, which will set a paste buffer, paste it to the active pane, and then delete the buffer.

(Some of the above is not actually what I did, because I figured out some better ways using commands I’d previously missed while writing this post).

The main things that are hard to do with this, and why for now this is a blog post rather than a piece of open source software, is getting the PID and exit code out for the program you’ve started and resizing the window. I know how to do both of those (running a manager process and starting inside a pane respectively), but it’s fiddly and I haven’t got the details right. When I do, expect all of the above to be baked into a library.

This entry was posted in Code, Python on by .

The era of rapid Hypothesis development is coming to an end

Don’t panic. This is not an announcement of my abandoning the project. Hypothesis still has a long way to go, and I 100% intend to be working on getting it there.

What this is an announcement of is of my continued existence in a market economy and my tragic need to acquire currency in order to convert it into food and accommodation.

I haven’t been making a big deal of it, so some of you might be surprised to learn that the reason Hypothesis development has been so rapid for the last 6 months is that I’ve been working on it full time unpaid. It’s not so much that I took time off to write Hypothesis as that I had the time off anyway and I thought I’d do something useful with it. Hypothesis is that something useful.

I would love to continue working on Hypothesis full time. But the whole “unpaid” thing is starting to become not viable, and will become critically non-viable as soon as I move back to London.

So I’m going to need money.

I will do something more organised in the next month, but for now if you are a company or individual interested in paying me to do any of the following, I would very much like to hear from you:

  • Sponsored Hypothesis development (this can include paying for implementing specific features if you want)
  • Integration work getting Hypothesis to work well your testing environment
  • Training courses on how to use Hypothesis
  • Anything else Hypothesis related

If the above sounds interesting, please email me at [email protected].

If no money to continue working on Hypothesis is forthcoming, Hypothesis development will absolutely continue, but at a greatly reduced rate. The current development cycle is approximately a minor version a week. This will likely go down to at most a minor version every month, more likely a minor version every two. This would be a shame, as I have a bunch of exciting features I still want to work on, and then I need to tie everything together into a coherent 2.0 release. With full time work I would project that to happen end of this year, without I can’t really make any predictions at the moment.

This entry was posted in Hypothesis, Python on by .