Hypothesis progress is alarming

I had a brilliant idea just now:

08:54 <DRMacIver> shapr: So I’ve been thinking about your point that you have to save pre-shrinking state because you might have found multiple bugs, and I think it’s wrong.
08:54 <DRMacIver> Because I think you’re much more likely to have found those other bugs in the course of shrinking than you are with the original value
08:54 <DRMacIver> So what will happen if you save the pre-shrinking state is that it’ll rerun and go “Yup, that’s fine”
08:54 <DRMacIver> My solution of saving the post-shrinking state is *also* wrong in this case mind you.
08:55 <DRMacIver> I think I have a solution, but it relies pretty heavily on glassbox testing in order to be any good
08:57 <DRMacIver> I have a really frustrating level of future designs for Hypothesis in my head.
08:58 <DRMacIver> (frustrating because I can tell how far I am from implementing it, and every time I get closer I come up with new ones so the goal recedes into the distance)
09:22 <DRMacIver> Ooh. I’ve just had a *great* idea
09:23 <DRMacIver> It not only solves this problem it solves a bunch of historic problems too
09:24 <DRMacIver> Basically 1) save every interesting example in the database. 2) examples loaded from the database which are valid but uninteresting “radioactively decay”. I.e. they get deleted with some probability once they’ve been run

The context: Both Haskell’s QuickCheck and Hypothesis save the last failing example. QuickCheck saves it prior to minimization, Hypothesis saves it post minimization. Because of the issues pointed out in Reducers are Fuzzers, both are the wrong thing to do.

• What do you do when the database fills up with more examples than you know what to do with in a given run and none of them are interesting?
• What do you do if the shrinker gets stuck or the process crashes before you’ve saved the example?

This solves both problems: We save every intermediate shrink as we find it and let the garbage collection process deal with the debris that subsequently proves uninteresting.

The above is a clever idea but is not the point of this post: The point of this post is that I have a growing sense that I am missing something.

Basically there is no reasonable way that I should be making as much progress on Hypothesis, Conjecture, etc as I am. This is not the first brilliant idea – I’ve had dozens. I’ve had so many brilliant ideas that I haven’t had time to implement all of them yet.

There are three ways to make an astonishing amount of progress:

1. Be wrong about the amount of progress you are making
2. Be an unparalleled genius
3. Be where the low hanging fruit is

I do not think I am wrong about the amount of progress I’m making. I will grant that some of my ideas turn out to be bad ones, or modifiable to make them good ones, but I’ve had enough practical validation that I’m reasonably sure that a lot of these concepts work and are useful. I’m probably off by a factor of two or three, but I doubt I’m out by an order of magnitude.

I am not an unparalleled genius. I am smart, but I’m one in a thousand, not one in a million. There are plenty of other people smarter than me, many of them working in similar fields.

So the only real option is that I’m hanging out in an orchard full of low hanging fruit, and I don’t really understand how that could be. Right now, life feels a bit like James Mickens describes in The Slow Winter:

I think that it used to be fun to be a hardware architect. Anything that you invented would be amazing, and the laws of physics were actively trying to help you succeed. Your friend would say, “I wish that we could predict branches more accurately,” and you’d think, “maybe we can leverage three bits of state per branch to implement a simple saturating counter,” and you’d laugh and declare that such a stupid scheme would never work, but then you’d test it and it would be 94% accurate, and the branches would wake up the next morning and read their newspapers and the headlines would say OUR WORLD HAS BEEN SET ON FIRE

I keep coming up with genuinely useful ideas with practical significance that work really well, and it all feels too easy.

There are a number of reasonable hypotheses about how this could be the case:

1. I had one clever idea that unlocked everything else.
3. I am ignoring giant swathes of prior art that I just don’t know exist because I’m an outsider to this field and couldn’t find on a google search because competence or academic firewall of doom.
4. I am ignoring huge swathes of prior art because nobody has ever bothered to write it down or it’s locked up in proprietary tools.

Obviously the first is the one that I hope it is. To some extent I even have a plausible candidate for it: The idea that having data generators work on an intermediate representation of the final result rather than the final result directly would be useful. But this isn’t actually a very clever idea. It’s proven very fruitful, but it’s sufficiently obvious that it should have occurred to someone else before.

Two seems plausible. e.g. the idea I started out with is only really interesting if you want to integrate this sort of testing with normal testing workflows, which I do and some other people do but not many people who get to do full time research on this do. The conjecture concept only really matters if you’re trying to make these ideas work in imperative languages. etc. I’m in enough intersections that it’s at least plausible that nobody has cared about this intersection before. Additionally, my impression is that random testing isn’t very interesting to academics and most of the people who have put a lot of work into it are security researches, who have rather different focuses than I do.

So those are the optimistic scenarios. Both are a little depressing because it suggests there just hasn’t been enough interest in something that massively improves software quality to do even the level of research I’ve put into it, but at least they suggest I’m not wasting my time.

But I really can’t rule out 3 and 4, both of which are worrying.

The prior art I am mostly aware of and have based my work on is:

And I’m semi aware of but have consciously decided not to use the work on concolic testing, etc.

But I can’t help but feel there’s a lot more out there.

So, um, halp. Any prior art you can send me on the general subject of random testing, etc. would be very appreciated. Papers or working software or random blog posts that someone thought of a thing. It’s all good.

This entry was posted in programming, Python on by .

Superposition values for testing

Cory was talking about using AFL to fuzz test hyper-h2 the other day.

We talked about the difficulty of building a good starter corpus, and I thought I’d pull out some old code I had for using glassbox to do this.

It proceeded to fail utterly.

The problem is that H2 is a complex protocol which is very hard to basically probe through due to the number of different interactions. Simply throwing bytes at it just to see what happens is unlikely to do anything useful. This is similar to why historically that approach worked will with binaryornot and chardet, but was pretty rubbish on pyasn1.

The idea I’d come up with was this: What if we use a custom type to hold off on deciding the values until the last possible minute. That way we can get values that do interesting things to the internals of complex protocols by looking at the questions that the parser asks about the values and deciding what the answer is then rather than immediately.

The way this works is that you have a mock object that internally is saying “I am one of these values but I don’t know which”. Every time you perform a comparison it picks a possible answer at random and uses that to narrow down the list of possible values. At the end, your program should have behaved as if there had just been a really well chosen initial set of integers.

It turns out to work pretty well based on brief experimentation. My initial prototype didn’t support anything more than comparison operators, but after some further pondering I got it to support arbitrary arithmetic expressions. And thus, I present schroedinteger.

>>> x = schroedinteger({1, 3, 7})
>>> y = schroedinteger({-1, 1})
>>> z = x * y
>>> z
indeterminate: {-7, -3, -1, 1, 3, 7}
>>> x == 1
False
>>> x == 3
False
>>> z
indeterminate: {-7, 7}
>>> z == -7
True
>>> z
-7
>>> y
-1

The way this works is to separate out concepts: You have observables which are basically just a list of possible integers that get whittled down over time, and you have schroedintegers, which consist of:

1. A set of observables they are interested in
2. A function which maps an assignment of those observables to integers to a concrete integer

So when you perform arithmetic operations on schroedintegers it just creates a new one that shares the sets of observables of the two sides and evaluates both.

Every time you observe something about the system it looks at the set of possible answers, picks an answer uniformly at random from the results, and then collapses the state of possibilities to only those that would have produced that answer, and then returns it.

Performance is… OK. Where by OK I mean “not very good”. The set of heuristics used keep it manageable, but no better than that.

It could be improved if I really cared, but right now this project is a bit of a toy. In particular most operations are currently O(m * n). Many of these could be fixed to not be quite readily with a little more work – currently the implementation is very generic and many of the operations admit a nicely specific implementation that I haven’t used. e.g. a thing that would be pretty easy to do is to track upper and lower bounds for every schroedinteger and use those to exclude many possibilities.

I also investigated using Z3 to do this. It would be an interesting backend that would remove the most of the limitations. Unfortunately the results were really slow and kinda crashy (I had at least one hard to minimize segfault), so I’ve given up on that for now.

All told, an interesting result for about a day of hacking. Right now I plan to leave it where it is unless I come up with a particularly interesting use case. Let me know if you have one.

This entry was posted in programming, Python on by .

Automated patch minimization for bug origin finding

Ok, so you know git bisect? You take a commit history and try to find the commit that introduced a problem.

Imagine you had the problem that git bisect solves but uh you hadn’t been very good about commit discipline and you had a whole bunch of large messy commits, several consecutive ones not in a working state, and it wasn’t really clear what happened where.

Not that I know anyone in that situation of course. This is a purely hypothetical scenario.

Well it turns out there’s a solution to this too, and it’s possibly better than git bisect.

First you write a script to detect the case you want. You can do this with git bisect too, but people rarely bother because you only need O(log(n)) tests. In this case you’ll need a lot more than that so you should do this. Running a test and grepping its output is usually sufficient here. This script exits 0 if the code exhibits the problem or non-zero otherwise.

These usually end up terrible. It’s OK – it’s genuinely a one off and you’re not going to have to maintain it. Here’s the relevant part of mine:

 export PYTHONPATH=src if python -u -m pytest tests/cover/test_flatmap.py -kmixed_list 2&>1 > test.log ; then echo "Test passed" exit 1 fi   grep -q "Extra items in the left set" test.log grep -q "'0'" test.log

i.e. you run the test, it should fail. Further, it’s output should contain some particular lines. Sometimes you’ll find that your script wasn’t specific enough and you’ll need to add more conditions.

Now you do what you do to start git bisect off: You find a pair of commits, one after the other, where the latter one exhibits the property you want to isolate (e.g. “This particular test fails in this particular way”)

Now, you take the patch between these two commits:

 git diff --no-prefix master andsoitbegins > working.patch

(yes, my branch is called andsoitbegins. I’m not good at branch names. Or possibly I’m excellent at branch names, you decide)

This produces a patch file. We’re now going to perform file based minimization on that patch to try to find the smallest patch that causes the problem.

I still don’t have a good go to minimizer – I tried using delta for this and continued my streak of never getting a good result from doing so, so once again I wrote my own terribly crappy file based minimizer. Here you go:

One of these days I’m going to bite the bullet and just package this all up into a good piece of standalone software. Today might be that day, but it probably isn’t.

What this script does is it repeatedly replaces the contents of a file and reruns a test script to see if that matters. It does this by deleting large chunks of lines then small chunks of lines. If those produced an invalid patch that we can’t apply, it just exits early and doesn’t bother running the test.

Here’s the full test script:

This cleans up the repo, applies the current patch in working.patch and then runs the tests as described above.

In the end our starting 6kloc patch file is turned into the following:

--- src/hypothesis/searchstrategy/strings.py
@@ -204,10 +204,10 @@ class StringStrategy(MappedSearchStrategy):
)

def __repr__(self):
-        return u'StringStrategy()'
+        return 'StringStrategy()'

def pack(self, ls):
-        return u''.join(ls)
+        return ''.join(ls)

I had accidentally made the text() strategy return raw strings instead of unicode. This caused surprisingly few problems, but did break at least one unrelated test.

The first part of the patch is actually not required to reproduce the problem. What it was needed for is to locate the relevant lines – there are multiple pack() methods in this file, so apparently patch needed help to disambiguate.

In conclusion: This worked really well. It took what would have been quite a complicated piece of detective work and automated it. A++ experience, and I’ll definitely add this to my general box of tools.

This entry was posted in programming on by .

On criticizing programming languages (without criticizing their users)

You may have read Aurynn’s piece on contempt culture and programming languages. It’s been doing the rounds. I recommend reading it, and this piece is not an argument with hers. It’s merely a write up of some adjacent thoughts I’ve had on the matter (some of which I’ve discussed with Aurynn on IRC).

To quote from it:

it was du jour in the communities I participated in to be highly critical of other languages. Other languages sucked, the people using them were losers or stupid, if they would just use a real language, such as the one we used, everything would just be better.

Notably there are three (and a half) statements that Aurynn mentions as popular to make:

• Other languages sucked
• the people using them were losers or stupid
• if they would just use a real language, such as the one we used, everything would just be better.

What’s interesting is that these statements absolutely look like one statement in most arguments, but their truth value is different.

Other languages sucked

Yes, this is absolutely true. All programming languages fall into one of two categories: Terrible to use or not yet ready to use. Sure, there are degrees of both, but if you think a language is great, you’re wrong. The best you can hope for is less bad.

Of course, every language is someone else’s other language.

PHP and Java, two examples cited in Aurynn’s post, are shitty languages. PHP more than Java (some of the recent Java stuff actually looks quite nice, and I have more sympathy for what  Java is trying to do than I used to), but both are in many ways really bad.

Which brings us to the next point.

The people using them were losers or stupid

For me, I think this is the heart of the problem with the narrative that Aurynn is criticizing: Languages don’t care if you make fun of them, or call them bad languages. Languages don’t have feelings. People do.

It’s OK to make fun of Java’s baggage of erased types embedded within the language that it carries over from pre 1.4. It’s better to constructively acknowledge that it’s a problem and talk about other ways that problem could have been solved, but ultimately posting ridiculous programs and saying “lol, Java, you so silly” isn’t really hurting anyone.

But saying “Oh god, why are you using Java? It’s the worst and you should feel bad for that decision” is hurting people. And in case this is unclear, hurting people is bad.

If they would just use a real language, such as the one we used, everything would just be better

This point is not even wrong. First, it embeds the half statement I mentioned “such as the one we used”, because aside from the fact that all languages are equally “real”, the one you are using is probably terrible too. But even setting that aside: The chance that someone has the option to switch languages is very low.

I am a reasonably senior developer with a reasonably large amount of negotiating clout. I have only once in my career got to choose what programming language I used during my day job (Scala, for a green field project. It didn’t go very well).

The best you can realistically hope for is right of refusal: You can choose not to take a job because of the programming language involved, or to preferentially seek out jobs which involve a particular programming language, but if the jobs aren’t there, or you can’t get them, you’ve no magic ability to just decide to create one where you get to write in your favourite language.

What this means when you’re saying someone should change to a new language, in the overwhelming majority of cases their options are one of:

1. Quit their job and hope to luck into finding one in a new language that you prefer.
2. Persuade their company that they should switch to a new language.
3. Do a vast amount of free labour and pretend that their day job isn’t their real language.

Hopefully when spelled out that way it’s obvious what a ridiculous set of choices this is.

Making the programming language you write in the sole determiner of your job is an awful idea. I’m not saying the programming language doesn’t matter, but when you compare it to working on good projects with good people for good compensation in a good work environment, it should be obvious that if you’re making it your primary consideration you’re making some really bad life choices.

Persuading a company that they should switch to a new language is basically a non-starter, and rightfully so. You can start new projects in a new language, although this itself may not be a great idea, but when you’ve got an existing project and infrastructure written in one language, switching that over to a new language is going to be a lot of work. Even if it that work pays off in the long term (it probably won’t), in the short term it’s an almost impossible sell.

So really what we’re saying when we want people to use new languages is mostly another variation on the theme of “your worth as a developer is measured by the amount of free labour you do”. This continues to be bullshit.

Why are we doing this anyway?

Ultimately, I think a lot of this problem comes down to identity. As software developers, we have too much of our identity bound up in the tools we use. We’re not software developers – we’re Java developers, or Python developers, or Ruby developers. And of course we use a mac/linux/Windows. Who the hell would choose one of those other options? I’ll tell you who: Emacs users.

But seriously. We get very personal about our choices. This makes it hard to criticize the tools people use, because it so easily turns into criticizing the people who use them. Equally, it makes it feel like when people who choose things differently that us, because that feels like a criticism of our choices and thus criticism of us.

We need to move past this. Our jobs is not to develop software. Our jobs are to develop software to solve problems. The software is a mechanism, not an end goal, and every constraint we put on this limits our ability to solve those problems.

It also limits our ability to improve the tools that we consider so important: Part of why I think it’s important to be able to criticize languages is because I want those languages to improve. We’ve come a long way in the last 50 years of language development, but we still have a long way to go. It would be as much a shame to stop here as it would be to release a language that ignored most of those 50 years of development.

But equally, if it’s important to criticize it’s more important to criticize well. If you can’t criticize well, don’t criticize at all. So I’d like to finish with some suggested guidelines for constructive criticism:

1. Criticize, don’t mock.
2. Criticize things, not people. Ideally don’t even criticize the people responsible for making those things (I’m not good at this one). Even when people are at fault (and as discussed, they usually aren’t), criticizing them just makes it personal and makes it less likely that a productive conversation is going to happen.
3. Be specific. Nothing “sucks” generically, but something about it might be bad. Even when something does suck, this isn’t useful criticism but mere venting. If you’re using that thing regularly then venting is fine, but otherwise it’s an extremely unclassy move.
4. Focus on effects, not aesthetic preferences. “I hate this syntax” is not a useful criticism. “I keep confusing this syntax with this other syntax” is. Although in general people over focus on syntax anyway and “This feature keeps causing this concrete problem” is way better.
5. Things arise through vast amounts of historical context, and come with a vast amount of inertia. Change is hard, and comes with a lot of problems. Most of the time, criticism will never result in fixing the problem. All it can do is try to establish how you can do better next time.

This entry was posted in programming on by .

So, you’re an engineer at FooCorp, or the hot new startup Barrrrr. You use a lot of open source, and you feel like your company isn’t really giving back enough to the community for the benefit it’s receiving. You feel generically guilty about this but aren’t really sure what to do about it, and aren’t really sure what the developers of those open source projects need or want.

As a developer of a reasonably popular open source library, I can tell you what I want from my corporate users:

It’s money. I’d like money.

This isn’t because I’m greedy and want to be swimming in a giant vault full of coins. I mean I’d not say no to the vault of coins, but if that was what I wanted there are a lot better ways to get there. The reason I want money to work on open source is that I want to work on open source, and I require money in order to survive. I think making it your day job is the dream of most of us who work on open source, but very few of us get to do that.

The thing is, I don’t want your money. People occasionally ask me how they can donate, and the answer is that they can’t. It’s just not a viable model. Individuals can’t and shouldn’t really donate enough money to make funding open source work viable. Things like Patreon work great if you’re doing something with sufficient mass appeal that you’ve got thousands of users willing to donate. Very few open source projects are in this category, and even if they were, I wouldn’t really feel any better about companies making profits off their employees’ individual donations than I would about them making profits off my free labour.

Companies could donate. Ned Batchelder wrote about this recently. He’s more optimistic about this than I am. Most companies can’t even be persuaded that they should upstream patches they make to open source, let alone that they should give their hard-earned money away.

So, what I would like instead is for companies to pay open source developers for services: Support contracts, custom development, training, etc. Long-term, this isn’t optimal, but it works now, which as someone who exists now rather than in the future, let me tell you is a big plus.

So what can you,  the aforementioned engineer at FooCorp/Barrrrr do in order to make this possible?

You might think that the answer is “advocate for it internally”. After all, you don’t have control of budget, so the thing to do here is to persuade people who do.

This can work, but it’s quite hit or miss. It can expend a lot of political capital, and you’re often not the best placed to argue for it.

Instead I would like to propose a simpler solution: Make introductions.

As an open source developer, the single most useful thing you can do for me if you do not yourself have control of budget is introduce me to someone who does. Ideally someone reasonably technical – if you’re a small enough company that you have access to the CTO, the CTO is a perfect person to do introductions to. If not, development managers or people who organise training are also great choices.

How do you do this?

Well, it’s quite straightforward:

To the CTO/etc: We’re using some projects that we think we could be using it more effectively, which would save us a lot of time and effort. Would you be up for an introduction to the developers of some of these projects to talk about how they can help us do so?

To the developers: We’re using your project. We’d like to give you money in exchange for your helping us use it more effectively. Is this a thing you would be interested in talking to someone over here about that?

The developers might say no, and that’s OK. Especially people who have full time jobs may not be able to find the time for this. But it can never hurt to ask, and I think most of us will say yes.

The CTO/etc might say no, but they shouldn’t. The maths is very simple: You have a lot of developers. Learning is slow and developers are expensive. If you can save each of those developers 1-2 days of time, that’s N-2N developer days of money saved. Paying an expert for a couple of days of even very well paid work is likely to cost you significantly less than that. Pointing this out to them may be useful.

Pointing out the sort of numbers involved to the open source developers in question may also be useful. It certainly would never have occurred to me prior to getting some experience in this area that offering training days for £2000 would actually be a huge discount (by the way, I’m offering said discount until January 22nd 2016, so now is an especially good time to make those intros).

Note: You don’t have to convince anyone to do work or pay money. All you have to convince them of is the idea that it might be worth doing this, and that it’s worth talking about the possibility.

Sometimes this will work out, sometimes it won’t, but in the cases where it doesn’t there was probably very little you could have done that would have worked, and in the cases where it does work you’ve saved your company a large chunk of money and given back to the project that is helping you, all for really very little effort.

You don’t have to restrict this to companies you work at either! Mentioning this to your friends at other companies can be very helpful too: “Hey, I remember you saying you had $PROBLEM. Well we’re using$PROJECT for something related. Why don’t you check it out? The authors are generally available to help you out if you need some assistance getting started, and for a very reasonable price”.

Ultimately, on our side, this is the “turn your open source software into a consultancy” model, and it doesn’t work for everyone, but I think a lot more of us would be doing it if we thought we could, and for those of us who already are, making those introductions is incredibly useful, easy to do, hugely appreciated.