Superposition values for testing

Cory was talking about using AFL to fuzz test hyper-h2 the other day.

We talked about the difficulty of building a good starter corpus, and I thought I’d pull out some old code I had for using glassbox to do this.

It proceeded to fail utterly.

The problem is that H2 is a complex protocol which is very hard to basically probe through due to the number of different interactions. Simply throwing bytes at it just to see what happens is unlikely to do anything useful. This is similar to why historically that approach worked will with binaryornot and chardet, but was pretty rubbish on pyasn1.

Yesterday evening I thought about this problem a bit more and then started giggling. This is usually a bad sign.

The idea I’d come up with was this: What if we use a custom type to hold off on deciding the values until the last possible minute. That way we can get values that do interesting things to the internals of complex protocols by looking at the questions that the parser asks about the values and deciding what the answer is then rather than immediately.

The way this works is that you have a mock object that internally is saying “I am one of these values but I don’t know which”. Every time you perform a comparison it picks a possible answer at random and uses that to narrow down the list of possible values. At the end, your program should have behaved as if there had just been a really well chosen initial set of integers.

It turns out to work pretty well based on brief experimentation. My initial prototype didn’t support anything more than comparison operators, but after some further pondering I got it to support arbitrary arithmetic expressions. And thus, I present schroedinteger.

>>> x = schroedinteger({1, 3, 7})
>>> y = schroedinteger({-1, 1})
>>> z = x * y
>>> z
indeterminate: {-7, -3, -1, 1, 3, 7}
>>> x == 1
False
>>> x == 3
False
>>> z
indeterminate: {-7, 7}
>>> z == -7
True
>>> z
-7
>>> y
-1

The way this works is to separate out concepts: You have observables which are basically just a list of possible integers that get whittled down over time, and you have schroedintegers, which consist of:

A set of observables they are interested in
A function which maps an assignment of those observables to integers to a concrete integer

So when you perform arithmetic operations on schroedintegers it just creates a new one that shares the sets of observables of the two sides and evaluates both.

Every time you observe something about the system it looks at the set of possible answers, picks an answer uniformly at random from the results, and then collapses the state of possibilities to only those that would have produced that answer, and then returns it.

Performance is… OK. Where by OK I mean “not very good”. The set of heuristics used keep it manageable, but no better than that.

It could be improved if I really cared, but right now this project is a bit of a toy. In particular most operations are currently O(m * n). Many of these could be fixed to not be quite readily with a little more work – currently the implementation is very generic and many of the operations admit a nicely specific implementation that I haven’t used. e.g. a thing that would be pretty easy to do is to track upper and lower bounds for every schroedinteger and use those to exclude many possibilities.

I also investigated using Z3 to do this. It would be an interesting backend that would remove the most of the limitations. Unfortunately the results were really slow and kinda crashy (I had at least one hard to minimize segfault), so I’ve given up on that for now.

All told, an interesting result for about a day of hacking. Right now I plan to leave it where it is unless I come up with a particularly interesting use case. Let me know if you have one.