Hypothesis progress is alarming

I had a brilliant idea just now:

08:54 <DRMacIver> shapr: So I’ve been thinking about your point that you have to save pre-shrinking state because you might have found multiple bugs, and I think it’s wrong.
08:54 <DRMacIver> Because I think you’re much more likely to have found those other bugs in the course of shrinking than you are with the original value
08:54 <DRMacIver> So what will happen if you save the pre-shrinking state is that it’ll rerun and go “Yup, that’s fine”
08:54 <DRMacIver> My solution of saving the post-shrinking state is *also* wrong in this case mind you.
08:55 <DRMacIver> I think I have a solution, but it relies pretty heavily on glassbox testing in order to be any good
08:57 <DRMacIver> I have a really frustrating level of future designs for Hypothesis in my head.
08:58 <DRMacIver> (frustrating because I can tell how far I am from implementing it, and every time I get closer I come up with new ones so the goal recedes into the distance)
09:22 <DRMacIver> Ooh. I’ve just had a *great* idea
09:23 <DRMacIver> It not only solves this problem it solves a bunch of historic problems too
09:24 <DRMacIver> Basically 1) save every interesting example in the database. 2) examples loaded from the database which are valid but uninteresting “radioactively decay”. I.e. they get deleted with some probability once they’ve been run

The context: Both Haskell’s QuickCheck and Hypothesis save the last failing example. QuickCheck saves it prior to minimization, Hypothesis saves it post minimization. Because of the issues pointed out in Reducers are Fuzzers, both are the wrong thing to do.

Additionally, Hypothesis has historically had the following problems:

  • What do you do when the database fills up with more examples than you know what to do with in a given run and none of them are interesting?
  • What do you do if the shrinker gets stuck or the process crashes before you’ve saved the example?

This solves both problems: We save every intermediate shrink as we find it and let the garbage collection process deal with the debris that subsequently proves uninteresting.

The above is a clever idea but is not the point of this post: The point of this post is that I have a growing sense that I am missing something.

Basically there is no reasonable way that I should be making as much progress on Hypothesis, Conjecture, etc as I am. This is not the first brilliant idea – I’ve had dozens. I’ve had so many brilliant ideas that I haven’t had time to implement all of them yet.

This is  bad sign.

There are three ways to make an astonishing amount of progress:

  1. Be wrong about the amount of progress you are making
  2. Be an unparalleled genius
  3. Be where the low hanging fruit is

I do not think I am wrong about the amount of progress I’m making. I will grant that some of my ideas turn out to be bad ones, or modifiable to make them good ones, but I’ve had enough practical validation that I’m reasonably sure that a lot of these concepts work and are useful. I’m probably off by a factor of two or three, but I doubt I’m out by an order of magnitude.

I am not an unparalleled genius. I am smart, but I’m one in a thousand, not one in a million. There are plenty of other people smarter than me, many of them working in similar fields.

So the only real option is that I’m hanging out in an orchard full of low hanging fruit, and I don’t really understand how that could be. Right now, life feels a bit like James Mickens describes in The Slow Winter:

I think that it used to be fun to be a hardware architect. Anything that you invented would be amazing, and the laws of physics were actively trying to help you succeed. Your friend would say, “I wish that we could predict branches more accurately,” and you’d think, “maybe we can leverage three bits of state per branch to implement a simple saturating counter,” and you’d laugh and declare that such a stupid scheme would never work, but then you’d test it and it would be 94% accurate, and the branches would wake up the next morning and read their newspapers and the headlines would say OUR WORLD HAS BEEN SET ON FIRE

I keep coming up with genuinely useful ideas with practical significance that work really well, and it all feels too easy.

There are a number of reasonable hypotheses about how this could be the case:

  1. I had one clever idea that unlocked everything else.
  2. Nobody cares about this problem, so they haven’t bothered to pick the fruit.
  3. I am ignoring giant swathes of prior art that I just don’t know exist because I’m an outsider to this field and couldn’t find on a google search because competence or academic firewall of doom.
  4. I am ignoring huge swathes of prior art because nobody has ever bothered to write it down or it’s locked up in proprietary tools.

Obviously the first is the one that I hope it is. To some extent I even have a plausible candidate for it: The idea that having data generators work on an intermediate representation of the final result rather than the final result directly would be useful. But this isn’t actually a very clever idea. It’s proven very fruitful, but it’s sufficiently obvious that it should have occurred to someone else before.

Two seems plausible. e.g. the idea I started out with is only really interesting if you want to integrate this sort of testing with normal testing workflows, which I do and some other people do but not many people who get to do full time research on this do. The conjecture concept only really matters if you’re trying to make these ideas work in imperative languages. etc. I’m in enough intersections that it’s at least plausible that nobody has cared about this intersection before. Additionally, my impression is that random testing isn’t very interesting to academics and most of the people who have put a lot of work into it are security researches, who have rather different focuses than I do.

So those are the optimistic scenarios. Both are a little depressing because it suggests there just hasn’t been enough interest in something that massively improves software quality to do even the level of research I’ve put into it, but at least they suggest I’m not wasting my time.

But I really can’t rule out 3 and 4, both of which are worrying.

The prior art I am mostly aware of and have based my work on is:

And I’m semi aware of but have consciously decided not to use the work on concolic testing, etc.

But I can’t help but feel there’s a lot more out there.

So, um, halp. Any prior art you can send me on the general subject of random testing, etc. would be very appreciated. Papers or working software or random blog posts that someone thought of a thing. It’s all good.

This entry was posted in programming, Python on by .

One thought on “Hypothesis progress is alarming

  1. astro73

    I don’t have a citation for it, but the sdk for the original palm pilot had a monkey testing tool. It would randomly mash on the screen in an attempt to crash your application. I never used it because I didn’t get that far in the project, but I remember it being mentioned in documentation.

Comments are closed.