Category Archives: programming

A survey of Quickchecks

It’s commonly claimed that every language has a Quickcheck port. This is one of those statements that is simultaneously true and really unhelpful. Every language has a Quickcheck port that you really don’t want to use. Some languages have Quickcheck ports that you do.

I thought I’d do a quick survey of the current state of play for different languages.

First, a disclaimer: I do not write all of these languages, and many of them I barely read. As such, some of this list is based on an outsider making conjectures and if you do write the language and have better information than me I would very much appreciate correction.

First, lets start with the “just use this one” table. These are languages for which there is more or less unambiguously a single Quickcheck you should be using.

Minimum criteria for being on this list:

  1. Must support random generation of data to a test function. Note that I consider Smallcheck its own thing, so Smallcheck ports don’t count for inclusion unless they also support random generation.
  2. It must be fairly straightforward to generate your own custom types
  3. It must support shrinking
  4. It must either be under active development or have a reasonable claim to being finished.
  5. Must be under an OSI approved license (note: I haven’t checked this one too carefully, but these all claim to be open source and I’ve checked the license in detail on about half of them).
  6. There isn’t another implementation for the language also satisfying the above 5 conditions which also gets a comparable amount of active use.

Note that the presence of an item on this list does not mean that I have used it, only that I have inspected it to see what it supports. In particular I can’t comment on the data quality or how buggy or stable each of these is, though most of them appear to be pretty good.

Language Library Notes
C theft Provides nothing in the way of built in generators. You’ve always got to write your own from scratch. However the API for doing so is pretty easy.
C++ CppQuickCheck
Clojure test.check
Coq QuickChick
F# FsCheck
Haskell Quickcheck The original. Probably no longer the best.
JavaScript jsverify
PHP Eris
Python Hypothesis You knew I was going to say this, so don’t even pretend to be surprised.
Ruby Rantly I really wanted to disqualify this one on grounds of the way that you support shrinking is by monkey patching a ‘shrink’ method on to the type. I couldn’t quite justify doing so because monkey patching common and easily clashing words onto types is the Ruby standard idiom. I am unclear on how active development on Rantly is.
Rust Quickcheck
Scala ScalaCheck
Swift Swiftcheck

Other Languages

Erlang

You may have been surprised that Erlang is not in the table despite being one of the most well known languages for Quickcheck. The reason is that there isn’t an unambiguous choice there. There is eqc, which is unambiguously the best but is not open source, and there’s Triq and PropEr, both of which are under active development. Apparently people use PropEr unless they have to avoid the GPL, in which case they use Triq.

Java

As far as I can tell, people who want good Quickcheck for Java write their tests in Clojure or Scala. There is junit-quickcheck which is disqualified on grounds of not implementing shrinking. functionaljava does have a Quickcheck implementation that supports shrinking, but it seems unclear that people actually know this exists and are using it. It is possible it should be on the above table, but I’m softly considering it to fail condition 6 because test.check and ScalaCheck just seem to completely dominate it. I am very open to arguments that this is not the case.

Similarly, other .NET languages just seem to adopt the policy that you should just use FsCheck to test your code rather than trying to port it to the current language.

Go

Go has a deeply inadequate version of Quickcheck which doesn’t support shrinking baked into its standard library. This is quite surprising as Quickcheck is from the 90s rather than the 70s, but they make up for it by not acknowledging where the concept is actually from. There do not appear to be any alternatives.

OCaml

OCaml seems to be suffering from a problem of being close enough to Haskell that people try to do a straight port of Quickcheck but far enough from Haskell that this doesn’t work. The result is that there is a “mechanical port” of Quickcheck which is completely abandoned and a fork of it that uses more idiomatic OCaml. I am insufficiently familiar with OCaml or its community to know if either is used or whether there is another one that is.

Others

There are of course many other languages than this, any many of them have something that looks a bit like Quickcheck if you squint hard enough. The Quickcheck page on Wikipedia lists a bunch of others, but a not entirely thorough sampling of the remainder suggests most of them fail the test of “does it implement shrinking” and the test of “is it actively developed or finished” and are not widely used enough to be worth an honourable mention.

If you know of a reasonably good Quickcheck that I have not included here, please do let me know.

This entry was posted in Hypothesis, programming on by .

What if we had more finished libraries?

Mark Jason Dominus has a piece from a while ago in which he says the following:

I released the Text::Template module several years ago, and it was immediately very successful. It’s small, simple, fast, and it does a lot of things well. At the time, there were not yet 29 templating systems available on CPAN.

Anyway, the module quickly stabilized. I would get bug reports, and they would turn out to be bugs in the module’s users, not in the module; I would get feature requests, and usually it turned out that what the requester wanted was possible, or even easy, without any changes to the module. Since the module was perfect, there was no need to upload new versions of it to CPAN.

But then I started to get disturbing emails. “Hi, I notice you have not updated Text::Template for nine months. Are you still maintaining it?” “Hi, I notice you seem to have stopped work on Text::Template. Have you decided to abandon this approach?” “Hi, I was thinking of using Text::Template, but I saw it wasn’t being maintained any more, so I decided to use Junk::CrappyTemplate, because I need wanted to be sure of getting support for it.”

I started wondering if maybe the thing to do was to release a new version of Text::Template every month, with no changes, but with an incremented version number. Of course, that’s ridiculous. But it seems that people assume that if you don’t update the module every month, it must be dead. People seem to think that all software requires new features or frequent bug fixes. Apparently, the idea of software that doesn’t get updated because it’s finished is inconceivable.

I blame Microsoft.

In my post about really boring programming language fantasies, I expressed a desire for more things in the standard library. A lot of people objected that the standard library is where software goes to die.

And… this is I guess somewhat true, but really the standard library is where software goes when it’s stable, and programmers seem to have lost their taste for stable software. After all, you’re not really living unless your dependencies are constantly shifting under you in backwards incompatible ways, right?

But… I share the fear. Because basically I don’t believe you can finish software. I mean sure it’s possible that MJD wrote bug free software. For such a constrained problem I’m definitely not ruling it out (but, on the other hand, text is terrible, and I’ve certainly written incredibly buggy solutions to highly constrained problems). But also he was writing in Perl.

And I don’t mean that pejoratively. As Avdi put it in another piece, we’re still catching up to Perl. Perl is amazing for backwards compatibility of language releases. In comparison, Python is terrible. For example, Hypothesis doesn’t currently run on Python 3.6 because they decided to remove a function that I depended on from the standard library because it’s been deprecated in favour of a function that doesn’t exist on Python 2.7. Thanks. (I haven’t fixed this yet not because I’m not going to, but because Python 3.5 isn’t even out yet so this is not high on my priority list).

And from the outside it’s often very hard to tell the difference between a finished and an abandoned library, but the balance of probability tends to be towards the latter. There are a lot of abandoned libraries and very few that haven’t had any bugs in them that needed fixing in the last four years.

Part of what motivates this is I’ve run into a bunch of libraries that from an API point of view “should” be finished: They have a very small surface area, are widely used, and haven’t been updated in a while. Unfortunately they’re instead either abandoned or struggling to find enough time to actually put out a new released version. For example bitarray is a very nice library that sadly you should never use because it’s abandoned and has some known and serious bugs that have not been fixed and probably never will be. Naturally it’s also widely deployed.

This is frustrating. The end of life for software should be stability, not bitrot, and I’d like a way to fix that. Here’s the way I’m currently speculating about:

It sure would be nice if there were an organisation which basically existed to take on maintainership of finished libraries. A library maintained by that organisation basically gives you the guarantee that bugs will be fixed, it will be ported to new versions of the language, the documentation will be improved, etc. but no new features will be added and the existing API will remain backwards compatible for basically all time. Any incompatible changes to the library are purely functional: They create a new library, not mutate the old one.

Sketch rules for how it could work (I’m imagining this being for Python, but it’s pretty portable to other languages):

  1. There is a corresponding Github organisation containing all libraries the organisation maintains.
  2. There is a shared email address which has ownership of the libraries on pypi (or whatever your package repository is)
  3. Anyone can and should feel free to work on any given project. Members of the organisation are actively encouraged to do so. A pull request requires sign-off from at least one, ideally two, members of the organisation.
  4. Any library maintainer can ask for the organisation to take ownership of their library. They put it to an internal vote and if they say yes the project is transferred to them. This should usually come with an invite to add the maintainer to the organisation too.
  5. People who want to join the organisation can ask to do so and it is again put to a vote.

Minimum criteria for libraries to be included:

  1. Must be under an OSI approved license.
  2. Must be version at least 1.0.0.
  3. Must have some threshold level of usage (a library nobody is using is probably not finished because you’re almost certainly wrong about the requirements).
  4. Must run on all of some required set of versions of the language (I’d pick CPython 2.7 and 3.4 and pypy for Python)
  5. Must not have any dependencies on libraries that are not maintained by the organisation (testing dependencies are totally fine).
  6. The latest release must not have introduced any new features or breaking changes.

This is not really a fantasy idea. I think it’s quite practical, and I would be prepared to be a founding member of this organisation if other people would be interested in joining it, however I don’t currently actually have any finished libraries to contribute (Hypothesis might be finished at some point, but it probably won’t be – e.g. I’ll likely be adding new and novel strategy combinators forever). Do you?

This entry was posted in programming, Python on by .

How to generate a list plus an element of it

A common problem when using Hypothesis is that you want to generate two arguments: One is a list, the other is some element of that list (or some index into it, or some subset of it, etc).

This is pretty easy to do, but there are different ways of doing it that work variably well, so I thought I would unpack some of the options. You should absolutely feel free to ignore this and do whatever is easiest for you: Almost everything that works will work reasonably well, this is really just if you’re interested in The Best way to do it.

That being said, first a caveat. Don’t use this one:

def list_and_element(elements):
   return tuples(lists(elements), elements).
       filter(lambda p: p[1] in p[0])

The main reason not to use this one is that it will appear to work fine but will actually be working really badly: It almost only ever generates pairs where the second element is one which has a very high probability of being drawn from the example distribution. e.g. if you try using this with string elements you’ll find that the second element of the pair is almost always the empty string and basically never a string with more than one character.

Basically as long as you don’t do something like that, which I don’t think I’ve ever seen someone do in the wild anyway, you will probably be fine. That caveat aside, lets look at some of the alternatives.

The easiest thing to do, and the one people most commonly reach for, is to use flatmap:

def list_and_element(elements):
   return lists(elements, min_value=1).flatmap(
        lambda xs: tuples(just(xs), sampled_from(xs)))

This is pretty self-explanatory if you understand what flatmap does: We’re doing literally exactly what the problem says, but we have to break it into two stages. First we draw a list (with at least one element so that there can be any elements in it), then we draw an element from that list and return that list and that drawn element.

There’s nothing wrong with this solution and you should feel free to use it. However there are two ways to improve upon it:

The first is that whenever you use flatmap you should have a think about whether there’s a reasonable way to instead not do that. It’s not that flatmap is especially dangerous per se, it’s just that because it has extremely general purpose semantics Hypothesis has to work a lot harder every time you use it, and performance and example quality may suffer a bit (if you don’t have a timeout it should never hurt the ability to find an example, only the quality of the end result).

map and filter do not have this problem (if you want a technical explanation why: Hypothesis operates on an intermediate representation which is where it does most of its work. map and filter have the same intermediate representation as the underlying strategy, but flatmap cannot and instead has a really complicated one which is quite hard to work with), and it’s often possible to replace flatmap usage with them, so sometimes it’s worth thinking about whether you can. In this case we can, quite easily. The following does this:

def list_and_element(elements):
   return tuples(integers(min_value=0), lists(elements, min_value=1)).
        filter(lambda p: p[0] < len(p[1])).
        map(lambda p: (p[1], p[1][p[0]]))

What we’re doing here is we’re simultaneously generating the list and an index into it, then we’re filtering to the range where that index is valid, then we’re taking the element at that index.

There’s a complication though: Did you notice how we’re swapping the order of the pair halfway through?

The reason for this is because it makes Hypothesis better able to simplify the example in some interesting cases. What will happen is that Hypothesis will first simplify the first element of the tuple as far as it can, then the second element, then back to the first, etc until it hits a point where it can simplify further.

And what this means is that by putting the index first, we automatically seek to the first element of the list that makes the example pass. This both gives us more freedom to trim down the array by deleting later elements and also stops us getting stuck in the case where we have an easy to satisfy property that happens not to be satisfied by the trivial element but can’t shrink the array. For example, when doing it the other way I found that Hypothesis gave me ([”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ‘0’], ‘0’) as a minimal example of a pair where the element chosen was non-empty: It had first shrunk all the elements of the array leading up to the one we’d chosen, then it couldn’t shrink the index because all of the earlier elements were empty.
This is also potentially a problem with the sampled_from + flatmap approach, but as it happens in this particular case you get lucky: What will end up happening is that if you try to shrink the length of the array past the index you’d chosen, you’ll end up redrawing a new index. This will usually get you past being stuck in this case, although no guarantees.

Then there’s the final option, which is my favourite: Don’t do this at all.

A thing that I have often found to be a very useful reframing is instead of asserting “this example is not a counter-example to this property”, assert “A countere-example to this property does not exist within this example”. As long as the set of possible counter-examples isn’t too large a function of the size of the example (e.g. if it’s O(n) you should be fine) this often leads to much better tests: You both vastly increase the chances of your finding a counterexample and improve the end result because you gain the ability to shrink past a bunch of accidental blockers. So instead of writing something like:

@given(list_and_element(text()))
def test_some_stuff(p):
    xs, x = p
    ...

You write:

@given(lists(text(), min_size=1))
def test_some_stuff(xs):
    for x in xs:
        ...

This is a technique I first noticed in my properties for testing optimization work. It’s not without its problems: It can be awkward if your tests need to mutate your arguments, and it of course makes your tests slower. For a lot of cases it seems to be worth trying though, as it can improve matters significantly.

This entry was posted in Hypothesis, programming, Python on by .

Throwing in the towel

Effective immediately, there will be no further feature development from me on Hypothesis for the foreseeable future.

This is not the end of Hypothesis. I will still be:

  1. Managing the Hypothesis community and answering peoples’ questions about how to use it
  2. Fixing bugs, although probably not even close to as rapidly as I have historically done
  3. Improving the documentation
  4. Giving talks about Hypothesis
  5. Reviewing and merging pull requests. If anyone else wants to do feature development on Hypothesis you are extremely welcome to and are more than welcome to ask me questions in the course of doing so.

Which is actually still quite a lot of work.

Hypothesis isn’t going anywhere. But, for now at least, 1.10 is as good as it’s going to get. This is now a side project, and you can expect a commensurate level of support. If you want more than that, or there’s some feature you desperately need, get in touch. We can talk rates.

At some point in the future I will doubtless resume feature development, but it won’t be soon.

Why?

Partly because Hypothesis 1.10 is good enough. It’s basically as good as or better than any open source equivalent in any language, and it’s certainly light years beyond what this time last year anyone in the Python community could reasonably expect would ever exist. There are exciting things I could work on, but they’d be another huge time sink and basically not all that helpful for the problem of making development on Hypothesis sustainable.

But mostly because I’m tired and I’m angry, and I’ve done a reality check and found reality wanting.

I’ve been trying to figure out a way of making Hypothesis development sustainable, and the answer is basically that I can’t, despite the fact that it’s clearly going to save people at the bare minimum millions of dollars over the course of its lifetime.

Yeah, I could probably eke out a living. Particularly if I was prepared to burn a lot of bridges and sacrifice most of what actually makes me want to work on it, but basically we’ve built an industry on free labour, and we’ve concluded that we’d much rather make people work for free in their spare time to produce adequate software and shame them into supporting it when somehow it surprisingly doesn’t do exactly what we want than fairly compensate for their labour and get good software out of it.

This makes any attempt to get money for tooling such an uphill struggle that it’s really not worth the effort. Plans which are predicated on changing the world before anyone will pay you any money are decidedly bad plans.

I think Hypothesis will make the world a better place, and I have a lot emotionally invested it, so as stated above I’m not abandoning it entirely, but I’ve really lost all desire to continue giving away so much of my labour for free, so I won’t.

There will be more on the subject of what happens next when I have collected my thoughts better and am slightly less angry.

In the short term though, if you’ve got any contracting work that you think would be up my alley and is either remote or reachable from London, that would be great.

This entry was posted in Hypothesis, programming, Python on by .

Hypothesis 1.8.0 is out

This release is mostly focused on internal refactoring, but has some nice polishing and a few bug fixes.

New features:

  • Much more sensible reprs for strategies, especially ones that come from hypothesis.strategies. These should now have as reprs python code that would produce the same strategy.
  • lists() accepts a unique_by argument which forces the generated lists to be only contain elements unique according to some function key (which must return a hashable value).
  • Better error messages from flaky tests to help you debug things.

Mostly invisible implementation details that may result in finding new bugs in your code:

  • Sets and dictionary generation should now produce a better range of results.
  • floats with bounds now focus more on ‘critical values’, trying to produce values at edge cases.
  • flatmap should now have better simplification for complicated cases, as well as generally being (I hope) more reliable.

Bug fixes:

  • You could not previously use assume() if you were using the forking executor (because the relevant exception wasn’t pickleable).
This entry was posted in Hypothesis, programming, Python on by .