Category Archives: programming

In praise of incremental approaches to software quality

Do you know what the best design decision I made in Hypothesis was?

It wasn’t well thought out. It wasn’t anything clever or theoretically interesting. It was done purely because I was lazy, not for any principled reason. It’s still the best design decision.

It’s this: Hypothesis does not have its own test runner. It just uses whichever one you’re already using.

This turns out to be a great idea, because it means that there’s basically zero cost to using Hypothesis: Install an extra library, write some code that uses @given to feed some data to a function. Done. Now you’re using Hypothesis.

As far as difficult sells go, this this barely even registers.

Compare this to, say, adding static analysis to your process (I’ve never got around to adding pylint to Hypothesis and I care about this stuff, because it gives too many false positives so it’s never been quite worth the up front investment), or rewriting everything in Haskell.

It’s certainly possible that both of those will produce enough of a return on investment from their correctness benefits that they’re worth the up front cost, but unless you’re really confident that they will and can afford to take the time to do it right now they’re things that you can put off basically indefinitely.

Hypothesis on the other hand is so cheap to get started with that you can start benefiting almost immediately, and then incremental amounts more work will produce incremental amounts more improvement.

This turns out to be incredibly valuable if what you actually want to produce better software.

One of the responses to the economics of software development post is that doing everything right up front can be cheaper than the long term cost of the unprincipled hacks that people often do instead.

This is certainly true. If you are lucky enough to find yourself in a position where you have the option of doing everything right up front, I recommend you make a reasonable effort towards doing so. You’ll probably fail (getting everything right is really rather hard), but in failing you will probably still end up in a better position than in not trying.

But, well, this is rarely the situation in which we find ourselves. Even if we got everything right at the beginning, at some point in the project we were rushed, or made a decision that later turned out to be a bad call, or in some other way failed to be perfect, because nobody can be perfect all the time.

If this is not the situation you find yourself in and you are working on a mature, high quality project that has consistently got most things right throughout its lifespan and has fixed the things it does not, good for you. Say hi to the unicorns for me while you’re there.

Most of us are not only not in that situation but that situation is almost entirely unreachable. Saying that if you did everything right at the beginning you wouldn’t have this problem is just another way of saying it sucks to be you.

So what we need are not tools that work great if you use them from the beginning, or tools that work great if you embrace them whole-heartedly, but tools that make it easy to get started and easy to move in the right direction.

This means that “rewrite it in Haskell” is out, but “we can write this microservice in Haskell” might be in. It means “Always have 100% branch coverage” is out, but “merges must never decrease branch coverage” is in. Failing the build if you have any static analysis warnings is out, but failing a merge if the touched lines have any static analysis failures is in.

Incremental quality improvements allow you to move your project to a higher quality state without requiring the huge up front investment that getting everything right from the beginning requires, and they don’t punish you for the fact that you didn’t get everything right before you had this tool to help you get things right.

This is of course often a harder problem, but I think it’s one that’s important to solve. If we want higher quality software we don’t get to live in a fantasy world where everything magically works, we have instead to think about how to move from the world we live in to a reachable world in which everything works better, and we need to think about what the steps along the way look like.

This entry was posted in programming, Python on by .

The economics of software correctness

This post is loosely based on the first half of my “Finding more bugs with less work” talk for PyCon UK.

You have probably never written a significant piece of correct software.

That’s not a value judgement. It’s certainly not a criticism of your competence. I can say with almost complete confidence that every non-trivial piece of software I have written contains at least one bug. You might have written small libraries that are essentially bug free, but the chances of you having written whole programs which are are tantamount to zero.

I don’t even mean this in some pedantic academic sense. I’m talking about behaviour where if someone spotted it and pointed it out to you you would probably admit that it’s a bug. It might even be a bug that you cared about.

Why is this?

Well, lets start with why it’s not: It’s not because we don’t know how to write correct software. We’ve known how to write software that is more or less correct (or at least vastly closer to correct than the norm) for a while now. If you look at the NASA development process they’re pretty much doing it.

Also, if you look at the NASA development process you will pretty much conclude that we can’t do that. It’s orders of magnitude more work than we ever put into software development. It’s process heavy, laborious, and does not adapt well to changing requirements or tight deadlines.

The problem is not that we don’t know how to write correct software. The problem is that correct software is too expensive.

And “too expensive” doesn’t mean “It will knock 10% off our profit margins, we couldn’t possibly do that”. It means “if our software cost this much to make, nobody would be willing to pay a price we could afford to sell it at”. It may also mean “If our software took this long to make then someone else will release a competing product two years earlier than us, everyone will use that, and when ours comes along nobody will be interested in using it”.

(“sell” and “release” here can mean a variety of things. It can mean that terribly unfashionable behaviour where people give you money and you give them a license to your software. It can mean subscriptions. It can mean ad space. It can even mean paid work. I’m just going to keep saying sell and release).

NASA can do it because when they introduce a software bug they potentially lose some combination of billions of dollars, years of work and many lives. When that’s the cost of a bug, spending that much time and money on correctness seems like a great deal. Safety critical industries like medical technology and aviation can do it for similar reasons (buggy medical technology kills people, and you don’t want your engines power cycling themselves midflight).

The rest of us aren’t writing safety critical software, and as a result people aren’t willing to pay for that level of correctness.

So the result is that we write software with bugs in it, and we adopt a much cheaper software testing methodology: We ship it and see what happens. Inevitably some user will find a bug in our software. Probably many users will find many bugs in our software.

And this means that we’re turning our users into our QA department.

Which, to be clear, is fine. Users have stated the price that they’re willing to pay, and that price does not include correctness, so they’re getting software that is not correct. I think we all feel bad about shipping buggy software, so let me emphasise this here: Buggy software is not a moral failing. The option to ship correct software is simply not on the table, so why on earth should we feel bad about not taking it?

But in another sense, turning our users into a QA department is a terrible idea.

Why? Because users are not actually good at QA. QA is a complicated professional skill which very few people can do well. Even skilled developers often don’t know how to write a good bug report. How can we possibly expect our users to?

The result is long and frustrating conversations with users in which you try to determine whether what they’re seeing is actually a bug or a misunderstanding (although treating misunderstandings as bugs is a good idea too), trying to figure out what the actual bug is, etc. It’s a time consuming process which ends up annoying the user and taking up a lot of expensive time from developers and customer support.

And that’s of course if the users tell you at all. Some users will just try your software, decide it doesn’t work, and go away without ever saying anything to you. This is particularly bad for software where you can’t easily tell who is using it.

Also, some of our users are actually adversaries. They’re not only not going to tell you about bugs they find, they’re going to actively try to keep you from finding out because they’re using it to steal money and/or data from you.

So this is the problem with shipping buggy software: Bugs found by users are more expensive than bugs found before a user sees them. Bugs found by users may result in lost users, lost time and theft. These all hurt the bottom line.

At the same time, your users are a lot more effective at finding bugs than you are due to sheer numbers if nothing else, and as we’ve established it’s basically impossible to ship fully correct software, so we end up choosing some level of acceptable defect rate in the middle. This is generally determined by the point at which it is more expensive to find the next bug yourself than it is to let your users find it. Any higher or lower defect rate and you could just adjust your development process and make more money, and companies like making money so if they’re competently run will generally do the things that cause them to do so.

This means that there are only two viable ways to improve software quality:

  1. Make users angrier about bugs
  2. Make it cheaper to find bugs

I think making users angrier about bugs is a good idea and I wish people cared more about software quality, but as a business plan it’s a bit of a rubbish one. It creates higher quality software by making it more expensive to write software.

Making it cheaper to find bugs though… that’s a good one, because it increases the quality of the software by increasing your profit margins. Literally everyone wins: The developers win, the users win, the business’s owners win.

And so this is the lever we get to pull to change the world: If you want better software, make or find tools that reduce the effort of finding bugs.

Obviously I think Hypothesis is an example of this, but it’s neither the only one nor the only one you need. Better monitoring is another. Code review processes. Static analysis. Improved communication. There are many more.

But one thing that won’t improve your ability to find bugs is feeling bad about yourself and trying really hard to write correct software then feeling guilty when you fail. This seems to be the current standard, and it’s deeply counter-productive. You can’t fix systemic issues with individual action, and the only way to ship better software is to change the economics to make it viable to do so.

Edit to add: In this piece, Itamar points out that another way of making it cheaper to find bugs is to reduce the cost of when your users do find them. I think this is an excellent point which I didn’t adequately cover here, though I don’t think it changes my basic point.

This entry was posted in programming, Python on by .

Mergeable compressed lists of integers

Alexander Shorin’s work on more configurable unicode generation in Hypothesis has to do some interesting slicing of ranges of unicode categories. Doing both generation and shrinking in particular either required two distinct representations of the data or something clever. Fortunately I’d previously figured out the details of the sort of data structure that would let you do the clever thing a while ago and it was just a matter of putting the pieces together.

The result is an interesting purely functional data structure based on Okasaki and Gill’s “Fast Mergeable Integer Maps”. I’m not totally sure we’ll end up using it, but the data structure is still interesting in its own right.

The original data structure, which is the basis of Data.IntMap in Haskell, is essentially a patricia trie treating fixed size machine words as strings of 0s and 1s (effectively a crit-bit trie). It’s used for implementing immutable mappings of integers with fast operations on them (O(log(n)) insert, good expected complexity on union).

With some small twists on the data structure you can do some interesting things with it.

  1. Ditch the values (i.e. we’re just representing sets)
  2. Instead of tips being a single key, tips are a range of keys start <= x < end.
  3. Split nodes are annotated with their size and the smallest interval [start, end) containing them.

When using this to represent sets of unicode letters this is extremely helpful – most of the time what we’re doing is we’re just removing one or two categories, or restricting the range, which results in a relatively small number of intervals covering a very large number of codepoints.

Let T be the number of intervals and W the word size. The data structure has the following nice properties:

  1. Getting the size of a set is O(1) (because everything is size annotated or can have its size calculated with a single arithmetic operation)
  2. Indexing to an element in sorted order is O(log(T)) because you can use the size annotation of nodes to index directly into it – when indexing a split node, check the size of the left and right subtrees and choose which one to recurse to.
  3. The tree can be automatically collapse tointervals in many cases, because a split node is equivalent to an interval if end = start + size, which is a cheap O(1) check
  4. Boolean operations are generally O(min(W, T)), like with the standard IntSet (except with intervals instead of values)
  5. Range restriction is O(log(T)).

Note that it isn’t necessarily the case that a tree with intervals [x, y) and [y, z) in it will compress this into the interval [x, z) because their common parent might be further up the tree.

An extension I have considered but not implemented is that you could potentially store very small subtrees as arrays in order to flatten it out and reduce indirection.

In particular the efficient indexing is very useful for both simplification and generation, and the fact that merging efficiently is possible means that we can keep two representations around: One for each permitted category (which helps give a better distribution when generating) and one for the full range (which makes it much easier to simplify appropriately).

Here is an implementation in Python. It’s not as fast as I’d like, but it’s not unreasonably slow. A C implementation would probably be a nice thing to have and is not too difficult to do (no, really. I’ve actually got a C implementation of something similar lying around), but wouldn’t actually be useful for the use case of inclusion in Hypothesis because I don’t want to add a C dependency to Hypothesis just for this.


This entry was posted in Code, programming, Python, Uncategorized on by .

Finding more bugs with less work

I was at PyCon UK this weekend, which was a great conference and I will definitely be attending next year.

Among the things that occurred at this conference is that I gave my talk, “Finding more bugs with less work”. The video is up, and you can see the slides here.

I may do a transcript at some point (like I did for my django talk), but I haven’t yet.

This entry was posted in Hypothesis, programming, Python on by .

Soliciting advice: Bindings, Conjecture, and error handling

Edit: I think I have been talked into a significantly simpler system than the one described here that simply uses error codes plus some custom hooks to make this work. I’m leaving this up for posterity and am still interested in advice, but don’t worry I’ve already been talked out of using setjmp or implementing my own exception handling system.

I’m working on some rudimentary Python bindings to Conjecture and running into a bit of a problem: I’d like it to be possible to run Conjecture without forking, but I’m really struggling to come up with an error handling interface that works for this.

In Conjecture’s current design, any of the data generation functions can abort the process, and are run in a subprocess to guard against that. For testing C this makes absolute sense: It’s clean, easy to use, and there are so many things that can go wrong in a C program that will crash the process that your C testing really has to be resilient against the process crashing anyway so you might as well take advantage of that.

For Python, this is a bit sub-optimal. It would be really nice to be able to run Conjecture tests purely in process just looking for exceptions. os.fork() has to do a bunch of things which makes it much slower than just using C forking straight off (and the program behaves really weirdly when you send it a signal if you try to use the native fork function), and it’s also just a bit unneccessary for 90% of what you do with Python testing.

It would also be good to support a fork free mode so that Conjecture can eventually work on Windows (right now it’s very unixy).

Note: I don’t need forkless mode to handle crashes that are not caused by an explicit call into the conjecture API. conjecture_reject and conjecture_fail (which doesn’t exist right now but could) will explicitly abort the test, but other things that cause a crash are allowed to just crash the process in forkless mode.

So the problem is basically how to combine these interfaces, and every thing I come up with seems to be “Now we design an exception system…”

Here is the least objectionable plan I have so far. It requires a lot of drudge work on my part, but this should mostly be invisible to the end user (“Doing the drudge work so you don’t have to” is practically my motto of good library design)

Step 1: For each draw_* function in the API, add a second draw_*_checked function which has exactly the same signature. This does a setjmp, followed by a call to the underlying draw_* function. If that function aborts, it does a longjmp back to the setjmp and sets a is_aborted flag and returns some default value. Bindings must always call the _checked version of the function, then check conjecture_is_aborted() and convert it into a language appropriate error condition.

Note: It is a usage error to call one checked function from another and this will result in your crashing the process. Don’t do that. These are intended to be entry points to the API, not something that you should use in defining data generators.

Step 2: Define a “test runner” interface. This takes a test, some associated data, and runs it and returns one of three states: Passing test, failing test, rejected test. The forking based interface then becomes a single test runner. Another one using techniques similar to the checked interface is possible. Bindings libraries should write their own – e.g. a Python one would catch all exceptions and convert them into an appropriate response.

Step 3: Define a cleanup API. This lets you register a void (*cleanup)(void *data) function and some data to pass to it which may get called right before aborting. In “crash the process” model it is not required to be called, and it will not get called if your process otherwise exits abnormally. Note: This changes the memory ownership model of all data generation. Data returned to you from generators is no longer owned by you and you may not free it.

I think this satisfies the requirements of being easy to use from both C and other languages, but I’m a little worried that I’m not so much reinventing the wheel as trying to get from point A to point B without even having heard of these wheel things and so I invented the pogo stick instead. Can anyone who has more familiarity with writing C libraries designed to be usable from both C and other languages offer me some advice and/or (heh) pointers?

This entry was posted in Hypothesis, programming, Python on by .