The economics of software correctness

This post is loosely based on the first half of my “Finding more bugs with less work” talk for PyCon UK.

You have probably never written a significant piece of correct software.

That’s not a value judgement. It’s certainly not a criticism of your competence. I can say with almost complete confidence that every non-trivial piece of software I have written contains at least one bug. You might have written small libraries that are essentially bug free, but the chances of you having written whole programs which are are tantamount to zero.

I don’t even mean this in some pedantic academic sense. I’m talking about behaviour where if someone spotted it and pointed it out to you you would probably admit that it’s a bug. It might even be a bug that you cared about.

Why is this?

Well, lets start with why it’s not: It’s not because we don’t know how to write correct software. We’ve known how to write software that is more or less correct (or at least vastly closer to correct than the norm) for a while now. If you look at the NASA development process they’re pretty much doing it.

Also, if you look at the NASA development process you will pretty much conclude that we can’t do that. It’s orders of magnitude more work than we ever put into software development. It’s process heavy, laborious, and does not adapt well to changing requirements or tight deadlines.

The problem is not that we don’t know how to write correct software. The problem is that correct software is too expensive.

And “too expensive” doesn’t mean “It will knock 10% off our profit margins, we couldn’t possibly do that”. It means “if our software cost this much to make, nobody would be willing to pay a price we could afford to sell it at”. It may also mean “If our software took this long to make then someone else will release a competing product two years earlier than us, everyone will use that, and when ours comes along nobody will be interested in using it”.

(“sell” and “release” here can mean a variety of things. It can mean that terribly unfashionable behaviour where people give you money and you give them a license to your software. It can mean subscriptions. It can mean ad space. It can even mean paid work. I’m just going to keep saying sell and release).

NASA can do it because when they introduce a software bug they potentially lose some combination of billions of dollars, years of work and many lives. When that’s the cost of a bug, spending that much time and money on correctness seems like a great deal. Safety critical industries like medical technology and aviation can do it for similar reasons (buggy medical technology kills people, and you don’t want your engines power cycling themselves midflight).

The rest of us aren’t writing safety critical software, and as a result people aren’t willing to pay for that level of correctness.

So the result is that we write software with bugs in it, and we adopt a much cheaper software testing methodology: We ship it and see what happens. Inevitably some user will find a bug in our software. Probably many users will find many bugs in our software.

And this means that we’re turning our users into our QA department.

Which, to be clear, is fine. Users have stated the price that they’re willing to pay, and that price does not include correctness, so they’re getting software that is not correct. I think we all feel bad about shipping buggy software, so let me emphasise this here: Buggy software is not a moral failing. The option to ship correct software is simply not on the table, so why on earth should we feel bad about not taking it?

But in another sense, turning our users into a QA department is a terrible idea.

Why? Because users are not actually good at QA. QA is a complicated professional skill which very few people can do well. Even skilled developers often don’t know how to write a good bug report. How can we possibly expect our users to?

The result is long and frustrating conversations with users in which you try to determine whether what they’re seeing is actually a bug or a misunderstanding (although treating misunderstandings as bugs is a good idea too), trying to figure out what the actual bug is, etc. It’s a time consuming process which ends up annoying the user and taking up a lot of expensive time from developers and customer support.

And that’s of course if the users tell you at all. Some users will just try your software, decide it doesn’t work, and go away without ever saying anything to you. This is particularly bad for software where you can’t easily tell who is using it.

Also, some of our users are actually adversaries. They’re not only not going to tell you about bugs they find, they’re going to actively try to keep you from finding out because they’re using it to steal money and/or data from you.

So this is the problem with shipping buggy software: Bugs found by users are more expensive than bugs found before a user sees them. Bugs found by users may result in lost users, lost time and theft. These all hurt the bottom line.

At the same time, your users are a lot more effective at finding bugs than you are due to sheer numbers if nothing else, and as we’ve established it’s basically impossible to ship fully correct software, so we end up choosing some level of acceptable defect rate in the middle. This is generally determined by the point at which it is more expensive to find the next bug yourself than it is to let your users find it. Any higher or lower defect rate and you could just adjust your development process and make more money, and companies like making money so if they’re competently run will generally do the things that cause them to do so.

This means that there are only two viable ways to improve software quality:

  1. Make users angrier about bugs
  2. Make it cheaper to find bugs

I think making users angrier about bugs is a good idea and I wish people cared more about software quality, but as a business plan it’s a bit of a rubbish one. It creates higher quality software by making it more expensive to write software.

Making it cheaper to find bugs though… that’s a good one, because it increases the quality of the software by increasing your profit margins. Literally everyone wins: The developers win, the users win, the business’s owners win.

And so this is the lever we get to pull to change the world: If you want better software, make or find tools that reduce the effort of finding bugs.

Obviously I think Hypothesis is an example of this, but it’s neither the only one nor the only one you need. Better monitoring is another. Code review processes. Static analysis. Improved communication. There are many more.

But one thing that won’t improve your ability to find bugs is feeling bad about yourself and trying really hard to write correct software then feeling guilty when you fail. This seems to be the current standard, and it’s deeply counter-productive. You can’t fix systemic issues with individual action, and the only way to ship better software is to change the economics to make it viable to do so.

Edit to add: In this piece, Itamar points out that another way of making it cheaper to find bugs is to reduce the cost of when your users do find them. I think this is an excellent point which I didn’t adequately cover here, though I don’t think it changes my basic point.

This entry was posted in programming, Python on by .

9 thoughts on “The economics of software correctness

  1. Pingback: In praise of incremental approaches to software quality | David R. MacIver

  2. Pingback: Links 8/10/2015: KDE Plasma 5.4.2 Released, Linux Drama Queens | Techrights

  3. Kay Schluehr

    Also, if you look at the NASA development process you will pretty much conclude that we can’t do that.

    Those processes are actually quite common in the tech industry but they are most related to tester tests: black-box, system level, specification driven, bureaucratic, implemented using proprietary or in-house tools and often even domain specific languages which strive to be simple but are rather simplistic. In my previous company ( mid-size, smartcards ) we purchased test suites along with their tools. The testcases were the assets the tools just an add on. Integration of those tools into existing workflows required additional work. The test suites were often buggy, more even than the products they were created for. Testing the product + debugging the test suites was the most time consuming development step.

    It’s something almost no one is talking about. The discourse about testing in the last 1 1/2 decades has been dominated by “agile” developer tests: unit tests, innovative techniques your own work derives from like Quickcheck, the advantages or disadvantages of TDD etc. From the point of view of tester-tests all of this is illegible, unmanaged creative expression, possibly very useful – see e.g. the trophy list on the AFL page but there is a serious communication issue. You cannot handover a product to some client and say: we have implemented all your requirements, here is the test specification containing all the references to your requirement docs and here are the log-files. That’s why Agile requires continuous delivery and permanent feedback, “the customer on board” but not only is that much effort on the side of the customer, but the customer might just handover a product specification based on a stack of others which are available from 3rd parties and expects a product in return. They don’t want anything which resembles incremental development.

    Technically tester-tests are often substandard and herein lies a real challenge. Companies with R&D departments might launch a university cooperation for some flavor of model driven testing to increase productivity but this hardly ever goes into production. Schlumberger once verified a component of their Javacard using Coq. Another, very isolated trophy. What can we do about it?

    A possible attack vector is test management and consistent reporting across a variety of approaches, languages, frameworks which does not prevent innovation in testing to take place. I thought about some derivative of project Jupyter, mostly the infrastructure with ZMQ, the communication protocol and the execution kernels, rather than the actual notebook frontend, which needs to be replaced by something more suitable for the domain. Letting Hypothesis plug into existing test frameworks is one step of integration, another one is to plug frameworks into test management and reporting tools.

    1. david Post author

      I agree with most of these points, but I think that saying that saying that the NASA testing processes are quite common in the tech industry isn’t really accurate because testers are quite common any more than saying the NASA development processes are common because developers are quite common. The degree to which testers are typically integrated into the workflow and the amount of work invested in it is rarely there – most testers are treated as second class citizens where you throw the product over the wall to them and then get angry when they tell you it doesn’t work.

      I do actually have a couple people from a testing background using Hypothesis. It’s not really a unit testing tool except in the sense that it works best for reproducible fast tests, but it also works well for almost any test you can automate.

      RE plugging Hypothesis into existing test frameworks: It does, that’s how it works. The design of Hypothesis is that you run it as part of your normal tests as part of your normal Python test suite. I have an explicit goal of taking over the world (of software testing), so making it easy to add to your normal testing workflow is a must. :-) I haven’t looked into integration with test management stuff because a) I think that’s better handled at the testing framework level and Hypothesis should just hook into the framework’s normal reporting and b) I, uh, have literally never met anyone who uses them, so I’ve no idea what the interest and requirements are. This seems like a thing that’s more likely to happen as a contracted project than on my own initiative.

      1. Kay Schluehr

        Most testers are treated as second class citizens where you throw the product over the wall to them and then get angry when they tell you it doesn’t work

        Unfortunately the developers are often right and their product is correct in places when the tests go wrong and vice versa. The number of type I and II errors in 3rd party or tester test suites can be quite unsettling. I blame the test process and the test partitioning / test management which makes sense on the surface but a test suite with lots of test cases encoded as individual test scripts is yet another program with just so many bugs per LOC and lots of code&data duplication under the hood. This is relevant with respect to the economics of testing because the effort to build formal models might not be overstated but the costs of buggy test suites is swept under the rug.

        Testers might be harassed by arrogant developers but I also experienced the opposite where test teams were held in high esteem when they actually found errors without keeping developers busy for weeks and months having to argue about false positives. One learns to know about the differences in painful ways.

        I agree that everyone should learn Hypothesis in 5 minutes – which is the average attention span of a programmer in 2015 or the maximum distance between two tweets – but chances are that tester testers are still threaded through ISTQB, which is low at technique and high at process. ‘Process’ in this case, can be mapped onto the economics of the profession. If for instance, formal software verification is omitted from the process, than it is not because it is useless, but time consuming, there is market clearance with respect to people who are knowledgeable about it and tools are perceived as ‘immature’ which means that the market hasn’t found a direction yet, to single out a mainstream candidate.

  4. Pingback: Professional Development 10/05/15 – 10/11/15 | The Software Mentor

  5. Pingback: Ecology of Software Quality - 250bpm

  6. Pingback: A whirlwind tour of the Hypothesis build | David R. MacIver

  7. Pingback: Professional Development – 2015 – Week 42

Comments are closed.