David R. MacIver's Blog: Static typing will not save us from broken software

Static typing will not save us from broken software

23 October 2016

Epistemic status: This piece is like virtually all writing about software and is largely anecdata and opinions. I think it’s more right than not, but then I would.

I learned to program in ML. I know Haskell to a reasonable degree of fluency. I’ve written quite a lot of Scala, including small parts of the compiler and standard library (though I think most or all of that is gone or rewritten by now. It’s been 8 years). I like static typing, and miss using statically typed languages more heavily than I currently do (which is hardly at all).

But I’m getting pretty frustrated with certain (possibly most) static type system advocates.

This frustration stems from the idea that static typing will solve all our problems, or even one specific problem: The ubiquity of broken software. There’s a lot of broken software out there, and the amount keeps going up.

People keep claiming that is because of bad choices of language, but it’s mostly not and static typing will not even slightly help fix it.

(Note: I’m getting a lot of people saying this is a strawman and that’s not what static typing advocates say. This post is in fact a response to several specific comments from specific people, but I didn’t want to name and shame. It’s not a strawman if the people I’m arguing against actually exist).

Broken software is a social and economic problem: Software is broken because its not worth people’s while to write non-broken software. There are only two solutions to this problem:

Make it more expensive to write broken software
Make it cheaper to write correct software

Technical solutions don’t help with the first, and at the level of expense most people are willing to spend on software correctness your technical solution has to approach “wave a magic wand and make your software correct” levels of power to make much of an impact: The current level of completely broken software can only arise if there’s almost zero incentive for people to sink time into correctness of their IoT devices and they’re not engaged in even minimal levels of testing for quality.

When you’ve got that level of investment in quality anything that points out errors is more likely to be ignored or not used than it is to improve things.

I think this carries over to moderate levels of investment in correctness too, but for different reasons (and ones I’m less confident of).

“All” static typing tells you is that your program is well-typed. This is good and catches a lot of bugs by enforcing consistency on you. But at entry-level static typing most of those bugs are the sort that ends up with a Python program throwing a TypeError. Debugging those when they happen in production is a complete pain and very embarrassing, but it’s still the least important type of bug: A crash is noticeable if you’ve got even basic investment in monitoring (e.g. a sentry account and 5 lines of code to hook it in to your app). This is more true in some dynamic languages than others - Javascript is terrible for this because so many errors result in a value of undefined rather than an exception - but generally speaking in most languages these are quite straightforward errors both in manifestation and debugging.

Don’t get me wrong: Not having those bugs reach production in the first place is great. I’m all in favour. But because these bugs are relatively minor the cost of finding them needs to be lower than the cost of letting them hit production, else they start to eat into your quality budget and come at the cost of other more important bugs.

For more advanced usage, I’ve yet to be convinced that types are more effective than tests on modestly sized projects.

For large classes of problems, tests are just easier to write than types. e.g. an end to end test of a complicated user workflow is fairly easy to write, but literally nobody is going to encode it in the type system. Tests are also easier to add after the fact - if you find a bug it’s easy and unintrusive to add a test for it, but may require a substantial amount of work to refactor your code to add types that make the bug impossible. It can and often will be worth doing the latter if the bug is an expensive one, but it often won’t be.

In general, trying to encode a particular correctness property in the type system is rarely going to be easier than writing a good test for it, especially if you have access to a good property based testing library. The benefits of encoding it in the type system might make it worth doing anyway, for some bugs and some projects, but given the finite quality budget it’s going to come at the expense of other testing, so it really has to pull its weight.

Meanwhile, for a lot of current statically typed languages static typing ends up coming at the cost of testing in another entirely different way: Build times.

There are absolutely statically typed languages where build times are reasonable but this tends to be well correlated with them having bad type systems. e.g. Go is obsessed with good build times, but Go is also obsessed with having a type system straight out of the 70s which fights against you at every step of the way. Java’s compile times are sorta reasonable but the Java type system is also not particularly powerful. Haskell, Scala or Rust all have interesting and powerful type systems and horrible build times. There are counter-examples - OCaml build times are reportedly pretty good - but by and large the more advanced the type system the longer the build times.

And when this happens it comes with an additional cost: It makes testing much more expensive. I’m no TDD advocate, but even so writing good tests is much easier when the build/test loop is low. Milliseconds it’s bliss, seconds it’s fine, tens of seconds it starts to get a bit painful and if the loop is minutes honestly you’re probably not going to be writing many tests and if you are they’re probably not going to be very good.

So in order to justify its place in the quality budget, if your static types are substantially increasing build times they need to not just be better than writing tests (which, as discussed, they will often not be), they need to be better than all the tests you’re not going to write because of those increased build times.

To recap:

The most common bugs caught by static typing are also the least critical sort of bug.
In most contexts, catching a bug with a test is going to be cheaper than catching it with types. This is particularly true for bugs found after the fact.
Most existing static type systems also come with a build time cost that makes testing in general more expensive.

This means that by and large when the quality budget is constrained I would expect complicated typing to often hurt quality.

This obviously won’t always be true. For many scenarios the opposite will be true. e.g. I’d expect static typing to win out for correctness if:

bugs (especially crashing bugs) are very expensive so you have a large correctness budget to play with and have already picked the low hanging fruit from testing.
the project is very large. In these scenarios you may benefit a lot more from the sort of universal guarantees that static typing provides vs writing the same sort of tests over and over again, and the build times are probably already high enough that it’s painful to test well anyway.

The point is not that static typing is going to hurt quality in general, but that it’s a set of complicated trade-offs.

I don’t know how to calculate those trade-offs in general. It’s far from straightforward. But the point is that those trade-offs exist and that people who are pretending that static typing will solve the software quality crisis are ignoring them and, as a result, giving advice that will make the world a worse place.

And anecdotally the trade-off does seem to be a fairly tight one: My general experience of the correctness software written in fancy statically typed languages is not overwhelmingly positive compared to that of software written in dynamic languages. If anything it trends slightly negative. This suggests that for the scale of many projects the costs and benefits are close enough that this actually matters.

But even if that weren’t true, my original point remains: When there’s no budget for quality, tools which catch bugs won’t and can’t help. If static typing genuinely helped improve software quality for most of these projects, the result wouldn’t be that people used static typing and wrote better software as a result, it would be that they’d continue to write broken software and not use static typing as a result.

For the middle ground where we care about software correctness but have a finite budget, there’s the additional problem that the trade-offs change over time - early in the project when we don’t know if it will succeed people are less prepared to invest in quality, later in the project we’ve already picked our language and migrating over to static types is hard (in theory gradual typing systems can help with this. In practice I’ve yet to be convinced by them, but I’m trying to maintain an open mind. Meanwhile there’s always linters I guess).

This is also a lot of why I’ve chosen to work on Hypothesis, and why I think property based testing and similar approaches are probably a better way forward for a lot of us: Rather than having to get things right up front, you can add them to your tool chain and get real benefits from using them without having to first make fundamental changes to how you work.

Because despite the slightly bleak thesis of this post I do think we can write better software. It’s just that, as usual, there is no silver bullet which makes things magically better. Instead we have to make a decision to actually invest in quality, and we have to invest in tools and approaches that will allow us to take incremental steps to get there.

Comments

cian on 2016-10-23 22:10:30:

I think there’s a lot to be said for optional type systems like dialyzer. So you get the benefit of reasonable edit/compile/test loop times, and the option of strong typing if you need it.

Mark Wotton on 2016-10-24 19:21:56:

I agree with the motivations to make correctness feedback fast, but I don’t think the tradeoff you discuss is a real thing. http://www.shimweasel.com/2016/10/24/fast-tests-and-static-languages

david on 2016-10-25 09:12:16:

Thanks for the interesting article! It really is nice to hear about the practical aspects of Haskell development.

But a couple things:

1. You’re far from the worst offender about this, but I’m going to pick on you because I’m actually replying to you unlike most of the others.…

“David is setting up a tradeoff, but the tradeoff is not a real one.”

I’m getting pretty tired of people treating it as if I’d written a long article saying “Static typing sucks because it leads to slow builds”. This is not the trade off I’m setting up.

My fundamental point is that when you are time and resource constrained, different quality promoting activities trade off against each other by virtue of eating up that budget, and that types are not actually a dramatically better (and are often a worse) way of using said budget. Slow builds are only one of the ways that I mention this can happen but are not an essential feature of the argument.

2. I don’t think this statement is at all true and I think this is you assuming that all the other languages are like Haskell: “However, every modern static language worth its salt has an interpreter, same as every dynamic language”

If you’re meaning “interpreter” to mean “repl” then it’s close to true but still not really (e.g. Rust. Also a lot of statically typed languages in wide use are not “modern”), but in the actually useful sense of “A fast mode to run your code that can handle reloading” it’s basically not true at all. Many and probably most of statically typed languages with repls just invoke the compiler and load the bytecode, it’s not really substantially faster than actually compiling the code normally. This is especially true of JVM or .NET languages like Scala or F# where there’s hardly an optimizer pass to bypass because they rely on the JIT for that anyway.

My impression is that this is a common pattern in the functional programming world (e.g. you also see it in MLs and Lisps) and very rare outside it.

So even accepting that the problem is a solved one for Haskell (which I’m still not entirely convinced by but am significantly more convinced than I was) this is still a pretty widespread problem.

3. I don’t find the robots.txt example very convincing because the library is tiny (BTW the link is broken - missing https at the beginning). Everything is going to be fast at that size, or if it’s not then that’s slightly horrifying. The slow builds issue is really more of a problem as projects get larger. How does this compare on larger projects?

Mark Wotton on 2016-10-25 20:17:34:

I did say I agreed with most of the argument :)
And I can’t really speak for languages outside Haskell/Lisp/ML world, I suppose - I’d certainly not want to use many of them for unrelated reasons, but not having an interpreter available would be a big problem for me.

Still, within that world, I find I can write code faster & more reliably with types than without: experimentation and prototyping is significantly easier than my previous decade in Ruby, for instance. In my case, types seem to expand my budget rather than deplete it.

Just tested the same technique on yesod-core, which is 6500 lines of code and 2600 lines of tests: on that codebase, it took 0.4s. (I had to comment out the second test suite, though - ghcid doesn’t like it when you have two main entry points).

Usually in that case I’d use hspec’s --rerun functionality to only run the failing tests to keep you in the groove of what’s failing. (Can’t do this here as yesod doesn’t use hspec).

david on 2016-10-25 21:10:56:

> Still, within that world, I find I can write code faster & more reliably with types than without: experimentation and prototyping is significantly easier than my previous decade in Ruby, for instance. In my case, types seem to expand my budget rather than deplete it.

Fair enough.

My experience with static typing is that this manages to be true at the relatively simple level of typing but that types can very rapidly blow your budget as they get too complicated, and the trend amongst statically typed languages is towards the significantly more complicated types.

But in general I almost completely distrust self-reports of productivity (including mine). Everyone tells you that their favourite methodology is the most productive and best, which either means that what works is hopelessly individual specific or that you can’t actually trust self-reporting of productivity metrics (I’m pretty sure it’s the latter, and I’d need to do some digging but I seem to recall I’ve read research to back that up).

> Usually in that case I’d use hspec’s –rerun functionality to only run the failing tests to keep you in the groove of what’s failing. (Can’t do this here as yesod doesn’t use hspec).

Well, in general it sounds like the world of testing within Haskell is a lot better than I thought it was. Thanks for sharing! It genuinely significantly increases my chances of trying Haskell again.

Mark Wotton on 2016-10-25 20:21:07:

Also, I guess the reason I’m focusing on Haskell here is that I think one counterexample is enough to break a proposed necessary tradeoff: even if the tradeoff fits every other static language in the world, having one static language for which it is demonstrably not true is enough.

david on 2016-10-25 20:33:33:

You mean like the example of OCaml which was already mentioned in the article? ;-)