Author Archives: david

An Unkind Question about Software Correctness Research

Dear Lazyweb,

I’m about to be something of a jerk, and I’d like your help in doing it.

I have noticed a common pattern in a lot of software correctness research, which is that they often follow the following format:

  1. Oh god software, it’s everywhere.
  2. And it doesn’t work very well at all, amirite?
  3. This is only going to get worse.
  4. We’re basically all doomed.
  5. Therefore in order to fix this we propose the following.
  6. We have made an improvement.
  7. To a category of tool that literally nobody uses.
  8. Happy to help.
  9. Love, The Authors.

What are your “favourite” examples of this genre? I’m particularly interested in examples from software testing research but I’ll take anything vaguely correctness related.

The “That nobody actually uses” part of it is also somewhat inessential – I’m mostly interested in good examples of the motivation section, and in many ways having examples from things people actually use is quite helpful because it gives me papers I can point to without feeling too mean about it!

Answers accepted in comments, emails, tweets, whatever.


Edit to add: I do not mean to suggest that such papers are necessarily bad papers. Many of them are interesting and intelligent research.

I’m also not interesting in being mean to individual papers or authors (I originally approved a comment which I have since retracted because I decided it was too far in that direction).

What I want to call into a question is our priorities. We have all of these introductions which claim that what we’re trying to do is solve these real world problems, but if we’re not addressing the fundamental question of why is nobody using our stuff then maybe we need to admit that that’s not actually what we’re trying to do.

As a pure mathematician I’m perfectly on board with a program of pure research, but as a software developer I am much more interested in the question of what a research program that actually cared about these questions would look like.

This entry was posted in programming on by .

Refining L* search

Attention conservation notice: If you don’t care about fine grained algorithmic improvements to inference algorithms for regular languages, you don’t care about this post.

Also, apologies, I know mathjax isn’t currently not working so you will see some raw LaTeX. If you turn on unsafe scripts (i.e. “not loaded over https”) it will work, but due to some technical hiccups this is currently difficult for me to fix. I’m on the case.

L* search (Angluin, D. Learning regular sets from queries and counterexamples. Inf. Comput. 75, 2 (1987), 87–106) is a really neat algorithm that lets you infer a deterministic finite automaton (DFA) for an unknown language.

I’ve long been of the suspicion that it would be useful in Hypothesis. Unfortunately to date reality has disagreed with me. The key problem is that L* search is really expensive and any attempt to use it would almost immediately blow Hypothesis’s normal budget for number of tests run.

Starting from a trivial automaton that accepts no strings, it proceeds in two stages:

  1. It uses membership queries for the language to complete the DFA, finding transitions from states that provably do not go to any of the existing known states, and stopping only when it has a consistent transition table for every known state and every member of the alphabet.
  2. It then asks for a counter-example to its conjectured DFA. If there is none, we have correctly inferred the language. If there is one, it is used to improve our ability to distinguish states, enlarging the number of states by at least one.

Because L* search is maintaining a minimal DFA, where the only states that are distinguished are ones that it can actually prove are inequivalent, this guarantees that the algorithm terminates with a correct DFA in time at most O(n), where n is the size of the minimal DFA of the language.

It does this by maintaining a set of experiments. These are strings that can be used to distinguish states – if by following the experiment from two states you end up in an accepting state in one and a non-accepting state in another, the two states must be inequivalent.

The rest of this post assumes you understand L* search. If you don’t, I recommend reading my prior more detailed explanation of it.

I use an improvement proposed in “Rivest, R. L., and Schapire, R. E. Inference of finite automata using homing sequences.
Inf. Comput. 103, 2 (1993), 299–347” that lets you take the cost of the counter-example step down to logarithmic in the length of the counter-example and ensures that each counter-example only adds one experiment to the list.

The problem is that the completion step is still extremely expensive: It requires evaluating \(|A||S||E|\) membership queries, where \(A\) is the alphabet, \(S\) are the states we currently have, and \(E\) is the number of experiments we currently have (you can remove some of these because some state transitions are provable because of the representation of the states as strings – \(s\) always transitions to \(sa\) under \(a\) if both strings are in the state set, but you can’t remove more than \(|S| – 1\) evaluations this way).

One of the problems I have with using this in Hypothesis is that \(|A| = 256\), which is both intrinsically fairly large and also actually larger than Hypothesis’s typical testing budget. Thus there’s no way to fit even one L* pass into a normal test run!

So, can we get that \(|A|\) factor down?

It turns out, yes, at the cost of potentially needing more counter-examples (which is a cost I’m entirely willing to pay).

The observation follows one from “Owens, S., Reppy, J. H., and Turon, A. Regular-expression derivatives re-examined. J. Funct. Program. 19, 2 (2009), 173–190.”, which is that when you have a large alphabet you will typically find at a given state in a DFA many characters are equivalent to each-other (that is, they result in a transition to the same state). For example, when \(|A| \geq |S|\), even though there are \(|A|\) possible valid characters at a given state there are only \(|S|\) up to equivalence.

If we take advantage of this, we could reduce the number of queries substantially, and it turns out that it is easy to do.

The idea is simple: When we create a state, we initialise a partition refinement data structure along-side it. Initially this is set to consider all members of the alphabet to be in the same equivalence class.

When expanding our table, we now assume that any characters in the same partition are equivalent, and we only actually perform membership queries with, say, the smallest element of the partition. Thus instead of the \(|A||S|\) term we only perform \(\sum |P_i|\) term, where \(P_i\) is the set of partitions associated with state \(i\). In particular this is \(O(|S| \min(|A|, |S|)\).

Then when we find an experiment \(e\) and a transition \(s \to t\) via \(c\), instead of immediately adding to our experiment set we check whether we just have too coarse a partition. We do this by checking if \(sce \in \mathcal{L} = sc’e \in \mathcal{L}\), where \(c’\) is the “canonical” element of the partition that we selected when determining that this was the correct transition (if \(c = c’\) we can skip this bit). If this is not the case then we use it to refine the partition by splitting it into two based on whether \(sde \in \mathcal{L}\). We then add the experiment to the list and rebuild the DFA. If this does not result in expanding our list of states (this can only happen if we did end up refining the partition), we then remove the experiment from the list! This can happen because we can end up with partitions that our existing experiments were sufficient to distinguish but where we never checked, and by not adding redundant experiments in that case we bound the number of experiments to the number of states.

This is correct for the same reasons that L* is correct – at any given stage, either our partitions are correct, or we refine them. However it will take an additional \(\sum (|P_i| – 1)\) counter-example steps to converge (where the \(P_i -\) here are the partitions from our final state, i.e. the true partitions).

Counter-examples are somewhat costly – regardless of how much it costs to find (which may be “free” in some sense), a counter example \(u\) takes  \(O(\log(|u|)\) membership queries to process. If we assume all counter-examples are minimized and so \(|u| \leq n\) where \(n\) is the number of states in the true minimal DFA, then that’s \(O(\log(n)\)

Suppose the true minimal DFA has \(n\) states and that \(m = \sum |P_i|\) in it. Then at each stage we either add one or more states, or we increase the number of partitions, thus we perform at most \(n + \sum (|P_i – 1) = n + m – n = m\) counter-example queries.

Each completion step takes \(O(m |E|\) to run, and we only added experiments in the case where we increased the number of states, so each step runs in \(O(m n)\), and the entire thing runs in at most \(O((mn + \log(n)) m)\). \(m \geq n\), so we can discard the \(\log(m)\) term and the whole thing runs in \(O(m^2 n)\).

In contrast, classic L* the steps take \(O(|A| n^2)\) and we run in at most \(n\) steps, so the complexity is \(O(|A| n^3\). So this is a complexity improvement if and only if \(n^2 |A| \geq m^2\) asymptotically, or in other words if \(\frac{m}{n} \geq \sqrt{|A|}\) (technically this is an invalid way to reason about asymptotics, but those were pretty tight estimates).

\(\frac{m}{n}\) is the average number of distinct equivalent classes over states. For my use case I expect this to be very low – typically the number of equivalence classes will be one!

For the more general use case this isn’t obviously a win – if \(|A|\) is small it almost certainly isn’t. e.g. when \(|A| = 2\) this condition is equivalent to saying that 60% of states don’t depend on the character to determine their transition at all.

I haven’t attempted to reason about it much, but one possible fix would be an adaptive algorithm that tracks the average on the states and partitions found so far, and as soon as it exceeds \(\sqrt{|A|}\) switches over to normal L*. My suspicion is that if you measure cost purely in terms of membership queries, this adaptive version is a substantial saving on almost any language model.

This entry was posted in Automata Theory on by .

Novelty Requires Explanation

Epistemic status: Reasonably confident, but I should probably try to back this up with numbers about how often elementary results actually do get missed.

Attention conservation notice: More than a little rambling.

Fairly regularly you see news articles about how some long-standing problem that has stumped experts for years has been solved, usually with some nice simple solution.

This might be a proof of some mathematical result, a translation of the Voynich algorithm, a theory of everything. Those are the main ones I see, but I’m sure there are many others that I don’t see.

These are almost always wrong, and I don’t even bother reading them any more.

The reason is this: If something is both novel and interesting, it requires an explanation: Why has nobody thought of this before?

Typically, these crackpot solutions (where they’re not entirely nonsensical) are so elementary that someone would surely have discovered it before now.

Even for non-crackpot ideas, I think this question is worth asking when you discover new. As well as being a useful validity check for finding errors and problems, if there is a good answer then it can often be enlightening about the problem space.

Potentially, it could also be used as a heuristic in the other direction: If you want to discover something new, look in places where you would have a good answer to this question.

There are a couple ways this can play out, but most of them boil down to numbers: If a lot of people have been working for a problem for a long time during which they could have discovered your solution, they probably would have. As nice as it would be to believe that we were uniquely clever compared to everyone else, that is rarely the case.

So an explanation basically needs to show some combination of:

  1. Why not many people were working on the problem
  2. Why the time period during which they could have discovered your technique in is small

The first is often a bad sign! If not many people work on the problem, it might not be very interesting.

This could also be a case of bad incentives. For example, I’ve discovered a bunch of new things about test case reduction, and I’m pretty sure most of that is because not many people work on test case reduction: It’s a useful tool (and I think the problem is interesting!), but it’s a very niche problem at a weird intersection of practical needs and academic research where neither side has much of a good incentive to work on it.

As a result, I wouldn’t be surprised an appreciable percentage of person-hours ever spent on test-case reduction were done by me! Probably not 10%, but maybe somewhere in the region of 1-5%. This makes it not very surprising for me to have discovered new things about it even though the end result is useful.

More often I find that I’m just interested in weird things that nobody else cares about, which can be quite frustrating and it can make it difficult to get other people excited about your novel thing. If that’s the case, you’re probably going to have a harder time marketing your novel idea than you are discovering it.

The more interesting category of problem is the second: Why have the people who are already working on this area not previously thought of this?

The easiest way out of this is simply incremental progress: If you’re building on some recent discovery then there just hasn’t been that much time for them to discover it, so you’ve got a reasonable chance of being the first to discover it!

Another way is by using knowledge that they were unlikely to have – for example, by applying techniques from another discipline with little overlap in practice with the one the problem is form. Academia is often surprisingly siloed (but if the problem is big enough and the cross-disciplinary material is elementary enough, this probably isn’t sufficient. It’s not that siloed).

An example of this seems to be Thomas Royen’s  recentish proof of the Gaussian Correlation Inequality (disclaimer: I don’t actually understand this work). He applied some fairly hairy technical results that few people working on the problem were likely to be familiar with, and as a result was able to solve something people had been working on for more than 50 years.

A third category of solution is to argue that everyone else had a good chance of giving up before finding your solution: e.g. If the solution is very complicated or involved, it has a much higher chance of being novel (and also a much higher chance of being wrong of course)! Another way this can happen is the approach looks discouraging in some way.

Sometimes all of these combine. For example, I think the core design of Hypothesis is a very simple, elegant, idea, that just doesn’t seem to have been implemented before (I’ve had a few people dismissively tell me they’ve encountered the concept before, but they never could point me to a working implementation).

I think there are a couple reasons for this:

  1. Property-based testing just doesn’t have that many people working on it. The number might top 100, but I’d be surprised if if topped 200 (Other random testing approaches could benefit from this approach, but not nearly as much. Property-based testing implements lots of tiny generators and thus feels many of the problems more acutely).
  2. Depending on how you count, there’s maybe been 20 years during which this design could have been invented.
  3. Simple attempts at this approach work very badly indeed (In a forthcoming paper I have a hilarious experiment in which I show that something only slightly simpler than what we do completely and totally fails to work on the simplest possible benchmark).

So there aren’t that many people working on this, they haven’t had that much time to work on it, and if they’d tried it it probably would have looked extremely discouraging.

In contrast I have spent a surprising amount of time on it (largely because I wanted to and didn’t care about money or academic publishing incentives), and I came at it the long way around so I was starting from a system I knew worked, so it’s not that surprising that I was able to find it when nobody else had (and does not require any “I’m so clever” explanations).

In general there is of course no reason that there has to be a good explanation of why something hasn’t been discovered before. There’s no hard cut off line where something goes from “logically must have been discovered” to “it’s completely plausible that you’re the first” (discontinuous functions don’t exist!), it’s just a matter of probabilities. Maybe it’s very likely that somebody hasn’t discovered it before, but maybe you just got lucky. There are enough novel things out there that somebody is going to get lucky on a fairly regular basis, it’s probably just best not to count on it being you.

PS. I think it very unlikely this point is novel, and I probably even explicitly got it from somewhere else and forgot where. Not everything has to be novel to be worthwhile.

This entry was posted in rambling nonsense on by .

Times to exhaust finite distributions when sampling with replacement

Due to reasons a maths problem has been utterly hijacking my brain since Friday. I think I now have it solved to my satisfaction. I’m morally certain this is nothing new and has been solved a thousand times before, but I couldn’t  figure out the right keywords to Google and each time I posed it clearly enough to ask for help I realised a new avenue to explore and so ended up solving it before I had to.

The problem boils down to this:

Suppose I have some random variable \(X\), taking values \(1, \ldots, n\) with probabilities \(P(X = i) = p_i > 0\).

Update: steve mcc points out that the uniform version of this is normally called the Coupon Collector’s Problem. I have definitely heard that before but could not for the life of me remember it. Given that term it’s easy to find a bunch of prior art on this. I’ve yet to do a review. I still found the specific details interesting and it was fun to work on, but it’s up to you if you want to read this given that…

Now suppose I have infinitely many independent \(X_i\) each with the same distribution as \(X\).

How long do we expect it to take until we’ve seen every value \(1, \ldots, n\)? i.e. If \(T\) is a random variable whose value is the first \(i\) such that \(X_j = k\) for some \(j \leq i\) and each \(1 \leq k \leq n\), what is \(E(T)\)?

I don’t have an exact calculation for \(E(T)\) and my experiments on special cases suggest that there is no nice formulae. I nevertheless consider the problem solved to my satisfaction by the following three results:

  1. \(E(T) \geq n H(n)\), where \(H(n) = 1 + \frac{1}{2} + \ldots + \frac{1}{n}\) is the nth harmonic number, and this bound is achieved when \(X\) is the uniform distribution.
  2. \(E(T) \leq \sum \frac{1}{p_i}\), and this bound is achievable in the limit (and possibly exactly but I don’t think so) by some distributions I will describe below.
  3. \(E(T) \geq \frac{n^2 – 1}{4 E(X)}\). This bound is not especially tight but is good enough for asymptotics.

In particular, if \(X\) comes from a family of distributions in which \(n\) is allowed to vary and \(E(X)\) remains bounded, \(E(T)\) is asymptotically at least quadratic.

Proofs:

Suppose \(f(n)\) is a lower bound on \(E(T)\). Suppose we draw \(i\) for our first draw. Then we may reduce exhausting \(\{1, \ldots, n\}\) to exhausting the other values, which takes at most \(f(n – 1)\) draws. But each draw takes in expectation \(\frac{1}{1 – p_i}\) draws of \(X\), as the time to draw a value other than \(i\) is a geometric distribution with parameter \(1 – p_i\).

Therefore \(E(T) \geq 1 + f(n – 1)\sum \frac{p_i}{1 – p_i}\). This sum is minimized when the \(p_i\) all have the same value, which must be \(\frac{1}{n}\). So by substituting in, we have \(E(T) \geq 1 + \frac{n}{n – 1} f(n – 1)\), with equality when \(T\) is uniform. Thus \(f(n) = 1 + \frac{n}{n – 1} f(n – 1)\). \(n H(n) = n (H(n – 1) + \frac{1}{n}) = 1 + n H(n – 1) = 1 + \frac{n}{n – 1} (n – 1) H(n – 1)\), and thus \(n H(n)\) is the solution to this recurrence relationship.

Thus \(E(T) \geq n H(n)\) and this bound is tight when \(X\) is uniform.

To see the upper bound, consider the following process: Let \(S_0 = 0\) and let \(S_k\) be the first \(i > S_{k – 1}\) with \(X_i = k\). Then certainly \(S_n \geq T\) as by that point we must have seen each value, and so \(E(T) \leq E(S_n)\). But \(S_{k + 1} – S(k)\) is a geometric distribution with parameter \(p_k\), so \(E(S_{k + 1} – S_k) = \frac{1}{p_k}\). Thus by summing up the terms we have \(E(S_n) = \sum \frac{1}{p_k}\) as desired.

To see that this is tight is a bit of a mess and I’m going to omit the calculations. The family of distributions that demonstrates it are as follows: Let \(p_n = \epsilon^{n – 1}\) and \(p_i = \epsilon^{i – 1}(1 – \epsilon)\)  for \(i < n\) (The intuition here is that each \(i\) captures \(1 – \epsilon\) worth of the remaining probability). Then both the bound and \(E(T)\) end up as \(\epsilon^{1 – n} + o(\epsilon^{1 – n} )\), so as \(\epsilon \to 0\) their ratio converges to \(1\).

The final bound is the most interesting one, and I think captures a lot of the intuition that \(E(T)\) is maximized by being “more uniform”. Because the value of \(E(T)\) is invariant under permutations of \(\{1, \ldots, n\}\), we can reorder the values such that \(p_{i + 1} \leq p_i\).

For distributions satisfying this constraint, then \(E(X)\) is strictly maximized by the uniform distribution (that is, the maximum value is obtained on the uniform distribution and any other distribution attains a strictly smaller value).

To prove the bound, first observe that if \(p = P(X \geq i)\) then \(E(T) \geq \frac{n – i}{p}\) (in fact \(E(T) \geq \frac{(n – i) H(n)}{p}\), but the calculations become messier with that factor in so I chose to drop the \(H(n)\) term).

The reason is that to finish we must draw all \(n – i\) values which are greater than or equal to \(i\), and if we only do that with probability \(p\) then it takes an expected number of times equal to at least \(\frac{1}{p}\) to draw each, because \(1 – p\) worth of draws end up drawing \(< i\).

But we can then use the result that \(P(X \geq i) \leq \frac{E(X)}{i}\) (because \(X \geq 0\). Thus by combining these we have \(E(T) \geq \frac{i (n – i)}{E(X})\).

\(i\) is arbitrary, so we can choose it to maximize this expression. If \(n\) is even it is maximised by \(i = \frac{n}{2}\), if \(n = 2k + 1\) it is maximized by (i = k\). Combining these cases we get a maximized value of at least \(\frac{n^2 – 1}{4}\). Plugging these in to our original bound, we get the desired result.

You can get more bounds by considering the same argument with \(E(X^k)\) for arbitrary \(k\). The general bound is actually that \(E(T) \geq n^{k + 1} \frac{(1 + k^{-1})^k}{(k + 1) E(X^k)} + O(n^k)\), but the details are messy so I’ve omitted it here. These bounds are still interesting, as when \(E(X^k)\) is small this indicates that \(X\) is very tightly concentrated around small values of \(i\), which causes discovering all of the values to take a very long time. As we saw with the example for demonstrating that the \(\sum \frac{1}{p_i}\) bound was tight, it can even be exponential in \(n\)!

You can also get similar bounds for exponential variates. e.g. \(E(T) \geq \frac{e^{s(n – 1)}{E(e^{sX})\), so if \(s\) is small enough that \(E(e^{sX})\) is bounded as \(n\) varies then we will see this sort of exponential growth (you can get slightly tighter bounds if \(s \geq \frac{1}{n}\) but it’s probably not worth bothering).

If you found this post interesting, may I highly recommend “Birthday paradox, coupon collectors, caching algorithms and self-organizing search.“, which I found only after steve mcc put me on to the right search terms. It doesn’t contain these bounds, so is in that sense orthogonal, but it has some really cool general techniques.


So what’s this all about?

I was interested in the question of how well random testing can do at exploring its space. In this model each \(X = i\) represents the random tester finding a distinct example up to equivalence.

Typical experience is that random testing “plateaus” – its rate of discoveries goes down over time – and that seems to be more or less backed by this model: \(E(T)\) is the amount of time it takes to explore it feasible search space, and this grows superlinearly in \(n\).

In the case where you manage to be fully uniform over inequivalent states, this isn’t too bad – a logarithmic slow down is pretty manageable – but if there is any sort of concentration around some common point (which will be \(i = 1\) after reordering), it is likely that finding new examples becomes much more expensive over time as each new example is discovered.

In order to say how applicable this is we’d need to do some sort of studying of what random testers actually do, and we’d need some sort of equivalence oracle which is basically impossible, so for the moment I don’t have any real way of actually applying this result to the problem that motivated it, but it is at least suggestive.

This entry was posted in Numbers are hard on by .

Can you write correct software in a statically typed language?

Epistemic status: Mostly trolling. Mostly.

Attention Conservation Notice: Dispatches from the type system wars.


I’ve been writing Rust recently. There have been some frustrating aspects to it (mostly syntax), but one of the things I’m really enjoying is having the productivity benefits of a good type system again.

The fast feedback loop from the type checker really improves my ability to develop software – it’s almost impossible to write code without making lots of trivial errors, and having an automated system that catches those errors and tells me about them in a detailed, localised, way, really smooths the entire development process, and the focus on modularity and ability to make certain behaviours actually impossible rather than merely discouraged really cuts down on a lot of worries that would otherwise slow me down. It’s something I’ve missed a lot over the last fiveish years of mostly writing in dynamic languages.

But I have a concern…

One of the things I’ve gotten used to in my time in dynamic language land is the correctness benefits. How much of that am I giving up for this productivity boost?

I’m not saying you can’t write correct software in statically typed languages. Of course you can. You can write correct software in anything if you’re prepared to try hard enough (except maybe Malbolge).

Of course you can write bad software in dynamic languages. It’s never going to be hard to write bad software.

But it sure seems harder to write correct software in a statically typed language, doesn’t it?

Writing correct software requires that you get a lot of the project workflow right. All other things being equal, static checks help, but they’re mostly dominated by other far more significant factors: Code review, run time assertions, a focus on user experience, processes, check-lists and, obviously, testing, but hopefully that you should be testing your code is uncontroversial at this point. Not even the static typing advocates really believe they can write correct code with less than 100% coverage, do they?

The problem is that even though we like to pretend that “If It Compiles It Works!” is just a tasteless joke and that nobody actually believes it, people don’t seem to act like they entirely disbelieve it either, and as a result they come to rely on the compiler for things the compiler never really promised to give them.

I think it’s that the type system makes you sloppy. It sits there telling you “Hey I checked your code. LGTM” and even though you know it’s not a substitute for a real code review, it’s still your most common piece of feedback because it happens on every change you make to your code, and over time you start to believe it.

It also tells you that you can fix it later. One of the great productivity benefits of static typing is how that fast feedback loop helps you change your code. I spent almost an hour doing major surgery on this project yesterday with the code in a completely non-compiling state. It would have been hard doing that in a dynamic language, because the feedback when your project isn’t working at all is just too non-local and coarse grained: If nothing works then of course your tests are all going to fail, and they way they fail isn’t going to be that informative.

So given a choice between shipping and putting in place the measures that are required for correctness, static typing says “Oh, don’t worry, it’s probably good enough, and if it’s not you can fix it later”.

Alas, this is a lie. Code that hasn’t been reviewed, tested, and put in front of a user before shipping may be more likely to be correct in a statically typed language, but that still isn’t likely, and static types are a lovely adjunct to but inadequate substitute for dynamic checks (possibly unless you’re writing in something like Idris, which I do still want to try seriously at some point).

You can do all of these things in a statically typed language, and I’m certain responsible developers do, but they all somehow seem less urgent, don’t they? 95% coverage is probably good enough, and it compiles, so it’s probably fine, and the compiler did a pretty good job of checking the obvious so a detailed code review doesn’t matter that much.

And there’s always that subconscious feeling that if these things do turn out to matter you can fix it later.

Unfortunately, that too is a lie. Statically typed code is easy to refactor, but the thing you’re going to need to change later isn’t the code, it’s the process, and the process is made of people. People are surprisingly resistant to change, as we find out every time Twitter shifts something a pixel to the right and eleven billion people complain that Twitter just don’t understand their user base and how dare they move that thing.

You can fix process later, but the will to do that needs to be there, and there’s always the sense that the status quo is probably good enough, especially with the compiler sitting there telling you its sweet lies about how everything is fine.

In contrast, a dynamic language gives you no such illusions. You’re not just operating without a safety net, you’re operating with a Tullock Spike. That fast feedback that the statically typed language gave you about your code, the dynamically typed language will give you about your process: If you care about correctness (and I acknowledge that many people don’t, even among those who chose to write in dynamically typed languages), you feel the pain points in your process fast, and you fix them early while you’re project is still flexible.

And yet, I really enjoy the productivity benefits of static typing, and I’m reluctant to give them up.

I think the thing to do here is probably to just try to carry the correctness lessons of dynamic typing over when writing in statically typed languages. In the same way that learning Haskell can improve your productivity when writing in Python, maybe once you’ve learned Python it’s possible to write correct code in Haskell.

I said I was writing in Rust recently, and that’s true, but it’s actually more complicated than that: It’s a hybrid Ruby/Rust project, so I’ll get to enjoy the benefits (and drawbacks) of both.

This might offer another way to have my cake and eat it too, because it means that I can choose which language to write each part of the code in depending on its function and trade offs. Code that I really need to be correct can go in Ruby, but where I need flexibility and freedom to experiment, I’m still able to write it in a more productive language such as Rust.

Postscript

People seem to be failing to get the joke, so I’m going to explain it. Apologies if this ruins it for you.

Should you conclude from reading this post that it is impossible to write correct code in a statically typed language? No. And if you concluded that you should, frankly, feel bad about your reasoning skills (even disregarding the fact that I said right at the top of the post that I was trolling).

I did not present any empirical evidence. I didn’t even present a balanced non-empirical picture of the landscape: I cherry-picked a bunch of effects, added a bit of hyperbole, backed it all up with condescension and anecdata, and from there concluded that because these factors existed they must dominate the outcome.

You know who else does that? Every single person arguing that static types are required for correctness, (and every single person arguing that dynamic types are better for productivity, but honestly my impression having spent quite a lot of time on both sides of the language fence is that it’s mostly the static typing people who are picking the fights). I just thought it would be interesting to flip the argument on its head for once.

So, if you found my arguments unconvincing, good. But you should also have found the arguments for why static typing helps correctness or hurts productivity equally unconvincing.

Does static typing help correctness? Does dynamic typing help productivity?

Probably not, no. But the converse is probably not any more true.

The empirical studies on the question are really pretty inconclusive. Partly that’s because we’re rubbish at doing empirical software engineering, but maybe it’s because there really isn’t that much of a difference to observe, or because other differences dominate.

Either way, if you’re going to condescendingly claim that your side is obviously correct and the other side is awful, maybe you should wait until you have some actual evidence backing you up.

And maybe you should stop pretending that the things that we know do help correctness are somehow less necessary in statically typed languages.

This entry was posted in programming on by .