Refining L* search

Attention conservation notice: If you don’t care about fine grained algorithmic improvements to inference algorithms for regular languages, you don’t care about this post.

Also, apologies, I know mathjax isn’t currently not working so you will see some raw LaTeX. If you turn on unsafe scripts (i.e. “not loaded over https”) it will work, but due to some technical hiccups this is currently difficult for me to fix. I’m on the case.

L* search (Angluin, D. Learning regular sets from queries and counterexamples. Inf. Comput. 75, 2 (1987), 87–106) is a really neat algorithm that lets you infer a deterministic finite automaton (DFA) for an unknown language.

I’ve long been of the suspicion that it would be useful in Hypothesis. Unfortunately to date reality has disagreed with me. The key problem is that L* search is really expensive and any attempt to use it would almost immediately blow Hypothesis’s normal budget for number of tests run.

Starting from a trivial automaton that accepts no strings, it proceeds in two stages:

  1. It uses membership queries for the language to complete the DFA, finding transitions from states that provably do not go to any of the existing known states, and stopping only when it has a consistent transition table for every known state and every member of the alphabet.
  2. It then asks for a counter-example to its conjectured DFA. If there is none, we have correctly inferred the language. If there is one, it is used to improve our ability to distinguish states, enlarging the number of states by at least one.

Because L* search is maintaining a minimal DFA, where the only states that are distinguished are ones that it can actually prove are inequivalent, this guarantees that the algorithm terminates with a correct DFA in time at most O(n), where n is the size of the minimal DFA of the language.

It does this by maintaining a set of experiments. These are strings that can be used to distinguish states – if by following the experiment from two states you end up in an accepting state in one and a non-accepting state in another, the two states must be inequivalent.

The rest of this post assumes you understand L* search. If you don’t, I recommend reading my prior more detailed explanation of it.

I use an improvement proposed in “Rivest, R. L., and Schapire, R. E. Inference of finite automata using homing sequences.
Inf. Comput. 103, 2 (1993), 299–347” that lets you take the cost of the counter-example step down to logarithmic in the length of the counter-example and ensures that each counter-example only adds one experiment to the list.

The problem is that the completion step is still extremely expensive: It requires evaluating \(|A||S||E|\) membership queries, where \(A\) is the alphabet, \(S\) are the states we currently have, and \(E\) is the number of experiments we currently have (you can remove some of these because some state transitions are provable because of the representation of the states as strings – \(s\) always transitions to \(sa\) under \(a\) if both strings are in the state set, but you can’t remove more than \(|S| – 1\) evaluations this way).

One of the problems I have with using this in Hypothesis is that \(|A| = 256\), which is both intrinsically fairly large and also actually larger than Hypothesis’s typical testing budget. Thus there’s no way to fit even one L* pass into a normal test run!

So, can we get that \(|A|\) factor down?

It turns out, yes, at the cost of potentially needing more counter-examples (which is a cost I’m entirely willing to pay).

The observation follows one from “Owens, S., Reppy, J. H., and Turon, A. Regular-expression derivatives re-examined. J. Funct. Program. 19, 2 (2009), 173–190.”, which is that when you have a large alphabet you will typically find at a given state in a DFA many characters are equivalent to each-other (that is, they result in a transition to the same state). For example, when \(|A| \geq |S|\), even though there are \(|A|\) possible valid characters at a given state there are only \(|S|\) up to equivalence.

If we take advantage of this, we could reduce the number of queries substantially, and it turns out that it is easy to do.

The idea is simple: When we create a state, we initialise a partition refinement data structure along-side it. Initially this is set to consider all members of the alphabet to be in the same equivalence class.

When expanding our table, we now assume that any characters in the same partition are equivalent, and we only actually perform membership queries with, say, the smallest element of the partition. Thus instead of the \(|A||S|\) term we only perform \(\sum |P_i|\) term, where \(P_i\) is the set of partitions associated with state \(i\). In particular this is \(O(|S| \min(|A|, |S|)\).

Then when we find an experiment \(e\) and a transition \(s \to t\) via \(c\), instead of immediately adding to our experiment set we check whether we just have too coarse a partition. We do this by checking if \(sce \in \mathcal{L} = sc’e \in \mathcal{L}\), where \(c’\) is the “canonical” element of the partition that we selected when determining that this was the correct transition (if \(c = c’\) we can skip this bit). If this is not the case then we use it to refine the partition by splitting it into two based on whether \(sde \in \mathcal{L}\). We then add the experiment to the list and rebuild the DFA. If this does not result in expanding our list of states (this can only happen if we did end up refining the partition), we then remove the experiment from the list! This can happen because we can end up with partitions that our existing experiments were sufficient to distinguish but where we never checked, and by not adding redundant experiments in that case we bound the number of experiments to the number of states.

This is correct for the same reasons that L* is correct – at any given stage, either our partitions are correct, or we refine them. However it will take an additional \(\sum (|P_i| – 1)\) counter-example steps to converge (where the \(P_i -\) here are the partitions from our final state, i.e. the true partitions).

Counter-examples are somewhat costly – regardless of how much it costs to find (which may be “free” in some sense), a counter example \(u\) takes  \(O(\log(|u|)\) membership queries to process. If we assume all counter-examples are minimized and so \(|u| \leq n\) where \(n\) is the number of states in the true minimal DFA, then that’s \(O(\log(n)\)

Suppose the true minimal DFA has \(n\) states and that \(m = \sum |P_i|\) in it. Then at each stage we either add one or more states, or we increase the number of partitions, thus we perform at most \(n + \sum (|P_i – 1) = n + m – n = m\) counter-example queries.

Each completion step takes \(O(m |E|\) to run, and we only added experiments in the case where we increased the number of states, so each step runs in \(O(m n)\), and the entire thing runs in at most \(O((mn + \log(n)) m)\). \(m \geq n\), so we can discard the \(\log(m)\) term and the whole thing runs in \(O(m^2 n)\).

In contrast, classic L* the steps take \(O(|A| n^2)\) and we run in at most \(n\) steps, so the complexity is \(O(|A| n^3\). So this is a complexity improvement if and only if \(n^2 |A| \geq m^2\) asymptotically, or in other words if \(\frac{m}{n} \geq \sqrt{|A|}\) (technically this is an invalid way to reason about asymptotics, but those were pretty tight estimates).

\(\frac{m}{n}\) is the average number of distinct equivalent classes over states. For my use case I expect this to be very low – typically the number of equivalence classes will be one!

For the more general use case this isn’t obviously a win – if \(|A|\) is small it almost certainly isn’t. e.g. when \(|A| = 2\) this condition is equivalent to saying that 60% of states don’t depend on the character to determine their transition at all.

I haven’t attempted to reason about it much, but one possible fix would be an adaptive algorithm that tracks the average on the states and partitions found so far, and as soon as it exceeds \(\sqrt{|A|}\) switches over to normal L*. My suspicion is that if you measure cost purely in terms of membership queries, this adaptive version is a substantial saving on almost any language model.

This entry was posted in Automata Theory on by .

Novelty Requires Explanation

Epistemic status: Reasonably confident, but I should probably try to back this up with numbers about how often elementary results actually do get missed.

Attention conservation notice: More than a little rambling.

Fairly regularly you see news articles about how some long-standing problem that has stumped experts for years has been solved, usually with some nice simple solution.

This might be a proof of some mathematical result, a translation of the Voynich algorithm, a theory of everything. Those are the main ones I see, but I’m sure there are many others that I don’t see.

These are almost always wrong, and I don’t even bother reading them any more.

The reason is this: If something is both novel and interesting, it requires an explanation: Why has nobody thought of this before?

Typically, these crackpot solutions (where they’re not entirely nonsensical) are so elementary that someone would surely have discovered it before now.

Even for non-crackpot ideas, I think this question is worth asking when you discover new. As well as being a useful validity check for finding errors and problems, if there is a good answer then it can often be enlightening about the problem space.

Potentially, it could also be used as a heuristic in the other direction: If you want to discover something new, look in places where you would have a good answer to this question.

There are a couple ways this can play out, but most of them boil down to numbers: If a lot of people have been working for a problem for a long time during which they could have discovered your solution, they probably would have. As nice as it would be to believe that we were uniquely clever compared to everyone else, that is rarely the case.

So an explanation basically needs to show some combination of:

  1. Why not many people were working on the problem
  2. Why the time period during which they could have discovered your technique in is small

The first is often a bad sign! If not many people work on the problem, it might not be very interesting.

This could also be a case of bad incentives. For example, I’ve discovered a bunch of new things about test case reduction, and I’m pretty sure most of that is because not many people work on test case reduction: It’s a useful tool (and I think the problem is interesting!), but it’s a very niche problem at a weird intersection of practical needs and academic research where neither side has much of a good incentive to work on it.

As a result, I wouldn’t be surprised an appreciable percentage of person-hours ever spent on test-case reduction were done by me! Probably not 10%, but maybe somewhere in the region of 1-5%. This makes it not very surprising for me to have discovered new things about it even though the end result is useful.

More often I find that I’m just interested in weird things that nobody else cares about, which can be quite frustrating and it can make it difficult to get other people excited about your novel thing. If that’s the case, you’re probably going to have a harder time marketing your novel idea than you are discovering it.

The more interesting category of problem is the second: Why have the people who are already working on this area not previously thought of this?

The easiest way out of this is simply incremental progress: If you’re building on some recent discovery then there just hasn’t been that much time for them to discover it, so you’ve got a reasonable chance of being the first to discover it!

Another way is by using knowledge that they were unlikely to have – for example, by applying techniques from another discipline with little overlap in practice with the one the problem is form. Academia is often surprisingly siloed (but if the problem is big enough and the cross-disciplinary material is elementary enough, this probably isn’t sufficient. It’s not that siloed).

An example of this seems to be Thomas Royen’s  recentish proof of the Gaussian Correlation Inequality (disclaimer: I don’t actually understand this work). He applied some fairly hairy technical results that few people working on the problem were likely to be familiar with, and as a result was able to solve something people had been working on for more than 50 years.

A third category of solution is to argue that everyone else had a good chance of giving up before finding your solution: e.g. If the solution is very complicated or involved, it has a much higher chance of being novel (and also a much higher chance of being wrong of course)! Another way this can happen is the approach looks discouraging in some way.

Sometimes all of these combine. For example, I think the core design of Hypothesis is a very simple, elegant, idea, that just doesn’t seem to have been implemented before (I’ve had a few people dismissively tell me they’ve encountered the concept before, but they never could point me to a working implementation).

I think there are a couple reasons for this:

  1. Property-based testing just doesn’t have that many people working on it. The number might top 100, but I’d be surprised if if topped 200 (Other random testing approaches could benefit from this approach, but not nearly as much. Property-based testing implements lots of tiny generators and thus feels many of the problems more acutely).
  2. Depending on how you count, there’s maybe been 20 years during which this design could have been invented.
  3. Simple attempts at this approach work very badly indeed (In a forthcoming paper I have a hilarious experiment in which I show that something only slightly simpler than what we do completely and totally fails to work on the simplest possible benchmark).

So there aren’t that many people working on this, they haven’t had that much time to work on it, and if they’d tried it it probably would have looked extremely discouraging.

In contrast I have spent a surprising amount of time on it (largely because I wanted to and didn’t care about money or academic publishing incentives), and I came at it the long way around so I was starting from a system I knew worked, so it’s not that surprising that I was able to find it when nobody else had (and does not require any “I’m so clever” explanations).

In general there is of course no reason that there has to be a good explanation of why something hasn’t been discovered before. There’s no hard cut off line where something goes from “logically must have been discovered” to “it’s completely plausible that you’re the first” (discontinuous functions don’t exist!), it’s just a matter of probabilities. Maybe it’s very likely that somebody hasn’t discovered it before, but maybe you just got lucky. There are enough novel things out there that somebody is going to get lucky on a fairly regular basis, it’s probably just best not to count on it being you.

PS. I think it very unlikely this point is novel, and I probably even explicitly got it from somewhere else and forgot where. Not everything has to be novel to be worthwhile.

This entry was posted in rambling nonsense on by .

Times to exhaust finite distributions when sampling with replacement

Due to reasons a maths problem has been utterly hijacking my brain since Friday. I think I now have it solved to my satisfaction. I’m morally certain this is nothing new and has been solved a thousand times before, but I couldn’t  figure out the right keywords to Google and each time I posed it clearly enough to ask for help I realised a new avenue to explore and so ended up solving it before I had to.

The problem boils down to this:

Suppose I have some random variable \(X\), taking values \(1, \ldots, n\) with probabilities \(P(X = i) = p_i > 0\).

Update: steve mcc points out that the uniform version of this is normally called the Coupon Collector’s Problem. I have definitely heard that before but could not for the life of me remember it. Given that term it’s easy to find a bunch of prior art on this. I’ve yet to do a review. I still found the specific details interesting and it was fun to work on, but it’s up to you if you want to read this given that…

Now suppose I have infinitely many independent \(X_i\) each with the same distribution as \(X\).

How long do we expect it to take until we’ve seen every value \(1, \ldots, n\)? i.e. If \(T\) is a random variable whose value is the first \(i\) such that \(X_j = k\) for some \(j \leq i\) and each \(1 \leq k \leq n\), what is \(E(T)\)?

I don’t have an exact calculation for \(E(T)\) and my experiments on special cases suggest that there is no nice formulae. I nevertheless consider the problem solved to my satisfaction by the following three results:

  1. \(E(T) \geq n H(n)\), where \(H(n) = 1 + \frac{1}{2} + \ldots + \frac{1}{n}\) is the nth harmonic number, and this bound is achieved when \(X\) is the uniform distribution.
  2. \(E(T) \leq \sum \frac{1}{p_i}\), and this bound is achievable in the limit (and possibly exactly but I don’t think so) by some distributions I will describe below.
  3. \(E(T) \geq \frac{n^2 – 1}{4 E(X)}\). This bound is not especially tight but is good enough for asymptotics.

In particular, if \(X\) comes from a family of distributions in which \(n\) is allowed to vary and \(E(X)\) remains bounded, \(E(T)\) is asymptotically at least quadratic.

Proofs:

Suppose \(f(n)\) is a lower bound on \(E(T)\). Suppose we draw \(i\) for our first draw. Then we may reduce exhausting \(\{1, \ldots, n\}\) to exhausting the other values, which takes at most \(f(n – 1)\) draws. But each draw takes in expectation \(\frac{1}{1 – p_i}\) draws of \(X\), as the time to draw a value other than \(i\) is a geometric distribution with parameter \(1 – p_i\).

Therefore \(E(T) \geq 1 + f(n – 1)\sum \frac{p_i}{1 – p_i}\). This sum is minimized when the \(p_i\) all have the same value, which must be \(\frac{1}{n}\). So by substituting in, we have \(E(T) \geq 1 + \frac{n}{n – 1} f(n – 1)\), with equality when \(T\) is uniform. Thus \(f(n) = 1 + \frac{n}{n – 1} f(n – 1)\). \(n H(n) = n (H(n – 1) + \frac{1}{n}) = 1 + n H(n – 1) = 1 + \frac{n}{n – 1} (n – 1) H(n – 1)\), and thus \(n H(n)\) is the solution to this recurrence relationship.

Thus \(E(T) \geq n H(n)\) and this bound is tight when \(X\) is uniform.

To see the upper bound, consider the following process: Let \(S_0 = 0\) and let \(S_k\) be the first \(i > S_{k – 1}\) with \(X_i = k\). Then certainly \(S_n \geq T\) as by that point we must have seen each value, and so \(E(T) \leq E(S_n)\). But \(S_{k + 1} – S(k)\) is a geometric distribution with parameter \(p_k\), so \(E(S_{k + 1} – S_k) = \frac{1}{p_k}\). Thus by summing up the terms we have \(E(S_n) = \sum \frac{1}{p_k}\) as desired.

To see that this is tight is a bit of a mess and I’m going to omit the calculations. The family of distributions that demonstrates it are as follows: Let \(p_n = \epsilon^{n – 1}\) and \(p_i = \epsilon^{i – 1}(1 – \epsilon)\)  for \(i < n\) (The intuition here is that each \(i\) captures \(1 – \epsilon\) worth of the remaining probability). Then both the bound and \(E(T)\) end up as \(\epsilon^{1 – n} + o(\epsilon^{1 – n} )\), so as \(\epsilon \to 0\) their ratio converges to \(1\).

The final bound is the most interesting one, and I think captures a lot of the intuition that \(E(T)\) is maximized by being “more uniform”. Because the value of \(E(T)\) is invariant under permutations of \(\{1, \ldots, n\}\), we can reorder the values such that \(p_{i + 1} \leq p_i\).

For distributions satisfying this constraint, then \(E(X)\) is strictly maximized by the uniform distribution (that is, the maximum value is obtained on the uniform distribution and any other distribution attains a strictly smaller value).

To prove the bound, first observe that if \(p = P(X \geq i)\) then \(E(T) \geq \frac{n – i}{p}\) (in fact \(E(T) \geq \frac{(n – i) H(n)}{p}\), but the calculations become messier with that factor in so I chose to drop the \(H(n)\) term).

The reason is that to finish we must draw all \(n – i\) values which are greater than or equal to \(i\), and if we only do that with probability \(p\) then it takes an expected number of times equal to at least \(\frac{1}{p}\) to draw each, because \(1 – p\) worth of draws end up drawing \(< i\).

But we can then use the result that \(P(X \geq i) \leq \frac{E(X)}{i}\) (because \(X \geq 0\). Thus by combining these we have \(E(T) \geq \frac{i (n – i)}{E(X})\).

\(i\) is arbitrary, so we can choose it to maximize this expression. If \(n\) is even it is maximised by \(i = \frac{n}{2}\), if \(n = 2k + 1\) it is maximized by (i = k\). Combining these cases we get a maximized value of at least \(\frac{n^2 – 1}{4}\). Plugging these in to our original bound, we get the desired result.

You can get more bounds by considering the same argument with \(E(X^k)\) for arbitrary \(k\). The general bound is actually that \(E(T) \geq n^{k + 1} \frac{(1 + k^{-1})^k}{(k + 1) E(X^k)} + O(n^k)\), but the details are messy so I’ve omitted it here. These bounds are still interesting, as when \(E(X^k)\) is small this indicates that \(X\) is very tightly concentrated around small values of \(i\), which causes discovering all of the values to take a very long time. As we saw with the example for demonstrating that the \(\sum \frac{1}{p_i}\) bound was tight, it can even be exponential in \(n\)!

You can also get similar bounds for exponential variates. e.g. \(E(T) \geq \frac{e^{s(n – 1)}{E(e^{sX})\), so if \(s\) is small enough that \(E(e^{sX})\) is bounded as \(n\) varies then we will see this sort of exponential growth (you can get slightly tighter bounds if \(s \geq \frac{1}{n}\) but it’s probably not worth bothering).

If you found this post interesting, may I highly recommend “Birthday paradox, coupon collectors, caching algorithms and self-organizing search.“, which I found only after steve mcc put me on to the right search terms. It doesn’t contain these bounds, so is in that sense orthogonal, but it has some really cool general techniques.


So what’s this all about?

I was interested in the question of how well random testing can do at exploring its space. In this model each \(X = i\) represents the random tester finding a distinct example up to equivalence.

Typical experience is that random testing “plateaus” – its rate of discoveries goes down over time – and that seems to be more or less backed by this model: \(E(T)\) is the amount of time it takes to explore it feasible search space, and this grows superlinearly in \(n\).

In the case where you manage to be fully uniform over inequivalent states, this isn’t too bad – a logarithmic slow down is pretty manageable – but if there is any sort of concentration around some common point (which will be \(i = 1\) after reordering), it is likely that finding new examples becomes much more expensive over time as each new example is discovered.

In order to say how applicable this is we’d need to do some sort of studying of what random testers actually do, and we’d need some sort of equivalence oracle which is basically impossible, so for the moment I don’t have any real way of actually applying this result to the problem that motivated it, but it is at least suggestive.

This entry was posted in Numbers are hard on by .

Can you write correct software in a statically typed language?

Epistemic status: Mostly trolling.

Attention Conservation Notice: Dispatches from the type system wars.

I’ve been writing Rust recently. There have been some frustrating aspects to it (mostly syntax), but one of the things I’m really enjoying is having the productivity benefits of a good type system again.

The fast feedback loop from the type checker really improves my ability to develop software – it’s almost impossible to write code without making lots of trivial errors, and having an automated system that catches those errors and tells me about them in a detailed, localised, way, really smooths the entire development process, and the focus on modularity and ability to make certain behaviours actually impossible rather than merely discouraged really cuts down on a lot of worries that would otherwise slow me down. It’s something I’ve missed a lot over the last fiveish years of mostly writing in dynamic languages.

But I have a concern…

One of the things I’ve gotten used to in my time in dynamic language land is the correctness benefits. How much of that am I giving up for this productivity boost?

I’m not saying you can’t write correct software in statically typed languages. Of course you can. You can write correct software in anything if you’re prepared to try hard enough (except maybe Malbolge).

Of course you can write bad software in dynamic languages. It’s never going to be hard to write bad software.

But it sure seems harder to write correct software in a statically typed language, doesn’t it?

Writing correct software requires that you get a lot of the project workflow right. All other things being equal, static checks help, but they’re mostly dominated by other far more significant factors: Code review, run time assertions, a focus on user experience, processes, check-lists and, obviously, testing, but hopefully that you should be testing your code is uncontroversial at this point. Not even the static typing advocates really believe they can write correct code with less than 100% coverage, do they?

The problem is that even though we like to pretend that “If It Compiles It Works!” is just a tasteless joke and that nobody actually believes it, people don’t seem to act like they entirely disbelieve it either, and as a result they come to rely on the compiler for things the compiler never really promised to give them.

I think it’s that the type system makes you sloppy. It sits there telling you “Hey I checked your code. LGTM” and even though you know it’s not a substitute for a real code review, it’s still your most common piece of feedback because it happens on every change you make to your code, and over time you start to believe it.

It also tells you that you can fix it later. One of the great productivity benefits of static typing is how that fast feedback loop helps you change your code. I spent almost an hour doing major surgery on this project yesterday with the code in a completely non-compiling state. It would have been hard doing that in a dynamic language, because the feedback when your project isn’t working at all is just too non-local and coarse grained: If nothing works then of course your tests are all going to fail, and they way they fail isn’t going to be that informative.

So given a choice between shipping and putting in place the measures that are required for correctness, static typing says “Oh, don’t worry, it’s probably good enough, and if it’s not you can fix it later”.

Alas, this is a lie. Code that hasn’t been reviewed, tested, and put in front of a user before shipping may be more likely to be correct in a statically typed language, but that still isn’t likely, and static types are a lovely adjunct to but inadequate substitute for dynamic checks (possibly unless you’re writing in something like Idris, which I do still want to try seriously at some point).

You can do all of these things in a statically typed language, and I’m certain responsible developers do, but they all somehow seem less urgent, don’t they? 95% coverage is probably good enough, and it compiles, so it’s probably fine, and the compiler did a pretty good job of checking the obvious so a detailed code review doesn’t matter that much.

And there’s always that subconscious feeling that if these things do turn out to matter you can fix it later.

Unfortunately, that too is a lie. Statically typed code is easy to refactor, but the thing you’re going to need to change later isn’t the code, it’s the process, and the process is made of people. People are surprisingly resistant to change, as we find out every time Twitter shifts something a pixel to the right and eleven billion people complain that Twitter just don’t understand their user base and how dare they move that thing.

You can fix process later, but the will to do that needs to be there, and there’s always the sense that the status quo is probably good enough, especially with the compiler sitting there telling you its sweet lies about how everything is fine.

In contrast, a dynamic language gives you no such illusions. You’re not just operating without a safety net, you’re operating with a Tullock Spike. That fast feedback that the statically typed language gave you about your code, the dynamically typed language will give you about your process: If you care about correctness (and I acknowledge that many people don’t, even among those who chose to write in dynamically typed languages), you feel the pain points in your process fast, and you fix them early while you’re project is still flexible.

And yet, I really enjoy the productivity benefits of static typing, and I’m reluctant to give them up.

I think the thing to do here is probably to just try to carry the correctness lessons of dynamic typing over when writing in statically typed languages. In the same way that learning Haskell can improve your productivity when writing in Python, maybe once you’ve learned Python it’s possible to write correct code in Haskell.

I said I was writing in Rust recently, and that’s true, but it’s actually more complicated than that: It’s a hybrid Ruby/Rust project, so I’ll get to enjoy the benefits (and drawbacks) of both.

This might offer another way to have my cake and eat it too, because it means that I can choose which language to write each part of the code in depending on its function and trade offs. Code that I really need to be correct can go in Ruby, but where I need flexibility and freedom to experiment, I’m still able to write it in a more productive language such as Rust.

Postscript

People seem to be failing to get the joke, so I’m going to explain it. Apologies if this ruins it for you.

Should you conclude from reading this post that it is impossible to write correct code in a statically typed language? No. And if you concluded that you should, frankly, feel bad about your reasoning skills (even disregarding the fact that I said right at the top of the post that I was trolling).

I did not present any empirical evidence. I didn’t even present a balanced non-empirical picture of the landscape: I cherry-picked a bunch of effects, added a bit of hyperbole, backed it all up with condescension and anecdata, and from there concluded that because these factors existed they must dominate the outcome.

You know who else does that? Every single person arguing that static types are required for correctness, (and every single person arguing that dynamic types are better for productivity, but honestly my impression having spent quite a lot of time on both sides of the language fence is that it’s mostly the static typing people who are picking the fights). I just thought it would be interesting to flip the argument on its head for once.

So, if you found my arguments unconvincing, good. But you should also have found the arguments for why static typing helps correctness or hurts productivity equally unconvincing.

Does static typing help correctness? Does dynamic typing help productivity?

Probably not, no. But the converse is probably not any more true.

The empirical studies on the question are really pretty inconclusive. Partly that’s because we’re rubbish at doing empirical software engineering, but maybe it’s because there really isn’t that much of a difference to observe, or because other differences dominate.

Either way, if you’re going to condescendingly claim that your side is obviously correct and the other side is awful, maybe you should wait until you have some actual evidence backing you up.

And maybe you should stop pretending that the things that we know do help correctness are somehow less necessary in statically typed languages.

This entry was posted in programming on by .

Why does everybody in the galaxy speak English?

Epistemic status: Really silly headcanon. Total retcon. Not to be taken too seriously.

Attention conservation notice: This is probably the nerdiest post I have ever written. It is only of interest if you’re a Stargate nerd like me.

I make no secret of the fact that I love Stargate. I can take or leave Atlantis, and I’m mildly hostile towards Universe, but Stargate SG-1 is one of my favourite series. One of my most read pieces ever (top 10 certainly, maybe top 5) is a piece of Stargate fanfiction about software testing.

But lets be honest, Stargate makes no sense. It’s ridiculous and it knows it.

Not that that stops me trying to make sense of it.

One of the things I have never ever been able to come up with a convincing explanation for (and as far as I can tell neither has anyone else) is why everyone in every galaxy speaks English. On the long list of things that make no sense about Stargate, this is basically top of the list. To the best of my knowledge nobody has ever even tried to explain it in series (though I think it gets lamp-shaded once or twice).

I think I finally have a headcanon that explains it. It’s not even too much of a stretch.

Here are the things we have to explain:

  1. Almost everybody speaks linguistically modern English.
  2. The people on Abydos had to be taught English.
  3. Almost everyone everywhere else spoke it automatically.
  4. Many non-English words are used and have to be translated, especially early on in the series.
  5. There are many languages that most people don’t speak (Goa’uld, Ancient, Russian).
  6. When people who went through the stargate come back to Earth they do not magically acquire fluency in any other Earth language.
  7. The written language is typically not English
  8. People far away from stargates still speak English (I’m reading the Stargate Atlantis Legacy series at the moment, which is actually pretty decent, but at some point they posit that the stargate is doing translation. This is manifestly not a sufficient explanation)
  9. When people come to earth, they are still speaking English, even when talking with people who have not been through the stargate.

This rules out all sorts of explanations. Here are some non-explanations:

  1. “English is actually the shared language of the human race”. No. Just no. No I don’t care about your clever theory about how Merlin did it. Setting aside all of my aesthetic objections to this theory, the English they speak is much too linguistically modern given the isolation of earth.
  2. “The stargate is translating” fails to explain both how people communicate when on a ship far away from a stargate, and also how people speak non-english (e.g. Goa’uld) around it.
  3. “Translator microbe plague”, again fails to explain how people speak non-English languages without being understood and why people who have been through the stargate .
  4. “Daniel taught them all the universal language based on ancient Egyptian that everyone speaks”. Implausible given timeline, doesn’t explain people in Pegasus who are very unlikely to speak the same universal language even in the unlikely event that they did before they became isolated populations. Also doesn’t explain how alien visitors to earth speak English to people who have no reason to know about the stargate. Also language acquisition is hard, especially for people who aren’t used to it.

The core problem is that what we have is non-universal translation. If everyone understood everyone all the time, sure translator microbe plague, whatever, but we have to explain not only why people understand each other most of the time but sometimes they speak a completely different language and aren’t understood.

At this point the sensible thing to do would be to shrug and say “Eh it’s a TV series. It doesn’t have to make logical sense” but no I demand an explanation.

And finally I have one.

In this explanation I posit two things:

  1. The fact that everybody “speaks English” is a translation convention for the sake of TV. Everyone is speaking a common language, but it is often not English and it would be too tedious for viewers to subtitle everything so this just gets rendered as English in some cases where they’re actually speaking some other shared language. So although we must still explain why they share a language, we need not explain why it is English.
  2. The stargate can rewrite the genetics of people who travel through it on the fly. Given the other things we see ancient tech do, and given that it is canon that the stargate does dematerialise and rematerialise you rather than just physically sending you through, I do not feel this is too much of a reach.

Given these two assumptions, we can explain everything.

We know that the ancients can encode memory in genetics. After all, they created the Goa’uld (I acknowledge that this is technically not canon, but it makes so much sense of everything that I just consider it canon now, sorry).

We also know that the Stargate is a horrible bodge of a project subject to massive feature creep and poor quality control at the whims of its committee, so what’s one more feature? (same)

So, at some point, the committee had a problem: It’s really annoying to travel somewhere and not speak the language. To solve this, they had the following bright idea: Why not have the stargate preload everyone who comes through it with a primer pack for the local language? What could go wrong? Stargates are perfectly reliable technology after all, and it sure would be convenient.

This feature was made with the following cultural assumptions:

  1. Everyone who matters already speaks Ancient
  2. Planets are long-term homogenous stable civilizations, so only speak one language. Linguistic isolation is only possible between different worlds between which, prior to the invention of the stargate, communication and travel between was much slower.

As a result, every Stargate has a language pack associated with it. All (well, almost all), incoming travellers  to that gate automatically acquire that language pack on arrival.

These language packs obey the following rules:

  1. They will never teach you Ancient, because you either do or should know it already.
  2. Being loaded with a language you already know is confusing, so if you seem to already know it they won’t give you the pack. Also if you have already been through a stargate (in either direction) that stargate will avoid giving you a new language pack, for the same reason.
  3. If enough people going out through a stargate don’t speak its currently configured language and do speak some other common language, the stargate will build a language pack from modelling them and eventually sets its current configuration to that pack.

I believe this explains everything.

The overwhelming majority of stargates in the milky way galaxy are set to the common Jaffa dialect, as they are the primary users of the stargate network, and this system has a rich get richer phenomenon: If people learn Jaffa automatically, it becomes a lingua franca and people will tend to learn it even if they don’t go through the stargate. Thus when they first enter a stargate they already speak Jaffa and do not trigger the check for whether an outgoing traveller does not speak the local language.

This knowledge is genetically encoded, so even if the local population are not stargate users (most aren’t in the Milky Way), the Jaffa language becomes essentially heritable. Therefore almost everyone in the Milky Way is in fact speaking Jaffa, including the people from Earth who come through the stargate. There may be local dialect differences and loan words, but the language people are speaking is overwhelmingly a Jaffa dialect, and regular infusions from outside keep it fairly stable as such even when there are no outgoing travellers to reset the local stargate. Unfortunately Jaffa has no written form (all official Jaffa documents where such exist are written in Goa’uld), so the stargate doesn’t teach anyone that.

In Pegasus people are very regularly travelling between worlds using the stargate network and have been doing so for a very long time. This has resulted in a shared Pegasus trade dialect that serves much the same role as Jaffa. There is a written form, but most significant local civilizations also speak and write at least one local language and most of their documents are written in that.

The earth stargate on the other hand has primarily been used by the USA military and its civilian contractors, and on the first time out none of them speak Jaffa, so it is configured to prime you with English. Non-native English speakers do not get any extra grasp of English from this because they have inevitably been through the outgoing stargate. Thus, anyone who comes to Earth through its stargate automatically speaks English. Even once the stargate program becomes more international, the overwhelming majority of people leaving through it speak English, so no single other language manages to unseat it.

The Abydonians do not speak Jaffa because Ra has kept them a deliberately isolated and subservient population (and Abydos was not configured to Jaffa at the time they got there).

The initial confusion over some words (Chapa’i) is because the concept of a stargate was not yet routine enough to have become embedded in the core English language that the stargate network picked up. Now people are much more familiar with stargates and stargates are much more familiar with English, so the problem doesn’t crop up so much.

One thing this doesn’t explain is why we don’t see people deliberately switching languages more – stargate teams could speak in English when on other planets and not be understood, or in Jaffa when on earth similarly. I’m going to cop out and explain this as it happening occasionally but we mostly don’t see it on screen, and where we do it’s just covered up by the translation convention.

I’m sure there are some edge cases in the series that this doesn’t cover. There’s a lot of it, and I don’t remember every part, but nevertheless I think this so much more sense than everyone speaking English that I’m happy to provide some elaborate explanation for special cases that this doesn’t cover. Also note that if someone comes to Earth the long way, you can just round trip them through the Alpha Site (which of course is also set to English) and they get the English language pack.

So, there you have it. This is why everyone speaks English. It’s not just bad world building for the sake of story telling convenience, it’s because A Stargate Did It. Now we can all rest easy at night.

This entry was posted in Stargate on by .