Why does everybody in the galaxy speak English?

Epistemic status: Really silly headcanon. Total retcon. Not to be taken too seriously.

Attention conservation notice: This is probably the nerdiest post I have ever written. It is only of interest if you’re a Stargate nerd like me.

I make no secret of the fact that I love Stargate. I can take or leave Atlantis, and I’m mildly hostile towards Universe, but Stargate SG-1 is one of my favourite series. One of my most read pieces ever (top 10 certainly, maybe top 5) is a piece of Stargate fanfiction about software testing.

But lets be honest, Stargate makes no sense. It’s ridiculous and it knows it.

Not that that stops me trying to make sense of it.

One of the things I have never ever been able to come up with a convincing explanation for (and as far as I can tell neither has anyone else) is why everyone in every galaxy speaks English. On the long list of things that make no sense about Stargate, this is basically top of the list. To the best of my knowledge nobody has ever even tried to explain it in series (though I think it gets lamp-shaded once or twice).

I think I finally have a headcanon that explains it. It’s not even too much of a stretch.

Here are the things we have to explain:

  1. Almost everybody speaks linguistically modern English.
  2. The people on Abydos had to be taught English.
  3. Almost everyone everywhere else spoke it automatically.
  4. Many non-English words are used and have to be translated, especially early on in the series.
  5. There are many languages that most people don’t speak (Goa’uld, Ancient, Russian).
  6. When people who went through the stargate come back to Earth they do not magically acquire fluency in any other Earth language.
  7. The written language is typically not English
  8. People far away from stargates still speak English (I’m reading the Stargate Atlantis Legacy series at the moment, which is actually pretty decent, but at some point they posit that the stargate is doing translation. This is manifestly not a sufficient explanation)
  9. When people come to earth, they are still speaking English, even when talking with people who have not been through the stargate.

This rules out all sorts of explanations. Here are some non-explanations:

  1. “English is actually the shared language of the human race”. No. Just no. No I don’t care about your clever theory about how Merlin did it. Setting aside all of my aesthetic objections to this theory, the English they speak is much too linguistically modern given the isolation of earth.
  2. “The stargate is translating” fails to explain both how people communicate when on a ship far away from a stargate, and also how people speak non-english (e.g. Goa’uld) around it.
  3. “Translator microbe plague”, again fails to explain how people speak non-English languages without being understood and why people who have been through the stargate .
  4. “Daniel taught them all the universal language based on ancient Egyptian that everyone speaks”. Implausible given timeline, doesn’t explain people in Pegasus who are very unlikely to speak the same universal language even in the unlikely event that they did before they became isolated populations. Also doesn’t explain how alien visitors to earth speak English to people who have no reason to know about the stargate. Also language acquisition is hard, especially for people who aren’t used to it.

The core problem is that what we have is non-universal translation. If everyone understood everyone all the time, sure translator microbe plague, whatever, but we have to explain not only why people understand each other most of the time but sometimes they speak a completely different language and aren’t understood.

At this point the sensible thing to do would be to shrug and say “Eh it’s a TV series. It doesn’t have to make logical sense” but no I demand an explanation.

And finally I have one.

In this explanation I posit two things:

  1. The fact that everybody “speaks English” is a translation convention for the sake of TV. Everyone is speaking a common language, but it is often not English and it would be too tedious for viewers to subtitle everything so this just gets rendered as English in some cases where they’re actually speaking some other shared language. So although we must still explain why they share a language, we need not explain why it is English.
  2. The stargate can rewrite the genetics of people who travel through it on the fly. Given the other things we see ancient tech do, and given that it is canon that the stargate does dematerialise and rematerialise you rather than just physically sending you through, I do not feel this is too much of a reach.

Given these two assumptions, we can explain everything.

We know that the ancients can encode memory in genetics. After all, they created the Goa’uld (I acknowledge that this is technically not canon, but it makes so much sense of everything that I just consider it canon now, sorry).

We also know that the Stargate is a horrible bodge of a project subject to massive feature creep and poor quality control at the whims of its committee, so what’s one more feature? (same)

So, at some point, the committee had a problem: It’s really annoying to travel somewhere and not speak the language. To solve this, they had the following bright idea: Why not have the stargate preload everyone who comes through it with a primer pack for the local language? What could go wrong? Stargates are perfectly reliable technology after all, and it sure would be convenient.

This feature was made with the following cultural assumptions:

  1. Everyone who matters already speaks Ancient
  2. Planets are long-term homogenous stable civilizations, so only speak one language. Linguistic isolation is only possible between different worlds between which, prior to the invention of the stargate, communication and travel between was much slower.

As a result, every Stargate has a language pack associated with it. All (well, almost all), incoming travellers  to that gate automatically acquire that language pack on arrival.

These language packs obey the following rules:

  1. They will never teach you Ancient, because you either do or should know it already.
  2. Being loaded with a language you already know is confusing, so if you seem to already know it they won’t give you the pack. Also if you have already been through a stargate (in either direction) that stargate will avoid giving you a new language pack, for the same reason.
  3. If enough people going out through a stargate don’t speak its currently configured language and do speak some other common language, the stargate will build a language pack from modelling them and eventually sets its current configuration to that pack.

I believe this explains everything.

The overwhelming majority of stargates in the milky way galaxy are set to the common Jaffa dialect, as they are the primary users of the stargate network, and this system has a rich get richer phenomenon: If people learn Jaffa automatically, it becomes a lingua franca and people will tend to learn it even if they don’t go through the stargate. Thus when they first enter a stargate they already speak Jaffa and do not trigger the check for whether an outgoing traveller does not speak the local language.

This knowledge is genetically encoded, so even if the local population are not stargate users (most aren’t in the Milky Way), the Jaffa language becomes essentially heritable. Therefore almost everyone in the Milky Way is in fact speaking Jaffa, including the people from Earth who come through the stargate. There may be local dialect differences and loan words, but the language people are speaking is overwhelmingly a Jaffa dialect, and regular infusions from outside keep it fairly stable as such even when there are no outgoing travellers to reset the local stargate. Unfortunately Jaffa has no written form (all official Jaffa documents where such exist are written in Goa’uld), so the stargate doesn’t teach anyone that.

In Pegasus people are very regularly travelling between worlds using the stargate network and have been doing so for a very long time. This has resulted in a shared Pegasus trade dialect that serves much the same role as Jaffa. There is a written form, but most significant local civilizations also speak and write at least one local language and most of their documents are written in that.

The earth stargate on the other hand has primarily been used by the USA military and its civilian contractors, and on the first time out none of them speak Jaffa, so it is configured to prime you with English. Non-native English speakers do not get any extra grasp of English from this because they have inevitably been through the outgoing stargate. Thus, anyone who comes to Earth through its stargate automatically speaks English. Even once the stargate program becomes more international, the overwhelming majority of people leaving through it speak English, so no single other language manages to unseat it.

The Abydonians do not speak Jaffa because Ra has kept them a deliberately isolated and subservient population (and Abydos was not configured to Jaffa at the time they got there).

The initial confusion over some words (Chapa’i) is because the concept of a stargate was not yet routine enough to have become embedded in the core English language that the stargate network picked up. Now people are much more familiar with stargates and stargates are much more familiar with English, so the problem doesn’t crop up so much.

One thing this doesn’t explain is why we don’t see people deliberately switching languages more – stargate teams could speak in English when on other planets and not be understood, or in Jaffa when on earth similarly. I’m going to cop out and explain this as it happening occasionally but we mostly don’t see it on screen, and where we do it’s just covered up by the translation convention.

I’m sure there are some edge cases in the series that this doesn’t cover. There’s a lot of it, and I don’t remember every part, but nevertheless I think this so much more sense than everyone speaking English that I’m happy to provide some elaborate explanation for special cases that this doesn’t cover. Also note that if someone comes to Earth the long way, you can just round trip them through the Alpha Site (which of course is also set to English) and they get the English language pack.

So, there you have it. This is why everyone speaks English. It’s not just bad world building for the sake of story telling convenience, it’s because A Stargate Did It. Now we can all rest easy at night.

This entry was posted in Stargate on by .

Selection pressures on company behaviour

Epistemic status: Speculative. Almost certainly not an idea that is original to me, but it’s hard to google for this without being swamped in people who don’t actually understand what evolution is.

Evolution is not only real, it’s everywhere. The set of criteria required for an evolutionary selection process is so mild that it’s almost weird to find things where it isn’t a factor (though it may not be the dominant one).

In order to be subject to evolution, roughly all you need is the following:

  1. Heritability – the ability to reproduce and create new things that are in some way “based” on the original and share its traits.
  2. Mutation – a certain degree of random changes to traits as you reproduce.
  3. Selection – reproductive success must be influenced by inherited traits, usually due to some sort of interaction with the environment (e.g. “survival of the fittest”, but also things like traits that make you more able to attract mates)

This absolutely includes evolution in the classical, biological, sense. A while back Dawkins proposed memes (in the original sense that just means “ideas that are subject to selection pressure” rather than pictures of cats with words on them. There’s also a fairly significant evolutionary process in how language changes), but really it’s just about everything. Programming languages evolve, natural languages evolve, genres of literature evolve, etc.

Often there is some mix of evolution and design going on (evolution is not design!) but the evolution is still there, the design is just a major part of the inheritance and mutation processes.

Note that evolve doesn’t mean “improve”, it just means “adapt to selection pressures”, and often that adaptation again isn’t one that we’d particularly think of as a good thing. As Dawkins (again. Sorry. In my defence this was all from before his evolution transformation into his current profile as chief new atheist ranter) pointed out in The Selfish Gene, evolution of our genetics doesn’t select in any way for our well-being, it just does whatever makes the numbers go up (which is why you get things like genes that work great when you have one of them and kill you when you have two. Statistically it’s a great deal!)

One thing I’ve been noticing for a while is that company cultures are absolutely in the camp of things that are subject to evolutionary pressures. A company culture:

  1. Reproduces as people leave it and join or form other companies, or as people outside the company are exposed to ideas about how the company works.
  2. Mutates, as people change their opinions over time through their natural growth and change and through exposure to other companies.
  3. Is selected for, because company culture is a large part of determining company success.

I’ve previously thought of this as a mostly neutral thing with some interesting implications. e.g. it probably tends to mean that over time we evolve companies which are more or less rational economic actors even if they’re not made out of rational economic actors. Not that I necessarily think that’s a good thing, but it’s at least a useful thing for reasoning about company behaviour.

But I realised this morning a rather unfortunate problem: Although having highly profitable companies which treat their employees great and make economically rational decisions is one perfectly viable reproductive strategy in this model (where in that case reproduction tends to come more from people trying to emulate you than from people leaving, although there tends to be enough turnover in even great companies that it’s both), there is another extremely adaptive model.

That model is to basically just be incredibly dysfunctional and haemorrhage employees as a result.

This might not be good for the survival of the company, but it’s not the company that we’re considering here, it’s the population of company cultures. If the company goes under and all its employees go off and form five different companies that look exactly the same as their parent company, that’s evolutionarily speaking a great success.

The problem is that people tend to reproduce the cultures they’re used to. Even when they realise that those cultures are bad, they will typically only pick up on a few of the ways in which they were bad, and will tend to just carry the rest of them along without ever really noticing.

As a result, dysfunctional company cultures are fairly actively selected for, and will tend to thrive even as they destroy the companies they live in.

This is perhaps analogous to r/K selection theory – some animals reproduce by having a small number of offspring that they invest a lot of effort in to ensure they thrive (K-selection, roughly analogous to well functioning company cultures that reproduce slowly but tend to promote the survival of the companies that have them) and those that produce many offspring, most of whom will die but enough of whom will survive to adulthood to continue on to another generation (r-selection, roughly analogous to dysfunctional company cultures that destroy their companies but are passed on when the employees leave for other companies). The analogy isn’t perfect because the mechanism of “gene” transmission is so different, but it’s at least suggestive.

Which one is more adaptive varies from niche to niche, and there are plenty of niches, so it’s unlikely that either type of culture or organism is ever going to out-compete the other and replace it, but anecdotally it certainly seems like we have enough niches for the dysfunctional company cultures to thrive in right now.

So that’s the problem. What’s the solution?

Good question. Unfortunately I’m not sure of the answer. Culture is hard.

But I do think that whatever the solution is, it’s going to have to work with the evolutionary dynamic, not against it – evolution is going to happen regardless of whether we want it to or not, but we can try to shape it so that it selects for things we want.

So here are some ideas for how we can try to guide each of the three aspects of that evolution:

  1. To control how dysfunctional cultures reproduce, we can can reduce how much hiring we do along friendship and network lines – when a company fails and people leave, they tend to bring their friends at that company along with them. This is a perfectly natural thing to do, but tends to perpetuate the culture they are used to. (That being said, I don’t think it’s a good idea to actively avoid hiring people from dysfunctional companies – that will just send them to companies that will happily reproduce their culture. Also it punishes individuals unfairly for things that are not their fault. Just… maybe treat their opinions on how things should be done with a grain of salt.)
  2. To control how culture mutates, we can be more actively involved in designing our cultures for the traits we want – culture isn’t just a thing that happens, it’s a thing we create. If we let it emerge purely organically, we will tend to reproduce what has come before. If we design it, and in particular design it to avoid things we know are dysfunctional, the mutation will more actively push us in the directions we want.
  3. To control how dysfunction is selected for, we as individuals can more actively avoid working for dysfunctional companies – investigate the culture before we join, and leave sooner rather than later once we realise there are major problems we can’t fix. Where possible, if we know a company has actively bad culture we should warn people off, though whisper networks and doing this in public have problems that I don’t entirely know how to solve.

I don’t necessarily think those ideas are either necessary or sufficient, but I do think they will probably help. Evolution is hard to manage, and tends to do things we don’t expect, but a certain amount of selective breeding is possible, and I think we need to start thinking about how to do that for the cultures we create.

Of course, when we’re doing that, we need to be careful about going too far in the other direction.

(Thanks to Jess for the conversation that lead to me thinking about this stuff again and lead to the realisation that sparked this post, and Kelsey for acting as my consulting biologist)

This entry was posted in Work on by .

An algorithmic puzzle

I posted the following problem on Twitter yesterday: I have intervals \(L_1, \ldots, L_m\) which are represented as half-open intervals \(L_k = [u_k, v_k)\). I know that the intervals are all distinct and that for any two intervals \(L_i, L_j\) with \(i \neq j\) either one is contained in the other or two are disjoint.

I want to sample uniformly from all pairs \(\{(i, j) : L_i \cap L_j = \emptyset \}\), or determine that there are no such pairs. How?

A solution now follows. It might not be the optimal solution, but it’s pretty close.

The two observations that make the solution work are:

  1. If every interval is disjoint from at least one other interval, then rejection sampling runs in expected time at most \(O(m)\), because the probability of accepting a random pair is at least the lowest probability of selecting \(j\) disjoint from \(i\) over all \(i\).
  2. A root interval (one containing everything else) is contained in no disjoint pair and thus can be removed from the list without changing the distribution. If there is no root interval then every interval is disjoint from at least one other.

To unpack the second observation: If there is no root interval, take some \(L_i\). Then \(L_i\) is contained in some maximal length interval. This is not a root interval, so there is some \(L_j\) that is not contained in it (and thus is disjoint from it). Such an \(L_j\) is necessarily disjoint from \(L_i\).

It’s easy to find a root interval in \(O(m)\), but removing it might introduce a new root-interval (in the degenerate case the whole list could be a chain of the form \(L_i = [i, m)\)).

Here’s how to simultaneously remove all root intervals in \(O(m \log(m))\):

First note that a root interval is necessarily of maximum length. Thus if we sort the intervals in order of decreasing length (that’s the \(O(m \log(m) \) bit. You can do it in \(O(m)\) if the interval boundaries are not-too-large intervals using bucket sort), the root intervals must form some initial prefix of the list. By renumbering where necessary lets assume that the \(L_i\) are sorted in that order.

Now calculate the intervals \(B_i = [\min\limits_{k \geq i} u_i, \max\limits_{k \geq i} v_i) \). These are the bounding intervals of the tail sets and can easily be worked out in \(O(m)\) by just iterating backwards over the list.

\(L_i\) contains all \(k \geq i\) if and only if it contains \(U_i\). So now we find the first index \(k\) that is not a root index just by iterating forward through the \(L_i\) until \(U_i\) is not contained in \(L_i\).

We now take all intervals from that point and discard everything earlier. If the set is now empty, we raise an error because there are no disjoint pairs. If it’s not empty, we can now run the rejection sampling algorithm on it as desired and have it terminate in \(O(m)\).

think there’s another solution that may sometimes be better involving explicitly tracking the tree structure and counting the number of disjoint pairs that live in each sub-interval, but the solution fell apart the more I looked at it and I wasn’t convinced it was ever actually better.

This entry was posted in Uncategorized on by .

Lazy fisher-yates shuffling for precise rejection sampling

This is a trick I figured out a while ago. It came up in a problem I was working on, so I thought I’d write it up.

I haven’t seen it anywhere else, but I would be very surprised if it were in any way original to me rather than a reinvention. I think it’s niche and not very useful, so it’s hard to find prior art.

Attention conservation notice: My conclusion at the end is going to be that this is a neat trick that is probably not worth bothering with. You get a slight asymptotic improvement for a worse constant factor. I mention some variants and cases where it might be useful at the end, but it’s definitely more of an interesting intellectual exercise than a practical tool.

Suppose you have the following problem: You want to implement the following function:

def sample(random, n, f):
    """returns a number i in range(n) such that f(i)
    is True. Raises ValueError if no such i exists."""

What is the right way to do this?

The obvious way to do it is this:

def sample(random, n, f):
    choices = [i for i in range(n) if f(i)]
    if not choices:
        raise ValueError("No such i!")
    return random.choice(choices)

We calculate all of the possible outcomes up front and then sample from those.

This works but it is always \(O(n)\). We can’t help to do better than that in general – in the case where \(f(i)\) always returns False, we must call it \(n\) times to verify that and raise an error – but it is highly inefficient if, say, f always returns True.

For cases where we know a priori that there is at least one i with f(i) returning True, we can use the following implementation:

def sample(random, n, f):
    while True:
        i = random.randrange(0, n)
        if f(i):
            return i

Here we do rejection sampling: We simulate the distribution directly by picking from the larger uniform distribution and selecting on the conditional property that \(f(i)\) is True.

How well this works depends on a parameter \(A\) – the number of values in \([0, n)\) such that \(f(i)\) is True. The probability of stopping at any loop iteration is \(\frac{A}{n}\) (the probability of the result being True), so the number of loops is a geometric distribution with that parameter, and so the expected number of loop iterations is \(\frac{n}{R}\). i.e. we’ve effectively sped up our search by a factor of \(A\).

So in expectation rejection sampling is always at least as good and usually better than our straightforward loop, but it has three problems:

  1. It will occasionally take more than \(n\) iterations to complete because it may try the same \(i\) more than once.
  2. It loops forever when \(A=0\)
  3. Even when \(A > 0\) its worst-case may take strictly more iterations than the filtering method.

So how can we fix this?

If we fix the first by arranging so that it never calls \(f\) with the same \(i\) more than once, then we also implicitly fix the other two: At some point we’ve called \(f\) with every \(i\) and we can terminate the loop and error, and if each loop iteration returns a fresh \(i\) then we can’t have more than \(n\) iterations.

We can do this fairly naturally by shuffling the order in which we try the indices and then returning the first one that returns True:

def sample(random, n, f):
    indices = list(range(n))
    random.shuffle(indices)
    for i in indices:
       if f(i):
           return i
    raise ValueError("No such i!")

Effectively at each iteration of the loop we are uniformly at random selecting an \(i\) among those we haven’t seen yet.

So now we are making fewer calls to \(f\) – we only call it until we first find an example, as in our rejection sampling. However we’re still paying an \(O(n)\) cost for actually doing the shuffle!

We can fix this partially by inlining the implementation of the shuffle as follows (this is a standard Fisher-Yates shuffle, though we’re doing it backwards:

def sample(random, n, f):
    indices = list(range(n))
    random.shuffle(indices)
    for j in range(n):
        k = random.randrange(j, n)
        indices[j], indices[k] = indices[k], indices[j]
        i = indices[j]
        if f(i):
            return i
    raise ValueError("No such i!")

This works because after \(j\) iterations of the loop in a Fisher-Yates shuffle the indices up to and including \(j\) are shuffled. Thus we can effectively fuse our search into the loop – if we know we’re going to terminate here, we can just stop and not bother shuffling the rest.

But we’re still paying \(O(n)\) cost for initialising the list of indices. Can we do better? Yes we can!

The idea is that we can amortise the cost of our reads of indices into our writes of it, and we only do \(O(\frac{n}{A})\) writes. If we haven’t written to a position in the indices list yet, then its value must be equal to its position.

We can do this naturally in python using defaultdict (there are better data structures for this, but a hashmap gets us the desired amortized complexity) as follows:

class key_dict(defaultdict):
    def __missing__(self, key):
        return key
 
def sample(random, n, f):
    indices = key_dict()
    random.shuffle(indices)
    for j in range(n):
        k = random.randrange(j, n)
        indices[j], indices[k] = indices[k], indices[j]
        i = indices[j]
        if f(i):
            return i
    raise ValueError("No such i!")

So we have now completely avoided any \(O(n)\) costs (assuming we’re writing Python 3 and so range is lazy. If you’re still using legacy Python, replace it with an xrange).

What is the actual expected number of loop iterations?

Well if \(A = n\) then we only make one loop iteration, and if \(A = 0\) we always make \(n\). In general we certainly make no more than \(n – A\) iterations, because after that many iterations we’ve exhausted all the values for which \(f\) is False.

I, err, confess I’ve got a bit lost in the maths for what the expected number of iterations of this loop is. If \(L\) is a random variable that is the position of the loop iteration on which this terminates with a successful result (with \(L = n + 1\) if there are no valid values) then a counting argument gives us that \(P(L \geq k) = \frac{(n – A)! (n – k)!}{n! (n – A – k)!}\), and we can calculate \(E(L) = \sum\limits_{k=0}^{n – A} P(L \geq k)\), but the sum is fiddly and my ability to do such sums is rusty. Based on plugging some special cases into Wolfram Alpha, I think the expected number of loop iterations is something like \(\frac{n+1}{A + 1}\), which at least gives the right numbers for the boundary cases. If that is the right answer then I’m sure there’s some elegant combinatorial argument that shows it, but I’m not currently seeing it. Assuming this is right, this is asymptotically no better than the original rejection sampling.

Regardless, the expected number of loop iterations is definitely \(\leq \frac{n}{k}\), because we can simulate it by running our rejection sampling and counting \(1\) whenever we see a new random variable, so it is strictly dominated by the expected number of iterations of the pure rejection sampling. So we do achieve our promised result of an algorithm that strictly improves on both rejection sampling and filter then sample – it has a better expected complexity than the former, and the same worst case complexity as the latter.

Is this method actually worth using? Ehhh… maybe? It’s conceptually pleasing to me, and it’s nice to have a method that complexity-wise strictly out-performs either of the two natural choices, but in practice the constant overhead of using the hash map almost certainly is greater than any benefit you get from it.

The real problem that limits this being actually useful is that either \(n\) is small, in which case who cares, or \(n\) is large, in which case the chances of drawing a duplicate are sufficiently low that the overhead is negligible.

There are a couple of ways this idea can still be useful in the right niche though:

  • This does reduce the constant factor in the number of calls to \(f\) (especially if \(A\) is small), so if \(f\) is expensive then the constant factor improvement in number of calls may be enough to justify this.
  • If you already have your values you want to sample in an array, and the order of the array is arbitrary, then you can use the lazy Fisher Yates shuffle trick directly without the masking hash map.
  • If you’re genuinely unsure about whether \(A > 0\) and do want to be able to check, this method allows you to do that (but you could also just run rejection sampling \(n\) times and then fall back to filter and sample and it would probably be slightly faster to do so).

If any of those apply, this might be a worthwhile trick. If they don’t, hopefully you enjoyed reading about it anyway. Sorry.

This entry was posted in Python on by .

A pathological example for test-case reduction

This is an example I cut from a paper I am currently writing about test-case reduction because I needed to save space and nobody except me actually cares about the algorithmic complexity of test-case reduction.

Suppose you have the following test case:

from hypothesis import given, strategies as st
 
K = 64
 
INTS = st.integers(0, 2 ** K - 1)
 
@given(INTS, INTS)
def test_are_not_too_far_apart(m, n):
    assert abs(m - n) > 1

Then if this test ever fails, with initial values \(m\) and \(n\), if reduction can replace an int with its predecessor but can’t change two ints at once, the reduction will take at least \(\frac{m + n}{2}\) test executions: The fastest possible path you can take is to at each step reduce the larger of \(m\) and \(n\) by two, so each step only reduces \(m + n\) by \(2\), and the whole iteration takes \(\frac{m + n}{2}\) steps.

(Side note: You can get lucky with Hypothesis and trigger some special cases that make this test faster. if you happen to have \(m = n\) then it can reduce the two together, but if you start with \(m = n \pm 1\) then it will currently never trigger because it will not ever have duplicates at the entry to that step. Hypothesis will also actually find this bug immediately because it will try it with both examples set to zero. Trivial modifications to the test can be made to avoid these problems, but I’m going to ignore them here).

The interesting thing about this from a Hypothesis point of view is that \(m + n\) is potentially exponential in \(k\), and the data size is linear in \(k\), so Hypothesis’s test case reduction is of exponential complexity (which doesn’t really cause a performance problem in practice because the number of successful reductions gets capped, but does cause an example quality problem because you then don’t run the full reduction). But this isn’t specifically a Hypothesis problem – I’m morally certain every current property-based testing library’s test case reduction is exponential in this case (except for ones that haven’t implemented reduction at all), possibly with one or two patches to avoid trivial special cases like always trying zero first.

Another way to get around this is to almost never trigger this test case with large values! Typically property-based testing libraries will usually only generate an example like this with very small values. But it’s easy for that not to be the case – mutational property-based testing libraries like Hypothesis or crowbar can in theory fairly easily find this example for large \(m\) and \(n\) (Hypothesis currently doesn’t. I haven’t tried with Crowbar). Another way you could easily trigger it is with distributions that special case large values.

One thing I want to emphasise is that regardless of the specific nature of the example and our workarounds for it, this sort of problem is inherent. It’s easy to make patches that avoid this particular example (especially in Hypothesis which has no problem making simultaneous changes to \(m\) and \(n\)).

But if you fix this one, I can just construct another that is tailored to break whatever heuristic it was that you came up with. Test-case reduction is a local search method in an exponentially large  space, and this sort of problem is just what happens when you block off all of the large changes your local search method tries to make but still allow some of the small ones.

You can basically keep dangling the carrot in front of the test-case reducer going “Maybe after this reduction you’ll be done”, and you can keep doing that indefinitely because of the size of the space. Pathological examples like this are not weird special cases, if anything the weird thing is that most examples are not pathological like this.

My suspicion is that we don’t see this problem cropping up much in practice for a couple of reasons:

  1. Existing property-based testing libraries are very biased towards finding small examples in the first place, so even when we hit cases with pathological complexity, \(n\) is so small that it doesn’t actually matter.
  2. This sort of boundary case relies on what are essentially “weird coincidences” in the test case. They happen when small local changes unlock other small local changes that were previously locked. This requires subtle dependency between different parts of the test case, and where that subtle dependency exists we are almost never finding it. Thus I suspect the fact that we are not hitting exponential slow downs in our test case reduction on a regular basis may actually be a sign that there are entire classes of bug that we are just never finding because the probability of hitting the required dependency combination is too low.
  3. It may also be that bugs just typically do not tend to have that kind of sensitive dependencies. My suspicion is that this is not true given the prevalence of off-by-one errors.
  4. It may also be that people are hitting this sort of problem in practice and aren’t telling us because they don’t care that much about the performance of test case reduction or example quality.
This entry was posted in Hypothesis, Python on by .