A hybrid voting system for scheduling

I was thinking recently about the problem of scheduling a group outing (e.g. for a meal). Using something like Doodle you can all vote on a day, and the day with the most votes wins. It’s basically Approval Voting or, if you allow the “if needs be”, option, range voting.

But this runs into a problem I’ve pointed out with voting on group meals before: If you do it often, you consistently make the same sort of choices over and over again, excluding the same people each time. If the group is mostly fine with Wednesday or Thursdays but has a slight Thursday preference, if I can never do Thursdays then I’m out of luck.

As in the previous lunch post, random ballot solves this: It gives you proportional representation of interests over time.

But while this worked reasonably well for lunch places, it doesn’t work so well for scheduling: Typical scheduling problems have any given person being genuinely indifferent between four or five options. Why force them to express a strict preference for one of those options, especially when it may be that some of those days work much better for some people than others?

So for scheduling it seems like approval voting is in fact very well suited. But is there a system that combines the benefits of both?

There is! It’s in fact an extremely direct hybrid of the two, but weirdly not one I’ve seen before.

  1. Everyone casts a vote as in approval voting. That is, they specify a list of acceptable options.
  2. When tallying, you pick a random voter and restrict the set of possible winners to the ones on their ballot.
  3. Amongst that restricted set, votes are tallied as per normal approval voting, picking the one that the largest number of people approve of.

And that’s all there is to it. It satisfies the proportionality criterion that any voting block will on average get an acceptable answer at least in proportion to the size of the bloc (they may get it much more than that, but that’s intrinsic to the problem: e.g. the set of people who are happy with every answer will get an acceptable answer 100% of the time), while still being broadly majoritarian because of the approval voting component.

You can also add a threshold if you like. Because each person votes for multiple options, doing so is much simpler than in normal random ballot, and much more reasonable, so you could quite easily decide to e.g. exclude any options from the schedule that fewer than half of the people can go to (or even more complicated criterion like e.g. it being at least some fixed fraction of the approval winning outcome). The result is that you only end up excluding people who really can’t make any day that most other people in the group can. Which is fair enough really, though a bit unfortunate.

Is this applicable to non-scheduling problems? I don’t know. It might be! I think it adds a strong centrist bias to random ballot, which can be both good and bad.

The properties of scheduling which matter here are I think:

  • Every voter typically has many acceptable options
  • It’s something done often enough that the randomness tends to balance out
  • It’s something where despite that consensus is probably more important than true proportionality.

It also has the nice property that experimenting with voting systems for scheduling is rather low impact. I’d want to do a more formal analysis of this before proposing it for anything more serious, but I suspect it might still have some nice properties that make it worth considering.

This entry was posted in voting on by .

Programmer at Large: Does that work?

This is the latest chapter in my web serial, Programmer at Large. The first chapter is here and you can read the whole archives here or on the Archive of Our Own mirror. This chapter is also mirrored at Archive of Our Own.

Sam and I worked in companionable silence for about five kiloseconds, but eventually they had to go lead a Krav Maga session, so we kissed each other goodbye. Kimiko was still flagged as busy, so I took the opportunity to retreat to a pod to do some real work.

Uh, not that social network analysis isn’t real work you understand, it’s just not exactly in my remit. It’s a useful skill to keep your hand in on, but I try to avoid the trap of becoming a generalist.

I reviewed where I was on my current task: I still didn’t know much about what was going wrong, but had a hint that it was something to do with temperature events.

At this point I could do an exhaustive analysis and try to binary search out the exact problem. It would take ages and require a lot of detail work, but it would almost certainly work in narrowing down at least one real problem.

But I wasn’t really in the right frame of mind for detail work, so I decided to gamble instead.

“Ide, show me something interesting to do with the current task.”

“I have a temperature control program marked as critical that exhibits anomalous command output prior to the event and currently has a failing build. Is that suitable?”


Almost too perfect in fact. I wondered why that hadn’t that flagged up before.

“How many other equally interesting things could you have shown me?”



“OK, call up the specs for this program.”

Subject: Stochastic Temperature Control Feedback Regulation Unit 3
Origin: New Earth 2
Language: Go#
Importance: Critical
Reliability: High
Obsolescence: High
Fragility: High
Notes: It might be best to leave now, you probably shouldn't touch this.

That wasn’t encouraging.  Also, I wasn’t thrilled by the idea of learning about another weird Grounder programming language.

I sighed. Still, I wasn’t just going to stop without looking into it a bit.

“Wiki, show me the specs for Go#”

Subject: Go#
Category: Programming language, text based.
Lineage: Pre-diaspora, began as a dialect of Go in 2021.
Common Tags: Archaic, Esoteric, Moderate Complexity, Evolutionary Dead End, Poorly Thought Out.
Normalised Rating: Please tell me you're not still using this.

Definitely less than encouraging.

“OK, show me the failing build step.”

Ide displayed a bunch of code for me. I can’t say I understood any of it, but one thing stood out.

“Wiki, what’s Hypothesis?”

“Hypothesis is a generic term for a family of testing tools that were popular for a period of approximately five gigaseconds before and around the diaspora. They work by generating random data to run a conventional unit test against.”

“Wow, really? Does that work?”

“Significantly better than the methods that predate it. Unassisted humans tend to to be very bad at writing correct software, which results in many shallow bugs that simple random testing can uncover. However, it has largely been supplanted by modern symbolic execution and formal methods, as the number of bugs it finds grows logarithmically.”

“Ide, how long did it take to find this particular bug?”

“Approximately nine gigaseconds of compute time.”

Wow. This code had run for most of a crew lifespan before eventually finding a bug. That was rather adorable. I vaguely saluted whatever grounder was responsible for this thing, and reflected on how grateful I was to not be them and to have access to modern tooling.

“How long would it have taken given appropriate formal methods?”

“Difficult to estimate due to low availability of formal models for this language. However, based on the execution trace this is a known bug in OpenSSH, where the bug was found within the first four seconds of active testing of it under a more modern test suite written as a student exercise in a class on software archaeology on the Star Struck three gigaseconds ago coordinated time.

That was about what I expected.

“Wiki, what’s OpenSSH?”

“It is a secure network communication protocol, originally designed to provide remote access to a system via a local PTY.”

“What’s a PTY?”

“Warning: This information has been tagged as a memetic hazard, subcategory can of worms. Do you wish to proceed?”

I blinked. That was unexpected. I was almost curious enough to proceed anyway, but these warnings were usually worth taking seriously and they didn’t normally get attached to interesting awful information. Besides, this really wasn’t that relevant.

“No, that’s fine.”

I thought for a bit. I was pretty sure why this known bug was still present, but decided to check anyway.

“Ide, why has this bug not been fixed despite being known?”

“Due to the rating of this process as high in all of criticality, stability and fragility, it was flagged as an ultra-low priority fix.”

That’s what I thought. It works, but trying to fix it is probably more likely to break it, and then the plumbing backs up. Not unlike the problem I’d run into with the ramscoop, but the difference was that one this one was in my remit.

“Is this bug being triggered in the wild?”

“Unknown as to whether the particular sequence of events Hypothesis has found are present in the wild, but logs indicate that the underlying OpenSSH bug is triggered.”

“Is it being triggered in the vicinity of the anomalous event?”


OK. So this was definitely a plausible culprit.

“Can we run a simulation of what would happen if this bug was fixed?”

“Warning: At current resource availability, such a simulation would take 0.8 gigaseconds to complete.”

“Ugh. Show me the cost curve.”

I looked at the cost curve. Right. All those game theory simulations the programmers at arms were running were taking up most of our spare capacity, and I didn’t have budget to outbid them on anything except the very tiny subset of capacity I had priority on.

Which wasn’t wholly unfair. But I now had evidence that a critical system was misbehaving and might be triggering an anomalous plumbing event, which was serious. Granted it was less important than the fate of humanity, but it might be higher priority. Time to try and free up some budget for simulation.

I sighed, and started putting together a report.

Next chapter when it happens. Nominally in two weeks time, but the schedule has become very erratic.

Like this? Why not support me on Patreon! It’s good for the soul, and will result in you seeing chapters as they’re written (which is currently not very much earlier than they’re published, but hopefully I’ll get back on track soon).

This entry was posted in Fiction, Programmer at Large on by .

The other half of binary search

Binary search is one of those classic algorithms that most people who know about algorithms at all will know how to do (and many will even be able to implement correctly! Probably fewer than think they can though – it took me a long time to go to thinking I could implement binary search correctly to actually being able to implement it correctly).

Some of this is because the way people think about binary search is somewhat flawed. It’s often treated as being about sorted arrays data, when that’s really only one application of it. So lets start with a review of what the right way to think about binary search is.

We have two integers \(a\) and \(b\) (probably non-negative, but it doesn’t matter), with \(a < b\). We also have some function that takes integers \(a \leq i \leq b\), with \(f(a) \neq f(b)\). We want to find \(c\) with \(a \leq c < b\) such that \(f(c) \neq f(c+ 1)\).

i.e. we’re looking to find a single exact point at which a function changes value. For functions that are monotonic (that is, either non-increasing or non-decreasing), this point will be unique, but in general it may not be.

To recover the normal idea of binary search, suppose we have some array \(xs\) of length \(n\). We want to find the smallest insertion point for some value \(v\).

To do this, we can use the function \(f(i)\) that that returns whether \(xs[i] < v\). Either this function is constantly true (in which case every element is < v and v should be inserted at the end), constantly false (in which case v should be inserted at the beginning), or the index i with \(f(i) \neq f(i + 1)\) is the point after which \(v\) should be inserted.

This also helps clarify the logic for writing a binary search:

def binary_search(f, lo, hi):
    # Invariant: f(hi) != f(lo)
    while lo + 1 < hi:
        assert f(lo) != f(hi)
        mid = (lo + hi) // 2
        if f(mid) == f(lo):
            lo = mid
            hi = mid
    return lo

Every iteration we cut the interval in half - because we know the gap between them is at least one, this must reduce the length. If \(f\) gives the same value to the midpoint as to lo, it must be our new lower bound, if not it must be our new upper bound (note that generically in this case we might not have \(f(mid) = f(hi)\), though in the typical case where \(f\) only takes two values we will).

Anyway, all of this is besides the point of this post, it's just scene setting.

Because the point of this post is this: Is this actually optimal?

Generically, yes it is. If we consider the functions \(f_k(i) = i < k\), each value we examine can only cut out half of these functions, so we must ask at least \(\log_2(b - a)\) questions, so binary search is optimal. But that's the generic case. In a lot of typical cases we have something else going for us: Often we expect change to be quite frequent, or at least to be very close to the lower bound. For example, suppose we were binary searching for a small value in a sorted list. Chances are good it's going to be a lot closer to the left hand side than the right, but we're going to do a full \(\log_2(n)\) calls every single time. We can solve this by starting the binary search with an exponential probe - we try small values, growing the gap by a factor of two each time, until we find one that gives a different value. This then gives us a (hopefully smaller) upper bound, and a lower bound somewhat closer to that.

def exponential_probe(f, lo, hi):
    gap = 1
    while lo + gap < hi:
        if f(lo + gap) == f(lo):
            lo += gap
            gap *= 2
            return lo, lo + gap
    return lo, hi

We can then put these together to give a better search algorithm, by using the exponential probe as the new upper bound for our binary search:

def find_change_point(f, lo, hi):
    assert f(lo) != f(hi)
    return binary_search(f, *exponential_probe(f, lo, hi))

When the value found is near or after the middle, this will end up being more expensive by a factor of about two - we have to do an extra \(\log_2(n)\) calls to probe up to the midpoint - but when the heuristic wins it potentially wins big - it will often take the \(\log_2(n)\) factor (which although not huge can easily be in the 10-20 range for reasonable sized gaps) and turn it into 1 or 2. Complexity wise, this will run in \(O(\log(k - lo)\), where \(k\) is the value returned, rather than the original \(O(hi - lo)\).

This idea isn't as intrinsically valuable as binary search, because it doesn't really improve the constant factors or the response to random data, but it's still very useful in a lot of real world applications. I first came across it in the context of timsort, which uses this to find a good merge point when merging two sublists in its merge step.

Edit to note: It was pointed out to me on Twitter that I'm relying on python's bigints to avoid the overflow problem that binary search will have if you implement it on fixed sized integers. I did know this at one point, but I confess I had forgotten. The above code works fine in Python, but if int is fixed size you want the following slightly less clear versions:

def midpoint(lo, hi):
    if lo <= 0 and hi >= 0:
        return (lo + hi) // 2
        return lo + (hi - lo) // 2

def binary_search(f, lo, hi):
    # Invariant: f(hi) != f(lo)
    while lo + 1 < hi:
        assert f(lo) != f(hi)
        mid = midpoint(lo, hi)
        if f(mid) == f(lo):
            lo = mid
            hi = mid
    return lo

def exponential_probe(f, lo, hi):
    gap = 1
    midway = midpoint(lo, hi)
    while True:
        if f(lo + gap) == f(lo):
            lo += gap
            if lo >= midway:
                gap *= 2
            hi = lo + gap
    return lo, hi

These avoid calculating any intermediate integers which overflow in the midpoint calculation:

  • If \(lo \leq 0\) and \(hi \geq 0\) then \(lo \leq hi + lo \leq hi\), so is representable.
  • If \(lo \geq 0\) then \(0 \leq hi - lo \leq hi\), so is representable.

The reason we need the two different cases is that e.g. if \(lo\) were INT_MIN and \(hi\) were INT_MAX, then \(hi - lo\) would overflow but \(lo + hi\) would be fine. Conversely if \(lo\) were INT_MAX - 1 and \(hi\) were INT_MAX, \(hi - lo\) would be fine but \(hi + lo\) would overflow.

The following should then be a branch free way of doing the same:

def midpoint(lo, hi):
    large_part = lo // 2 + hi // 2
    small_part = ((lo & 1) + (hi & 1)) // 2
    return large_part + small_part

We calculate (x + y) // 2 as x // 2 + y // 2, and then we fix up the rounding error this causes by calculating the midpoint of the low bits correctly. The intermediate parts don't overflow because we know the first sum fits in \([lo, hi]\), and the second fits in \([0, 1]\). The final sum also fits in \([lo, hi]\) so also doesn't overflow.

I haven't verified this part too carefully, but Hypothesis tells me it at least works for Python's big integers, and I think it should still work for normal C integers.

This entry was posted in programming, Python on by .

Linear time (ish) test case reduction

A problem I’m quite interested in, primarily from my work on Hypothesis, is test case reduction: Taking an example that produces a bug, and producing a smaller version that triggers the same bug.

Typically a “test-case” here means a file that when fed to a program triggers a crash or some other wrong behaviour. In the abstract, I usually think of sequences as the prototypical target for test case reduction. You can view a file as a sequence in a number of usefully different ways – e.g. as bytes, as lines, etc. and they all are amenable to broadly similar algorithms.

The seminal work on test-case reduction is Zeller and Hildebrant’s “Simplifying and Isolating Failure-Inducing Input“, in which they introduce delta debugging. Delta-debugging is essentially an optimisation to the greedy algorithm which removes one item at a time from the sequence and sees if that still triggers the bug. It repeatedly applies this greedy algorithm to increasingly fine partitions of the test case, until the partition is maximally fine.

Unfortunately the greedy algorithm as described in their paper, and as widely implemented, contains an accidentally quadratic bug. This problem is not present in the reference implementation, but it is present in many implementations of test case reduction found in the wild, including Berkeley delta and QuickCheck. Hypothesis gets this right, and has for so long that I forgot until quite recently that this wasn’t more widely known.

To see the problem, lets look at a concrete implementation. In Python, the normal implementation of the greedy algorithm looks something like this:

def greedy(test_case, predicate):
    while True:
        for i in range(len(test_case)):
           attempt = list(test_case)
           del attempt[i]
           if predicate(attempt):
               test_case = attempt
    return test_case

We try deleting each index in turn. If that succeeds, we start again. If not, we’re done: No single item can be deleted from the list.

But consider what happens if our list is, say, the numbers from 1 through 10, and we succeed at deleting a number from the list if and only if it’s even.

When we run this algorithm we try the following sequence of deletes:

  • delete 1 (fail)
  • delete 2 (succeed)
  • delete 1 (fail)
  • delete 3 (fail)
  • delete 4 (succeed)
  • delete 1 (fail)

Every time we succeed in deleting an element, we start again from scratch. As a result, we have a classic accidentally quadratic bug.

This is easy to avoid though. Instead of starting again from scratch, we continue at our current index:

def greedy(test_case, predicate):
    deleted = True
    while deleted:
        deleted = False
        i = 0
        while i <  len(test_case):
           attempt = list(test_case)
           del attempt[i]
           if predicate(attempt):
               test_case = attempt
               deleted = True
               i += 1
    return test_case

At each stage we either successfully reduce the size of the test case by 1 or we advance the loop counter by one, so this loop makes progress. It stops in the same case as before (when we make it all the way through with no deletions), so it still achieves the same minimality guarantee that we can’t delete any single element.

The reason this is only linear time “ish” is that you might need to make up to n iterations of the loop (this is also true with the original algorithm) because deleting an element might unlock another element. Consider e.g. the predicate that our sequence must consist of the numbers 1, …, k for some k – we can only every delete the last element, so we must make k passes through the list. Additionally, if we consider the opposite where it must be the numbers from k to n for some k, this algorithm is quadratic while the original one is linear!

I believe the quadratic case to be essential, and it’s certainly essential if you consider only algorithms that are allowed to delete at most one element at a time (just always make the last one they try in any given sequence succeed), but anecdotally most test cases found in the wild don’t have nearly this high a degree of dependency among them, and indexes that previously failed to delete tend to continue to fail to delete.

A model with a strong version of this as its core assumption shows up the complexity difference: Suppose you have \(k\) indices out of \(n\) which are essential, and every other index can be deleted. Then this algorithm will always run in \(n + k\) steps (\(n\) steps through the list the first time, \(k\) steps at the end to verify), while the classic greedy algorithm will always run in \(O(k^2 + n)\) steps.

Although this model is overly simplistic, anecdotally examples found in the wild tend to have something like this which acts as an “envelope” of the test case – there’s some large easily deletable set and a small essential core. Once you’re down to the core you have to do more work to find deletions, but getting down to the core is often the time consuming part of the process.

As a result I would be very surprised to find any instances in the wild where switching to this version of the algorithm was a net loss.

This entry was posted in programming, Python on by .

Startup Life

Regular readers of this blog who quite sensibly don’t use Twitter (or do use Twitter and don’t follow me there) have probably not noticed that I have a new short story out: Startup Life.

As with most of my fiction, this one is about exploring important philosophical and political questions.

Specifically it’s about exploring the important political and philosophical question: What would happen if Dr Vicky Frankenstein joined Vampire Ada Lovelace’s cybernetics startup?

Um. OK. Maybe it’s not that important or philosophical. But I’m really pleased with how it turned out and if you like my fiction at all I recommend reading it.

This entry was posted in Fiction on by .