So I’ve done my first Proper open source library in ages. It’s a Python implementation of the majority judgement voting algorithm.
If, you know, you find yourself in need of an implementation of majority judgement in python. I admit that’s not very likely.
Majority judgement basically works as follows: Every voter assigns a grade to each candidate in a discrete list of ordinal grades (e.g. terrible, bad, ok, good, great). You then tally these grades and rearrange the grades for each candidate in a sequence. You then compare these sequences lexicographically to determine who won. The first element is always the (lower) median, and each point in the sequence is the lower median of the tail from that point on.
So e.g. a candidate with three OK grades, two bad ones and one great one would get the sequence:
OK, OK, Bad, OK, Bad, Great
A candidate two had two OK grades, two bad ones and two great ones would get
OK, OK, Bad, Great, Bad, Great
So would differ in the fourth position with the second candidate getting “great” and the first getting “OK”, so the second candidate would win.
However, implementing it this way for very large electorates is on the inefficient side, so the library I released contains an optimisation: Basically instead of generating a list it generates a run length encoded list. This is nice because you can often tell how many runs you’re going to need in advance so you don’t have to generate the whole list and compress. It’s also much faster to compare heavily compressed run length encoded lists – you can compare long stretches at a time rather than having to go through one element at a time.
Another optimisation that this library performs is that the evaluation is lazy: We only ever work out as much of the sequence as is needed.
An optimisation this library doesn’t perform yet but that I want it to is that there’s another type of structure I’d like to compress: Frequently you end up with situations where you’ve got a long sequence which looks like Ok, Good, Ok, Good, OK, Good, etc. repeating for a long stretch of time (e.g. you can get this when you’ve got just two grades assigned to a candidate and the same number of each). I’d like to compress those down in a similar manner to the run length encoding. I roughly know how the code for that will look, I just haven’t
Although this is a very early stage of the library, I’m actually pretty happy to label it production ready. It’s not going to change much except in implementation details as the interface is basically “It looks like a list but behind the scenes it’s doing cunning things”, and the test coverage is probably the highest of any project I’ve worked on – it’s got about half again as many lines of tests as lines of code and all those tests are heavily parametrized so there are effectively nearly 600 test cases for < 200 lines of code. Branch coverage is 100% and I intend to keep it that way.