Author Archives: david

The littlest infinity

Note: This is mostly all stuff you’d probably learn in your first six months of a maths degree, I just like occasionally chewing on the fundamentals and seeing how they fit together. There’s nothing very deep here, but it might be pedagogically interesting.

You’re probably familiar with the natural numbers: They’re a set \(\mathbb{N}\) with a distinguished element \(0 \in \mathbb{N}\) and a successor function \(S: \mathbb{N} \to \mathbb{N}\) which is injective, has \(0 \ne S(n)\) for all \(n \in \mathbb{N}\) and has the property that if \(A \subseteq N\) with \(0 \in A\) and \(S(A) \subseteq A\) then \(A = \mathbb{N}\) (“the induction principle”).

The natural numbers are the most basic example of an infinite set, and are usually postulated to exist as an axiom of set theory. We then construct all other infinite sets starting from \(\mathbb{N}\) using e.g. power set operations.

But what is an infinite set?

There turn out to be more or less three ways of defining this, all of which are equivalent (under certain assumptions).

One way of defining it is any set which is not a finite set, but that reduces to the question “What is a finite set?”. To which the answer is any set which is in bijection (one to one correspondence) with one of the sets \(\{i \in \mathbb{N} : i < n\}\) for some \(n \in \mathbb{N}\) (you need to define the relation \(<\) on \(\mathbb{N}\) for this, but lets assume we’ve done that in some standard way e.g. by induction).

You can show inductively then that \(\mathbb{N}\) has no injection into a finite subset of itself (and thus is infinite) as follows: It’s certainly not in bijection with the empty set, which is \(\{i: i < 0\}\). Suppose you had \(f: \mathbb{N} \to \{i \in \mathbb{N} : i < n + 1\}\) a bijection. By composing it with a swap if necessary, you can assume that \(f(0) = n\). But then \(f \cdot S\) is a bijection with \(\{i \in \mathbb{N} : i < n \}\), which contradicts our induction hypothesis.

Another definition of a set being infinite is that it contains a bijective copy of \(\mathbb{N}\) (i.e. there is an injection from \(\mathbb{N}\) into it).

If there is such an injection then the set is certainly infinite in the sense of not being finite, as otherwise we’d be able to compose the injection from \(\mathbb{N}\) into it with its bijection to a finite set, which contradicts what we showed above about \(\mathbb{N}\) being infinite.

To show the opposite direction we need the axiom of choice (we actually only need a weak form of it, but we’ll ignore that detail): For any set \(X\) there is a choice function \(c : \mathcal{P}(X) \setminus \{\emptyset\} \to X\) with \(c(A) \in A\).

If \(X\) is infinite in the sense of not being finite, we can then construct a function \(f: \mathbb{N} \to X\) as follows:

  • \(f(0) = c(X)\)
  • \(f(S(n)) = c(X \setminus \{f(0), \ldots f(n)\})\)

This is well defined because the set we’re passing to \(c\) must be non-empty, or we’d have constructed a bijection with \(\{0, \ldots n\} = \{i : i < n + 1\}\).

This definition of infinity justifies the fundamental role of the natural numbers (and the title) to some degree: They’re literally the smallest infinite set, because a copy of them is contained in every other infinite set.

But finally, there is a third definition of infinity which has no reference to the natural numbers at all: \(X\) is infinite if there is some injective function \(f: X \to X\) whose range is a proper subset of \(X\). i.e. \(f\) is injective but not surjective. You can justify this by analogy with the natural numbers as our basic example of an infinite set: \(S\) is such a function, because it is injective but does not map anything to \(0\). You can also show inductively that all of the sets \(\{i: i < n\}\) do not have this property (more or less by the same argument we used to show that there was no injection from \(\mathbb{N}\) into them).

Lets see that this is equivalent to the previous definition.

First, suppose we’ve got some injection \(h: \mathbb{N} \to X\). We can construct such an \(f\) as follows:

  • if \(x \neq h(n)\) for any \(n\), then \(f(x) = x\)
  • Let \(f(h(n)) = h(S(n))\)

i.e. we define our function to be the successor function on our copy of \(\mathbb{N}\) and the identity elsewhere.

This is still injective, because it’s injective on each member  of the partition into \(h(\mathbb{N})\) and \(X \setminus h(\mathbb{N})\), and it doesn’t map either part into the other. It is however not surjective, because it maps no values to \(h(0)\).

So a set which contains a copy of \(\mathbb{N}\) must necessarily be infinite in this sense.

Now, suppose we have such an \(f\). We can use this to construct an injection from \(\mathbb{N}\) into \(X\) as follows:

  • Pick \(h(0)\) as an arbitrary element of \(X \setminus f(X)\)
  • Let \(h(S(n)) = f(h(n))\)

We can prove this is injective by inductively showing that \(h(n) \not\in \{h(m): m < n\}\)

Proof: Trivially true for \(0\) because that’s the empty set. For \(S(n)\), note that again \(h(S(n)) \ne 0\) because by choice of \(h(0)\) we must have \(f(x) \ne h(0)\) for any \(x\).

Supposing \(h(S(n)) = h(m)\) for some \(0 < m \leq n\), we must have \(f(h(n)) = h(m)\). But we can write \(m = S(k)\) for some \(k\) because \(m \neq 0\), so we have \(f(h(n)) = f(h(k))\). \(f\) is an injection, so \(h(n) = h(k)\). By our inductive hypothesis, \(h\) is an injection on \(\{1, \ldots, n\}\), so we must have \(n = k\). Therefore \(h\) is an injection as desired.

So we have three notions of infinity, all equivalent (up to the axiom of choice). Why do we care?

Well, one interesting thing about the last definition is that it’s a notion of infinity that has nothing to do with the natural numbers. This has the nice benefit that we can run it backwards: If we have a set that is infinite in that sense, then we can construct the natural numbers out of it. This is a pretty strong justification of the natural numbers as our fundamental notion of an infinite set.

To see how to do this, suppose we’ve got a set \(X\) and an injective function \(S : X \to X\) which is not surjective.

Pick some distinguished element \(0 \in X \setminus S(X)\).

Now consider the family \(\mathcal{F}\) of sets \(U \subseteq X\) with the properties that \(0 \in U\) and \(S(U) \subseteq U\). This family is nonempty because trivially \(X \in \mathcal{F}\).

Now, define \(N = \bigcap \mathcal{F}\).

We must have \(N \in \mathcal{F}\), because \(0 \in N\) and if \(U \in \mathcal{F}\) then \(S(N) \subseteq S(U) \subseteq U\), so taking intersections over all \(U \in \mathcal{F}\) we have \(S(N) \subseteq \bigcap \mathcal{F} = N\).

Moreover, if \(A \subseteq N\) with \(0 \in A\) and \(S(A) \subseteq A\), then \(A \in \mathcal{F}\), and thus \(N \subseteq A\), and so \(N = A\).

But these are precisely the requirements for \(N\) to be a model of the natural numbers, \(\mathbb{N}\)! We have a distinguished element \(0\), an injective function \(S\) that doesn’t map anything to \(0\), and it satisfies the induction principle. And, importantly, we got here without assuming that anything that looked like the natural numbers existed: All we required was the apparently weaker claim about some form of infinite set existing.

So pretty much wherever you start in defining infinite sets, you’re going to end up with the natural numbers at some point.

In the absence of at least some weak form of the axiom of choice these equivalences break down a bit, but they’re still compelling enough and strong enough that they fairly firmly place the natural numbers as the foundational object of infinite set theory.

This entry was posted in Numbers are hard on by .

Democracy for lunch

“The History of every major Galactic Civilization tends to pass through three distinct and recognizable phases, those of Survival, Inquiry and Sophistication, otherwise known as the How, Why, and Where phases. For instance, the first phase is characterized by the question ‘How can we eat?’ the second by the question ‘Why do we eat?’ and the third by the question ‘Where shall we have lunch?”

Douglas Adams, The Restaurant at the End of the Universe

As well as being the mark of a sophisticated civilization, the question “Where shall we have lunch?” has another interesting feature: It’s one of the most common applications of small-scale democratic decision making we regularly run into in our lives.

Suppose we’re a group of friends or coworkers trying to decide on a venue for a group lunch. How do we make this decision? And how should we make this decision?

The most important step in the process is the one everyone is already doing: Talk it out. People can raise options, either well known ones (“Why don’t we go back to our usual Thai place?”), or new ones that might not be broadly known (“There’s a great new steak restaurant that’s just opened up”). Importantly, people can also veto options, e.g. “I have a peanut allergy, so Thai food tends to be an excitingly lethal game for me”, or partial vetoes such as “Only if the steak restaurant has a vegetarian menu”.

Eventually after a number of expansions and contractions of the list of options, you tend to reach an an impasse: There are multiple options that everyone finds acceptable, but no obvious winner.

At this point you could keep talking about it until some sort of consensus is reached, but at this point people are probably bored of talking about it and any remaining disagreement is down to fundamental matters of preference and taste rather than something you can easily talk people around to anyway.

When you reach this state, how do you finally pick a venue?

In my experience usually what ends up happening is that someone (often me) gets fed up and says “OK, I propose we go to this option. Does anyone want to object?” and attempt to build a consensus starting from there. This is OK as a solution, but tends to prioritise the needs of the most assertive and/or least patient members of the group.

Another, slightly less common, solution which is better at respecting everyone’s desires is voting: Everyone in the group who has an opinion on the matter votes on the options and the winning option is chosen.

But how do we vote on these options? Casual use of voting tends to devolve to bad choices of voting systems, usually plurality voting (everyone picks their most preferred option, you go for the one the most people picked). This is far from the best option, even for relatively low importance decisions like lunch.

Instead there are two relatively simple variants on it, either of which are a strict improvement. Which one you should use depends a bit on context, but there’s a very simple test to decide between them: Just answer the question “Do we definitely want to go along with the majority opinion?”

The answer to this question isn’t obviously yes, and depends on a number of factors, the biggest one being how often we are making this sort of decision.

If this is our weekly lunch then we actually don’t want to always follow the majority opinion: If we’ve got 4 vegetarians and 5 meat eaters, going out to a burger place (they do veggie burgers! It’s fine and we’re totally respecting your preferences!) every single week on the basis of a thin majority is a bit harsh. If on the other hand you’re planning the big annual dinner then you probably want to go with the majority view (though ideally you’d have discussed things enough that you’ll find an option with more than a bare majority).

Once you’ve answered that question, I think there are two obvious best options: If you want a majority outcome, use approval voting. If you instead want a fairer spread over many decisions, use random ballot. These aren’t the best options for everything, but for low impact single-winner decisions like this where simplicity of voting system matters a lot they’re very hard to beat.

These work as follows:

Approval voting is normal plurality voting where people can vote for more than one option. You go through each option and everyone who likes that option raises their hand. The one that got the most hands wins (break ties arbitrarily, or by running another vote between the tied options). People can raise their hands for as many different options as they like.

Random ballot using chance to spread out the decision making across the whole group: You pick one of the group by lot and go with their favourite choice. Over multiple decisions this will tend to average out – if 40% of the group are vegetarian, you’ll pick highly vegetarian friendly options about 40% of the time. If you’ve got that one guy in 10 who is completely obsessed with that Brazillian barbeque joint, you’ll go there about 10% of the time.

Random ballot is a little trickier to implement than approval voting, because you need some sort of randomization device, but almost any device will work. The two easiest options are to draw cards from a deck and have high card choose and to put pieces of paper into a hat and pick one out. The latter is better but slightly fiddlier. The former has the problem that by picking the person and then making them choose you put them on the spot a bit and make them feel much more accountable for their choice, which will tend to bias them towards the majority and reduce the proportionality. That may also be a feature for some people.

Either of these will be a significant improvement over normal plurality voting, and neither of them are particularly hard to use. Obviously the random ballot solution is dearer to my heart, but for many groups approval voting will be an easier sell, and for rare events it’s definitely the way to go because you can’t rely on the randomness to average things out.

You can also use these systems mixed in with the initial talking phase rather than keeping them clearly separate – if you want to use approval voting, just let anyone call out a new option, and discuss whether you want to veto it before voting. If you’re using random ballot, use the version where you pick a dictator by lot and then have them lead the discussion and listen to suggestions and vetos but they make the final decision.

An Amusing Variation

There’s an idea called Quadratic Voting which works by letting people buy votes, with the cost being proportional to the square of the number of votes: If one vote costs you £1, then two votes costs you £4, 3 votes cost you £9, etc. The money paid is then distributed amongst the voters equally.

I’m fairly ambivalent about quadratic voting in general, but it’s actually quite well suited for the use case of going for lunch, because there’s an obvious thing to do with the money spent: Put it towards the bill!

You can combine this idea with either of the above voting systems. For manual counting it’s actually slightly more convenient with random ballot, because you don’t have to do arithmetic: Either people get dealt extra cards or get to put multiple items in the hat.

Why am I writing about this?

I’d like to see more casual use of democratic methods, and the use case of lunch is nicely illustrative of how that can come about: Voting doesn’t have to happen at large scales with big life changing (and destroying) results. It can happen at small scales whenever we’re hanging out with our friends and want to make a decision, and it can make life easier when it does.

In general I think we tend to assume this sort of process and formalism is very heavy weight and only appropriate for Serious Decisions, but I think that overstates the difficulty a lot, and this sort of casual use is important if these tools are ever to become widespread.

Also, it really annoys me when we struggle to decide where to go for lunch.

This entry was posted in voting on by .

Metric space completeness is a restricted form of topological compactness

This is one of those facts that is obvious once you’ve seen it but I sometimes forget isn’t well known.

Topological compactness admits the following slightly less common but directly equivalent definition:

A family of sets has the finite intersection property if every intersection of a finite subfamily of it has non-empty intersection. A topological space is said to be compact if every family of closed sets with the finite intersection property has non-empty intersection.

(The reason this is directly equivalent is that a family of closed sets with empty intersection is precisely the set of complements of some open cover)

Starting from this definition, we can get an alternative definition for what it means for a metric space to be complete that makes the relationship between completeness and compactness much more obvious:

A metric spaces is complete if and only if for every family of closed sets with the finite intersection property that contains sets of arbitrarily small diameter has non-empty intersection.

i.e. metric completeness is “compactness for arbitrarily small sets”.

Let’s show that this is equivalent to the normal Cauchy sequence definition.

First, assume a metric space has this property, let’s show it’s complete in the Cauchy sequence sense.

So, let \(x_n\) be a cauchy sequence. Now, define \(F_n = \overline{\{x_m: m \geq n\}}\).

The family \(\{F_n\}\) is nested and non-empty, so certainly has the finite intersection property. Its elements are closed by construction.

To see that it contains sets of arbitrarily small diameter, we just need to show that the tail sets \(\{x_m: m \geq n\}\) can have arbitrarily small diameter, as the diameter of the closure of a set is the same as the diameter of the set. But \(x_n\) is a cauchy sequence, so for \(\epsilon > 0\) we can find \(N\) such that for \(m, n \geq N\), \(d(x_n, x_m) < \epsilon\). Necessarily then the tail set \(\{x_m: m \geq N\}\) has diameter \(\leq \epsilon\).

Now suppose \(x \in \bigcap F_n\). Then \(x_n \to x\), as if we pick \(N\) such that \(diam(F_N) < \frac{1}{2}\epsilon\), then because \(x\) is a limit point we can find \(x_m\) with \(m \geq n\) and \(d(x_m, x) < \frac{1}{2}\epsilon\), so necessarily for all \(n \geq N\), \(d(x_n, n) \leq d(x_m, x_n) + d(x_m, x) < \epsilon\), and the sequence converges to \(x\) as desired.

The other direction now. Suppose we have a Cauchy sequence complete metric space and a family of closed sets with the finite intersection property and arbitrarily small diameter sets.

Let \(E_n\) be a sequence of sets in that family with \(diam(E_n) \to 0\).

Pick \(x_n \in \bigcap\limits_{m \leq n} E_n\) (we can do this because of the finite intersection property). Then \(x_n\) is a Cauchy sequence: Pick \(N\) such that \(diam(E_N) < \epsilon\), then for \(m, n \geq N\), \(x_m, x_n \in E_N\), so necessarily \(d(x_m, x_n) \leq diam(E_N) < \epsilon\).

Because it’s a Cauchy sequence, it converges, say to \(x\). Because the \(E_n\) are closed, \(x\) is necessarily a member of \(\bigcap E_n\).

Suppose now that \(F\) is some other member of our family. We can repeat the above construction replacing \(E_n\) with \(E_n \cap F\), and get an element \(y\) with \(y \in F \cap \bigcap E_n\). But \(x, y \in E_n\), so \(d(x, y) \leq diam(E_n) \to 0\). So \(d(x, y) = 0\) and thus \(x = y\). Hence \(x\) must be in the intersection of the whole family.

QED

Why does this characterisation matter?

Well, clearly it doesn’t matter that much. The vast majority of people who do analysis have probably never encountered it and their lives are not substantially worse off for that lack.

There are basically two major reasons I like this fact:

  1. I think it makes the relationship between compactness and completeness much clearer.
  2. What Zorn’s lemma does for transfinite induction it does for sequences in analysis – it essentially lets you factor out a repetitive mechanical part of the proof and just reuse it without having to repeat the machinery over and over again. I’m slightly allergic to sequences, so this tends to play well with how I like to do analysis.
This entry was posted in Numbers are hard on by .

Generalized Single Transferable Vote

One of the interesting things I’ve noticed with my recent work on deterministic Hare Clark is that this method generalises to let you take any ranked voting system and turn it into a single transferable vote variant. I’m unsure how useful this is, but I think it’s at least notable.

(Note: I’ve never seen this before, but it’s sufficiently obvious now that I have that I would be surprised if it was truly original to me)

There is a caveat though: Although it takes any ranked voting system, the n=1 case does not correspond to the winner for that ranked voting system. Instead it corresponds to what I’m going to call the instant runoff variant of that voting system. Normal instant runoff voting is then the instant runoff variant of plurality voting.

Lets see how this works.

Instant Runoff Variants

Suppose people cast votes in some unspecified manner – scores, rankings, etc. All we need to be able to tell about these votes is if for a given candidate what other candidates the voter strictly prefers to this candidate (ties are OK and we don’t care about them). We also need to be able to take a vote and restrict it to a subset of the candidates.

Suppose we then have some voting system that takes these votes and produces a total ordering of the candidates (this will need some tie breaker).

The normal version of this system is to output the highest ranked candidate. The instant runoff version of this system instead aims to produce a candidate with broader majoritarian support, and proceeds as follows:

A vote supports a candidate if there are no other (non-disqualified) candidates that it strictly prefers. We disqualify candidates until there is some candidate that is supported by a strict majority of votes (clearly this must be possible – if we disqualify all but one candidate that candidate is supported by 100% of the votes).

We repeatedly iterate the following process:

Run the voting procedure on all votes with all remaining candidates. Now, starting from the highest ranked candidate and working downwards check if this candidate is supported by a strict majority of votes. If it is, that candidate is elected. Stop and return it.

If we get to the end with no candidates elected, disqualify the lowest ranked candidate and start again.

Normal IRV voting is just this construction applied to plurality voting on the highest ranked non-disqualified candidate (the ordering step is a red herring there because at most one candidate can be supported by a majority of first preference votes, but the drop-out is the same once you account for tie breaking).

This will in general produce quite different results from the original voting system – the difference between IRV and plurality shows that, but here’s another example with Majority Judgement. Suppose we’re running majority judgement with four grades (Reject, Tolerate, Accept, Support) and supposed we have three candidates and three voters who grade as follows:

  • Accept A, Reject B, Support C
  • Reject A, Reject B, Tolerate C
  • Support A, Reject B, Reject C

The first median grades are Accept A, Reject B, Tolerate C, so A wins the majority judgement immediately.

But it doesn’t win the runoff version (even without any dropout steps!), because a majority supports C and does not support A: The first two voters both support C as their best option, so C wins despite the relatively lower grade.

Unsurprisingly this will tend to destroy a lot of the character of the underlying voting system and make it behave significantly more like IRV than the original system does, but I still think it retains some of the interesting underlying character because it both prioritises winners who are ranked higher in the underlying system and drops out losers from the underlying system.

Generalised Single Transferable Vote

The above then generalises directly to single transferable vote as follows:

Instead of looking for support from a strict majority we look for support merely from a quota worth of support. When we find it, we remove a quota worth of votes from the set of available votes (either randomly or using a prioritisation mechanism of some sort), mark the candidate as elected and start the process again with those votes and that candidate excluded. We repeat until we have elected a full house of candidates.

In the same way that IRV results from applying the above construction to plurality voting, Hare-Clark STV is what you get if you apply this construction to plurality voting.

All of the adjustments in my previous post – prioritisation, reallocation, etc – apply just as well in this case, but I’ll omit them here for the sake of brevity. Given those adjustments though, what I described is essentially applying this construction to plurality voting with a tie breaker defined by another voting system.

Conclusion

I don’t know how significant this is or how useful this is, but I’ve been wondering for a while how to turn non-ranked, or better ranked, systems into an analogue to STV (which I’m generally of the opinion that it’s a bit of a rubbish system but for the niche it fills it’s also the best system we currently have).

There’s Reweighted Range Voting, which I only came across recently, but I think it’s a less than ideal system. In particular it lacks the coalition forming property that I’ve only recently been convinced is a good idea: It always elects the winner of the single-winner version as one of the winners (this may be bad because that winner might win as a compromise between two coalitions who would each be able to get their own seat in a larger house). I’m also fairly ambivalent on the subject of range voting (and will moderate out the inevitable comments that come from its advocates trying to tell me I’m an idiot for not thinking it’s the best thing ever), but this approach would work just as well for range voting and will probably produce a better result than RRV.

I’m far from convinced that this will prove to be the ideal way to produce more general proportional representation systems though, because I think the result is still too similar to STV and will likely inherit many of the failure modes of it.

In particular it will certainly sometimes turn monotonic systems into non-monotonic ones, because if you take plurality voting (a monotonic system, for all its faults) and apply the construction then you get IRV (a non-monotonic system, among its faults). I don’t in general think non-monotonicity is the worst thing ever, but it’s certainly pretty bad.

Still, it seems likely that for some choices of ranked voting system this will prove to be a strict improvement on STV. e.g. if you apply a Condorcet ranking system then in many cases you are likely to get a better result. It also opens the door for more interesting classes of ballots being cast in proportional multi-seat elections, which is promising.

This entry was posted in voting on by .

Deterministic Hare-Clarke STV voting, take 2

I ended up doing a lot of redesign of my deterministic variant on Hare-Clarke STV while driving around. At this point I’m pretty happy with it conceptually and after about an afternoon of experimenting with it I think it might genuinely push back the state of the art in STV counting methods. I’m not currently prepared to say it does because I don’t think my current level of testing is adequate to make that claim, but at the very least initial results are promising.

Here’s how it works.

First we need some definition of terms (many of these are the same as standard STV):

  • We define a priority order over candidates. The idea is that this defines a strict tie breaking order over candidates – when all else is equal, we do the thing that benefits higher priority candidates.
  • We also define a priority order over votes. The idea is that higher priority votes should be “more able to transfer” because they could still be useful. The voter priority will change as candidates get elected.
  • At any point in time, any given vote is either free or assigned to a candidate. A vote may be assigned to at most one candidate.
  • At any point in time, any given candidate is either electedhopeful or disqualified.
  • A vote is available to a candidate if the vote is free and every candidate the vote has ranked higher than that candidate is either elected or disqualified. A (full) vote is always available to exactly one hopeful candidate and may be available to multiple elected candidates.

As usual the goal is to get the desired number of candidates to the elected state.

One of the big differences so far is the introduction of the “free” state for a vote. Instead of having votes all be assigned to candidates and transferring between them, votes only transfer between free and assigned. Further, the only candidates with assigned votes are the elected ones, and we maintain the invariant that every elected candidate is always assigned exactly a quota worth of votes. Votes may then move between allocated and free states, but never between allocation to two candidates.

The voter and candidate priorities require more explanation of the details, but they’re fairly complicated and distract from the overall method, so I’ll explain the details later.

Elections then proceed in a number of rounds as follows. In each round either one or more candidates move from a hopeful state to an elected state or exactly one candidate moves from a hopeful state to a disqualified state. Additionally, votes may move between free and allocated states in ways that preserve the invariant that every candidate has exactly zero (if hopeful or disqualified) or quota (if elected) allocated votes.

Each round first consists of an elect stage. This works as follows:

For each hopeful candidate with at least the quota worth of votes available to it, mark it as elected. Now for each candidate that was elected just now, assign a quota worth of the votes that were previously available to it before the change in status (to prevent a candidate accidentally stealing votes another one needs out from under it). They each each pick the quota worth of votes with the lowest priority available to them.

We repeatedly do this until no new candidates are elected.

If no candidates were elected in the previous stage, we perform a dropout stage: The worst hopeful candidate is marked as disqualified.

Worst is picked more or less as usual: It’s the hopeful candidate with the fewest votes available to it. In the event of the tie, it’s the lowest priority amongst the tied candidates.

If on the other hand some candidates were elected, we now perform reallocation.

The idea of reallocation is that there are low priority votes that are currently free that could replace higher priority votes that are currently allocated, they should.

Reallocation works as follows: For each free vote, starting from the lowest priority and working upwards, go through each candidate they are available to in the order they voted for. If that candidate has any votes allocated to it that are higher priority than the current vote, allocate the current vote to that candidate and mark the current highest priority vote allocated to that candidate as free.

Repeat this process until it runs to completion without making any changes.

This process of alternating election followed by either reallocation or dropout continues until the full set of candidates have been elected.

Conceptually, that’s pretty much all there is to it other than the definition of priorities.

What constitutes a good priority for candidates is a somewhat open question, but I’ve been working on the assumption that it’s basically a rank aggregation order for the votes. I’ve been using Borda Majority Judgement, for no particularly good reason other than it seems to produce slightly nicer results in general than Borda score and given that I had some code around for calculating Majority Judgement it was easy enough to do. This still requires some experimentation though.

Given the priority order for candidates though, there is a natural and effective way of calculating a priority order for votes.

The idea is that we first prioritise votes that are less satisfied, then given two equally satisfied votes we priorise ones that agree more with the priority order for candidates.

We work with the reduced vote that you get after you’ve discarded all disqualified candidates.

Now, to determine which of two votes are higher priority we first look at the first index in the vote of a candidate which has not been elected. (So e.g. if I voted “A, B, C” and A and C have been elected then my list here would be “1”, as indices start from 0). Given two candidates, we compare these lists of indices lexicographically. If one is before the other, that vote is higher priority.

Or, in other words, if the first unelected candidate in my ranking is higher ranked than in yours, my vote is higher priority and vice versa. If they are the same, we look at the next unelected candidates, and so on.

If this results in a tie, we now compare priority orders. We look only at the unelected candidates on our ballot. Find the first candidate where the two votes differ. The vote with the higher priority candidate there is higher priority.

If still tied, do the same but for elected candidates (this step doesn’t matter, but prevents us having to break ties arbitrarily). If the votes are still tied after that then they are identical up to disqualified candidates (which don’t matter), so pick whichever.

Small Elections and Anecdotal Results

This seems to work pretty well. In a lot of cases it just agrees with Gregory or Meek counting – the large majority of STV elections it doesn’t seem to matter very much which method you pick, it’s only in edge cases that it makes a difference.

But there are some interesting edge cases where I think it comports itself fairly well.

The following election is one:

  • Two votes: 0, 1, 2, 3
  • Three votes: 0, 2, 3, 1
  • Two votes: 2, 0, 3, 1

In this election it produces a result that no fractional counting method can and happens with quite low probability in standard Hare-Clarke voting, but I feel that it’s quite defensible as the correct result: It elects candidates 0, 1 and 2. Gregory or Meek counting would elect candidates 0, 2, and 3.

The reason for this is that the quota is two, and there are exactly two votes who would support 1 given the option (you could achieve the same effect with larger number of votes and have the number supporting being slightly above the quota if you liked). This means that if any fraction of their vote transfers away then it is no longer possible for candidate 1 to be elected.

When Meek and Gregory counting disagree it doesn’t always seem to side with one or the other. e.g. in the following election it agrees with Gregory counting but not Meek:

  • Four votes: 0, 3, 2, 1
  • Five votes: 3, 0, 1, 2
  • One vote: 3, 2, 1, 0

When electing three candidates, Gregory and this method elect 0, 1 and 3 while Meek elects 0, 2 and 3.

There doesn’t appear to be any deep reason about which it agrees with here: The difference between Gregory and Meek is about which candidate drops out after 0 and 3 have been elected – under Meek candidate 2 retains a slightly higher score, so stays in.

In this method however there are no disqualifications at all, it just finds a mutually amicable assignment of votes. No drop out happens because everything worked out fine in the electoral phase (anecdotally this seems to happen a lot). A reallocation is performed, but it doesn’t affect the results if you disable it.

This could easily have gone the other way. The following election picks the same candidates as Meek which is different from Gregory:

  • Three votes of 0, 1, 2, 3
  • Two votes of 0, 3, 1, 2
  • One vote of 1, 0, 2, 3
  • One vote of 1, 0, 3, 2

When electing three candidates, Gregory elects 0, 1 and 2. Meek and this method elect 0, 1 and 3.

The reasons are much the same as before: The difference between Gregory and Meek counting comes down to which drops out after all the possible candidates have been elected, whileas this method just happily goes and finds a set of compatible votes:

  • Candidate 0 gets two “0, 1, 2, 3” votes
  • Candidate 1 gets “1, 0, 3, 2” and “1, 0, 2, 3”
  • Candidate 3 gets  two “0, 3, 1, 2” votes

Unfortunately sometimes this system can demonstrate arguable worse results. In the following election it exhibits non-monotonic behaviour where the fractional methods behave correctly (though there are cases where the fractional methods also demonstrate non-monotonic behaviour. It’s intrinsic to STV):

Three votes [0, 1, 2, 3, 4],
Four votes [1, 0, 4, 3, 2],
Two votes [4, 0, 2, 1, 3],

When electing 4 candidates, all three methods agree this should elect 0, 1, 3 and 4. However, if we take one of the final votes of “4, 0, 2, 1, 3” and raise 3 from last to first place, all of a sudden we elect candidate 2 instead of 3 under this method, while the fractional votes remain unchanged.

I don’t have a straightforward explanation for why this happens here. For some reason this causes 4 to get elected using the votes that would have been used for three instead of the ones it used previously, but it’s not completely obvious to me why. It’s possible this might be a bug in my implementation. I’ll update this section when I know more.

Update: This happens because the rule for electing multiple candidates simultaneously has unexpected consequences. In the original case, 0, 1 and 4 are all elected in the same round. In the case where a ranking for 3 has been increased, 4 is elected in a separate round. This has unexpected consequences. I’ll look into seeing whether changing the rule here is an improvement..

Even if this is correct, I’m not too worried. Non-monotonicity is a known flaw of STV so it’s not surprising this method exhibits it, even if it is slightly surprising it exhibits it in a place where Gregory counting does not.

In none of these examples did this method need to either disqualify candidates or redistribute votes. Here’s an example where it does need to redistribute votes:

  • One vote 0, 1, 2, 3
  • One vote 3, 0, 1, 2
  • Four votes 3, 1, 2, 0

When electing three candidates, Gregory, Meek and this method all agree that the correct candidates to elect here are 0, 1 and 3.

However, if you disable reallocation then this method does not agree, and elects 2 in place of 0. The reason for this is that the vote “3, 0, 1, 2” originally gets used to elect candidate 3, but once candidate 1 has become elected it gets swapped out for a “3, 1, 2, 0”. This gives 0 and extra vote, which causes it to become elected instead of dropping out.

This example would be different given a different priority order on candidates, but similar ones exist for any order.

Conclusion

For the cases where no candidates drop out when this method is run I feel from a philosophical point of view that it’s just obviously better than fractional methods: It successfully arranges a quota worth of actual voters for each candidate, all of whom have a full initial prefix of their vote. You can imagine a process of mutual negotiation and swapping amongst voters to get there, at which point the single transferable vote requirement is satisfied.

Anecdotally this seems to happen a lot – the system is very good at finding such a satisfaction. I suspect this is in small or large part a function of how I’m generating examples to look at, but it’s still an encouraging sign.

For the cases where drop-out then happens I’m less convinced about the system making equitable choices, but it’s probably no worse than STV normally is (the drop out round is in many ways the least convincing part about STV), and may be better depending on how you calculate the priority order.

All told this is a really promising looking system. I’ll put together some proper open source code for trying it out at some point in the near future, along with some more serious evaluations.

 

This entry was posted in voting on by .