Category Archives: voting

An election protocol for implementing random ballot in an accountable manner

Long ago, I proposed that the use of random ballot was a good way to elect a house of representatives. At the time I was only half serious. These days I genuinely think it’s a good idea.

To remind you how this works: Conceptually the idea is that everyone votes as they would under first past the post. Then when it comes time to actually pick the winning candidate, rather than the candidate with the most votes winning you pick a vote at random and use that. Thus the more votes you have, the higher the probability of winning, but you are never certain to win unless you get 100% of the votes.

This subject occasionally comes up in conversation and people complain that the problem with random ballot is that elections have to be accountable and that it’s impossible to make it accountable: You cannot verify the randomization because it’s not repeatable.

Fortunately, this isn’t true. This post is a proposed protocol for how to implement random ballot in a way that has the randomization be fully repeatable and difficult to tamper with. It uses physical sources of randomness as inputs to an electronic voting procedure. The electronic voting is open source, fully deterministic and all inputs to it are published so they may be verified by third parties.

The basic approach is that rather than actually picking a vote at random we tally the votes as we do under first past the post and then use a random number generator to draw a candidate from the resulting distribution. The only tricky bit, which is what this election protocol is designed to ensure, is being able to trust and verify our random number generator.

Edit: It’s been pointed out to me since that there exist much more sensible cryptographic schemes for generating random numbers reliably. Some variation on these should be used instead of the one I proposed here. e.g. each candidate + the people organising the vote counting simultaneously provide a random seed according to this scheme after the votes are counted, then the seeds are revealed, then the calculation is done. So the current details of random number generation are wrong, but I’ll leave them in for posterity. The rest remains valid.

The goals of this design are as follows:

  1. No individual constituency can conspire to change their answer
  2. No central power can conspire to change the overall answers
  3. Once we have counted our votes and obtained our initial random data, everything from that point on is deterministic and verifiable

It involves three components:

  1. Whatever is currently used for tallying votes. i.e. we count up the votes each candidate gets in exactly the same way we currently do
  2. A number of lottery ball machines with 256 balls in it labelled 0 to 255
  3. An open source program which implements our voting procedure

These are deployed as follows:

Firsts votes are counted. These vote counts are published as they currently are. Recounts may be demanded at this point. No recounts are permitted once we start rolling the metaphorical dice, so the whole election is stalled until everyone agrees to stop recounting.

A thing worth noting here is that recounting is much less valuable for random ballot than it is for first past the post: In first past the post if you have 51% of the vote and your opponent has 49%, you’ve won everything and so your opponent really really wants a recount because they’ve got nothing to lose by it. In random ballot you’re near as dammit tied and a recount can make your opponent’s situation worse as well as better, and isn’t likely to do much of either.

Now we have our votes counted it’s time to generate random numbers.

This is where our lottery balls come in. The goal is to generate 64 bits of random data for each constituency, i.e. 8 draws of one byte balls from the lottery machines.

First, each constituency draws 4 balls. They write down the numbers for these but do not publish them.

Then two central authorities which are physically separated and kept out of communication each draw two balls. These results are published.

The random data for each constituency is then the sequence of 4 balls they’ve drawn and 4 balls drawn by the central authority.

What’s the reasoning here?

Well, if the constituency is solely responsible for choosing the random data then they have the possibility of doing something like rejection sampling – they rerun the vote until they get the result they want.

If a single central authority then adds a source of randomness to the mix and they have access to the numbers from some of the constituencies (which they’re not supposed to have, but information can leak), then they can in theorydo similar: Rerun the experiment until they get a more pleasing distribution of seats.

By having two central authorities who are not communicating with eachother you remove the possibility of either of them doing this (they could manage to find a side channel and communicate, but this is hard enough that it makes an already hard problem essentially impossible).

So now we have our vote counts and we have our random numbers. What do we do?

Well, we publish all of these in the open for a start. This makes it possible to verify our calculations.

We now feed them in to our program. The program does the following:

  1. It concatenates all the numbers together into a single 64-bit integer
  2. It hashes that integer (this makes the ability to control any small number of balls in the sequence much less useful)
  3. It uses this hashed integer to seed a pseudo random number generator. I don’t know enough about PRNGs to comment on what is a good one here, but we don’t need fast performance here (even taking seconds per number would be more than fast enough enough) so lets assume it’s one good enough for cryptographic use.
  4. It sorts all the candidates by name
  5. It shuffles that list (this helps a little bit in protecting us against attempts to bias the PRNG towards low or high numbers. I’m not sure it’s a necessary step)
  6. We now generate a random number between 0 and the total number of votes
  7. We use this number to sample from the distribution we got from our vote count (basically counting off from the left until the running total is larger than the vote number we picked)
  8. That sample is our elected candidate

Although there are a lot of “If you could get a hidden communication channel in here and run a lot of simulations really fast and somehow rig the lottery machine to produce the answer you wanted” holes in this, I think they’re almost impossibly difficult to get anything through. The way we combine different information sources means that you need to subvert a lot of different features of the system in order to get a useful influence on the output. I’m reasonably confident that any interference with this procedure is significantly harder and more likely to be detected than more traditional forms of vote rigging, and that the way the information is used in the system is transparent enough to satisfy any reasonable desire for accountability.

This entry was posted in voting on by .

Update on resampling voting systems

In my previous post I wrote about resampling the results of voting systems. I defined a function \(S(n)\) which was the probability of the result changing when you sampled N voters from the population with replacement.

One of the questions I asked was what happens as \(N \to \infty\). I now have a partial answer to that.

This solution only addresses voting systems where you can get the answer by looking at the proportions of votes. Such voting systems can be regarded as functions

\[ \nu : \mathbb{P}(V) \to \mathbb{P}(C) \]

Where \(\mathbb{P}(X)\) is the set of probability distributions on \(X\), \(V\) is the set of possible votes and \(C\) is the set of possible candidates.

I’m pretty sure most conventional voting systems can be viewed in this way, or can be modified to be viewed this way. My only slight concern is that most of them are concerned with integer tallies. However I think they mostly have scaling properties that mean they can be viewed this way anyway (I haven’t checked this properly and I’m pre-coffee at time of writing this post, so this may be an obviously stupid belief).

At the very least however my two examples in the previous post can definitely be viewed this way: First past the post is the function that takes the highest element and produces a probability distribution that returns that with probability 1 (it needs a tie breaking rule. A reasonable one might be a uniform distribution across the highest values). Random ballot we have \(V = C\) and the function is just the identity function. Picking a random Condorcet winner and alternative vote can both be converted with only slightly more effort, so I think there’s at least a reasonably large subset of voting systems which can adopt this formalism.

Anyway, this formalism done, we can answer the question for most such voting systems about what the limiting behaviour of \(S(n)\) is:

Given a voting system \(\nu\) and a point \(x \in \mathbb{P}(V)\), if \(\nu\) is continuous (when regarding \(\mathbb{P}(V)\) as a subset of \(\mathbb{R}^{|V|}\)) at \(x\) (the original distribution of votes in our population) then \(S(n) \to \sum_{c \in C} \nu(x)(c)^2 \), the probability that running the election twice will produce the same result.

Why? Well, basically because by continuity we can find a neighbourhood of \(x\) which makes the distribution of running the voting system on anything in that neighbourhood arbitrarily close to the distribution of \(\nu(x)\). We can then make \(n\) sufficiently arbitrarily large that by the law of large numbers the distribution of the population is going to be within that neighbourhood with probability arbitrarily close to 1. So we can make the distribution of the vote outcomes arbitrarily close to the distribution of the original outcome by making \(n\) large enough, and hence the result follows.

When is this continuity assumption satisfied?

Well, deterministic voting systems are never continuous for all \(x\). Why? Topology! The input space is a connected set, the output space is a discrete set (it’s the set of vectors which have one coordinate set to 1 and all others set to 0). However most of them are continuous away from the boundaries I think – that is, if you’ve not had to invoke some tie breaker mechanism you’re probably continuous at that point (I saw some good visualisations of this a while ago but I can’t find them right now). First past the post in particular is continuous if you don’t have a tie.

Anyway, given this result, I’m largely satisfied as to what the large \(n\) behaviour of \(S(n)\) is.

I’m still interested in the small \(N\) question. I think it’s fairly clear that \(S(1)\) may be quite far from 1 indeed. For example in alternative vote, \(S(1)\) will give you the probability of having the highest number of first votes and will completely ignore the crucial secondary and higher votes. I suspect \(S(3)\) is an interesting number, as it’s the first point at which you can have interesting majority behaviours starting to take effect.

This entry was posted in voting on by .

A resampling statistic for voting systems

Suppose we have a population of voters, each of them casting a vote in some (not necessarily deterministic) voting system.

Let w be the winning candidate.

Consider the following experiment:

Fix some number N. Draw a sample of N voters with replacement (i.e. you may draw the same voter more than once). Run the election again. What is the probability of the result of the vote being different from the vote on the whole population (if the voting system is non-deterministic, rerun the system on the whole population too)?

Call this number S(N).

I suspect S(N) might reveal interesting behaviours of voting systems, but I haven’t yet analyzed this in any great detail or thought about it much.

It does however have the interesting property that it isn’t dependent on the type of voting system used – it works equally well for ranked, graded or scored votes, and for deterministic and non-deterministic systems, so it provides an interesting way of comparing potentially very different voting systems.

Some interesting questions one may reasonably ask:

  • What is the limiting behaviour of S(N) as \(N \to \infty\)?
  • Is S(N) monotonic increasing?
  • What is the behaviour like for very small N? (I don’t have a precise formulation of this yet)
  • Is there anything special about S(original number of voters)?

Two examples:

For first past the post voting, \(S(1)\) is the fraction of the population who voted for the candidate, and \(S(N) \to 1\) as \(N \to \infty\), because as \(N \to \infty\) the fractions of the samples who vote for a given candidate concentrate on the fractions in the original sample. I think \(S(N)\) is monotone increasing but have only proven it for the two candidate case. It’s intuitively plausible though.

For random ballot, \(S(N) = \sum_{i = 1}^m p_i^2 \), where we have candidates \(1 \ldots m\) and \(p_i\) is the fraction of the population who voted for candidate \(i\). Note that this is independent of N. This is because picking N candidates then picking a random one of them is exactly the same as just picking a random candidate in the first place, so \(S(N)\) is just the probability of running the election twice and getting the same result.

I haven’t worked out the answers for other, more complicated, systems. I would be interested to do so, and may well at some point, but if someone wants to do it for me or tell me if it’s already been done that’d be cool too.

This entry was posted in voting on by .

An interesting experiment in social choice

The following experiment design has occurred to me. I think it would be interesting, but I’m not sure it would be quite interesting enough for me to put in the effort to create it.

The domain of experiment is as follows: Given a presumed homogenous group trying to make a decision about a shared goal, what is the best decision procedure (i.e. voting system) for combining their opinions about it into a single decision?

The setup is as follows.

We take a strategy game. I picked reversi as a nice sweet spot between complicated and simple (it also has the nice property that the number of valid moves is quite constrained). Teams of people are assigned to black and white respectively.

A turn is played as follows:

The set of valid moves is generated. If there are no valid moves or only one play proceeds in the obvious constrained fashion without user input.
If there are more than some cap of valid moves pick a random subset of them (I’m not sure this is necessary. The maximum number of valid moves in reversi shouldn’t be very large).
The team then votes in some manner on which move to take.
The move which the voting system declares to be the winner is taken.

Note that this requires that the decision procedure not allow ties. This isn’t a terrible constraint as you can just add a random tie breaker to the decision procedure.

What’s the output of this experiment?

Each game generates the following data point:

Black had N players using voting procedure X
White had M players using voting procedure Y
C won

(it also generates the complete history of the game if that’s useful or interesting, but I suspect that’s too complicated to properly analyze)

A sufficiently large sample of this data can be used to answer a number of questions. The following are particularly interesting:

  1. Given the same size of group, does voting procedure X tend to beat voting procedure Y?
  2. Given a fixed voting procedure X, what is P(X applied to a group of size N beats a single opponent) as a function of N?
  3. Given a fixed voting procedure X, what is P(X applied to a group of size N beats X applied to a group of size M) as a function of M, N

Obviously you can’t then go on from this to say “This is the best voting procedure” as it’s applied to a very simple case, but I think the results would be interesting.

This entry was posted in voting on by .

Ha ha, only serious

So 12,640,417 people told me the following joke earlier:

Q: If I ask you if I should listen to your advice, and you tell me “no”, what should I do?
A: I don’t know, but I bet the UK is going to spend the next couple decades finding out.

This entry was posted in voting on by .