Long ago, I proposed that the use of random ballot was a good way to elect a house of representatives. At the time I was only half serious. These days I genuinely think it’s a good idea.
To remind you how this works: Conceptually the idea is that everyone votes as they would under first past the post. Then when it comes time to actually pick the winning candidate, rather than the candidate with the most votes winning you pick a vote at random and use that. Thus the more votes you have, the higher the probability of winning, but you are never certain to win unless you get 100% of the votes.
This subject occasionally comes up in conversation and people complain that the problem with random ballot is that elections have to be accountable and that it’s impossible to make it accountable: You cannot verify the randomization because it’s not repeatable.
Fortunately, this isn’t true. This post is a proposed protocol for how to implement random ballot in a way that has the randomization be fully repeatable and difficult to tamper with. It uses physical sources of randomness as inputs to an electronic voting procedure. The electronic voting is open source, fully deterministic and all inputs to it are published so they may be verified by third parties.
The basic approach is that rather than actually picking a vote at random we tally the votes as we do under first past the post and then use a random number generator to draw a candidate from the resulting distribution. The only tricky bit, which is what this election protocol is designed to ensure, is being able to trust and verify our random number generator.
Edit: It’s been pointed out to me since that there exist much more sensible cryptographic schemes for generating random numbers reliably. Some variation on these should be used instead of the one I proposed here. e.g. each candidate + the people organising the vote counting simultaneously provide a random seed according to this scheme after the votes are counted, then the seeds are revealed, then the calculation is done. So the current details of random number generation are wrong, but I’ll leave them in for posterity. The rest remains valid.
The goals of this design are as follows:
- No individual constituency can conspire to change their answer
- No central power can conspire to change the overall answers
- Once we have counted our votes and obtained our initial random data, everything from that point on is deterministic and verifiable
It involves three components:
- Whatever is currently used for tallying votes. i.e. we count up the votes each candidate gets in exactly the same way we currently do
- A number of lottery ball machines with 256 balls in it labelled 0 to 255
- An open source program which implements our voting procedure
These are deployed as follows:
Firsts votes are counted. These vote counts are published as they currently are. Recounts may be demanded at this point. No recounts are permitted once we start rolling the metaphorical dice, so the whole election is stalled until everyone agrees to stop recounting.
A thing worth noting here is that recounting is much less valuable for random ballot than it is for first past the post: In first past the post if you have 51% of the vote and your opponent has 49%, you’ve won everything and so your opponent really really wants a recount because they’ve got nothing to lose by it. In random ballot you’re near as dammit tied and a recount can make your opponent’s situation worse as well as better, and isn’t likely to do much of either.
Now we have our votes counted it’s time to generate random numbers.
This is where our lottery balls come in. The goal is to generate 64 bits of random data for each constituency, i.e. 8 draws of one byte balls from the lottery machines.
First, each constituency draws 4 balls. They write down the numbers for these but do not publish them.
Then two central authorities which are physically separated and kept out of communication each draw two balls. These results are published.
The random data for each constituency is then the sequence of 4 balls they’ve drawn and 4 balls drawn by the central authority.
What’s the reasoning here?
Well, if the constituency is solely responsible for choosing the random data then they have the possibility of doing something like rejection sampling – they rerun the vote until they get the result they want.
If a single central authority then adds a source of randomness to the mix and they have access to the numbers from some of the constituencies (which they’re not supposed to have, but information can leak), then they can in theorydo similar: Rerun the experiment until they get a more pleasing distribution of seats.
By having two central authorities who are not communicating with eachother you remove the possibility of either of them doing this (they could manage to find a side channel and communicate, but this is hard enough that it makes an already hard problem essentially impossible).
So now we have our vote counts and we have our random numbers. What do we do?
Well, we publish all of these in the open for a start. This makes it possible to verify our calculations.
We now feed them in to our program. The program does the following:
- It concatenates all the numbers together into a single 64-bit integer
- It hashes that integer (this makes the ability to control any small number of balls in the sequence much less useful)
- It uses this hashed integer to seed a pseudo random number generator. I don’t know enough about PRNGs to comment on what is a good one here, but we don’t need fast performance here (even taking seconds per number would be more than fast enough enough) so lets assume it’s one good enough for cryptographic use.
- It sorts all the candidates by name
- It shuffles that list (this helps a little bit in protecting us against attempts to bias the PRNG towards low or high numbers. I’m not sure it’s a necessary step)
- We now generate a random number between 0 and the total number of votes
- We use this number to sample from the distribution we got from our vote count (basically counting off from the left until the running total is larger than the vote number we picked)
- That sample is our elected candidate
Although there are a lot of “If you could get a hidden communication channel in here and run a lot of simulations really fast and somehow rig the lottery machine to produce the answer you wanted” holes in this, I think they’re almost impossibly difficult to get anything through. The way we combine different information sources means that you need to subvert a lot of different features of the system in order to get a useful influence on the output. I’m reasonably confident that any interference with this procedure is significantly harder and more likely to be detected than more traditional forms of vote rigging, and that the way the information is used in the system is transparent enough to satisfy any reasonable desire for accountability.