Open data and fair elections are incompatible

Due to, ah, reasons which will soon become apparent, I’ve been interested in opening up the voting data from elections which use more appropriate voting systems than FPTP. Unfortunately it turns out that this is a bad idea.

You should read the paper. It’s short, well written and very interesting. But the summary of it is that when you’ve got something like Majority Judgment or a ranked voting system like AV there are a lot of redundant bits of information in your vote – so much so that this means that pretty much any individual vote will be unique. Malicious parties can then force you to encode messages in your vote – e.g. by saying “Vote for this person first, then vote for the remainder in this distinctive order”. If they then have access to the anonymised set of votes cast they can look for that distinctive pattern in the results and thus confirm that you have voted as you were told to vote. Its absence proves that you didn’t, and its presence proves that with very high probability you did.

This attack is pretty universal – it holds for almost any voting system that allows you to express a complicated enough set of preferences to produce a fair result. I do however feel the need to point out that there is at least one fair voting system that is immune to this…

P.S. This post represents two new things I’m planning to try with this blog in the near future: More promotion of “this is a thing that I found interesting” that is a little bit better curated than you just reading my pinboard feed and shorter blog posts which just point something out rather than constituting a fully formed essay. We’ll see how this plan goes.

This entry was posted in voting on by .

5 thoughts on “Open data and fair elections are incompatible

  1. Nick

    At least in AV, one could mostly fix this by truncating the published results such that you only publish each person’s first n votes, where n is the number of rounds necessary to decide the election. In general this will hide almost all identifiable information. (Although it’s true one could make a vote identifiable by getting somebody to vote for a bizarre unique combination of tiny parties in their first three places, this gives you the quantum effect of being able to buy a useful vote, or an observable vote, but not both at once.)

    A slightly more general fix that could apply to ranking systems in general would be to truncate each vote to its longest prefix with multiplicity above a certain number, which should be picked randomly within a range (say 5 – 20) and not published. Any votes that remain below the threshold with a prefix length of 1 remain unpublished. So you might get results of the form

    50 ABCDEF
    35 BCAEDF
    30 ABC*
    20 BCA*
    25 A*
    19 other votes

    This doesn’t seem quite right. It seems like it might be possible to recover the multiplicity cutoff by looking at the small numbers in the vote tally, and then if you found out the cutoff was 7 and you had 6 flunkies vote for BFEDCA you can infer that the person you blackmailed to vote that way didn’t. However, it makes it much more difficult, and impossible to guarantee, since you don’t know the cutoff beforehand.

    1. david Post author

      The problem with the multiplicity solution is that you can’t actually recreate the election from the results: You can seed the tiny parties throughout the vote to make most prefixes unique.

      I think a similar problem breaks the AV solution: Given enough tiny parties on the ballot, all you need to do is force people to vote for certain combinations of them first. These will drop out in the initial rounds, but will still show up in the truncated ballot.

  2. Nick

    It’s true that you could manage to make most prefixes unique, though it’s not clear that there’s any benefit in doing so: by forcing that ordering, all you ensure is that few detailed results are visible. I guess this harms transparency, but unless the lack of transparency lets you completely rig the election, you can’t really gain from it.

    Likewise in the av case – you could force unique prefixes, but then you stand a large chance of somebody being elected before enough tiny parties are eliminated to make your purchased votes useful.

  3. Terence Eden

    Is it a problem for UK elections? The most candidates ever to stand were 15 (Sedgefield 2005). Sure, with that many candidates your vote is liable to be unique.
    With 8 candidates, there are 8! == 40,320 possible voting combinations. Although, if you assume that lots of people are voting for candidate #1, that’s down to 7! == 5,040. That provides some “guarantee” that your vote won’t be unique.

    Although, I suppose, it’s possible to craft a voting order that is unlikely for anyone to “naturally” vote for – Labour, BNP, Green, etc…

    1. david Post author

      It’s hard to answer this question. Part of the reason why we have so few candidates in UK elections is because we have a system that is not remotely amenable to large numbers of candidates: There’s very little incentive to vote for anyone who is not one of the top couple of candidates, most people will not run without a party, and you have a deposit that you lose unless you get 5% of the vote (admittedly the deposit is not massive). It seems very likely that given a better voting system, more people would run.

      Even supposing you didn’t get more people running, you could probably still get a reasonable amount of mileage out of a more common signature – instead of assigning a unique signature to everyone you assign an implausible signature and count the number that satisfy that. If it’s not enough, punish people you were coercing (either all of them or a random subset of them subject to how many voted that way). Certainly this is a less useful exploit though.

Comments are closed.