David R. MacIver's Blog: Turning deterministic strategy games into nondeterministic games of voting

Turning deterministic strategy games into nondeterministic games of voting

8 May 2012

I posted yesterday about using classic deterministic strategy games like Reversi (aka Othello) as a test bed for voting systems. The problem is that the experiment I described is actually quite hard to coordinate.

After some refinement of the idea I came up with the following randomization strategy for producing a game that, over enough runs, will produce something not totally dissimilar from this experiment.

The initial idea is to use a voting mechanism which can be repurposed into a variety of systems. I don’t know if this system has a name. I refer to it as gin ranking for historical reasons (I initially started using it on types of gin for a blind taste testing). It’s very simple: You first assign grades to everything, you then put the items in each grade in a strict order of preference.

This can then be turned into a vote in a variety of ways because it provides both grades and a strict ordering of the candidates: You can use a condorcet system, you can use approval voting on anything >= a certain grade, you can use range voting, you can use majority judgement, etc. So we don’t have to decide just yet what voting system we’re going to use (more on this later).

But we can also use this ranking in a single player game. Here is how you play Randomized Ranked Reversi:

It’s played on a normal reversi board with the normal reversi rules. All that is changed is that you may be started in an arbitrary board position and the way you make your move.

Your turn consists of being presented with a list of options for moves. I still don’t have a good sense of whether providing the list of all possible moves will be an overwhelming number. If is, you get presented a random subset.

You gin rank this set of moves. A move is then randomly selected based on your rankings. The exact mechanism doesn’t matter terrible - all that matters is that if you ranked A better than B you should be more likely to get A, and if you graded A better than B you should be significantly more likely to get it.

Your opponent can either be an AI or another player. For our purposes it will be an AI.

This gives a strong incentive to get your vote correct in all the particulars - ranking something well makes it likely to be picked, ranking something badly makes it less likely, and your grades then influence those probabilities even more strongly.

How do we use this game to run the previous experiment?

Well, the idea is we make a slight simplification and ignore history. All we care about is a board position (including whose turn it is) and the votes by users on moves from that position. So every play of the random game is a contributing a vote.

At any given time we have a population of people playing the game and a set of board positions we’re interested in the answer to. The goal is to get a minimum number of votes on each interesting board position.

When a new player starts they are given a random board position that we are currently interested in gathering more votes for. They then play randomized ranked reversi against an opponent who behaves as follows:

If there’s a move I could make that would put the player in a board position I’m currently interested, make that. If there is more than one, pick the one which currently has the most data already gathered
If there are existing votes for my position, pick one at random (preferring ones in which the player won) and run the RRR algorithm to pick a move from that
Fall back to some fairly dumb AI. Might even just pick a move at random

We can now use this to answer the experiment I previously proposed, as long as you restrict yourself to voting strategies that can be applied to gin ranking. You do it as follows:

In order to play a game as strategy X with N voters, you proceed as follows:

If we are in a state where we have M votes for that state we do one of three things:

If M is >= N, we sample a subset of size N from M (with or without replacement? Not sure), run the strategy on that sample and play the winning move
If M is < N but still pretty large we generate a population of size N by sampling with replacement and run the strategy
If M is too small, we submit it to our players as an interesting board position and get them to fill it out. We pause the simulation until they have done so.

What are the benefits of doing it this way?

has much lower coordination costs - people don’t need to wait for the other people in their group to vote in order to make progress
by not imposing a voting system up front you get a lot more reuse out of the data
It might actually be fun to play. I suspect playing the real version of the experiment would be a nightmare slog of waiting for other people to finish voting
If you institute some sort of leader board as an incentive to win, the players are given a fairly strong incentive to vote honestly and accurately

The main downside is that it suffers badly from sparsity of data - I expect 90% or more of positions in a given game will be unique. There are a lot of valid reversi boards. I don’t know how many, but I expect it to be on the order of 2^100. The fact that the valid moves are quite constraining and people will tend to make similar decisions should help, but I don’t know how much. It might be possible to help this by steering some subset of the players randomly assigned moves towards boards we want, but you can only do that so much before people cotton on and it starts to distort the results.

I still don’t know if I actually care enough to implement this, but in this incarnation it sounds like an interesting enough project that I might at some point.