The worst election

No, this isn’t about the recent US presidential elections.

In one of those “Yay I get to combine all my weird obsessions” moments, I’ve been playing around with using linear programming to construct provably minimal examples of specific election results. It actually works pretty well – it’s easy to to do for things like plurality voting, borda count and Condorcet winners, and it turns out to be almost possible to do for instant runoff voting (I haven’t figured out how to specify the instant runoff winner in a single linear problem, but you can specify an exact instant runoff drop-off order and then brute force by just considering all the relevant permutations).

Anyway, in the course of doing this I came up with the following really nasty small election example:

  • 6 votes of A, C, B, D
  • 3 votes of B, D, C, A
  • 3 votes of D, B, C, A
  • 1 vote of C, D, B, A

Why is this election so nasty? Basically the electorate is divided into two groups: A near majority who think A is great, and a bare majority who can’t agree on much of anything except that they hate A. If you want an interpretation of how these results came about, you could imagine that A is a right wing candidate in a slightly majority left-wing constituency, C is a centrist candidate, and B and D are two left wing candidates who are running against each other and have split the vote.

But regardless of interpretation, what results is an election that is very sensitive to which voting system you use. In particular four fairly common choices of system each elect a different one of the four candidates:

  • The plurality winner is A, with 6 votes.
  • The Borda count winner is B: The Borda scores are A at 31,  B at 35, C at 34, and D at 30.
  • The Condorcet winner is C:  7 voters (the first and last group) prefer it to B and D and 7 voters (everyone but the first group) prefer it to A.
  • The Instant Runoff Voting winner is D: In the first round C drops out with only 1 vs 3, 3 and 6 first choice preferences. This transfers a vote to D, causing B to drop out in the next round at 3 vs 4 and 6 first choice preferences. B then transfers their first choice votes to D as well, and finally D has a majority at 7 out of 13 votes, so D wins.

This is the smallest election with these winners and this specific drop-off order, in the sense that it has the fewest number of distinct votes cast (caveat: Where no individual vote has more than 100 voters casting it. This probably doesn’t matter), and amongst those elections with four distinct votes having this property it also has the smallest number of total votes (it also does some minimizing amongst those, but that’s more for aesthetics and doesn’t correspond to an obviously interesting property).

I originally thought all of the voting systems were doing badly on this election, but actually on further reflection I think they’re all behaving in an extremely typical fashion for them.

To frame this in terms of the interpretation above:

  • Plurality voting elects the right wing candidate because the left wing vote got split.
  • IRV collapses the two left wing candidates together and elects the one with slightly larger support due to the centrist tie breaker vote.
  • Borda vote does similarly but breaks the tie in the opposite direction because it counts the fact that the A voters prefer B over D for something.
  • The Condorcet candidate is a centrist compromise between the two sides that everybody can tolerate but nobody is thrilled about.

So which candidate you think is correct really depends on what you’re going for – there’s a legitimate question to be asked about how centrist you really want your elected leader to be, and about how justified that is in this case. If you decide that because you’ve got a bare majority the you should elect a more left-wing leader, the question of B vs D seems to come down to whether you want to go for a candidate the other side hates or one the other side really hates.

I don’t know about you, but right now having leaders the other side only hates sound pretty good to be honest. So although I didn’t intend this to be an argument for any of these systems, maybe Borda count comes out of this looking pretty good.

Given that, I feel the title is rather over promising matters. So here’s an even worse election:

  • 6 votes A, B, D, C
  • 5 votes C, A, B, D
  • 5 votes D, C, B, A
  • 2 votes B, C, A, D
  • 2 votes B, D, C, A

This has more or less the same properties as the above one: A is the Plurality winner, B is the Borda winner, D is the IRV winner.

However, C is no longer the Condorcet winner because there no longer is a Condorcet winner. Instead C is merely the Kemeny-Young winner, a system which always elects the Condorcet winner when there is one and has some claim to being the “best” such electoral system (in the sense of being the closest to capturing the same spirit as the Condorcet criterion), but is horribly impractical for any significantly large number of candidates.

To see why this happens, lets look at the majority preferences amongst the candidates:

  • 11 voters prefer A to B
  • 13 voters prefer A to D
  • B and C are tied with 10 votes each
  • 15 voters prefer B to D
  • 14 voters prefer C to A
  • 13 voters prefer D to C

So we have a Condorcet cycle where A is strictly preferred to B, B is strictly preferred to D, D is strictly preferred to C, and C is strictly preferred to A.

The Kemeny Young rule works by looking at all rankings of the candidates and giving them a cost associated by how much they disagree with this ordering. For each voter and each pair of candidates, the ordering incurs a cost of one if the voter disagrees with that ordering of that pair.

There are better algorithms for this than just considering all orderings of the candidates, but fortunately we only have four candidates so we can just brute force it to see why this works out this way.

The full list isn’t very informative, but here’s a partial list:

This gives us the following scores:

  • “C, A, B, D” scores 50
  • “A, B, D, C”, “B, C, A, D”, “B, D, C, A” and “C, B, A, D” each score 52
  • “B, A, D, C “scores 54
  • Everything else scores 58 or higher
  • “D, B, A, C” scores 70

(The example was specifically constructed to give “C, A, B, D” as the unique Kemeny-Young best ordering: In much the same way as with IRV, it’s easy to specify the Kemeny-Young ordering but hard to specify just the Kemeny-Young winner)

Because it’s the leader of the unique minimum cost ordering, C wins under the Kemeny-Young rule.

As you can see it’s quite a close fight – there isn’t a huge amount of difference between the lowest cost and the runner up cost, and in the second lowest cost orderings don’t really agree on who should win – the only thing that seems to be consistent is that it shouldn’t be D. This sort of close race tends to be why Kemeny-Young calculations become difficult very quickly.

I’m finding it hard to come up with an interpretation of this election, so lets run the instant runoff vote and see if it’s enlightening (I’m ambivalent about IRV as a system, but watching how votes transfer can help elaborate on coalitions).

Initially B drops out, having only four first choice votes. These split equally between C and D, leaving us with the following configuration:

  • 6 votes A, D, C
  • 7 votes C, A, D
  • 7 votes D, C, A

Now A, the previous plurality winner, has the lowest first choice votes so drops out and transfers votes to D:

  • 7 votes C, D
  • 13 votes D, C

So D wins the runoff over C.

So I think maybe the way to look at this is that A is an independent candidate of some sort that is distracting from the main stream set of candidates B, C and D. Lets see what happens to the election if we start by taking A out:

  • 8 votes B, D, C
  • 5 votes C, B, D
  • 5 votes D, C, B
  • 2 votes B, C, D

So B becomes the plurality candidate, with fully half the first choice vote. The majority preferences remain as they were (because they are not affected by the removal of A): C is still majority tied with B, B is still strictly preferred to D, D is still strictly preferred to C. So there’s still no Condorcet winner.

Removing has made instant runoff voting result in ambiguity though: C and D are tied as to who will drop out. If C drops out then it transfers votes to B, which wins. If D drops out, it transfers votes to C, which is now tied with B. In neither case does D, the original IRV winner, win! This is particularly interesting because the voters who put A first all strictly prefer B to D, and C to D, so removing their first choice candidate arguably results in a better result for them (if you break ties by flipping a coin, they get a better option 3/4 of the time).

The Borda winner (which can change by dropping a candidate) is still B.

I don’t know how enlightening that was, but it seems to reinforce the interpretation that B, C and D are an extremely fractious lot who are probably not going to get along, and unless we elect RON instead (the only candidate who can promise change!) we’re probably going to see the continuation of voting through other means.


If you enjoy these small election examples, I’m putting together a small ebook collating some of them. Let me know if you’d like to be a reader of an early draft.

Also (you’re going to be seeing messages like this a lot over the next year), if you like reading about this sort of thing and haven’t already, do consider voting with your wallet and sending some money to my Patreon for supporting my blogging here.

This entry was posted in voting on by .

Updating in favour of two-round delayed runoffs

I’m not a huge fan of IRV (Also known as AV or Hare voting), despite its status as being probably the most widely used single-winner ranked voting system in practice. It seems unclear whether it’s actually better than plurality voting, but it’s certainly not a lot better than plurality voting. About its only real redeeming value is that Single Transferable Vote is built out of it (and my generalized single transferable vote is still basically a form of IRV in disguise).

Still, people use it, which makes it worth thinking about and studying.

However, some examples I ran into yesterday have caused me to start to think that there are simple variations that are strictly better than IRV in the same way that Approval voting is strictly better than Plurality voting: It’s not that these systems are intrinsically amazing, it’s just that they are (or in this case may be) in all ways better than the systems they replace.

The variation is this: Instead of running potentially as many rounds as candidates, you only ever run two rounds. In the first round, everyone scores as per their plurality (first choice) votes. In the second round, the top two candidates from the first round stay in and everyone else drops out. You then elect whichever of those two candidates the majority prefers.

I’m not yet wholly convinced this is a strict improvement on IRV, but it leads to some things that I think in many cases will be.

The major reason I prefer it to IRV is that it acts as a rather pleasant blend of plurality and Condorcet method: As long as the Condorcet winner is one of the top two plurality winners, this will always elect the Condorcet winner.

Naturally the Condorcet winner is not always one of the top two candidates, but this is still a much stronger claim than IRV can make. The following election is one where the Condorcet winner is also the Plurality winner, but loses in IRV!

23 voters are electing a winner from 5 candidates: A, B, C, D and E.

  • 7 votes of A, B, C, D, E
  • 6 votes of D, A, B, C, E
  • 5 votes of B, A, C, D, E
  • 3 votes of C, B, A, D, E
  • 2 votes of E, D, A, B, C

The Plurality and Condorcet winner is A, but the IRV winner is B. The two round IRV winner is of course A because it is both plurality and Condorcet winner.

It can of course also go the other way when the Condorcet winner is not in the top two Plurality winners:


14 voters are electing a winner from 4 candidates: A, B, C and D.

  • 5 votes of D, B, A, C
  • 4 votes of A, B, C, D
  • 3 votes of B, A, C, D
  • 2 votes of C, B, A, D

The IRV and Condorcet winner is B, but the two-round IRV winner is A (the Plurality winner is D here): In this case the Condorcet winner is the third place candidate with the Plurality votes. In full IRV when C drops out it transfers enough votes to B to keep it in the race so A drops out instead of it and then it beats D, but in two-round IRV only A and D make it to the second round, and then A beats out D on majority preferences.

In many ways Two Round IRV is out-IRV-ing IRV here: The way in which IRV tends to fail to elect Condorcet winners is that it gives too strong a weight to first choice preferences. So this is a failure mode that is already present, it just happens that IRV manages to avoid it in this example and two-round IRV does not. I do think that B would be a much more solid choice of winner here (as well as being the Condorcet winner, it’s also the Borda winner), but I’m not very surprised that a system would get it wrong.

So my current feeling from these examples is that two-round IRV is not obviously worse than IRV and that the promise of “often” giving the Condorcet winner is a modestly strong recommendation for it over IRV.

I haven’t actually done the simulations to check (I couldn’t get the original code to compile and haven’t yet put in the effort to write my own), but I would also expect two-round IRV to look a lot smoother than full IRV because of this criterion – IRV has a lot of weird spiky edges in the geometry of voting that I would expect to be damped down by the fact that it usually agrees with the Condorcet winner where that’s in the same rough region as the Plurality winner.

So that’s my evidence for two-round IRV being surprisingly competitive with full IRV and possibly better. Additionally, it’s easier to explain and simpler to follow the reasoning of an election. IRV is not itself that hard to explain or follow the reasoning of, but it’s nice when a system can both be better and simpler.

In and of itself these differences aren’t very interesting and I suspect there’s likely not much in it, but it leads to additiona interesting variations. In particular:

  1. If we’re only doing two rounds, why on earth are we using such a bad voting system for selecting who makes it into the second round? Why not e.g. use Borda count and then do a majority runoff between the top two?
  2. If we’re only doing two rounds, is it really worth making it an instant runoff?

Both of these questions stem from a single root question: Why are we making voters go to all the work of providing a full ranking and then using so little of it?

Full rankings are quite a lot of work (even partial rankings are a fair amount of work), and while IRV in theory uses most of your ranking, two round IRV uses very little of it – it only looks at your first choice preferences and then the order in which you put two pairs. Why not just split this out into a normal non-instant runoff?

And, while you’re at it, if you’re using a normal non-instant runoff why are you bothering to use plurality voting for the first round? As I mentioned at the top: Plurality voting is a sign you should be using Approval voting.

So, if you’re using IRV why not consider the following non instant variation? My suspicion is that it will be outright better:

  1. in the first round, vote for candidates with Approval voting.
  2. In the second round, pick the top two candidates from the first round and do a simple majority vote between them.

It does require people to go out to the polls twice, which may be a reason to prefer IRV and variants for large scale elections (though there are plenty of examples of people doing it anyway). However for a lot of use cases it’s very practical to just run two rounds. e.g. I discussed the best democratic solutions for picking a lunch venue recently, where the polls are literally just people raising their hands. It’s not substantially harder to just pick the top two options from the previous election and hold another election for lunch venue.

So, for many and possibly most use cases if you’re tempted to use IRV I would recommend using the above delayed runoff system instead.

This may also be an improvement on straight Approval voting too: One of the problems I have with Approval voting is that it lacks the ability to express a nuanced distinction between two candidates: You can say “I’d be OK with either of these candidates” but you can’t say “But given a choice between them I’d definitely pick this one”. Adding a follow up runoff election adds some of that nuance back in.

One question is whether you always need the runoff election. Suppose only one choice makes it through the first election with a majority of approvals. Should you still run a runoff between that and the second choice?

My suspicion is yes, not because I expect it to reverse the decision in that case but because I think it’s likely to reduce tactical voting in the first stage: If there’s the possibility of no runoff election, you’ll want to very carefully choose your approvals so that your favourite candidate makes it to majority and any other candidate you favour doesn’t.

I don’t know how big a factor that is, and always holding the runoff certainly doesn’t eliminate tactical voting (nothing will except ignoring peoples’ votes in the first round and replacing it with a pure lottery over all the candidates. For many cases this might be a reasonable thing to do, but people seem not to like it very much).

This piece will end without a resounding conclusion, because I haven’t done enough work or reading to decide for certain, but I do feel fairly confident in the claim that a delayed two-round runoff with approval voting in the first round will often be a better choice than IRV and you should consider using it.

Also, if you enjoy these small election examples, I’m putting together a small ebook collating some of them. Let me know if you’d like to be a reader of an early draft.

Finally (you’re going to be seeing messages like this a lot over the next year), if you like reading about this sort of thing and haven’t already, do consider voting with your wallet and sending some money to my Patreon for supporting my blogging here.

This entry was posted in voting on by .

Small elections comparing Majority Judgement, Range Voting and Condorcet winners

I was interested in how range voting, majority judgement, and Condorcet winners interacted, so I thought I’d put together some small elections to demonstrate this. Perhaps unsurprisingly the answer is they’re fairly orthogonal.

The setup is that we’re considering 5 point scores, 0 through 4 (or if you prefer you can interpret these as “Terrible, bad, OK, good, great” for the majority judgement case).

Here is an election where the range winner is the Condorcet winner. We have three voters voting between two candidates, and they assign scores as follows:

  • 3, 4
  • 3, 2
  • 0, 1

The first candidate has a range voting score of 6 and the second has a range voting score of 7, so the second wins the range vote. But the median score for the first is 3, while the median for the second is 2, so the first candidate wins the majority judgement vote. However, two out of three voters (the first and the third voter) prefer the second candidate to the first, so the second is also the Condorcet winner.

Conversely, in the following election the majority judgement winner is the Condorcet winner:

  • 0, 1
  • 0, 1
  • 3, 0

The first candidate is the range voting winner because it totals 3 rather than 2, but it’s the majority judgement loser because it has a median score of 0 rather than 1. The Condorcet winner is also the second candidate, because the first two voters prefer it.

Sometimes range voting and majority judgement agree but they agree on someone who isn’t the Condorcet winner:

  • 0, 1
  • 1, 2
  • 2, 0
  • 2, 0
  • 2, 3

The first candidate scores 7 vs the second’s 6, so it’s the range winner. It also has a median of 2 vs the second candidate’s 1, so it’s the majority judgement winner. But the first two and the last voters strictly prefer the second candidate, so the second candidate is the Condorcet winner.

Finally, here’s an example where all three disagree:

  • 1, 2, 0
  • 2, 0, 3
  • 3, 4, 3
  • 3, 4, 3
  • 4, 0, 3

The first candidate is the range voting winner (13 vs 10 and 12), the second candidate is the Condorcet winner (3 out of 5 voters strictly prefer it) and the third candidate is the majority judgement winner (median 3 vs 2 for the second. It ties on the first median with the first candidate, but on removing one 3 score from each candidate the first candidate breaks downwards to 2 while the third stays at 3).

I don’t have strong opinions on what the right answers for any of these elections are, other than “it’s complicated”. In particular I don’t think any of the three options (Condorcet, Range and Majority Judgement) obviously produce the best outcome in all of these examples.

My rough intuitive judgement (formed without explicitly reminding myself of which systems won which, though I’m probably still biased by that knowledge) goes:

  • There’s not much in it but the second is slightly better (range vote, which is also Condorcet)
  • This election sucks, but the second candidate sucks less (majority judgement, which is also Condorcet)
  • First candidate seems a much better compromise (majority judgement and range voting)
  • Third candidate  seems pretty universally popular other than the first voter who kinda hates everyone, so seems the best option (majority judgement only).

So on this blatantly biased sampling of four elections I like the range voting winner on three of them, the majority judgement winner on three of them, and the Condorcet winner on two of them. I don’t think that significantly updates my opinions about any of the three mechanisms, but it’s still interesting to see the comparison.

This entry was posted in voting on by .

2017 Goal: Recurring Revenue

It’s a bit early for new years resolutions, but I don’t believe in new years resolutions anyway so oh well.

I’ve been self-employed for nearly two years now, and an ongoing question is whether I should stay that way. I keep moving the goal posts for when it’s time to quit and get a day job, so this post is my attempt to fix a set of goal posts (or, well, to fix the derivative of the goal post’s position over time. More on that in a bit), as setting concrete decision criteria is the best way to deal with the various biases my brain throws out.

First, cards on the table: The reason this is a question at all is that from a financial point of view my attempts to go it alone are so far a failure. I am currently earning significantly less than I did in my first year as a software developer.

I didn’t go into this for the money – instead I wanted the freedom to work on things like Hypothesis, which I genuinely think makes the world of software development a much better place but nobody would pay me a salary to work on – but unfortunately I do still need money to live, and the financial aspect has become a significant enough source of stress that it’s come to dominate my decision making around this.

Additionally, I have the major problem that none of this is recurring – everything I’ve earned in the last year has been the result of artisanal hand-crafted contracts, largely of a form which have not been amenable to repeating. On top of that, there have been a number of potential contracts which I sunk a lot of work into getting that ended up not panning out, which is extra depressing when it happens.

The result is a weird sort of boom and bust cycle where when things are going well I’m making a tonne of money, but things are not going well often enough that it adds up to doing well in aggregate, and I can never count on any of it until the money is more or less in the bank.

Some/all of this is because I am bad at sales and marketing, and i should learn to do better at sales and marketing, but my track record so far does not suggest that I am going to do that at a rate which is sufficient to help.

The result is that this is all significantly more stressful than just earning a salary would be, and that it sucks most of my desire to do this because as well as making the whole process unenjoyable it also causes me to significantly doubt that what I’m doing is worthwhile – if people tell you that they love your stuff, but that the price they’re willing to pay for it is £0, is that really a worthwhile thing to be doing?

So it’s time to refocus. If the problem is this boom and bust cycle, the solution is to get a steady baseline. As you have probably guessed from the title, my goal for 2017 is recurring revenue.

The following are the rules that I’m setting for myself:

  • My goal is £2000/month recurring revenue.
  • My base deadline for achieving this goal is end of December 2017.
  • Every £5000 of non-recurring revenue allows me to push the deadline date back one month (but I should think long and hard about whether it’s worth doing so).
  • If things look obviously hopeless before then I may quit early.
  • If by the time the deadline arrives I have not achieved the goal, I am going to go get a normal person day job (though I won’t abandon the things I’m doing to get the recurring revenue).

This isn’t a lot of money. £2000/month is still less than my first year salary as a developer (which was £27.5k / year + a bonus scheme), but the important thing about it is that it’s more than enough to live on, which makes any revenue I make on top of that a nice bonus and makes money stop being a source of stress.

Note that the goal is recurring revenue, not passive income. I’m not looking to earn money for doing no work, I’m just looking to reliably earn money. Retainer schemes where I do, say, a day of work a month for a company sound great.

Concrete planning and how you can help

On my end, I have two or three possible projects – one in the works, one in planning, one in a vague concept stage – that should help with this. I don’t think between them they’ll make £2000/month by the end of 2017, but they might get me a decent chunk of the way there. I will promote these heavily to you when they are in a suitable state for that.

I now have a Patreon for this blog. I don’t expect it to be even close to enough to achieve this goal on its own, but in my most optimistic moments I think it might maybe get me a quarter of the way there. So if you enjoy my writing and would like me to continue it at my fairly prolific rate, please do consider becoming a patron. Note: This Patreon will self-destruct on the recurring revenue deadline date if it has not reached $500/month by then.

Finally, if you work for a software company (or any other sort of company!), do consider if there’s some sort of way I can help you on an ongoing basis – from a corporate point of view, £2000/month is peanuts. If I can get, say, five companies who are interested in paying for a day of work a month from me on an ongoing basis then I would immediately blow this goal out of the water and I could go back to worrying about more interesting things. So, think about it. Email me if you want to discuss how I can help (with no implied commitment on your part – I’m happy to just talk about it) – it doesn’t have to be Hypothesis related at all. It turns out I know quite a lot about this software lark and I’m more than happy to help people develop it in any way they’d find useful. I’d also be happy to help with mentoring new developers if you want someone to take the load off your existing senior developers.

This entry was posted in Business, life on by .

The littlest infinity

Note: This is mostly all stuff you’d probably learn in your first six months of a maths degree, I just like occasionally chewing on the fundamentals and seeing how they fit together. There’s nothing very deep here, but it might be pedagogically interesting.

You’re probably familiar with the natural numbers: They’re a set \(\mathbb{N}\) with a distinguished element \(0 \in \mathbb{N}\) and a successor function \(S: \mathbb{N} \to \mathbb{N}\) which is injective, has \(0 \ne S(n)\) for all \(n \in \mathbb{N}\) and has the property that if \(A \subseteq N\) with \(0 \in A\) and \(S(A) \subseteq A\) then \(A = \mathbb{N}\) (“the induction principle”).

The natural numbers are the most basic example of an infinite set, and are usually postulated to exist as an axiom of set theory. We then construct all other infinite sets starting from \(\mathbb{N}\) using e.g. power set operations.

But what is an infinite set?

There turn out to be more or less three ways of defining this, all of which are equivalent (under certain assumptions).

One way of defining it is any set which is not a finite set, but that reduces to the question “What is a finite set?”. To which the answer is any set which is in bijection (one to one correspondence) with one of the sets \(\{i \in \mathbb{N} : i < n\}\) for some \(n \in \mathbb{N}\) (you need to define the relation \(<\) on \(\mathbb{N}\) for this, but lets assume we’ve done that in some standard way e.g. by induction).

You can show inductively then that \(\mathbb{N}\) has no injection into a finite subset of itself (and thus is infinite) as follows: It’s certainly not in bijection with the empty set, which is \(\{i: i < 0\}\). Suppose you had \(f: \mathbb{N} \to \{i \in \mathbb{N} : i < n + 1\}\) a bijection. By composing it with a swap if necessary, you can assume that \(f(0) = n\). But then \(f \cdot S\) is a bijection with \(\{i \in \mathbb{N} : i < n \}\), which contradicts our induction hypothesis.

Another definition of a set being infinite is that it contains a bijective copy of \(\mathbb{N}\) (i.e. there is an injection from \(\mathbb{N}\) into it).

If there is such an injection then the set is certainly infinite in the sense of not being finite, as otherwise we’d be able to compose the injection from \(\mathbb{N}\) into it with its bijection to a finite set, which contradicts what we showed above about \(\mathbb{N}\) being infinite.

To show the opposite direction we need the axiom of choice (we actually only need a weak form of it, but we’ll ignore that detail): For any set \(X\) there is a choice function \(c : \mathcal{P}(X) \setminus \{\emptyset\} \to X\) with \(c(A) \in A\).

If \(X\) is infinite in the sense of not being finite, we can then construct a function \(f: \mathbb{N} \to X\) as follows:

  • \(f(0) = c(X)\)
  • \(f(S(n)) = c(X \setminus \{f(0), \ldots f(n)\})\)

This is well defined because the set we’re passing to \(c\) must be non-empty, or we’d have constructed a bijection with \(\{0, \ldots n\} = \{i : i < n + 1\}\).

This definition of infinity justifies the fundamental role of the natural numbers (and the title) to some degree: They’re literally the smallest infinite set, because a copy of them is contained in every other infinite set.

But finally, there is a third definition of infinity which has no reference to the natural numbers at all: \(X\) is infinite if there is some injective function \(f: X \to X\) whose range is a proper subset of \(X\). i.e. \(f\) is injective but not surjective. You can justify this by analogy with the natural numbers as our basic example of an infinite set: \(S\) is such a function, because it is injective but does not map anything to \(0\). You can also show inductively that all of the sets \(\{i: i < n\}\) do not have this property (more or less by the same argument we used to show that there was no injection from \(\mathbb{N}\) into them).

Lets see that this is equivalent to the previous definition.

First, suppose we’ve got some injection \(h: \mathbb{N} \to X\). We can construct such an \(f\) as follows:

  • if \(x \neq h(n)\) for any \(n\), then \(f(x) = x\)
  • Let \(f(h(n)) = h(S(n))\)

i.e. we define our function to be the successor function on our copy of \(\mathbb{N}\) and the identity elsewhere.

This is still injective, because it’s injective on each member  of the partition into \(h(\mathbb{N})\) and \(X \setminus h(\mathbb{N})\), and it doesn’t map either part into the other. It is however not surjective, because it maps no values to \(h(0)\).

So a set which contains a copy of \(\mathbb{N}\) must necessarily be infinite in this sense.

Now, suppose we have such an \(f\). We can use this to construct an injection from \(\mathbb{N}\) into \(X\) as follows:

  • Pick \(h(0)\) as an arbitrary element of \(X \setminus f(X)\)
  • Let \(h(S(n)) = f(h(n))\)

We can prove this is injective by inductively showing that \(h(n) \not\in \{h(m): m < n\}\)

Proof: Trivially true for \(0\) because that’s the empty set. For \(S(n)\), note that again \(h(S(n)) \ne 0\) because by choice of \(h(0)\) we must have \(f(x) \ne h(0)\) for any \(x\).

Supposing \(h(S(n)) = h(m)\) for some \(0 < m \leq n\), we must have \(f(h(n)) = h(m)\). But we can write \(m = S(k)\) for some \(k\) because \(m \neq 0\), so we have \(f(h(n)) = f(h(k))\). \(f\) is an injection, so \(h(n) = h(k)\). By our inductive hypothesis, \(h\) is an injection on \(\{1, \ldots, n\}\), so we must have \(n = k\). Therefore \(h\) is an injection as desired.

So we have three notions of infinity, all equivalent (up to the axiom of choice). Why do we care?

Well, one interesting thing about the last definition is that it’s a notion of infinity that has nothing to do with the natural numbers. This has the nice benefit that we can run it backwards: If we have a set that is infinite in that sense, then we can construct the natural numbers out of it. This is a pretty strong justification of the natural numbers as our fundamental notion of an infinite set.

To see how to do this, suppose we’ve got a set \(X\) and an injective function \(S : X \to X\) which is not surjective.

Pick some distinguished element \(0 \in X \setminus S(X)\).

Now consider the family \(\mathcal{F}\) of sets \(U \subseteq X\) with the properties that \(0 \in U\) and \(S(U) \subseteq U\). This family is nonempty because trivially \(X \in \mathcal{F}\).

Now, define \(N = \bigcap \mathcal{F}\).

We must have \(N \in \mathcal{F}\), because \(0 \in N\) and if \(U \in \mathcal{F}\) then \(S(N) \subseteq S(U) \subseteq U\), so taking intersections over all \(U \in \mathcal{F}\) we have \(S(N) \subseteq \bigcap \mathcal{F} = N\).

Moreover, if \(A \subseteq N\) with \(0 \in A\) and \(S(A) \subseteq A\), then \(A \in \mathcal{F}\), and thus \(N \subseteq A\), and so \(N = A\).

But these are precisely the requirements for \(N\) to be a model of the natural numbers, \(\mathbb{N}\)! We have a distinguished element \(0\), an injective function \(S\) that doesn’t map anything to \(0\), and it satisfies the induction principle. And, importantly, we got here without assuming that anything that looked like the natural numbers existed: All we required was the apparently weaker claim about some form of infinite set existing.

So pretty much wherever you start in defining infinite sets, you’re going to end up with the natural numbers at some point.

In the absence of at least some weak form of the axiom of choice these equivalences break down a bit, but they’re still compelling enough and strong enough that they fairly firmly place the natural numbers as the foundational object of infinite set theory.

This entry was posted in Numbers are hard on by .