Category Archives: Uncategorized

Binary tournaments

One of the problems with the tournament model I’ve been discussing is it’s not a very realistic model of how tournaments are normally played.

A more normal tournament design is to split the set of players into two groups, let the two groups duke it out recursively and then play the winners.

You actually can model this in the tournament setup I’ve described (you draw the hierarchy then pick a pair with a lowest common ancestor), but it’s not very natural. This seems like it needs special treatment.

It also helps that there are much fewer such tournaments. You can canonically represent every such tournament as a list of players you recursively divide down the middle, so there are at most \(O(n!)\) such tournaments. Given the bounds on the set of all tournaments this is actually a huge improvement.

I’ve pushed some initial code for such tournaments. I don’t currently know what the best way to optimise them is though. For small sets of candidates we can simply brute force try all the tournaments, but there should be a more efficient way.

One notable difference is that you can’t rig the tournaments quite so thoroughly: e.g. the strategy for boosting a candidate previously was to make everyone else fight before they reached this candidate, whileas now every candidate fights \(O(log(n))\) other candidates to be the winner. In many cases this isn’t a massive problem because you can just pair the candidate against much weaker candidates, but it still helps balance things out a bit.

This entry was posted in Uncategorized on by .

The geometry and optimisation of tournaments

So in the type of game we’re playing (part 1, part 2), we play a binary tournament.

Formally, we have a set \(\mathcal{C} = \{1, \ldots, n\}\). A tournament on these candidates is a function \(f\) on the subsets of \(\mathcal{C}\) of size \(> 2\) such that \(f(A)\) is a probability distribution over two-element subsets of \(A\). Let \(\mathcal{T}(X)\) be the set of tournaments on \(X\) (the set of tournaments on a set of fewer than two elements is defined but empty)

Given tournaments \(t_1, \ldots, t_n\) we can form a convex combination of them by playing tournament \(i\) with probability \(p_i\) at each step. Some inspection shows that every tournament is a convex combination of (potentially rather a lot of) deterministic tournaments – that is, tournaments such that at each step they produce a single pair with probability 1.

How many such deterministic tournaments are there?

Well, there are \(n \choose k\) subsets of size \(k\). Each of these then has \(k \choose 2\) possibilities to map to. So there are

\(\tau(n) = \prod\limits_{k=3}^n {k \choose 2}^{n \choose k}\)

This is O(pretty hardcore). \(\tau(3) = 3\), which is fine, but \(\tau(4) = 486\) and \(\tau(5) = 4591650240\). I’ve not done precise bounds calculations, but it’s at least \(O(2^{2^n}\).

Anyway, the point being that this space is bloody large. Although convex and linear minimizations are a thing that we can do relatively easily, this space is so infeasibly large that this becomes not an option. You probably can run the simplex algorithm on a 4591650240 dimensional space, but it’s going to take a long time

So why did I think I could do optimisations on it?

Well for some functions, you can. Unfortunately this turns out to be a subset of the functions I used the relevant algorithm for. Basically the question is whether you can optimise a tournament based on optimising its subtournaments. If you can then this is “easy” in the sense of only having to perform \(O(2^n)\) optimisations on \(O(n)\) dimensional spaces. Compared to our above super-exponential function this isn’t so bad at all.

Let \(t\) be a tournament and let \(A \subseteq C\). Through a slight abuse of notation, define the restriction \(t|_A\) to be the restriction of \(t\) to subsets of \(A\).

Let \(V\) be a victory distribution. That is, \(V : \mathcal{C}^2 \to \mathbb{R}\) with \(0 \leq V(i, j) \leq 1\) and \(V(i, j) = 1 – V(j, i)\). \(V(i, j)\) is the probability that \(i\) beats \(j\).

We define the collapse of a tournament \(t\) with respect to \(V\), which we will write \(c(t, V)\), to be the probability distribution on \(\mathcal{C}\) you get by playing the tournament with this victory function and returning the winner: That is, at each stage from the remaining candidates you use the tournament to pick a pair and the victory function to decide which one of that pair drops out.

Theorem: Let \(f : \bigcup\limits_{A \subseteq \mathcal{C}} \mathcal{T}(A) \to \mathbb{R}\). Suppose there exists a victory function \(V\) and a linear function \(g : \mathcal{L}(\mathcal{C}) \to \mathbb{R}\) (where \(\mathcal{L}(A)\) is the set of random variables taking values in \(A\)) such that \(f(t) = g(c(t, V))\). That is, \(f\) is \(g\) applied to the victory distribution. Then \(f|_{\mathcal{T}(\mathcal{C})}\) is maximized by a tournament \(t\) such that for every \(A \subseteq T\), \(t|_\mathcal{A}\) also maximizes \(f_{\mathcal{T}(A)}\). Further, every tournament maximizing \(f_{\mathcal{T}(A)}\) is the restriction of one which maximizes \(f|_{\mathcal{T}(\mathcal{C})}\). Further, these maxima or minima are achieved by a deterministic tournament.

Proof:

This is because \(c(t|_A, V)\) is a non-negative linear combination of \(\{ c(t|_{A \setminus \{x\}}, V) : x \in A\}\) and thus since \(g\) is linear, \(f(t_A)\) is a non-negative linear combination of \(\{f|_{A \setminus \{x\}} : x \in A\}\). Thus any increase or decrease in these corresponds to an increase or decrease in \(f(t_A)\). Further, because \(f(t_A)\) is only a function of the \(f(t_{A \setminus \{x\}})\), any maxima/minima will work.

Why can we pick a deterministic tournament? Well because linear functions take their extreme values on extreme points, and the extreme points are the deterministic tournaments.

QED

In particular this works absolutely fine for finding the best and worst victory probabilities for a given candidate, because these are simple linear functions of the victory distribution (indeed they’re just coordinate functions). So those numbers I quoted last time are perfectly fine.

However the entropy numbers I quoted… might be the most extreme due to the extreme simplicity of the example I used, but in general adopting this approach to finding entropy extreme is completely bogus. Because entropy is concave the entropy maximization approach will probably work pretty well (because the entropy is greater than the entropy of the subtournaments). It might even find the maximum entropy deterministic tournament, I’m not entirely sure, but it will not in general find the maximum.

 

This entry was posted in Uncategorized on by .

Lets play a game

Edit: I’ve noticed that my optimisation functions, well, don’t actually optimise. They successfully produce a somewhat optimal value of the function they’re optimising and produce valid tournaments, but in order to produce a global optimum they make certain assumptions about the shape of the function they’re optimising that are not valid. So take any claims of optimality in this post with a pinch of salt.

The game we’re going to play isn’t very fun. But that’s OK, because it’s not actually going to be us playing it. This is a very simple game that is designed as a starting point for exploring some of the properties of the tournament rigging library I’m working on.

The game we’re going to play is good-card bad-card. Each player has a deck. This deck has two types of cards in it: Good and bad.

Play proceeds as follows: Each player draws a card. If one of them draws a good card and the other draws a bad card, the player with the good card wins. Else if they both have the same card, they shuffle their cards back into their respective decks and play another round. They do this until someone wins a round.

The interesting feature of the game comes from the fact that different players have different fractions of good cards in their hands (every player has at least one good card and at least one bad card, preventing the possibility that both of them will be sitting there endlessly drawing the same card). This means that in any given situation one player may have a clear advantage.

Suppose we have players \(0, \ldots, n\), and suppose player \(i\) has a fraction of good cards \(g_i\). Then the probability of player \(i\) winning a round against player \(j\) is \(g_i – g_i g_j\) (because this is the probability of \(i\) drawing a good card minus the probability of them both drawing a good card). This means the probability of \(i\) winning the whole game is \(\frac{g_i – g_i g_j}{g_i + g_j – 2 g_i g_j}\) (because the probability of winning the game is proportional to the probability of winning any given round).

Lets do a worked example of what this looks like:

In [1]: import tournament as t
In [2]: import numpy as np
In [3]: x = x = np.array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])
In [4]: Ax = t.good_card_bad_card(x)
In [5]: t.victory_probabilities(Ax, t.even_selector)
Out[6]: array([ 0.00128291,  0.00392595,  0.00865896,  0.01682629,  0.03099347,
        0.05650568,  0.10608786,  0.21727215,  0.55844674])

This is the probability of victory when each pair of contenders is picked at random from the set of remaining pairs. In this particular model of tournaments (where there’s no state to pair off people evenly) this seems the “fairest” and, unsurprisingly, the better your score the better your chances of winning. To almost a surprising degree really: The deck with 90% good cards is more than twice as likely to beat the deck with 80% good cards, despite the fact that in a head to head fight it only has a 69% chance of winning.

Two other… I won’t say obvious, but at least somewhat natural, choices are to maximize and minimize the entropy of the results. Maximizing the entropy could be regarded as keeping the competition exciting, minimizing could be regarded as trying to highlight the actually best candidate.

In [7]: t.victory_probabilities(Ax, t.max_entropy_selector)
Out[7]: 
array([ 0.10300318,  0.10353773,  0.08934311,  0.07525026,  0.06495158,
        0.05973063,  0.06176968,  0.08015912,  0.36225472])
 
In [8]: t.victory_probabilities(Ax, t.min_entropy_selector)
Out[8]: 
array([ 0.00162647,  0.00362818,  0.0052852 ,  0.00680589,  0.00863982,
        0.01165419,  0.01830958,  0.08015912,  0.86389154])

The max entropy tournament is… weird. It results in the lower scoring items being more likely to win than the mid scoring items! Why does that happen? Well, basically because even competitions are higher entropy than unbalanced ones (entropy is approximately “how much information does knowing the result of this event give me on average?”. When an event is a foregone conclusion knowing its outcome gives you very little information).

An interesting thing is how these relate to the tournaments which are rigged for or against the top candidate:

In [9]: t.victory_probabilities(Ax, t.boosting_selector(8))
Out[9]: 
array([ 0.00162647,  0.00362818,  0.0052852 ,  0.00680589,  0.00863982,
        0.01165419,  0.01830958,  0.08015912,  0.86389154])
 
In [10]: t.victory_probabilities(Ax, t.killing_selector(8))
Out[10]: 
array([ 0.00178168,  0.01591364,  0.01091917,  0.03665989,  0.06297012,
        0.11254326,  0.13593336,  0.26102417,  0.36225472])

Interesting to note is that the probability of the top candidate winning is the best possible under the minimum entropy tournament and the worst possible under the maximum entropy tournament. This isn’t terribly surprising given what they’re doing: In the minimum entropy tournament it pays to keep the top candidate around. In the max entropy tournament it pays to kill them off.

A thing in there I’m a little suspicious of is that the min entropy tournament appears to correspond exactly to the tournament designed to kill off the top candidate. This is suspicious because when the top candidate has been eliminated this tournament falls back to even selection, which is not the mininum entropy solution in general. I think this is a bug in my entropy minimization/maximization code which I’m going to have to think about how to solve – the problem is likely that while it correctly chooses the single pair that will optimize the entropy of the child, that’s not actually necessarily the best way to optimize the overall entropy because we can choose a distribution over tournaments. For minimization I think we want to always pick a single tournament, and for maximization I think we might want to pick a more complicated distribution over tournaments. So take this section with a grain of salt: These are still valid tournaments with low and high entropies respectively, but they may not be the global minima and maxima.

One thing that does seem to be correct is if we look at the entropies of all the tournaments designed to boost a single candidate, the entropy decreases as the score of the candidate increases:

In [11]: np.array([t.entropy(t.victory_probabilities(Ax, t.boosting_selector(i))) for i in xrange(9)])
Out[11]: array([ 1.96741165,  1.89865646,  1.80417052,  1.69265673,  1.56505675,
        1.41703054,  1.23658575,  0.99217258,  0.5873777 ])

Lets take a look at what the best and worst probabilities for each candidate are:

In[12]: np.array([t.victory_probabilities(Ax, t.boosting_selector(i))[i] for i in xrange(9)])
Out[12]: 
array([ 0.10300318,  0.18244198,  0.24852718,  0.30746763,  0.36347069,
        0.4205425 ,  0.48460246,  0.57025324,  0.86389154])
In [13]: np.array([t.victory_probabilities(Ax, t.killing_selector(i))[i] for i in xrange(9)])
Out[13]: 
array([  9.35042740e-10,   3.05782789e-07,   8.93073705e-06,
         1.02246579e-04,   7.25760000e-04,   3.93070198e-03,
         1.83095795e-02,   8.01591233e-02,   3.62254715e-01])

So unsurprisingly these are still increasing with score. Further, your worst candidate can never be that likely to win, and your best candidate is never that likely to lose: The worst victory probability for the top candidate is still more than three times better than the best victory probability for the worst candidate.

I’m not really sure where I’m going with this, except to say “yep, for this very simple game tournaments mostly align with our intuition”. For this simple game with a straight linear ordering of the candidates in terms of quality, you can distort the results quite far, but not impossibly far.

Next I’m going to look at some more interesting games where you can have non-transitive victory probabilities. I’m not yet sure, but I suspect you can get much more interesting behaviour there.

This entry was posted in Uncategorized on by .

Consider the tournament

Due to reasons, I found myself thinking about the following problem:

We have a two-player game with players \(1, \ldots, n\). Player \(i\) beats player \(j\) with probability \(A_{ij}\) ( so \(A_{ij} = 1 – A_{ji}\)).

We wish to run a tournament with these players. The tournament is played as a sequence of games between pairs. The loser of the two drops out, and we choose another pair from the remainder until there is only one left (assume that a player winning any game they participate in is not affected by previous games they’ve played).

My question is this: How much can we influence the result of the tournament by changing which pairs play against eachother?

The short answer is “a lot”. There are clear upper and lower bounds on a given player winning (the upper bound on player \(i\) winning is \(\max\limits_{j} A_{ij}\) – they don’t play until the very end and then play against the player they are most likely to beat – while the lower bound is \(\prod\limits_{j} A_{ij}\) – they play against every other player\), but there’s a lot of variation in there and these bounds won’t be achievable for all tournaments.

So how to resolve this question? Well, I wrote some code. Have ye a nice little python module for calculating how to rig tournaments.

(Well, it doesn’t actually tell you how to rig tournaments, though it could easily be modified to do so. Instead it tells you the probability of victory under various strategies. Some of those strategies are “recursively maximize this function”).

I’m going to explore this more later, but if I write any more on this right now I’ll be late to work, so in the meantime this is just a “Here’s some code! If you’re interested in this question, feel free to play around with it”.

This entry was posted in Uncategorized on by .

Well orderings, comparability of cardinals and aleph_1

Preliminaries

This is a continuation of my previous posts on infinity.

It’s a bit out of order in that I tried to write this post with a lot of introductory material and it just bogged it down too much. I’ll attempt to backfill later with a more introductory post, or at least to find something else to link to. For now though if you don’t know what the following words mean you’re probably going to struggle. Sorry. :-(

You should understand the concepts of:

  • equivalence relation
  • equivalence class
  • partial ordering
  • total ordering
  • order isomorphism

It would be useful to have previously encountered well-orderings, but I’ll introduce the bits of those we need.

Actual content

So far it’s very non-obvious how we might go about proving that for any two infinite sets, \(|A| \leq |B|\) or \(|B| \leq |A|\). Even proving the Cantor-Schroeder-Bernstein theorem was really hard work, and we had some concrete objects to work with in the form of the injections in each direction. Given two completely unknown and apparently unrelated infinite sets, where would we even begin?

The problem is that just a set on its own has very little structure to it – it’s just a collection of stuff. Without some sort of structure to get a handle on there’s not a lot we can do to it. So, for now, we will park the question of arbitrary sets and focus on a specific class of sets that will prove fruitful.

Specifically the sets we will consider are those which have a total ordering relationship on them which is well ordered. That is, every non-empty subset of them has a smallest element. The reason this is a useful property is that it allows us to generalise induction on the natural numbers: Given a property, we can prove that it holds for all elements of a well ordered set by proving that if it holds for all smaller elements then it holds for the current one. This will be invaluable. It will also turn out that there is a rich class of sufficiently large well ordered sets that will allow us to get a handle on infinity.

Why do we want well-ordered sets?

Well because we can do induction on them. Suppose \(X\) is a well-ordered set and \(p\) is some logical property such that if \(p(y)\) for all \(y < x\) then \(p(x)\). Then \(p\) holds for all \(x \in X\). Why? Well, if not then the set \(\{x \in X : \mathrm{not} p(x)\}\) is non-empty, so it has a smallest element \(x\). But then for \(y < x\) we have \(p(y)\), which by hypothesis implies \(p(x)\) (note: \(x\) can be the smallest element of \(X\) here, in which case this holds vacuously). This contradicts the existence of \(x\), hence the non-emptiness of that set. Hence \(p(x)\) for all \(x \in X\).

We can extend this further: We can also define functions inductively.

Suppose we have some function \(h : P(Y) \to Y\) and a well-ordered set \(X\). There is a unique function \(f : X \to Y\) such that \(f(x) = h(\{ f(y) : y < x\})\). Why? Proof by induction: Let \(p(x)\) be the property that there is a unique such function on the set \(\{y : y \leq x\}\). If such a function exists call it \(f_x\). For \(y < x\) let \(f_x(y) = f_y(y)\). Let \(f_x(x) = h(\{f_y(y) : y \in X\})\). I’m going to leave checking the details of this as an exercise for the interested reader because I’m a big jerk.

It turns out there is a strict hierarchy of well ordered sets.

Let \(X\) be a well ordered set. \(A \subseteq X\) is called a segment (this terminology is slightly non-standard) if for \(x \in A\) and \(y < x\) we must have \(y \in A\). It is called an initial segment (this terminology is not) if it is of the form \(I_a = \{x : x < a \}\) for some \(a\).

We’ll basically be interested in when one well-ordered set embeds as a segment of another. First we need to get a handle on the structure of segments:

Lemma: Let \(X\) be well-ordered and \(A \subseteq X\) be a segment. Then either \(A = X\) or \(A\) is an initial segment.

Proof: Suppose \(A \neq X\). Then the set \(X \setminus A\) is non-empty. Let \(a\) be the minimal element of this set. Then if \(x < a\) we must have \(x \in A\) by minimality. If \(y \geq a\) and \(y \in A\) then we must have \(a \in A\), because \(A\) is a segment. Therefore \(x \in A\) if and only if \(x < a\), i.e. \(A = I_a\). QED

Lemma: Let \(A \subeteq B \subseteq C\). Let \(B\) be a segment of \(C\) and \(A\) a segment of \(B\). Then \(A\) is a segment of \(C\).

Proof: This is mostly just definition chasing. Let \(a \in A\) and \(c \in C\) with \(c < a\). Then because \(a \in B\) we have \(c \in B\) (because \(B\) is a segment of \(C\)\). Therefore \(c \in A\), because \(A\) is a segment of \(B\).

The following relationships will be at the heart of how well-ordered sets interact with eachother:

Let \(A, B\) be well ordered sets. Write \(A \preceq B\) if \(A\) is isomorphic to a segment of \(B\). Write \(A \sim B\) if \(A\) and \(B\) are isomorphic. Write \(A \prec B\) if \(A \preceq B\) and not \(A \sim B\).

Proposition: If \(A \preceq B \preceq C\) then \(A \preceq C\).

Proof: Let \(f : A \to B\) and \(g : B \to C\) be order isomorphisms onto segments. Then \(h = g \cdot f\) is an order-isomorphism \(h : A \to h(A) \subseteq C\). We need only show that \(h(A)\) is a segment.

Because \(f(A)\) is a segment and \(g\) is an order-isomorphism we know that \(h(A) = g(f(A))\) is a segment of \(g(B)\). But a segment of a segment is a segment, so \(h(A)\) is a segment of \(C\) and we’re done. QED

Clearly if \(A \prec B\) then \(A\) must be isomorphic to an initial segment of \(B\) (because it’s isomorphic to a segment which isn’t \(B\)). This is also a sufficient condition:

Theorem: A well-ordered set is not isomorphic to any of its initial segments.

Proof:

Suppose \(f : A \to I_a\) is order preserving and injective. Then necessarily \(f(a) < a\) (because \(f(a) \in I_a\). But then because it’s order preserving and injective, \(f(f(a)) < f(a)\). Indeed, \(a > f(a) > \ldots > f^n(a) > \ldots\).

This means that the set \(\{ f^n(a) : n \in \mathbb{N}\}\) has no minimal element, contradicting the well-orderedness of A. QED

Corollary: \(A \prec B\) if and only if \(A\) is order isomorphic to an initial segment of \(A\).

Corollary: If \(A \preceq B\) and \(B \preceq A\) then \(A \sim B\).

Proof: Otherwise \(A \preceq B \sim I_a\).

Corollary: If \(A \prec B\) then \(A\) is isomorphic to a unique initial segment of \(B\) (because if it were isomorphic to two, the larger of the two would be isomorphic to an initial segment of itself)

So basically \(\preceq\) is a partial order on equivalence classes of well-orderings. i.e. it’s a partial order except for instead of equality in the anti-symmetric clause we have isomorphic.

It turns out it’s actually a total order:

Theorem: Let \(A\), \(B\) be well-ordered sets. Then \(A \preceq B\) or \(B \preceq A\).

Proof:

Let \(T\) be some element not in \(B\). We’ll define a function \(f : A \to B^+ = B \cup \{T\} \) by transfinite induction as follows: Let \(g : \mathcal{P}(B^+) : \to \B^+\) be defined by \(g(U) = \mathrm{min}(B \setminus U)\) if \(B \setminus U\) is non-empty, else \(g(U) = T\). Let \(f\) be the function this defines. i.e. if \(f(a) \in B\) then \(f(a) = \mathrm{min} B \setminus f(I_a)\).

The idea is that \(T\) is our “stopping point” which says “hey I’ve run out of elements of B”. If we ever hit it then we’ve covered \(B\) with an order preserving isomorphism from some segment of \(A\). If we don’t then we’ve embedded \(A\) in a segment of \(B\).

Specifically, let \(S_A = \{a \in A : f(a) \neq T\}\) and let \(S_B = f(S_a)\). We’ll show that both \(S_A\) and \(S_B\) are segments and at least one of \(S_A = A\) or \(S_B = B\) holds.

Proof that \(S_B\) is a segment: Suppose \(f(a) = b\) and \(b’ < b\). Then because we chose \(b\) to be minimal such that \(f(x) \neq b\) for \(x < a\), we must have that \(f(x) = b’\) for some \(x < a\), else \(b’\) would be an example contradicting minimality.

Proof that \(S_A\) is a segment: If \(a \in S_A\) and \(a’ < a\) then \(f(a) \in B \setminus f(I_{a’})\) so \(f(a’) \neq T\).

Now suppose \(S_B \neq B\). Then there exists \(b \in B \setminus S_B\). But then \(b \in B \setminus f(I_a)\) for all \(a \in A\), so this set is never empty and thus we never have \(f(a) = T\). Hence \(A = S_A\).

\(f|_{S_A}\) is injective by construction because if \(a’ < a\) we always choose \(f(a) \neq f(a’)\). We need only show that it’s order preserving.

But the same proof that \(S_B\) was a segment will show that \(f(I_a)\) is a segment for any \(a\). Thus if \(a < a’\) we must have \(f(a’) > f(a)\), as else \(f(a’) \in f(I_a)\) which would contradict injectivity.

Therefore \(f|_{S_A}\) is order-preserving and injective, and so is an order isomorphism between \(S_A\) and \(S_B\). If \(S_A = A\) then \(A \preceq B\). Else \(S_B = B\) and so \((f_{S_A})^-1 : B \to S_A\) demonstrates that \(B \preceq A\). QED

So what does this all have to do with sizes of infinity then?

Well we’ve demonstrated that for any two well ordered sets there is an injection from one to the other. So if every infinite set can be well-ordered we must have that for any two sets \(A, B\), \(|A| \leq |B|\) or \(|B| \leq |A|\).

Can every infinite set be well-ordered? It seems highly implausible.

It turns out the answer is “sort of”. There is an axiom of set theory which asks for a very natural structure on infinite sets which will be enough to guarantee a well-ordering of every set.

But we don’t need that to demonstrate its plausibility. It turns out that if we want it to be true that \(|A| \leq |B|\) or \(|B| \leq |A|\) always holds we must have every set be able to be well-ordered.

Why? Well because for any set \(A\) there is a natural example of a well-ordered set \(W(A)\) such that \(W(A) \not\leq A\). i.e. there is no injection \(W(A) \to A\). Thus if we want all cardinals to be comparable we must have an injection \(f : A \to W(A)\). But then we can define a well-ordering on \(A\) by \(x \leq y\) iff \(f(x) \leq f(y)\).

Lets construct such a set:

Theorem: Let \(A\) be any infinite set. Let \(W(A)\) be the set of isomorphism classes of well-orderings of subsets of \(A\). Define \([A] \leq [B]\) if \(A \preceq B\). This is well-defined because of how \(\preceq\) interacts with isomorphism. Then \(W(A)\) is a well-ordered set and there is no injective function \(f : W(A) \to A\).

Proof:

We’ve already shown that it’s a totally ordered set by our previous results. We need only show that it’s well-ordered.

It suffices to show that every initial segment of \(W(A)\) is well-ordered: Then if \(U \subseteq W(A)\) and \(u \in U\), either \(u\) is a minimal element of \(U\) or \(I_u \cap U\) is a non-empty subset of \(I_u\) and thus has a minimal element which is in turn a minimal element of \(U\).

So let \(x \in W(A)\). We can write \(x = [T]\) for some well ordered set. Claim: \(I_x \sim T\).

Proof: \(y < x\) iff \(y = [S]\) for some \(S \prec T\) by definition of our ordering on \(W(A)\). Thus there is a unique initial segment of \(T\) such that \(S \sim I_s\). Define \(f : I_x \to T\) by mapping \([S]\) to the element corresponding to the unique initial segment isomorphic to \(S\). i.e. if \(S \sim I_t\) then \(f([S]) = t\).

This is injective because no two initial segments a well-ordered set are isomorphic. It’s order preserving because if \([S’] \prec [S]\) then we have an order-isomorphism between \(S’\) and an initial segment of \(S\). If \(f(S) = x\) we can compose this with the order isomorphism we’ve got \(S \to I_x\) and get an order isomorphism between \(S’\) and an initial segment of \(I_x\). This means that \(S’ \sim I_y \subseteq I_x\) and so necessarily \(f(S’) = y < x\).

This is surjective because if \(S\) is a well-ordering of some subset of \(A\) then so are all its initial segments, and \([I_s] < [S]\) for \(s \in S\). This proves our claim: Every initial segment of \(W(A)\) is isomorphic to a well-ordered set.

So we now know that \(W(A)\) is a well-ordered set. We must show that there is no injection \(f : W(A) \to A\).

Suppose such an \(f\) exists. Then we can define a well-ordering on the set \(T = f(W(A))\) by \(f(x) < f(y)\)  when \(x < y\).

But then \([T] \in W(A)\), and by what we proved in the claim we have \(I_{[T]} \tilde W(A)\). But a well-ordered set cannot be order isomorphic to an initial segment of itself. This contradicts the existence of such a function.

QED

So we now know that there are “arbitrarily large” well-ordered sets, and so if we want all cardinalities to be comparable in fact all sets can be well-ordered. This gives a strong plausibility argument for the idea that this is possible. But how might we prove it?

Well we still run into that problem I mentioned at the beginning: Arbitrary sets have very little structure to play with. The solution is to demand that they have slightly more structure.

There is a set theoretic axiom called the axiom of choice. Historically it was somewhat controversial. These days it’s basically considered entirely uncontroversial and modern set theory is very much mostly the study of set theories which obey the axiom of choice. Simply put, the axiom is this:

Let \(X\) be a non-empty set. Then there exists a choice function \(f : P(X) \setminus \{\emptyset\} \to X\) such that \(f(U) \in U\).

i.e. we have some way of picking an “arbitrary” member of any given non-empty set. If you like your intuition can be that we pick at random, though it turns out to be hard to make that intuition make rigorous sense.

Given the axiom of choice, we can now straightforwardly use transfinite induction to construct an injection \(A \to W(A)\), which will give us a well-ordering on \(A\).

Theorem:  A set \(A\) can be well-ordered if and only if there is a choice function on \(P(A) \setminus \{\emptyset\}\). In particular given the axiom of choice every set can be well-ordered.

Proof:

If \(A\) can well ordered then the function \(h(U) = \min U\) is a choice function. So that takes care of the “only if” part.

We now must construct a well-order from a choice function. We’ll do this by bijecting \(A\) with a subset of \(W(A)\). This will let us inherit the well order from it.

Let \(h : P(X) \to X\) be a choice function extended with \(h(\emptyset) = h(X)\) (it doesn’t really matter what we do with the empty-set).

Let \(f : W(A) \to A\) be the function defined by induction as \(f(x) = h(X \setminus f(I_x))\).

By our previous theorem this cannot be an injection. But if \(X \setminus f(I_x)\) is never empty then this must be an injection because otherwise we always choose \(f(x) \neq f(y)\) for \(y < x\).

Pick the minimal \(x\) such that \(f(I_x) = X\). Then \(f|_{I_x} : I_x \to A\) is a bijection, so its inverse is an injection \(A \to W(A)\) and the result is proved.

QED

And this more or less concludes what I wanted to cover in this post: Every set can be well-ordered, and thus any two cardinalities are comparable.

I will finish by now finally explaining what \(\aleph_1\) means.

Suppose \(A\) is any uncountable well-ordered set. Let \(x = \mathrm{min} \{a \in A : |I_a| > \aleph_0\}\).

Then by construction \(|I_x| > \aleph_0\) but any initial segment of \(I_x\) has cardinality \(\aleph_0\). This means that regardless of what well-ordered set we started out with the results will be isomorphic: One must be isomorphic to a segment of the other, and the initial segments are all countable so they must be isomorphic to the whole set.

This means additionally that for any \(|B| > \aleph_0\) we must have \(|I_x| \leq |B|\): Well-order \(B\) and perform the same construction. Then \(I_x\) is order isomorphic to a segment of \(B\) and thus there is an injection \(I_x \to B\).

We write \(|I_x| = \aleph_1\): It is the smallest size an uncountable set can be.

I still haven’t told you what these mystery \(\aleph\)s mean of course, but you’ll have to wait till next time for that one.

This entry was posted in Numbers are hard, Uncategorized on by .