Author Archives: david

A technical lemma I am finding surprisingly useful

Let \(\mathcal{P}_n = \{ x \in [0, \infty)^n : \sum x = 1 \}\) be the set of probability distributions on n items.

Let \(x \in [0, 1]\). Define the selection function \(s_x : \mathcal{P}_n \to \{1, \ldots, n\}\) as \(s_x(p) = \mathrm{min} \{k : \sum\limits_{i=1}^k p_i \geq x \}\).

Then:

  1. \(s_x\) is continuous except on a closed set of measure zero.
  2. If \(U\) is chosen uniformly at random from \([0, 1]\) then \(P(s_U(p) = k) = p_k\).

Proof of 1: It is continuous except on the set \(\bigcup\limits_k \{ p \in \mathcal{P} : \sum\limits_{i=1}^k p_i = x \}\) as small variations won’t change the result if it’s not on a boundary. This is a closed set of measure zero.

Proof of 2: \(s_U(p) = k\) iff \(\sum_{i < k} p_i < U\) and \(U \leq \sum_{i \leq k} p_i \). i.e. if \(U \in (\sum_{i < k} p_i, \sum_{i \leq k} p_i]\). This is an interval of length \(p_k\) and \(U\) is uniform, so \(P(U \in (\sum_{i < k} p_i, \sum_{i \leq k} p_i]) = p_k\). Therefore \(P(S_U(p) = k) = p_k\) as desired.QED

Why is this a useful lemma?

Well, because I often find myself in a situation at work right now where I want to be able to experiment with randomized algorithms in a way that lets me see the effect of small tweaks. Unfortunately a change which changes the probabilities slightly requires me to re-roll the dice, so it’s hard to tell whether a given change is the result of randomness or the change I’ve made.

By deterministically seeding a PRNG and using it to pick the selection function we get something which is almost always continuous in the probabilities used, so it’s mostly stable against the probabilities used and will only make big changes in response to big changes in the probability.

This entry was posted in Numbers are hard on by .

The other randomized voting system

The title of this post is of course misleading. There are myriad randomized voting systems. But there are two (well, sortof three) which have a specific special property: That of being immune to tactical voting. In “Manipulation of Schemes That Mix Voting with Chanceh” by Allan F. Gibbard in Econometrica (which I’ve never actually read due to the great academic firewall) demonstrated that if you have a ranked voting system such that when there are at least three candidates running and:

  1. There is no dictator
  2. Anyone who receives a unanimous votes wins
  3. There is no incentive for a voter to lie about their preferences

then it is one of the following:

  1. Random ballot. i.e. pick a voter at random, use their first choice
  2. Random pair. Pick two candidates at random, use the one that more people have ranked first
  3. A mixture of the two in the sense that we randomly choose between them with some probability

EMERGENCY EDIT: this version of the theorem, which I originally got from rangevoting.org’s reporting of it, turns out to be a complete lie and a much richer class of systems is permitted. I owe you one correct accounting of the actual theorem.

I’ve written quite a lot about random ballot. You might have noticed. I’ve never written about random pair as far as I can remember, despite the fact that I think it’s quite interesting.

Why? Well because it looked like it was impossible to make work in practice. It’s so easy to game – people can’t feasibly rank a large number of candidates, so you have to make do with a small number of candidates, and then it’s easy to win by stuffing the ballots with people you agree with so that people get a choice between two options which you both like.

I realised in the shower this morning (I say that a lot. The shower is where I do some of my best thinking) that this reasoning is in fact completely false. There is a practical voting system which is in some sense equivalent to random ballot (the way it’s run may distort the results, but if people truly have a transitive ordering amongst candidates decided in advance then they have no incentive to vote otherwise in this system) and you can easily run it on as many candidates as you want.

In particular you can run it on a sortition. I suspect for many cases where a sortition is desirable this may be a better system. It will tend to shift things towards the prevailing biases of the country (e.g. it’s possible that it will be more gender biased than a sortition), but it may also bias more heavily towards competence.

How?

Well the key realisation is that the only preferences that matter are between the candidates who are randomly chosen. So you don’t need to ask about the other preferences.

This is what you do: Rather than rank then choose, you choose then vote. You pick two candidates at random from the eligible populace, then you do a run-off vote between them in which people simply select which candidate they prefer and the one with the majority of votes wins. If people vote in line with their transitive preferences, this is equivalent to random ballot.

In order to do this in a way that doesn’t bias towards recognisability, here’s what you do in practice: A year before the election you select the two candidates (people have the opportunity to refuse, in which case you select a new one). You now pay each of them a decent salary for the next year and give them a staff of campaign advisors. They now have the next year to campaign for why they should be elected MP. At the end of that year you do the vote.

What are the characteristics of this? I don’t know. It’s obviously vastly more unstable than random ballot – the chances of the same MP being elected twice in a row are essentially zero. This is a downside. On the other hand, you lose some of the privilege bias where recognition leads to votes.

A good solution might be to adopt a mixture approach – you can either do this with a coin-flip for selecting which one you use or you can do this by electing two MPs per constituency, maybe on alternating cycles.

Potentially you could also use one for the commons and one for the lords. This might be an interesting way to elect the lords if every time a new lord is called for you run this procedure over the whole population.

In all honesty I’m still not sure this is a good idea, but it’s gone from a completely impractical idea to one that we can usefully reason about the consequences of, and that makes it interesting to think about.

This entry was posted in voting on by .

Ordinal and cardinal numbers

Continuing my series about infinity.

We’ve previously made a lot of references to things like \(\aleph_0\), \(\aleph_1\), etc. What are these mysterious alephs? What do they mean?

They’re essentially a kind of “infinite number” called a cardinal number. In much the same way that we used the sets \(\{1, \ldots, n\}\) as canonical representatives of finite sets of a given size and said those sets had cardinality \(n\), we will find canonical representatives of every size of set and use those to label cardinalities.

It turns out that this isn’t quite the right place to start. Rather than canonical representatives for sizes, we’ll want canonical representatives for well orderings – they’re easier to build, and then our canonical representative for sizes will be the smallest well-ordering of that size as per our construction of \(\aleph_1\) at the end of the last post.

So how are we going to construct these things?

There turn out to be a lot of clues for how such a thing would work in our construction of \(W(A)\) in the last post. We implicitly proved the following theorem:

Theorem: Let \(A\) be a well-ordered set. Then \(A\) is isomorphic to the set \(\{I_a : a \in A\}\) ordered by \(\preceq\).

We can of course replace the \(I_a\) with any other sets isomorphic to it. In particular, if we have canonical representatives for every well-ordered set, we can replace the \(I_a\) with their canonical representatives.

This gives us a strategy: We’ll define our canonical representatives by transfinite induction (recall that we showed as part of our work around \(W(A)\) that \(\preceq\) well-ordered isomorphism classes of well-ordered sets). The representative for \(A\) is the set of all representatives for initial segments of \(A\).

Definition: Let \(A\) be a well-ordered set. Define \(\mathrm{ord}(A) = \{ \mathrm{ord}(I_a) : a \in A\}\). Give \(\mathrm{ord}(A)\) the ordering \(\preceq\).

As a corollary to our previous result, \(A \sim \mathrm{ord}(A)\). So this gives us our canonical representatives. We call something that is \(\mathrm{ord}(A)\) for some well-ordered \(A\) and ordinal number.

Note that we’re doing something subtly different from normal: The class of all well-ordered sets is not itself a set (because if it were it were you could construct \(W(A)\) which is a strictly larger well-ordered set). However, for any well-ordered set the isomorphism classes of well ordered sets \(\preceq\) it is a set, so we can still do transfinite induction like this.

First we need to show that this is a canonical representative:

Theorem: If \(A \sim B\) then \(\mathrm{ord}(A) = \mathrm{ord}(B)\).

Proof:

Suppose \(\mathrm{ord}(A) \neq \mathrm{ord}(B)\).

Let \(f: A \to B\) be an order isomorphism. Consider the initial segments of \(A\). If \(\mathrm{ord}(I_a) \neq \mathrm{ord}(f(I_a))\) for some \(I_a\), replace \(A\) with the minimal such \(I_a\). Else, for every initial segment of \(A\), \(\mathrm{ord}(I_a) = \mathrm{ord}(I_b)\)

But \(\mathrm{ord}(A) = \{ \mathrm{ord}(I_a) : a \in A\}\). And because \(f\) is an isomorphism, \(\{f(I_a) : a \in A \} = \{I_b : b \in B\}\). So \(\mathrm{ord}(A) = \{ \mathrm{ord}(I_a) : a \in A\} = \{ \mathrm{ord}(f(I_a)) : a \in A\} = \{ \mathrm{ord}(I_b) : b \in B\} = \mathrm{ord}(B)\). QED

So the ordinals are indeed canonical representatives in the sense that they give us a way of choosing a distinct element out of every isomorphism class.

We’re now pretty much done.

A cardinal is an ordinal \(\alpha\) such that for \(\beta < \alpha\), \(|\beta| < |\alpha|\). i.e. every smaller ordinal is strictly smaller in size. The cardinality of a set is the unique cardinal number of the same size as that set.

Why does this cardinality always exist?

Well, form a well-ordering of \(A\) and take its ordinal \(\alpha\). Either this ordinal is a cardinal or \(\{\beta : \beta < \alpha, |\beta| = |A|\}\) is non-empty and the minimal element of that is a cardinal.

So these give us our canonical sets representing the cardinality of any given set. \(|A|\) can thus be considered as being that cardinal number.

But how do we write down cardinal numbers?

Well, that’s mostly just notation.

First we need some notation for ordinal numbers. We’re just going to name a few for now. In particular the ordinal numbers corresponding to finite sets (there’s a unique up to isomorphism total order of any finite size) we will just write as their size. So \(n\) is the ordinal number corresponding to ordering a finite set of size \(n\). For the normal order on \(\mathbb{N}\) we will write \(\omega\). These are enough ordinals for us for now.

Our notation for cardinal numbers comes as follows. We want to label cardinals by ordinals, so that the cardinal \(\aleph_\alpha\) is the \(\alpha\)th cardinal.

The easiest way to do this is by transfinite induction. Let \(\aleph_\alpha\) be the smallest infinite cardinal that is greater than \(\aleph_\beta\) for every \(\beta < \alpha\). This is well-defined because any set of cardinals is well ordered and because there are arbitrarily large cardinals so there will always be some cardinal greater than any set of cardinals (e.g. the power set of the union).

Note that \(\aleph_0 = \omega\): If \(\alpha < \omega\) then \(\alpha\) is isomorphic to an initial segment of \(\mathbb{N}\) so is finite. Note also that by construction if \(\alpha > \beta\) then \(\aleph_\alpha > \aleph_\beta\).

Lemma: \(\alpha \leq \aleph_\alpha\)

Proof: Intuitively this makes sense because there are large gaps in the ordinals between each cardinal. In practice we’re just going to prove it by transfinite induction.

Suppose \(\beta \leq \aleph_\beta < \aleph_\alpha\) for \(\beta < \alpha\).

Then if \(\alpha_\alpha < \alpha\) we would have \(\aleph_\alpha < \aleph_\alpha\), which is a contradiction. QED

It might be tempting to think that \(\alpha < \aleph_\alpha\) for all \(\alpha\) – at the earlier cardinals (e.g. \(\aleph_0, \aleph_1\)) the corresponding ordinal is unimaginably smaller than its \(\aleph\). This turns out to not be necessarily true, but providing you with a counter-example requires a lot of theory I’m disinclined to get into right now.

Theorem: Every cardinal is \(\aleph_\alpha\) for some ordinal \(\alpha\).

Proof: Let \(\kappa\) be some cardinal. Then \(\aleph_\kappa \geq \kappa\), so there is at least one ordinal such that its \(\aleph\) is larger than \(\kappa\).

Let \(\alpha\) be the smallest ordinal such that \(\aleph_\alpha \geq \kappa\).

Then \(\kappa > \aleph_\beta\) for \(\beta < \alpha\). Because \(\aleph_\alpha\) is chosen to be the minimal upper bound for the \(\aleph_\beta\), this means that \(\aleph_\alpha \leq \kappa\) and thus \(\aleph_\alpha = \kappa\) as desired. QED

And there you have it. The final piece in what I promised at the beginning of this series (modulo skipping a bit). I’ve taught you just enough set theory to make sense of \(\aleph_0\), \(\aleph_1\) and \(2^{\aleph_0}\), and probably more than enough to convince you that you don’t care.

There’s a hell of a lot more to this subject. However there’s not really a hell of a lot more to this subject that I know. Once you fill out the details I’ve elided, I’ve probably blogged about 10% of my set theory knowledge in this series of posts. I doubt I’ve even touched on 1% of what you need to get started on modern set theory research. Still, I hope it’s been an interesting journey and you now have a slightly better grasp of mathematical infinity.

This entry was posted in Numbers are hard on by .

Well orderings, comparability of cardinals and aleph_1

Preliminaries

This is a continuation of my previous posts on infinity.

It’s a bit out of order in that I tried to write this post with a lot of introductory material and it just bogged it down too much. I’ll attempt to backfill later with a more introductory post, or at least to find something else to link to. For now though if you don’t know what the following words mean you’re probably going to struggle. Sorry. :-(

You should understand the concepts of:

  • equivalence relation
  • equivalence class
  • partial ordering
  • total ordering
  • order isomorphism

It would be useful to have previously encountered well-orderings, but I’ll introduce the bits of those we need.

Actual content

So far it’s very non-obvious how we might go about proving that for any two infinite sets, \(|A| \leq |B|\) or \(|B| \leq |A|\). Even proving the Cantor-Schroeder-Bernstein theorem was really hard work, and we had some concrete objects to work with in the form of the injections in each direction. Given two completely unknown and apparently unrelated infinite sets, where would we even begin?

The problem is that just a set on its own has very little structure to it – it’s just a collection of stuff. Without some sort of structure to get a handle on there’s not a lot we can do to it. So, for now, we will park the question of arbitrary sets and focus on a specific class of sets that will prove fruitful.

Specifically the sets we will consider are those which have a total ordering relationship on them which is well ordered. That is, every non-empty subset of them has a smallest element. The reason this is a useful property is that it allows us to generalise induction on the natural numbers: Given a property, we can prove that it holds for all elements of a well ordered set by proving that if it holds for all smaller elements then it holds for the current one. This will be invaluable. It will also turn out that there is a rich class of sufficiently large well ordered sets that will allow us to get a handle on infinity.

Why do we want well-ordered sets?

Well because we can do induction on them. Suppose \(X\) is a well-ordered set and \(p\) is some logical property such that if \(p(y)\) for all \(y < x\) then \(p(x)\). Then \(p\) holds for all \(x \in X\). Why? Well, if not then the set \(\{x \in X : \mathrm{not} p(x)\}\) is non-empty, so it has a smallest element \(x\). But then for \(y < x\) we have \(p(y)\), which by hypothesis implies \(p(x)\) (note: \(x\) can be the smallest element of \(X\) here, in which case this holds vacuously). This contradicts the existence of \(x\), hence the non-emptiness of that set. Hence \(p(x)\) for all \(x \in X\).

We can extend this further: We can also define functions inductively.

Suppose we have some function \(h : P(Y) \to Y\) and a well-ordered set \(X\). There is a unique function \(f : X \to Y\) such that \(f(x) = h(\{ f(y) : y < x\})\). Why? Proof by induction: Let \(p(x)\) be the property that there is a unique such function on the set \(\{y : y \leq x\}\). If such a function exists call it \(f_x\). For \(y < x\) let \(f_x(y) = f_y(y)\). Let \(f_x(x) = h(\{f_y(y) : y \in X\})\). I’m going to leave checking the details of this as an exercise for the interested reader because I’m a big jerk.

It turns out there is a strict hierarchy of well ordered sets.

Let \(X\) be a well ordered set. \(A \subseteq X\) is called a segment (this terminology is slightly non-standard) if for \(x \in A\) and \(y < x\) we must have \(y \in A\). It is called an initial segment (this terminology is not) if it is of the form \(I_a = \{x : x < a \}\) for some \(a\).

We’ll basically be interested in when one well-ordered set embeds as a segment of another. First we need to get a handle on the structure of segments:

Lemma: Let \(X\) be well-ordered and \(A \subseteq X\) be a segment. Then either \(A = X\) or \(A\) is an initial segment.

Proof: Suppose \(A \neq X\). Then the set \(X \setminus A\) is non-empty. Let \(a\) be the minimal element of this set. Then if \(x < a\) we must have \(x \in A\) by minimality. If \(y \geq a\) and \(y \in A\) then we must have \(a \in A\), because \(A\) is a segment. Therefore \(x \in A\) if and only if \(x < a\), i.e. \(A = I_a\). QED

Lemma: Let \(A \subeteq B \subseteq C\). Let \(B\) be a segment of \(C\) and \(A\) a segment of \(B\). Then \(A\) is a segment of \(C\).

Proof: This is mostly just definition chasing. Let \(a \in A\) and \(c \in C\) with \(c < a\). Then because \(a \in B\) we have \(c \in B\) (because \(B\) is a segment of \(C\)\). Therefore \(c \in A\), because \(A\) is a segment of \(B\).

The following relationships will be at the heart of how well-ordered sets interact with eachother:

Let \(A, B\) be well ordered sets. Write \(A \preceq B\) if \(A\) is isomorphic to a segment of \(B\). Write \(A \sim B\) if \(A\) and \(B\) are isomorphic. Write \(A \prec B\) if \(A \preceq B\) and not \(A \sim B\).

Proposition: If \(A \preceq B \preceq C\) then \(A \preceq C\).

Proof: Let \(f : A \to B\) and \(g : B \to C\) be order isomorphisms onto segments. Then \(h = g \cdot f\) is an order-isomorphism \(h : A \to h(A) \subseteq C\). We need only show that \(h(A)\) is a segment.

Because \(f(A)\) is a segment and \(g\) is an order-isomorphism we know that \(h(A) = g(f(A))\) is a segment of \(g(B)\). But a segment of a segment is a segment, so \(h(A)\) is a segment of \(C\) and we’re done. QED

Clearly if \(A \prec B\) then \(A\) must be isomorphic to an initial segment of \(B\) (because it’s isomorphic to a segment which isn’t \(B\)). This is also a sufficient condition:

Theorem: A well-ordered set is not isomorphic to any of its initial segments.

Proof:

Suppose \(f : A \to I_a\) is order preserving and injective. Then necessarily \(f(a) < a\) (because \(f(a) \in I_a\). But then because it’s order preserving and injective, \(f(f(a)) < f(a)\). Indeed, \(a > f(a) > \ldots > f^n(a) > \ldots\).

This means that the set \(\{ f^n(a) : n \in \mathbb{N}\}\) has no minimal element, contradicting the well-orderedness of A. QED

Corollary: \(A \prec B\) if and only if \(A\) is order isomorphic to an initial segment of \(A\).

Corollary: If \(A \preceq B\) and \(B \preceq A\) then \(A \sim B\).

Proof: Otherwise \(A \preceq B \sim I_a\).

Corollary: If \(A \prec B\) then \(A\) is isomorphic to a unique initial segment of \(B\) (because if it were isomorphic to two, the larger of the two would be isomorphic to an initial segment of itself)

So basically \(\preceq\) is a partial order on equivalence classes of well-orderings. i.e. it’s a partial order except for instead of equality in the anti-symmetric clause we have isomorphic.

It turns out it’s actually a total order:

Theorem: Let \(A\), \(B\) be well-ordered sets. Then \(A \preceq B\) or \(B \preceq A\).

Proof:

Let \(T\) be some element not in \(B\). We’ll define a function \(f : A \to B^+ = B \cup \{T\} \) by transfinite induction as follows: Let \(g : \mathcal{P}(B^+) : \to \B^+\) be defined by \(g(U) = \mathrm{min}(B \setminus U)\) if \(B \setminus U\) is non-empty, else \(g(U) = T\). Let \(f\) be the function this defines. i.e. if \(f(a) \in B\) then \(f(a) = \mathrm{min} B \setminus f(I_a)\).

The idea is that \(T\) is our “stopping point” which says “hey I’ve run out of elements of B”. If we ever hit it then we’ve covered \(B\) with an order preserving isomorphism from some segment of \(A\). If we don’t then we’ve embedded \(A\) in a segment of \(B\).

Specifically, let \(S_A = \{a \in A : f(a) \neq T\}\) and let \(S_B = f(S_a)\). We’ll show that both \(S_A\) and \(S_B\) are segments and at least one of \(S_A = A\) or \(S_B = B\) holds.

Proof that \(S_B\) is a segment: Suppose \(f(a) = b\) and \(b’ < b\). Then because we chose \(b\) to be minimal such that \(f(x) \neq b\) for \(x < a\), we must have that \(f(x) = b’\) for some \(x < a\), else \(b’\) would be an example contradicting minimality.

Proof that \(S_A\) is a segment: If \(a \in S_A\) and \(a’ < a\) then \(f(a) \in B \setminus f(I_{a’})\) so \(f(a’) \neq T\).

Now suppose \(S_B \neq B\). Then there exists \(b \in B \setminus S_B\). But then \(b \in B \setminus f(I_a)\) for all \(a \in A\), so this set is never empty and thus we never have \(f(a) = T\). Hence \(A = S_A\).

\(f|_{S_A}\) is injective by construction because if \(a’ < a\) we always choose \(f(a) \neq f(a’)\). We need only show that it’s order preserving.

But the same proof that \(S_B\) was a segment will show that \(f(I_a)\) is a segment for any \(a\). Thus if \(a < a’\) we must have \(f(a’) > f(a)\), as else \(f(a’) \in f(I_a)\) which would contradict injectivity.

Therefore \(f|_{S_A}\) is order-preserving and injective, and so is an order isomorphism between \(S_A\) and \(S_B\). If \(S_A = A\) then \(A \preceq B\). Else \(S_B = B\) and so \((f_{S_A})^-1 : B \to S_A\) demonstrates that \(B \preceq A\). QED

So what does this all have to do with sizes of infinity then?

Well we’ve demonstrated that for any two well ordered sets there is an injection from one to the other. So if every infinite set can be well-ordered we must have that for any two sets \(A, B\), \(|A| \leq |B|\) or \(|B| \leq |A|\).

Can every infinite set be well-ordered? It seems highly implausible.

It turns out the answer is “sort of”. There is an axiom of set theory which asks for a very natural structure on infinite sets which will be enough to guarantee a well-ordering of every set.

But we don’t need that to demonstrate its plausibility. It turns out that if we want it to be true that \(|A| \leq |B|\) or \(|B| \leq |A|\) always holds we must have every set be able to be well-ordered.

Why? Well because for any set \(A\) there is a natural example of a well-ordered set \(W(A)\) such that \(W(A) \not\leq A\). i.e. there is no injection \(W(A) \to A\). Thus if we want all cardinals to be comparable we must have an injection \(f : A \to W(A)\). But then we can define a well-ordering on \(A\) by \(x \leq y\) iff \(f(x) \leq f(y)\).

Lets construct such a set:

Theorem: Let \(A\) be any infinite set. Let \(W(A)\) be the set of isomorphism classes of well-orderings of subsets of \(A\). Define \([A] \leq [B]\) if \(A \preceq B\). This is well-defined because of how \(\preceq\) interacts with isomorphism. Then \(W(A)\) is a well-ordered set and there is no injective function \(f : W(A) \to A\).

Proof:

We’ve already shown that it’s a totally ordered set by our previous results. We need only show that it’s well-ordered.

It suffices to show that every initial segment of \(W(A)\) is well-ordered: Then if \(U \subseteq W(A)\) and \(u \in U\), either \(u\) is a minimal element of \(U\) or \(I_u \cap U\) is a non-empty subset of \(I_u\) and thus has a minimal element which is in turn a minimal element of \(U\).

So let \(x \in W(A)\). We can write \(x = [T]\) for some well ordered set. Claim: \(I_x \sim T\).

Proof: \(y < x\) iff \(y = [S]\) for some \(S \prec T\) by definition of our ordering on \(W(A)\). Thus there is a unique initial segment of \(T\) such that \(S \sim I_s\). Define \(f : I_x \to T\) by mapping \([S]\) to the element corresponding to the unique initial segment isomorphic to \(S\). i.e. if \(S \sim I_t\) then \(f([S]) = t\).

This is injective because no two initial segments a well-ordered set are isomorphic. It’s order preserving because if \([S’] \prec [S]\) then we have an order-isomorphism between \(S’\) and an initial segment of \(S\). If \(f(S) = x\) we can compose this with the order isomorphism we’ve got \(S \to I_x\) and get an order isomorphism between \(S’\) and an initial segment of \(I_x\). This means that \(S’ \sim I_y \subseteq I_x\) and so necessarily \(f(S’) = y < x\).

This is surjective because if \(S\) is a well-ordering of some subset of \(A\) then so are all its initial segments, and \([I_s] < [S]\) for \(s \in S\). This proves our claim: Every initial segment of \(W(A)\) is isomorphic to a well-ordered set.

So we now know that \(W(A)\) is a well-ordered set. We must show that there is no injection \(f : W(A) \to A\).

Suppose such an \(f\) exists. Then we can define a well-ordering on the set \(T = f(W(A))\) by \(f(x) < f(y)\)  when \(x < y\).

But then \([T] \in W(A)\), and by what we proved in the claim we have \(I_{[T]} \tilde W(A)\). But a well-ordered set cannot be order isomorphic to an initial segment of itself. This contradicts the existence of such a function.

QED

So we now know that there are “arbitrarily large” well-ordered sets, and so if we want all cardinalities to be comparable in fact all sets can be well-ordered. This gives a strong plausibility argument for the idea that this is possible. But how might we prove it?

Well we still run into that problem I mentioned at the beginning: Arbitrary sets have very little structure to play with. The solution is to demand that they have slightly more structure.

There is a set theoretic axiom called the axiom of choice. Historically it was somewhat controversial. These days it’s basically considered entirely uncontroversial and modern set theory is very much mostly the study of set theories which obey the axiom of choice. Simply put, the axiom is this:

Let \(X\) be a non-empty set. Then there exists a choice function \(f : P(X) \setminus \{\emptyset\} \to X\) such that \(f(U) \in U\).

i.e. we have some way of picking an “arbitrary” member of any given non-empty set. If you like your intuition can be that we pick at random, though it turns out to be hard to make that intuition make rigorous sense.

Given the axiom of choice, we can now straightforwardly use transfinite induction to construct an injection \(A \to W(A)\), which will give us a well-ordering on \(A\).

Theorem:  A set \(A\) can be well-ordered if and only if there is a choice function on \(P(A) \setminus \{\emptyset\}\). In particular given the axiom of choice every set can be well-ordered.

Proof:

If \(A\) can well ordered then the function \(h(U) = \min U\) is a choice function. So that takes care of the “only if” part.

We now must construct a well-order from a choice function. We’ll do this by bijecting \(A\) with a subset of \(W(A)\). This will let us inherit the well order from it.

Let \(h : P(X) \to X\) be a choice function extended with \(h(\emptyset) = h(X)\) (it doesn’t really matter what we do with the empty-set).

Let \(f : W(A) \to A\) be the function defined by induction as \(f(x) = h(X \setminus f(I_x))\).

By our previous theorem this cannot be an injection. But if \(X \setminus f(I_x)\) is never empty then this must be an injection because otherwise we always choose \(f(x) \neq f(y)\) for \(y < x\).

Pick the minimal \(x\) such that \(f(I_x) = X\). Then \(f|_{I_x} : I_x \to A\) is a bijection, so its inverse is an injection \(A \to W(A)\) and the result is proved.

QED

And this more or less concludes what I wanted to cover in this post: Every set can be well-ordered, and thus any two cardinalities are comparable.

I will finish by now finally explaining what \(\aleph_1\) means.

Suppose \(A\) is any uncountable well-ordered set. Let \(x = \mathrm{min} \{a \in A : |I_a| > \aleph_0\}\).

Then by construction \(|I_x| > \aleph_0\) but any initial segment of \(I_x\) has cardinality \(\aleph_0\). This means that regardless of what well-ordered set we started out with the results will be isomorphic: One must be isomorphic to a segment of the other, and the initial segments are all countable so they must be isomorphic to the whole set.

This means additionally that for any \(|B| > \aleph_0\) we must have \(|I_x| \leq |B|\): Well-order \(B\) and perform the same construction. Then \(I_x\) is order isomorphic to a segment of \(B\) and thus there is an injection \(I_x \to B\).

We write \(|I_x| = \aleph_1\): It is the smallest size an uncountable set can be.

I still haven’t told you what these mystery \(\aleph\)s mean of course, but you’ll have to wait till next time for that one.

This entry was posted in Numbers are hard, Uncategorized on by .

Maybe I’m doing it wrong

There’s a joke I sometimes tell at family gatherings.

I have this vision of my ancestors of Clan MacIver. We’re preparing for battle. A line of hairy bearded Scotsmen on the top of a ridge. We face the enemy, raise our axes, and as one we charge, shouting our fierce battle cry.

“NO. YOU’RE DOING IT WRONG!”

This joke is of course wildly inaccurate. Setting aside the ludicrous cultural stereotyping, I think the people in our family who the you’re doing it wrong tendency come from most are on the maternal line (my mother and my father’s mother) – my father and his father before him are/were definitely perfectionists, but it’s more self-directed than that. But still, annoyance at other’s mistakes and failure to understand The Right Way Of Doing Things is definitely a family thing.

And, as you’ve probably noticed, I exemplify it somewhat. I don’t think I always know the right way of doing things, but boy do I have opinions. And I will often rant angrily about them. Here, on twitter, on IRC, in the pub, if you happen to stop me on the street to ask for directions, etc.

And for some reason it’s taken me 30 years to start to wonder… is this useful?

As far as I can tell, about 80% of our industry, and I suspect that number is generously low, is suffering from at least one of Dunning-Kruger or Impostor Syndrome (I’m of course in the remaining 20%. Yes. For sure. After all, I wouldn’t self-describe as being either of those, so I must be in the 20%, right?).

So when I shout angrily about how people are shit at writing software, who is listening?

On the one hand we’ve got the overconfident types. They might be listening, but of course they’re already doing everything right, so while they may treat my anger as a hilarious spectator sport, it doesn’t really apply to them does it? Really I’m just giving them another tool to beat people with.

On the other hand, we’ve got the people who are already insecure about their own skills. They’re afraid they’re shit at software development, and they’ll take angry criticism as validating that this is the case. I know from past experience (learning to drive) that having someone shout at you about how bad you are at the thing you know you’re bad at is a good way to completely destroy what little confidence you have left and ruin any chances of getting good at it by making you give up and go do something else.

So when I shout at people, who is listening, and what does it do to them?

There’s certainly a place for anger, and it’s a good tool for getting people to listen long enough for the mind control to take effect. But when it’s a post about making yourself better at a useful skill, who would I rather derive the benefit from it? And even if I do want the larger community to better itself, shouldn’t I be more careful about the people caught in the crossfire?

This post is full of questions, and I don’t really know the answers. I dropped out of my mind-reading classes because the instructor shouted at me too much, so I don’t actually know how my shouting affects people. But it’s a possibility that I’m worried enough about that I’m going to keep an eye on it. There probably aren’t going to be any overnight changes, but there may be a gradual shift in tone.

This entry was posted in Uncategorized on by .