Direct proofs from the well ordering theorem

In the course of writing yesterday’s post about Zorn’s lemma I noticed a different proof of the theorem that every vector space has a basis:

Let \(V\) be a vector space and define some well-ordering \(<\) on it. Let \(B\) be the set of \(x\) such that \(x\) is not in the span of \(\{y: y < x\}\).

Claim: B is a basis.

Proof: Certainly \(B\) is linearly independent: If not, let \(x_1 < \ldots < x_n \in B\) be linearly dependent. Then \(x_n\) is in the span of \(x_1, \ldots, x_{n – 1}\), which are \(< x_n\), contradicting the definition of \(B\).

Claim: For all \(x\), \(x\) is in the span of \(\{y \in B: y \leq x\}\).

Proof: Transfinite induction. Either \(x\) is not in the span of \(\{y: y < x\}\), and thus \(x\) is actually in this set and thus certainly in the span. Else, we can find \(y_1, \ldots, y_n < x\) such that \(x\) is in their span. But by induction assumption, each of these is in the span of \(\{y \in B: y < x\}\), and thus so is \(X\).


To some extent this is just the normal proof in disguise, but I thought it was cute. I highly doubt it’s in any way original to me, but I hadn’t seen it before.

This made me wonder what other things you could prove this way. I’ve rarely seen the well ordering theorem used as a direct proof method except when you want to make some careful argument about cardinalities (e.g. one way the Continuum Hypothesis is often used is to put a well-ordering on \(\mathbb{R}\) such that all the initial segments are countable).

For starters, the above proof can relatively easily be turned into a proof of the Teichmüller–Tukey lemma using essentially the same construction:

Let \(X\) be some set, and let \(L \subseteq \mathcal{P}(X)\) be some family of sets of finite character. Define the function \(f : X \to \{0, 1\}\) as follows:

Let \(T = \{y < x: f(y) = 1\}\).

Let \(f(x) = 1\) if \(T \cup \{x\} \in L\), else \(f(x) = 0\).

Claim: \(S = \{x : f(x) = 1\}\) is a maximal element of \(L\).

Proof: First we show that it’s in \(L\). If not, there is some finite subset \(x_1 < \ldots \x_n \in S\) such that \(\{x_1, \ldots, x_n\} \not\in L\). But then \(f(x_n) = 1\), so with \(T\)  as in our definition of \(f\),  \(T \cup \{x_n\} \in L\). Thus every finite subset of this is in \(L\), and in particular \(\{x_1, \ldots, x_n\} \in L\), contradicting our assumption.

Now we show it’s maximal: Suppose \(S \cup \{x\} \in L\). Then we must have \(f(x) = 1\), because \(T \cup \{x\} \subseteq S \cup \{x\}\), and families of sets of finite character are closed under taking subsets. Thus \(x \in S\).


Slightly further adaptation allows us to prove Zorn’s lemma itself via this construction, and I think it actually is a slightly nicer proof than the one I presented before:

Let \(T\) be some partially ordered set with relation \(\prec\) such that every chain in \(T\) has an upper bound.

Let \(<\) be a well-order on \(T\) (not necessarily compatible with \(\prec\)\).

Define \(f : T \to \{0, 1\}\) as \(f(x) = 1\) if \(y \prec x\) for every \(y < x\) such that \(f(y) = 1\), else \(f(x) = 0\).

Let \(C = \{x: f(x) = 1\}\).

Claim: \(C\) is a chain for \(\prec\).

Proof: Let \(x, y \in C\) with \(x \neq y\). By reordering if necessary we can assume \(x < y\). Then by construction of \(f\), we must have \(x \prec y\).

Thus, by assumption, there is some upper bound for \(C\), call it \(s\).

Claim: \(s\) is maximal.

Proof: We will show that \(s \in C\). This proves that \(s\) is maximal because if there were another element \(t > s\), this would also be an upper bound, which would mean it will also have to be in \(C\), which contradicts that \(s\) is an upper bound for \(C\).

To show that \(s\) is in \(C\): We must have \(f(s) = 1\), because \(\{y : y < x, f(y) = 1 \} \subseteq C\}\), \(y \preceq  s\) for all \(y \in C\), and \(y \neq s\) for \(y < s\). These are the conditions in our definition of \(f\), so \(f(s) = 1\) and thus \(s \in C\).


Side note 1: Note that all we’ve really done in the above proof is inlined a proof of the Hausdorff maximal principle. This follows fairly straightforwardly from the Teichmüller–Tukey lemma, so having its own proof by this mechanism is not actually super useful.

Side note 2: You may have only seen the well-ordering theorem proven using Zorn’s lemma so this may seem circular, but you can prove it directly from Hartog’s lemma (which does not require the axiom of choice) as follows:

Let \(X\) be some set, and let \(c\) be a choice function for subsets of \(X\). Let \(W\) be some well ordered set that does not inject into \(X\). Define \(f : W \to X \cup \{\infty\}\) as:

If \(X \subseteq \{ f(s) : s < t \}\) then \(f(t) = \infty\). Else, \(f(t) = c(X \setminus \{ f(s) : s < t \})\).

This must eventually hit \(\infty\) as otherwise it would be injective. Restricted to \(\{t: f(t) \neq \infty\}\), f gives a bijection between \(X\) and a well-ordered set, so you can use this to define a well-ordering on \(X\).


Is this style of proof really any better than the alternatives? I don’t know. You probably still want to use Zorn’s lemma and similar most of the time anyway. I like these proofs though, and they seem like a slightly less messy way to get a feel for transfinite induction.

This entry was posted in Numbers are hard on by .