For reasons that aren’t relevant here, I was reminded of the power of fixed point theorems on partially ordered sets to prove some fundamental results in set theory earlier. This is an expository post about that. If you don’t care about set theory you probably won’t care about this.
I’m going to mention two fixed point theorems and use each of them to give a simple proof of a well known result in set theory.
Preliminaries
A poset is a set \(X\) with a relation \(\leq\) on it that is
- Reflexive: \( \forall x. x \leq x \)
- Antisymmetric: \(\forall x, y. x \leq y\) and \(y \leq x \implies x = y\)
- Transitive: \(\forall x, y, z. x \leq y\) and \(y \leq z \implies x \leq z\)
Additional reminders:
- A poset is said to be totally ordered if additionally it is Total: \( \forall x, y. x \leq y\) or \(y \leq x\).
- A chain is a subset of a poset which is totally ordered under the restriction of the relationship to it.
- \(x\) is an upper bound for a set \(A\) if \(\forall a \in A. a \leq x \)
- \(x\) is a least upper bound for a set \(A\) if it is an upper bound and \(x \leq y\) for all upper bounds of A \(y\). When such an upper bound exists it is unique (by anti-symmetry) and we write it as \(\sup A\)
- A poset is said to be complete (resp. chain-complete) if every subset (resp chain) has a least upper bound
- Every chain complete poset has a least element (the least upper bound of the empty set).
Examples of posets are any of the normal sets of numbers – e.g. \(\mathbb{N}, \mathbb{Q}, \mathbb{R}\). All of these are total orders, none of them are complete. \([0, 1] \subseteq \mathbb{R}\) is a complete total order.
For any \(X\), the power set \(\mathcal{P}(X)\) is a complete poset with the order being set inclusion.
Our fixed point theorems will concern functions
Our fixed point theorems are both about particular types of mappings of posets to themselves.
Knaster–Tarski fixed point theorem
(This is actually a slightly weakened form of the theorem because it’s slightly easier to prove and we don’t need the full thing)
Theorem: Let \(X\) be a complete poset and \(f : X \to X\) be order presering. That is \(x \leq y \implies f(x) \leq f(y)\). Then \(f\) has a fixed point.
Proof:
Let \(A = \{ x : x \leq f(x) \} \).
If \(x \in A\) then \(f(x) \in A\) as if \(x \leq f(x)\) then \(f(x) \leq f(f(x))\) due to \(f\) being order preserving.
Suppose now that \(y = \sup A\). Then if \(x \in A\) we have \(x \leq y\) and so \(f(x) \leq f(y)\). But \(x \leq f(x)\) by hypothesis of \(x \in A\), so we have \(x \leq f(x) \leq f(y)\). Therefore \(f(y)\) is also an upper bound of \(A\) and thus, because \(y\) is the least upper bound, \(y \leq f(y)\) and thus \(y \in A\).
But then as per above we must also have \(f(y) \in A\), and thus \(f(y) \leq y\). Therefore \(f(y) = y\).
QED.
This then lets us provide a very short proof of one of the basic results of set theory.
Cantor–Bernstein–Schroeder theorem
Let \(A, B\) be sets, and let \(f : A \to B\) and \(g : B \to A\) be injections. There exists a bijection \(h : A \to B\)
Proof:
The idea is that we want to find some subset \(H \subseteq A\) for which we can define our function h as \(h|_H = f\) and \(h|_{H^c} = g^{-1}\).
Claim: If \(g(f(H)^c) = H^c\) then the function h is well defined and is a bijection.
Well defined is easy: By the assumption, every element of \(H^c\) is in the image of \(g\), so \(g^{-1}\) is well defined on it.
Injective: It’s injective when restricted to \(H\) and to \(H^c\), the only question is whether we can have \(h(x) = h(y)\) with \(x \in H\) and \(y \in H^c\). But by assumption and injectivity of \(g\) we have \(g^{-1}(H^c) \subseteq f(H)^c\), so this is impossible.
Surjective: By assumption, \(g^{-1}(H^c) = f(H)^c\), so the function is surjective.
So now all that remains is to find such an H. Which we will now do by using Knaster-Tarski to pull it out of a hat.
Rewrite the requirement as \(H = g(f(H)^c)^c\)
i.e. we have the function \(S(A) = g(f(A)^c)^c\) and are looking for \(S(H) = H\).
Suppose \(A \subseteq B\). Then \(f(A) \subseteq f(B)\), so \(f(B)^c \subseteq f(A)^c\), so \(g(f(B)^c) \subseteq g(f(A)^c)\), so \( g(f(A)^c)^c \subseteq g(f(B)^c)^c\). i.e. \(S(A) \subseteq S(B)\).
So \(S\) is an order preserving map on the complete poset \((\mathcal{P}, \subseteq)\). Thus by the Knaster–Tarski fixed point theorem it gives us \(H\) we want.
QED
I find Knaster-Tarski a particularly “magic” fixed point theorem usage here because there’s no obvious connection between \(A\) and \(S(A)\) – we’ve just defined \(S(A)\) so that if it had a fixed point it would be the thing we wanted, and thus a solution appears.
Our next fixed point theorem is one I find much more intuitive, which makes it perhaps slightly odd that the proof we’ll use it in is intimately connected with the Axiom of Choice (which is intuitive itself but produces unintuitive results) and Zorn’s lemma (which is too technical to really have much intuition about).
Bourbaki–Witt fixed point theorem
Let \(X\) be a chain-complete poset and \(f : X \to X\) such that \(f(x) \geq x\). For every \(x \ni X\), \(f\) has a fixed point \(y \geq x\)
Proof:
The proof is by transfinite induction. That is, we’ll take some ordinal \(\kappa\) with no injection into \(X\) (such an ordinal exists by Hartog’s Lemma) and define \(g : \kappa \to X\) as follows:
- \(g(0) = x\)
- \(g(\alpha + 1) = f(g(\alpha))\)
- \(g(\beta) = \sup \{ g(\alpha) : \alpha < \beta\}\) when \(\beta\) is a limit ordinal (this is well defined because g produces a chain by construction and \(X\) is chain complete)
\(g\) is, by construction and that \(x \leq f(x)\), a monotone increasing function. i.e. \(\alpha \leq \beta \implies g(\alpha) \leq g(\beta)\). It’s not injective, therefore we can find some \(\alpha < \beta\) with \(g(\alpha) = g(\beta)\). Then \(\alpha + 1 \leq \beta\), so \(g(\alpha) \leq g(\alpha + 1) \leq g(\beta) = g(\alpha)\). Therefore \(g(\alpha) = g(\alpha + 1) = f(g(\alpha))\) and \(g(\alpha)\) is our desired fixed point (and is \(\geq x\) because \(g(0) = x\). QED Now, how do we use this? Well, it provides a very nice proof of Zorn's lemma by way of a closely related result.
Hausdorff maximality theorem
Let \(X\) be a poset. Every chain \(C \subseteq X\) is contained in a maximal chain (that is one not properly contained in any other chain).
This is a theorem very closely related to Zorn’s lemma – either can very easily be proved from the other. But Zorn’s lemma is a simple corollary of Hausdorff’s maximality theorem (whileas the other direction requires at least a bit of machinery), and this way round we get to apply a nice fixed point theorem to give a very simple proof.
Proof:
Let \(\mathcal{C}\) be the set of chains of X ordered by inclusion.
Claim: This is a chain complete poset.
Proof: Let \(\mathcal{A} \subseteq \mathcal{C}\) be a chain. Let \(A = \bigcup \mathcal{A}\). If \(x, y \in A\) then \(x, y \in B\) for some \(B \in \mathcal{A}\) (because it’s a chain, so find one containing each and take the larger of the two). Therefore \(x \leq y\) or \(y \leq x\). Hence \(A \in \mathcal{C}\). Clealy it’s a least upper bound for \(\mathcal{A}\) as it is one in the poset \(\mathcal{P}(X)\).
So, we’ll now apply the Bourbaki Witt fixed point theorem. Define \(f : \mathcal{C} \to \mathcal{C}\) as follows:
- If A is maximal, \(f(A) = A \)
- If A is not maximal, use the axiom of choice to arbitrarily pick some chain strictly containing \(A\) as \(f(A)\)
By construction \(f(A) \supseteq A\), by the above \(\mathcal{C}\) is chain complete, therefore by Bourbaki Witt \(f\) has a fixed point above every element of \(\mathcal{C}\). The elements of \(\mathcal{C}\) are the chains of \(X\) and the fixed points of \(f\) must be maximal. Hence the result is proved.
QED
As promised, Zorn’s lemma is just a simple corollary:
Zorn’s Lemma
Let \(X\) be a poset such that every chain has an upper bound (not necessarily a least one). \(X\) has a maximal element.
Proof:
Pick a maximal chain, \(C\). Let \(x\) be an upper bound for \(C\). If there were some element \(y \) with \(x < y\) then \(C \cup \{y\}\) would be a chain strictly containing \(C\). Therefore no such \(y\) exists and \(x\) is maximal.
Conclusion
A lot of the proofs presented for both of these results are pretty inscrutable. They’re not hard to understand, but there are quite a lot of details to get right.
I’ve got to admit, this mostly exhausts my list of cute things in set theory you can prove by way of fixed point theorems. I’m sure there are more, and I’d be interested to hear about them, but it’s been a while since I’ve looked at this and these are the only two I remember.
Update: In talking to Imre Leader, who taught me the course in which I learned Bourbaki Witt, he points out that as far as set theory is concerned Bourbaki Witt is almost entirely useless. It can essentially be regarded as “the choice free part of Zorn’s lemma”, and as a result no one actually uses it because the hypotheses of Zorn’s lemma are so much easier to verify.
f(x) belongs to A simply by symmetry (namely, f(x) <= f(x)); there's no need to invoke the monotonicity of f.
I’m sorry, I don’t follow at all. The condition for \(f(x) \in A\) is that \(f(x) \leq f(f(x))\), not that \(f(x) \leq f(x)\). What am I missing?
David, I’m so sorry for not following up properly to your reply. You’re right, I misunderstood the condition on A. Please disregard my intervention.
A problem in the proof of Knaster–Tarski fixed point theorem is the non-emptiness of the set A. The proof will fail in case A is an empty set. Am I right?
Pingback: An introduction to infinity | David R. MacIver
Very nice article.
This is a nice blog article. I have a couple of comments.
To the commenter yang: there is no problem, because the empty poset is not complete. Completeness means that every subset, including the empty subset, has a least upper bound, i.e., an element y such that y is an upper bound such that y <= x for any upper bound x. In the case of the empty subset, this would mean the existence of a bottom element. Since the empty poset has no bottom element, it can't be complete.
Second, it's possible to "beta-reduce" the proofs so that they become less inscrutable. Take for example the proof of Cantor-Bernstein from Knaster-Tarski. The point is that Knaster-Tarski is not a black box but is quite explicit. In this case, the fixed point H is explicitly the union of all subsets S such that S is contained in g(f(S)^c)^c. This is essentially the same as defining H = {a in A: {a} <= g( {f(a)}^c )^c}, in other words H = {a in A: a not= g(b) whenever b not= f(a)}. Of course, this H might be empty, but no matter. We go ahead and define the bijection as before: h(a) = f(a) if a in H, and h(a) = g^{-1}(a) if a not in H. [Notice that if a is not in H, then by definition of H there is some b != f(a) such that g(b) = a (and of course this b is unique), and so the function h is well-defined and everything works out.] If you work through this, it turns out to be actually the same as the other commonly seen proof (attributed to Julius Koenig) involving a back-and-forth argument, but to my eyes this formulation looks less fussy and more easily remembered.
Ugh, please remove my prior comment (and this one). Thanks, and sorry.