Category Archives: Numbers are hard

Stating the probabilistic Gibbard-Sattherthwaite Theorem

So as mentioned in an emergency edit, the version of the Gibbard-Sattherthwaite Theorem for non-deterministic voting systems that I mentioned in my last post is wrong. The set of voting systems permitted by it is in fact notably larger than I claimed (I got my version of it from this rangevoting.org post. Ah, trusting third party reporting of science by people with an agenda).

So now I’m reading the original paper. I’m struggling a bunch with the details, but I at least understand the result and think I can see my way towards constructing an alternate proof that makes more sense to me.

Here is the true version of the theorem as near as I can tell:

Consider a voting system under the following setup:

  1. There are N candidates
  2. Each of M voters must submit a ballot consisting of a full ordering of the N candidates
  3. The votes are aggregated in some way to produce a lottery over the N candidates

Further, assume that voters are Von Neumann-Morgenstern rationalists with full preferences over lotteries arising from expected values of utility functions, and that they have strict preferences amongst the candidates (i.e. there are no two candidates to which they assign the same utility).

A voter’s ballot is honest if whenever they rank candidate x better than candidate y their utility function \(U\) has \(U(x) > U(y)\).

A voting system is strategy-proof if there are never circumstances where a single voter can change their vote from an honest one to a dishonest one and get a lottery they strictly prefer in expected utility.

The theorem is that any strategy-proof voting system is a probabilistic mixture of the following types of system (that is, we choose from any number of items of this type with fixed probabilities independently of the votes cast):

  1. Fixed, meaning it always elects a specific candidate
  2. Unilateral, meaning there is a single voter such that any other voter can change their ballot without affecting the outcome
  3. Duple, in the sense that there are only two candidates it can elect (fixed is a special case of this where it can only actually elect one)

Three examples of this are sortition (it is an equal mixture of each of the possible fixed systems), random ballot (it is an equal mixture of each of the unilateral systems which elect the candidate’s top choice) and random pair (it’s an equal mixture of majority vote between each of two possible pairs of candidates).

But these aren’t the only examples. For example, a system which randomly chooses between a fixed voter’s top k choices is unilateral and strategy-free. So is a system which chooses between two fixed candidates, picking a candidate with probability equal to the fraction of people who prefer that candidate. There is a lot of flexibility in terms of how you construct the unilateral and duple systems which is ignored by random ballot and random pair.

How might we prove the theorem? Well… I haven’t quite understood the proof yet, truth be told. I’ve got bits of it lying disassembled in pieces hanging around on the shop floor of my brain, and I’m getting there, but so far I don’t have anything coherent enough to blog about. Once I do I’m hoping to try to slim it down and blog about it. We’ll see. For now though, I just thought it was important to have a correct statement of the theorem online.

 

This entry was posted in Numbers are hard, voting on by .

A technical lemma I am finding surprisingly useful

Let \(\mathcal{P}_n = \{ x \in [0, \infty)^n : \sum x = 1 \}\) be the set of probability distributions on n items.

Let \(x \in [0, 1]\). Define the selection function \(s_x : \mathcal{P}_n \to \{1, \ldots, n\}\) as \(s_x(p) = \mathrm{min} \{k : \sum\limits_{i=1}^k p_i \geq x \}\).

Then:

  1. \(s_x\) is continuous except on a closed set of measure zero.
  2. If \(U\) is chosen uniformly at random from \([0, 1]\) then \(P(s_U(p) = k) = p_k\).

Proof of 1: It is continuous except on the set \(\bigcup\limits_k \{ p \in \mathcal{P} : \sum\limits_{i=1}^k p_i = x \}\) as small variations won’t change the result if it’s not on a boundary. This is a closed set of measure zero.

Proof of 2: \(s_U(p) = k\) iff \(\sum_{i < k} p_i < U\) and \(U \leq \sum_{i \leq k} p_i \). i.e. if \(U \in (\sum_{i < k} p_i, \sum_{i \leq k} p_i]\). This is an interval of length \(p_k\) and \(U\) is uniform, so \(P(U \in (\sum_{i < k} p_i, \sum_{i \leq k} p_i]) = p_k\). Therefore \(P(S_U(p) = k) = p_k\) as desired.QED

Why is this a useful lemma?

Well, because I often find myself in a situation at work right now where I want to be able to experiment with randomized algorithms in a way that lets me see the effect of small tweaks. Unfortunately a change which changes the probabilities slightly requires me to re-roll the dice, so it’s hard to tell whether a given change is the result of randomness or the change I’ve made.

By deterministically seeding a PRNG and using it to pick the selection function we get something which is almost always continuous in the probabilities used, so it’s mostly stable against the probabilities used and will only make big changes in response to big changes in the probability.

This entry was posted in Numbers are hard on by .

Ordinal and cardinal numbers

Continuing my series about infinity.

We’ve previously made a lot of references to things like \(\aleph_0\), \(\aleph_1\), etc. What are these mysterious alephs? What do they mean?

They’re essentially a kind of “infinite number” called a cardinal number. In much the same way that we used the sets \(\{1, \ldots, n\}\) as canonical representatives of finite sets of a given size and said those sets had cardinality \(n\), we will find canonical representatives of every size of set and use those to label cardinalities.

It turns out that this isn’t quite the right place to start. Rather than canonical representatives for sizes, we’ll want canonical representatives for well orderings – they’re easier to build, and then our canonical representative for sizes will be the smallest well-ordering of that size as per our construction of \(\aleph_1\) at the end of the last post.

So how are we going to construct these things?

There turn out to be a lot of clues for how such a thing would work in our construction of \(W(A)\) in the last post. We implicitly proved the following theorem:

Theorem: Let \(A\) be a well-ordered set. Then \(A\) is isomorphic to the set \(\{I_a : a \in A\}\) ordered by \(\preceq\).

We can of course replace the \(I_a\) with any other sets isomorphic to it. In particular, if we have canonical representatives for every well-ordered set, we can replace the \(I_a\) with their canonical representatives.

This gives us a strategy: We’ll define our canonical representatives by transfinite induction (recall that we showed as part of our work around \(W(A)\) that \(\preceq\) well-ordered isomorphism classes of well-ordered sets). The representative for \(A\) is the set of all representatives for initial segments of \(A\).

Definition: Let \(A\) be a well-ordered set. Define \(\mathrm{ord}(A) = \{ \mathrm{ord}(I_a) : a \in A\}\). Give \(\mathrm{ord}(A)\) the ordering \(\preceq\).

As a corollary to our previous result, \(A \sim \mathrm{ord}(A)\). So this gives us our canonical representatives. We call something that is \(\mathrm{ord}(A)\) for some well-ordered \(A\) and ordinal number.

Note that we’re doing something subtly different from normal: The class of all well-ordered sets is not itself a set (because if it were it were you could construct \(W(A)\) which is a strictly larger well-ordered set). However, for any well-ordered set the isomorphism classes of well ordered sets \(\preceq\) it is a set, so we can still do transfinite induction like this.

First we need to show that this is a canonical representative:

Theorem: If \(A \sim B\) then \(\mathrm{ord}(A) = \mathrm{ord}(B)\).

Proof:

Suppose \(\mathrm{ord}(A) \neq \mathrm{ord}(B)\).

Let \(f: A \to B\) be an order isomorphism. Consider the initial segments of \(A\). If \(\mathrm{ord}(I_a) \neq \mathrm{ord}(f(I_a))\) for some \(I_a\), replace \(A\) with the minimal such \(I_a\). Else, for every initial segment of \(A\), \(\mathrm{ord}(I_a) = \mathrm{ord}(I_b)\)

But \(\mathrm{ord}(A) = \{ \mathrm{ord}(I_a) : a \in A\}\). And because \(f\) is an isomorphism, \(\{f(I_a) : a \in A \} = \{I_b : b \in B\}\). So \(\mathrm{ord}(A) = \{ \mathrm{ord}(I_a) : a \in A\} = \{ \mathrm{ord}(f(I_a)) : a \in A\} = \{ \mathrm{ord}(I_b) : b \in B\} = \mathrm{ord}(B)\). QED

So the ordinals are indeed canonical representatives in the sense that they give us a way of choosing a distinct element out of every isomorphism class.

We’re now pretty much done.

A cardinal is an ordinal \(\alpha\) such that for \(\beta < \alpha\), \(|\beta| < |\alpha|\). i.e. every smaller ordinal is strictly smaller in size. The cardinality of a set is the unique cardinal number of the same size as that set.

Why does this cardinality always exist?

Well, form a well-ordering of \(A\) and take its ordinal \(\alpha\). Either this ordinal is a cardinal or \(\{\beta : \beta < \alpha, |\beta| = |A|\}\) is non-empty and the minimal element of that is a cardinal.

So these give us our canonical sets representing the cardinality of any given set. \(|A|\) can thus be considered as being that cardinal number.

But how do we write down cardinal numbers?

Well, that’s mostly just notation.

First we need some notation for ordinal numbers. We’re just going to name a few for now. In particular the ordinal numbers corresponding to finite sets (there’s a unique up to isomorphism total order of any finite size) we will just write as their size. So \(n\) is the ordinal number corresponding to ordering a finite set of size \(n\). For the normal order on \(\mathbb{N}\) we will write \(\omega\). These are enough ordinals for us for now.

Our notation for cardinal numbers comes as follows. We want to label cardinals by ordinals, so that the cardinal \(\aleph_\alpha\) is the \(\alpha\)th cardinal.

The easiest way to do this is by transfinite induction. Let \(\aleph_\alpha\) be the smallest infinite cardinal that is greater than \(\aleph_\beta\) for every \(\beta < \alpha\). This is well-defined because any set of cardinals is well ordered and because there are arbitrarily large cardinals so there will always be some cardinal greater than any set of cardinals (e.g. the power set of the union).

Note that \(\aleph_0 = \omega\): If \(\alpha < \omega\) then \(\alpha\) is isomorphic to an initial segment of \(\mathbb{N}\) so is finite. Note also that by construction if \(\alpha > \beta\) then \(\aleph_\alpha > \aleph_\beta\).

Lemma: \(\alpha \leq \aleph_\alpha\)

Proof: Intuitively this makes sense because there are large gaps in the ordinals between each cardinal. In practice we’re just going to prove it by transfinite induction.

Suppose \(\beta \leq \aleph_\beta < \aleph_\alpha\) for \(\beta < \alpha\).

Then if \(\alpha_\alpha < \alpha\) we would have \(\aleph_\alpha < \aleph_\alpha\), which is a contradiction. QED

It might be tempting to think that \(\alpha < \aleph_\alpha\) for all \(\alpha\) – at the earlier cardinals (e.g. \(\aleph_0, \aleph_1\)) the corresponding ordinal is unimaginably smaller than its \(\aleph\). This turns out to not be necessarily true, but providing you with a counter-example requires a lot of theory I’m disinclined to get into right now.

Theorem: Every cardinal is \(\aleph_\alpha\) for some ordinal \(\alpha\).

Proof: Let \(\kappa\) be some cardinal. Then \(\aleph_\kappa \geq \kappa\), so there is at least one ordinal such that its \(\aleph\) is larger than \(\kappa\).

Let \(\alpha\) be the smallest ordinal such that \(\aleph_\alpha \geq \kappa\).

Then \(\kappa > \aleph_\beta\) for \(\beta < \alpha\). Because \(\aleph_\alpha\) is chosen to be the minimal upper bound for the \(\aleph_\beta\), this means that \(\aleph_\alpha \leq \kappa\) and thus \(\aleph_\alpha = \kappa\) as desired. QED

And there you have it. The final piece in what I promised at the beginning of this series (modulo skipping a bit). I’ve taught you just enough set theory to make sense of \(\aleph_0\), \(\aleph_1\) and \(2^{\aleph_0}\), and probably more than enough to convince you that you don’t care.

There’s a hell of a lot more to this subject. However there’s not really a hell of a lot more to this subject that I know. Once you fill out the details I’ve elided, I’ve probably blogged about 10% of my set theory knowledge in this series of posts. I doubt I’ve even touched on 1% of what you need to get started on modern set theory research. Still, I hope it’s been an interesting journey and you now have a slightly better grasp of mathematical infinity.

This entry was posted in Numbers are hard on by .

Well orderings, comparability of cardinals and aleph_1

Preliminaries

This is a continuation of my previous posts on infinity.

It’s a bit out of order in that I tried to write this post with a lot of introductory material and it just bogged it down too much. I’ll attempt to backfill later with a more introductory post, or at least to find something else to link to. For now though if you don’t know what the following words mean you’re probably going to struggle. Sorry. :-(

You should understand the concepts of:

  • equivalence relation
  • equivalence class
  • partial ordering
  • total ordering
  • order isomorphism

It would be useful to have previously encountered well-orderings, but I’ll introduce the bits of those we need.

Actual content

So far it’s very non-obvious how we might go about proving that for any two infinite sets, \(|A| \leq |B|\) or \(|B| \leq |A|\). Even proving the Cantor-Schroeder-Bernstein theorem was really hard work, and we had some concrete objects to work with in the form of the injections in each direction. Given two completely unknown and apparently unrelated infinite sets, where would we even begin?

The problem is that just a set on its own has very little structure to it – it’s just a collection of stuff. Without some sort of structure to get a handle on there’s not a lot we can do to it. So, for now, we will park the question of arbitrary sets and focus on a specific class of sets that will prove fruitful.

Specifically the sets we will consider are those which have a total ordering relationship on them which is well ordered. That is, every non-empty subset of them has a smallest element. The reason this is a useful property is that it allows us to generalise induction on the natural numbers: Given a property, we can prove that it holds for all elements of a well ordered set by proving that if it holds for all smaller elements then it holds for the current one. This will be invaluable. It will also turn out that there is a rich class of sufficiently large well ordered sets that will allow us to get a handle on infinity.

Why do we want well-ordered sets?

Well because we can do induction on them. Suppose \(X\) is a well-ordered set and \(p\) is some logical property such that if \(p(y)\) for all \(y < x\) then \(p(x)\). Then \(p\) holds for all \(x \in X\). Why? Well, if not then the set \(\{x \in X : \mathrm{not} p(x)\}\) is non-empty, so it has a smallest element \(x\). But then for \(y < x\) we have \(p(y)\), which by hypothesis implies \(p(x)\) (note: \(x\) can be the smallest element of \(X\) here, in which case this holds vacuously). This contradicts the existence of \(x\), hence the non-emptiness of that set. Hence \(p(x)\) for all \(x \in X\).

We can extend this further: We can also define functions inductively.

Suppose we have some function \(h : P(Y) \to Y\) and a well-ordered set \(X\). There is a unique function \(f : X \to Y\) such that \(f(x) = h(\{ f(y) : y < x\})\). Why? Proof by induction: Let \(p(x)\) be the property that there is a unique such function on the set \(\{y : y \leq x\}\). If such a function exists call it \(f_x\). For \(y < x\) let \(f_x(y) = f_y(y)\). Let \(f_x(x) = h(\{f_y(y) : y \in X\})\). I’m going to leave checking the details of this as an exercise for the interested reader because I’m a big jerk.

It turns out there is a strict hierarchy of well ordered sets.

Let \(X\) be a well ordered set. \(A \subseteq X\) is called a segment (this terminology is slightly non-standard) if for \(x \in A\) and \(y < x\) we must have \(y \in A\). It is called an initial segment (this terminology is not) if it is of the form \(I_a = \{x : x < a \}\) for some \(a\).

We’ll basically be interested in when one well-ordered set embeds as a segment of another. First we need to get a handle on the structure of segments:

Lemma: Let \(X\) be well-ordered and \(A \subseteq X\) be a segment. Then either \(A = X\) or \(A\) is an initial segment.

Proof: Suppose \(A \neq X\). Then the set \(X \setminus A\) is non-empty. Let \(a\) be the minimal element of this set. Then if \(x < a\) we must have \(x \in A\) by minimality. If \(y \geq a\) and \(y \in A\) then we must have \(a \in A\), because \(A\) is a segment. Therefore \(x \in A\) if and only if \(x < a\), i.e. \(A = I_a\). QED

Lemma: Let \(A \subeteq B \subseteq C\). Let \(B\) be a segment of \(C\) and \(A\) a segment of \(B\). Then \(A\) is a segment of \(C\).

Proof: This is mostly just definition chasing. Let \(a \in A\) and \(c \in C\) with \(c < a\). Then because \(a \in B\) we have \(c \in B\) (because \(B\) is a segment of \(C\)\). Therefore \(c \in A\), because \(A\) is a segment of \(B\).

The following relationships will be at the heart of how well-ordered sets interact with eachother:

Let \(A, B\) be well ordered sets. Write \(A \preceq B\) if \(A\) is isomorphic to a segment of \(B\). Write \(A \sim B\) if \(A\) and \(B\) are isomorphic. Write \(A \prec B\) if \(A \preceq B\) and not \(A \sim B\).

Proposition: If \(A \preceq B \preceq C\) then \(A \preceq C\).

Proof: Let \(f : A \to B\) and \(g : B \to C\) be order isomorphisms onto segments. Then \(h = g \cdot f\) is an order-isomorphism \(h : A \to h(A) \subseteq C\). We need only show that \(h(A)\) is a segment.

Because \(f(A)\) is a segment and \(g\) is an order-isomorphism we know that \(h(A) = g(f(A))\) is a segment of \(g(B)\). But a segment of a segment is a segment, so \(h(A)\) is a segment of \(C\) and we’re done. QED

Clearly if \(A \prec B\) then \(A\) must be isomorphic to an initial segment of \(B\) (because it’s isomorphic to a segment which isn’t \(B\)). This is also a sufficient condition:

Theorem: A well-ordered set is not isomorphic to any of its initial segments.

Proof:

Suppose \(f : A \to I_a\) is order preserving and injective. Then necessarily \(f(a) < a\) (because \(f(a) \in I_a\). But then because it’s order preserving and injective, \(f(f(a)) < f(a)\). Indeed, \(a > f(a) > \ldots > f^n(a) > \ldots\).

This means that the set \(\{ f^n(a) : n \in \mathbb{N}\}\) has no minimal element, contradicting the well-orderedness of A. QED

Corollary: \(A \prec B\) if and only if \(A\) is order isomorphic to an initial segment of \(A\).

Corollary: If \(A \preceq B\) and \(B \preceq A\) then \(A \sim B\).

Proof: Otherwise \(A \preceq B \sim I_a\).

Corollary: If \(A \prec B\) then \(A\) is isomorphic to a unique initial segment of \(B\) (because if it were isomorphic to two, the larger of the two would be isomorphic to an initial segment of itself)

So basically \(\preceq\) is a partial order on equivalence classes of well-orderings. i.e. it’s a partial order except for instead of equality in the anti-symmetric clause we have isomorphic.

It turns out it’s actually a total order:

Theorem: Let \(A\), \(B\) be well-ordered sets. Then \(A \preceq B\) or \(B \preceq A\).

Proof:

Let \(T\) be some element not in \(B\). We’ll define a function \(f : A \to B^+ = B \cup \{T\} \) by transfinite induction as follows: Let \(g : \mathcal{P}(B^+) : \to \B^+\) be defined by \(g(U) = \mathrm{min}(B \setminus U)\) if \(B \setminus U\) is non-empty, else \(g(U) = T\). Let \(f\) be the function this defines. i.e. if \(f(a) \in B\) then \(f(a) = \mathrm{min} B \setminus f(I_a)\).

The idea is that \(T\) is our “stopping point” which says “hey I’ve run out of elements of B”. If we ever hit it then we’ve covered \(B\) with an order preserving isomorphism from some segment of \(A\). If we don’t then we’ve embedded \(A\) in a segment of \(B\).

Specifically, let \(S_A = \{a \in A : f(a) \neq T\}\) and let \(S_B = f(S_a)\). We’ll show that both \(S_A\) and \(S_B\) are segments and at least one of \(S_A = A\) or \(S_B = B\) holds.

Proof that \(S_B\) is a segment: Suppose \(f(a) = b\) and \(b’ < b\). Then because we chose \(b\) to be minimal such that \(f(x) \neq b\) for \(x < a\), we must have that \(f(x) = b’\) for some \(x < a\), else \(b’\) would be an example contradicting minimality.

Proof that \(S_A\) is a segment: If \(a \in S_A\) and \(a’ < a\) then \(f(a) \in B \setminus f(I_{a’})\) so \(f(a’) \neq T\).

Now suppose \(S_B \neq B\). Then there exists \(b \in B \setminus S_B\). But then \(b \in B \setminus f(I_a)\) for all \(a \in A\), so this set is never empty and thus we never have \(f(a) = T\). Hence \(A = S_A\).

\(f|_{S_A}\) is injective by construction because if \(a’ < a\) we always choose \(f(a) \neq f(a’)\). We need only show that it’s order preserving.

But the same proof that \(S_B\) was a segment will show that \(f(I_a)\) is a segment for any \(a\). Thus if \(a < a’\) we must have \(f(a’) > f(a)\), as else \(f(a’) \in f(I_a)\) which would contradict injectivity.

Therefore \(f|_{S_A}\) is order-preserving and injective, and so is an order isomorphism between \(S_A\) and \(S_B\). If \(S_A = A\) then \(A \preceq B\). Else \(S_B = B\) and so \((f_{S_A})^-1 : B \to S_A\) demonstrates that \(B \preceq A\). QED

So what does this all have to do with sizes of infinity then?

Well we’ve demonstrated that for any two well ordered sets there is an injection from one to the other. So if every infinite set can be well-ordered we must have that for any two sets \(A, B\), \(|A| \leq |B|\) or \(|B| \leq |A|\).

Can every infinite set be well-ordered? It seems highly implausible.

It turns out the answer is “sort of”. There is an axiom of set theory which asks for a very natural structure on infinite sets which will be enough to guarantee a well-ordering of every set.

But we don’t need that to demonstrate its plausibility. It turns out that if we want it to be true that \(|A| \leq |B|\) or \(|B| \leq |A|\) always holds we must have every set be able to be well-ordered.

Why? Well because for any set \(A\) there is a natural example of a well-ordered set \(W(A)\) such that \(W(A) \not\leq A\). i.e. there is no injection \(W(A) \to A\). Thus if we want all cardinals to be comparable we must have an injection \(f : A \to W(A)\). But then we can define a well-ordering on \(A\) by \(x \leq y\) iff \(f(x) \leq f(y)\).

Lets construct such a set:

Theorem: Let \(A\) be any infinite set. Let \(W(A)\) be the set of isomorphism classes of well-orderings of subsets of \(A\). Define \([A] \leq [B]\) if \(A \preceq B\). This is well-defined because of how \(\preceq\) interacts with isomorphism. Then \(W(A)\) is a well-ordered set and there is no injective function \(f : W(A) \to A\).

Proof:

We’ve already shown that it’s a totally ordered set by our previous results. We need only show that it’s well-ordered.

It suffices to show that every initial segment of \(W(A)\) is well-ordered: Then if \(U \subseteq W(A)\) and \(u \in U\), either \(u\) is a minimal element of \(U\) or \(I_u \cap U\) is a non-empty subset of \(I_u\) and thus has a minimal element which is in turn a minimal element of \(U\).

So let \(x \in W(A)\). We can write \(x = [T]\) for some well ordered set. Claim: \(I_x \sim T\).

Proof: \(y < x\) iff \(y = [S]\) for some \(S \prec T\) by definition of our ordering on \(W(A)\). Thus there is a unique initial segment of \(T\) such that \(S \sim I_s\). Define \(f : I_x \to T\) by mapping \([S]\) to the element corresponding to the unique initial segment isomorphic to \(S\). i.e. if \(S \sim I_t\) then \(f([S]) = t\).

This is injective because no two initial segments a well-ordered set are isomorphic. It’s order preserving because if \([S’] \prec [S]\) then we have an order-isomorphism between \(S’\) and an initial segment of \(S\). If \(f(S) = x\) we can compose this with the order isomorphism we’ve got \(S \to I_x\) and get an order isomorphism between \(S’\) and an initial segment of \(I_x\). This means that \(S’ \sim I_y \subseteq I_x\) and so necessarily \(f(S’) = y < x\).

This is surjective because if \(S\) is a well-ordering of some subset of \(A\) then so are all its initial segments, and \([I_s] < [S]\) for \(s \in S\). This proves our claim: Every initial segment of \(W(A)\) is isomorphic to a well-ordered set.

So we now know that \(W(A)\) is a well-ordered set. We must show that there is no injection \(f : W(A) \to A\).

Suppose such an \(f\) exists. Then we can define a well-ordering on the set \(T = f(W(A))\) by \(f(x) < f(y)\)  when \(x < y\).

But then \([T] \in W(A)\), and by what we proved in the claim we have \(I_{[T]} \tilde W(A)\). But a well-ordered set cannot be order isomorphic to an initial segment of itself. This contradicts the existence of such a function.

QED

So we now know that there are “arbitrarily large” well-ordered sets, and so if we want all cardinalities to be comparable in fact all sets can be well-ordered. This gives a strong plausibility argument for the idea that this is possible. But how might we prove it?

Well we still run into that problem I mentioned at the beginning: Arbitrary sets have very little structure to play with. The solution is to demand that they have slightly more structure.

There is a set theoretic axiom called the axiom of choice. Historically it was somewhat controversial. These days it’s basically considered entirely uncontroversial and modern set theory is very much mostly the study of set theories which obey the axiom of choice. Simply put, the axiom is this:

Let \(X\) be a non-empty set. Then there exists a choice function \(f : P(X) \setminus \{\emptyset\} \to X\) such that \(f(U) \in U\).

i.e. we have some way of picking an “arbitrary” member of any given non-empty set. If you like your intuition can be that we pick at random, though it turns out to be hard to make that intuition make rigorous sense.

Given the axiom of choice, we can now straightforwardly use transfinite induction to construct an injection \(A \to W(A)\), which will give us a well-ordering on \(A\).

Theorem:  A set \(A\) can be well-ordered if and only if there is a choice function on \(P(A) \setminus \{\emptyset\}\). In particular given the axiom of choice every set can be well-ordered.

Proof:

If \(A\) can well ordered then the function \(h(U) = \min U\) is a choice function. So that takes care of the “only if” part.

We now must construct a well-order from a choice function. We’ll do this by bijecting \(A\) with a subset of \(W(A)\). This will let us inherit the well order from it.

Let \(h : P(X) \to X\) be a choice function extended with \(h(\emptyset) = h(X)\) (it doesn’t really matter what we do with the empty-set).

Let \(f : W(A) \to A\) be the function defined by induction as \(f(x) = h(X \setminus f(I_x))\).

By our previous theorem this cannot be an injection. But if \(X \setminus f(I_x)\) is never empty then this must be an injection because otherwise we always choose \(f(x) \neq f(y)\) for \(y < x\).

Pick the minimal \(x\) such that \(f(I_x) = X\). Then \(f|_{I_x} : I_x \to A\) is a bijection, so its inverse is an injection \(A \to W(A)\) and the result is proved.

QED

And this more or less concludes what I wanted to cover in this post: Every set can be well-ordered, and thus any two cardinalities are comparable.

I will finish by now finally explaining what \(\aleph_1\) means.

Suppose \(A\) is any uncountable well-ordered set. Let \(x = \mathrm{min} \{a \in A : |I_a| > \aleph_0\}\).

Then by construction \(|I_x| > \aleph_0\) but any initial segment of \(I_x\) has cardinality \(\aleph_0\). This means that regardless of what well-ordered set we started out with the results will be isomorphic: One must be isomorphic to a segment of the other, and the initial segments are all countable so they must be isomorphic to the whole set.

This means additionally that for any \(|B| > \aleph_0\) we must have \(|I_x| \leq |B|\): Well-order \(B\) and perform the same construction. Then \(I_x\) is order isomorphic to a segment of \(B\) and thus there is an injection \(I_x \to B\).

We write \(|I_x| = \aleph_1\): It is the smallest size an uncountable set can be.

I still haven’t told you what these mystery \(\aleph\)s mean of course, but you’ll have to wait till next time for that one.

This entry was posted in Numbers are hard, Uncategorized on by .

An amusing problem to solve

I was thinking about coding questions, and how you would write a coding question that tested peoples’ ability to explore algorithms in areas they weren’t likely to be that familiar with without necessarily being able to come up with optimal solutions.

I mulled over this a bit and came up with one which is entirely too evil to use. However I rather feel like I’m going to have to try to implement it anyway because it’s going to bug me. I figured I would inflict this on the rest of you, because misery loves company.

The problem is this:

You are given a non-negative symmetric \(n \times n\) matrix \(A\) with \(A_{ii} = 0\), with real numbers represented as double precision. This is your potential matrix.

Find an assignment of coordinates \(x_1, \ldots, x_n \in [0, 1]\) that has a low value for the energy function \(E = \sum\limits_{1 \leq i < j \leq n} \frac{A_{ij}}{|x_i – x_j|}\). You are not required to minimize it, but if you can that would be lovely.

Some initial thoughts on how one might solve it:

  1. Any optimal solution for a non-zero \(A\) assigns at least one point to 0 and at least one point to 1.
  2. For any given starting position there is a relatively straightforward gradient descent algorithm to find a local minimum
  3. If \(A_{ij} > 0\) for all \(i \neq j\) then there are at least \(n!\) local minima.
  4. If you’ve placed \(k\) items and you want to place a \(k + 1th\), as long as it has a non-zero coordinate in \(A\) with at least one of the items you’ve placed so far there is a unique place to put the k+1th item (though finding this is O(k)). This probably makes a greedy algorithm quite productive for smallish \(n\).
  5. If the graph with edges \(\{ \{i, j\} : A_{ij} \neq 0\}\)  is not connected you can run the problem separately on each component and merge the results, because they don’t affect each-other.
  6. Some form of e.g. simulated annealing with the mutation operator being to swap pairs of elements then run a local optimisation would probably be quite productive.

As an additional comment: Trying to do combinatorial optimisation on permutations of indices is an exercise in frustration. Believe me, I know. This problem may prove more annoying than it’s worth.

This entry was posted in How do I computer?, Numbers are hard on by .