# Some simple theorems about dominance of probability distributions

This is a post about some very simple results about a particular partial order on probability distributions. It came about because I was talking with Paul Crowley about whether I believed in there being a valid total ordering on probability distributions which says that one is strictly better than the other. My answer was that I did not. After some additional reflection I realised that there was a partial ordering I considered valid. After some additional additional reflection I realised it had a quite nice structure. This is a write up of the things I noticed about it. It’s all quite easy to prove, so the results are probably more interesting than the proofs themselves.

Let $$X, Y$$ be natural number valued random variables.

Say $$X \preceq Y$$ if $$P(X \geq n) \leq P(Y \geq n)$$ for all $$x$$. $$X \prec Y$$ if $$X \preceq Y$$ and $$X \neq Y$$. Read this as “Y dominates X” or “X is dominated by Y”. The idea is simply that we always expect $$Y$$ to produce better results than $$Z$$, if we read larger as better.

$$\preceq$$ is a partial order because it’s the intersection of the partial orders $$P(X \geq n) \leq P(Y \geq n)$$

For $$p \in [0,1]$$, let $$L(p, A, B)$$ be the random variable which consists of drawing from A with probability p and from B with probability 1 – p.

Theorem: If $$A, B \preceq C$$ then $$L(p, A, B) \preceq C$$. Similarily if $$C \preceq A, B$$ then $$C \preceq L(p, A, B$$

Proof: $$P(L(p,A,B) \geq n) = p P(A \geq n) + (1 – p) P(B \geq n) \leq p P(C \geq n) + (1 – p) P(C \geq n) = P(C \geq n)$$ as desired. Similarly the other way.
QED

Lemma: $$E(X) = \sum\limits_{n=1}^\infty P(X >= n)$$
Proof: Let $$p_n = P(X = n)$$.

$\sum\limits_{n = 1}^\infty P(X >= n) = \sum\limits_{n=1}^\infty \sum\limits_{m=n}^\infty p_m$

A term $$p_k$$ appears $$k$$ times in the right hand side – it is present if in the sum if $$k \leq n$$ and not otherwise. So we can rearrange the sum (the terms are all non-negative so this is valid without worrying about convergence issues) to get

$\sum\limits_{n = 1}^\infty P(X >= n) = \sum\limits_{n=1}^\infty n p_n = \sum\limits_{n=0}^\infty n p_n = E(X)$
QED

Theorem: If $$X \prec Y$$ then $$E(X) < E(Y)$$. Proof: Suppose $$X \preceq Y$$. Then $$P(X \geq n) < P(Y \geq n)$$ for some $$n$$, as it's always $$\leq$$ and if it were always equal then the distributions would be the same, which they're not by hypothesis. Therefore $$\sum P(X \geq n) < \sum P(Y \geq n)$$, because all the other terms are $$\leq$$. But according to the lemma this equation is $$E(X) < E(Y)$$ as desired. QED Lemma: Let $$x_n$$ be a monotonic decreasing real-valued sequence converging to 0 such that $$x_0 = 1$$. Then there is a random variable $$X$$ with $$P(X \geq n) = x_n$$. Proof: Let $$p_n = x_n - x_{n+1}$$. Because $$x_n$$ is monotonic decreasing we have $$p_n \geq 0$$. Also, $$p_n \leq x_n \leq x_0 = 1$$. Further $$\sum\limits_{k=0}^n p_k = x_0 - x_1 + x_1 - x_2 + \ldots + x_n - x_{n+1} = 1 - x_{n+1}$$. So $$\sum\limits_{k=0}^\infty p_k = \lim\limits_{n \to \infty} 1 - x_{n+1} = 1$$. So $$p_n$$ is a probability distribution. Let $$X$$ be a random variable such that $$P(X = n) = p_n$$. Then $$P(X \geq n) = \sum\limits_{k = n}^\infty p_n = \sum\limits_{k = n}^\infty x_n - x_{n+1} = x_n - x_{n+1} + x_{n+1} - x_{n+2} + \ldots = x_n$$. QED Theorem: The set of random variables with $$\preceq$$ form a lattice. i.e. for random variables $$X,Y$$ there is a random variable $$X \wedge Y$$ such that $$Z \preceq X, Y$$ iff $$X \preceq X \wedge Y$$. Similarly there is $$X \vee Y$$ such that $$X, Y \preceq Z$$ iff $$X \wedge Y \preceq Z$$. Proof: To get $$X \wedge Y$$ take the sequence $$n \to \mathrm{min}(P(X \geq n), P(Y \geq n))$$. This satisfies the conditions of the lemma, so we can find a random variable $$Z$$ such that this is $$P(Z \geq n)$$. By the definition of min this satisfies the desired properties. Similarly with max to get $$X \vee Y$$. QED

This entry was posted in Numbers are hard on by .

## 5 thoughts on “Some simple theorems about dominance of probability distributions”

1. Veky

Why is P(pA+(1-p)B)=pP(A)+(1-p)P(B)?

1. david Post author

Because pA + (1-p)B is an utter abuse of notation which I completely failed to explain! Note that A and B are distributions, not random variables. I was using pA + (1-p)B to refer to the combined lottery of choosing A with probability p, else choosing B.

Edit: On reflection this notation is simply wrong and I will fix it.