Some simple theorems about dominance of probability distributions

This is a post about some very simple results about a particular partial order on probability distributions. It came about because I was talking with Paul Crowley about whether I believed in there being a valid total ordering on probability distributions which says that one is strictly better than the other. My answer was that I did not. After some additional reflection I realised that there was a partial ordering I considered valid. After some additional additional reflection I realised it had a quite nice structure. This is a write up of the things I noticed about it. It’s all quite easy to prove, so the results are probably more interesting than the proofs themselves.

Let \(X, Y\) be natural number valued random variables.

Say \(X \preceq Y\) if \(P(X \geq n) \leq P(Y \geq n)\) for all \(x\). \(X \prec Y\) if \(X \preceq Y\) and \(X \neq Y\). Read this as “Y dominates X” or “X is dominated by Y”. The idea is simply that we always expect \(Y\) to produce better results than \(Z\), if we read larger as better.

\(\preceq\) is a partial order because it’s the intersection of the partial orders \(P(X \geq n) \leq P(Y \geq n)\)

For \(p \in [0,1]\), let \(L(p, A, B)\) be the random variable which consists of drawing from A with probability p and from B with probability 1 – p.

Theorem: If \(A, B \preceq C\) then \(L(p, A, B) \preceq C\). Similarily if \(C \preceq A, B\) then \(C \preceq L(p, A, B\)

Proof: \(P(L(p,A,B) \geq n) = p P(A \geq n) + (1 – p) P(B \geq n) \leq p P(C \geq n) + (1 – p) P(C \geq n) = P(C \geq n)\) as desired. Similarly the other way.
QED

Lemma: \(E(X) = \sum\limits_{n=1}^\infty P(X >= n)\)
Proof: Let \(p_n = P(X = n)\).

\[\sum\limits_{n = 1}^\infty P(X >= n) = \sum\limits_{n=1}^\infty \sum\limits_{m=n}^\infty p_m\]

A term \(p_k\) appears \(k\) times in the right hand side – it is present if in the sum if \(k \leq n\) and not otherwise. So we can rearrange the sum (the terms are all non-negative so this is valid without worrying about convergence issues) to get

\[\sum\limits_{n = 1}^\infty P(X >= n) = \sum\limits_{n=1}^\infty n p_n = \sum\limits_{n=0}^\infty n p_n = E(X)\]
QED

Theorem: If \(X \prec Y\) then \(E(X) < E(Y)\). Proof: Suppose \(X \preceq Y\). Then \(P(X \geq n) < P(Y \geq n)\) for some \(n\), as it's always \(\leq\) and if it were always equal then the distributions would be the same, which they're not by hypothesis. Therefore \(\sum P(X \geq n) < \sum P(Y \geq n)\), because all the other terms are \(\leq\). But according to the lemma this equation is \(E(X) < E(Y)\) as desired. QED Lemma: Let \(x_n\) be a monotonic decreasing real-valued sequence converging to 0 such that \(x_0 = 1\). Then there is a random variable \(X\) with \(P(X \geq n) = x_n\). Proof: Let \(p_n = x_n - x_{n+1}\). Because \(x_n\) is monotonic decreasing we have \(p_n \geq 0\). Also, \(p_n \leq x_n \leq x_0 = 1\). Further \(\sum\limits_{k=0}^n p_k = x_0 - x_1 + x_1 - x_2 + \ldots + x_n - x_{n+1} = 1 - x_{n+1}\). So \(\sum\limits_{k=0}^\infty p_k = \lim\limits_{n \to \infty} 1 - x_{n+1} = 1\). So \(p_n\) is a probability distribution. Let \(X\) be a random variable such that \(P(X = n) = p_n\). Then \(P(X \geq n) = \sum\limits_{k = n}^\infty p_n = \sum\limits_{k = n}^\infty x_n - x_{n+1} = x_n - x_{n+1} + x_{n+1} - x_{n+2} + \ldots = x_n\). QED Theorem: The set of random variables with \(\preceq\) form a lattice. i.e. for random variables \(X,Y\) there is a random variable \(X \wedge Y\) such that \(Z \preceq X, Y\) iff \(X \preceq X \wedge Y\). Similarly there is \(X \vee Y\) such that \(X, Y \preceq Z\) iff \(X \wedge Y \preceq Z\). Proof: To get \(X \wedge Y\) take the sequence \(n \to \mathrm{min}(P(X \geq n), P(Y \geq n))\). This satisfies the conditions of the lemma, so we can find a random variable \(Z\) such that this is \(P(Z \geq n)\). By the definition of min this satisfies the desired properties. Similarly with max to get \(X \vee Y\). QED

This entry was posted in Numbers are hard on by .

5 thoughts on “Some simple theorems about dominance of probability distributions

    1. david Post author

      Because pA + (1-p)B is an utter abuse of notation which I completely failed to explain! Note that A and B are distributions, not random variables. I was using pA + (1-p)B to refer to the combined lottery of choosing A with probability p, else choosing B.

      Edit: On reflection this notation is simply wrong and I will fix it.

  1. Pingback: A theorem on dominance of random variables | David R. MacIver

  2. Pingback: When does one normal distribution dominate another? | David R. MacIver

  3. Pingback: Best of drmaciver.com | David R. MacIver

Comments are closed.