A theorem about certain families of random variables

This is a bit of pure mathematics. I was originally interested in it for decision theoretic reasons, but it turned out to be the wrong structure for what I was looking for. Still, I found the result interesting enough in its own right that I thought I would write it up.

Let \(S\) be some non-empty finite set. Let \(\mathcal{C}\) be the set of non-empty subsets of \(S\). Let \(\mathcal{R}\) be the set of random variables taking values in \(S\). A strategy is a function \(s : \mathcal{C} \to \mathcal{R}\) such that \(P(s(U) \in U) = 1\).

A strategy is said to be subset consistent if when \(U \subseteq V\) and \(x \in U\) then \(P(s(U) = x) = P(s(V) = x) P(s(U) \in V)\). (The intuition here is “It doesn’t matter what order we add pieces of information in”).

It turns out that there’s a very nice characterisation of all subset-consistent decision strategies.


Let \(L_1, \ldots, L_n\) be a partition of \(S\) into non-empty sets. Let \(w : S \to (0, \infty)\). Define a strategy as follows:

Given \(U \in \mathcal{C}\), define \(U’ = L_i \cap U\), where i is the largest such that this set is non-empty. Now pick \(x \in U’\) with probability proportional to \(w(x)\).

This strategy is subset-consistent.


Note that \(s(U) = s(U’)\).

Let \(V \subseteq U\), and let \(L_i\) be as above. If \(L_i \cap V = \emptyset\) then \(P(s(U) \in V) = 0\), and in particular for \(x \in V\) we have \(P(s(U) = x) = 0\), so the result is proved in this case as both sides of the equation are 0.

In the case where \(V \cap L_i \neq 0\), we have \(V’ = V \cap L_i\), and \(s(V)\) is chosen from \(V’\). Note that \(s(U) \in V\) iff \(s(U) \in V’\).

P(s(U) = x) &= \frac{w(x)}{\sum\limits_{y \in U’} w(y)} \\
&= \frac{w(x)}{\sum\limits_{y \in V’} w(y)} \frac{\sum\limits_{y \in V’} w(y)}{\sum\limits_{y \in U’} w(y)} \\
&= P(s(V’) = x) P(s(U) \in V’) \\
&= p(s(V) = x) P(s(U) \in V)\\

As desired.


We will spend the rest of this proving that in fact every subset-consistent strategy arises this way.

From now on let \(s\) be some fixed strategy.

We’ll need some notational convenience. Let \(\rho_U(x) = P(s(U) = x)\). Let \(\tau(x,y) = \rho_{\{x,y\}}(x)\) if \(x \neq y\) and \(\tau(x,x) = \frac{1}{2}\).

The following lemma is practically just definition shuffling, but is surprisingly useful:

Lemma: Let \(x, y \in U\). Then \(\rho_U(x) = (\rho_U(x) + \rho_U(y)) \tau(x, y)\).

Proof: If \(x = y\) then this just boils down to \(\rho_U(x) = 2 \rho_U(x) \frac{1}{2}\). So suppose \(x \neq y\).

Let \(V = \{x, y\}\). By subset consistency we have \(\rho_U(x) = \rho_V(x) P(s(U) \in V)\). But \(\rho_V(x) = \tau(x,y)\) by definition, and \(s(U) \in V\) iff \(s(U) = x\) or \(s(U) = y\), so \(P(s(U) \in V) = \rho_U(x) + \rho_U(y)\). Combining these gives the desired result.


Say \(x \prec y\) if \(\tau(x,y) = 0\). Say \(x \sim y\) if neither \(x \prec y\) nor \(y \prec x\). Note that because \(\tau(x,y) = 1 – \tau(y, x)\) it’s not possible to have both \(x \prec y\) and \(y \prec x\).

Lemma: If \(x, y \in U\) with \(x \prec y\) then \(\rho_U(x) = 0\).

Proof: \(\rho_U(x) = (\rho_U(x) + \rho_U(y)) \tau(x, y) = 0\).

This has a partial converse:

Lemma: If \(x, y \in U\) with \(\rho_U(x) = 0\) and \(rho_U(y) \neq 0\) then \(x \prec y\).


By the preceding lemma we have \(\tau(x,y) = \frac{\rho_U(x)}{\rho_U(x) + \rho_U(y)} = \frac{0}{0 + \rho_U(y)} = 0\). This is well defined because \(\rho_U(y) \neq 0\).


Lemma: If \(x \prec y\) and \(y \sim z\) then \(x \prec z\). Similarly if \(x \sim z\) then \(z \prec y\).


We’ll only prove the first version. The second will follow similarly.

Let \(V = \{x,y,z\}\). Then because \(y \prec z\) we have \(\rho_U(y) = 0\). Then \(\rho_U(x) = \rho_U(x) \tau(x, y)\). But because \(x \sim y\) we have that \(tau(x, y)\) is neither \(0\) nor \(1\), so the only possible solution to this is that \(\rho_U(x) = 0\).


Lemma: If \(x \prec y\) and \(y \prec z\) then \(x \prec z\).

Proof: Again consider \(V = \{x,y,z\}\). We must have \(\rho_V(x) = \rho_V(y) = 0\) because \(x \prec y\) and \(y \prec z\) respectively. Therefore \(\rho_V(x) \neq 0\). Therefore by a previous lemma we have \(x \prec z\).

Lemma: \(\sim\) is an equivalence relationship.

We have \(\tau(x,x) = \frac{1}{2} \neq 0, 1\) by definition, so \(\sim\) is reflexive.

It is symmetric because \(\tau(x, y) = 1 – \tau(y, x)\), so if \(\tau(x,y) \neq 0, 1\) then \(\tau(y,x) \neq 0, 1\).

We now must show transitivity. Suppose \(x \sim y\) and \(y \sim z\). If it is not the case that \(x \sim z\) then we must have either \(x \prec z\) or \(z \prec x\). Suppose without loss of generality that \(x \prec z\). Then by a previous lemma we must have \(y \prec z\), contradicting our assumption that \(y \sim z\).


Theorem: There exists a partition of \(S\) into finitely many sets \(L_1, \ldots, L_n\) such that for any \(U \in \mathcal{C}\) we have \(P(s(U) \in U \cap L_i) = 1\) where \(i\) is maximal such that this set is non-empty.


We will take as our partition the set of \(\sim\) equivalence classes. These equivalence classes are given a total ordering by \(\prec\), because if \(w \sim x\) and \(y \sim z\) then \(x \prec y\) implies \(w \prec z\). There are finitely many of them, so we can put them in an order \(L_1, \ldots, L_n\) such that if \(x \in L_i\) and \(y \in L_j\) with \(i < j\) then \(x \prec y\). If \(L_i\) is maximal such that \(L_i \cap U \neq \emptyset\), then for any \(x \in L_j\) with \(j < i\) we must have \(\rho_U(x) = 0\), because for some \(y \in U\) we have \(y \in L_i\) and thus \(x \prec y\). Therefore \(P(s(U) \not\in L_i \cap U) = 0\) as desired. QED We now need only define our weight function. Define \(w(x) = \rho_{L_i}(x)\) where \(x \in L_i\). Note that \(w(x) \neq 0\) because if it were 0 then we'd have \(x \prec y\) for some \(y \in L_i\), contradicting that \(x \in L_i\). Theorem: Let \(s(U)\) is chosen by choosing from \(U'\) with probability proportional to \(w(x)\). Proof: We have \(P(s(U) = x) = P(s(U') = x)\) by the previous theorem. Now, \(U' \subseteq L_i\). Therefore by subset consistency we must have \(P(s(U') = x) = \frac{P(s(L_i) = x)}{P(s(L_i) \in U'\)} = \frac{w(x)}{\sum\limits_{y \in U'} w(y) }\). This was the desired result. QED We have now shown the converse to our original theorem: Every subset-consistent strategy can be constructed from a partition and a weight function. As an additional note, I've thought a bit about how this works if you lift the restriction that \(S\) must be finite. A lot of this carries through naturally, but some of it doesn't. I think if you replace the subset consistency hypothesis with the natural infinitary equivalent that if \(U \subseteq V \subseteq W\) then \(P(s(W) \in U) = P(s(W) \in V) P(s(V) \in U)\) and add the additional restriction that for every \(U\) there exists an \(x \in U\) with \(P(s(U) = x) \neq 0\) then a lot of this carries over naturally with the partition replaced by one whose reverse is a well-ordered set and each partition is countable. But I haven’t worked through the details yet, and I’m not sure I’m interested in doing so.

This entry was posted in Uncategorized on by .

One thought on “A theorem about certain families of random variables

  1. Pingback: Best of drmaciver.com | David R. MacIver

Comments are closed.