Let \(\mathcal{P}_n = \{ x \in [0, \infty)^n : \sum x = 1 \}\) be the set of probability distributions on n items.

Let \(x \in [0, 1]\). Define the selection function \(s_x : \mathcal{P}_n \to \{1, \ldots, n\}\) as \(s_x(p) = \mathrm{min} \{k : \sum\limits_{i=1}^k p_i \geq x \}\).

Then:

- \(s_x\) is continuous except on a closed set of measure zero.
- If \(U\) is chosen uniformly at random from \([0, 1]\) then \(P(s_U(p) = k) = p_k\).

Proof of 1: It is continuous except on the set \(\bigcup\limits_k \{ p \in \mathcal{P} : \sum\limits_{i=1}^k p_i = x \}\) as small variations won’t change the result if it’s not on a boundary. This is a closed set of measure zero.

Proof of 2: \(s_U(p) = k\) iff \(\sum_{i < k} p_i < U\) and \(U \leq \sum_{i \leq k} p_i \). i.e. if \(U \in (\sum_{i < k} p_i, \sum_{i \leq k} p_i]\). This is an interval of length \(p_k\) and \(U\) is uniform, so \(P(U \in (\sum_{i < k} p_i, \sum_{i \leq k} p_i]) = p_k\). Therefore \(P(S_U(p) = k) = p_k\) as desired.QED

Why is this a useful lemma?

Well, because I often find myself in a situation at work right now where I want to be able to experiment with randomized algorithms in a way that lets me see the effect of small tweaks. Unfortunately a change which changes the probabilities slightly requires me to re-roll the dice, so it’s hard to tell whether a given change is the result of randomness or the change I’ve made.

By deterministically seeding a PRNG and using it to pick the selection function we get something which is almost always continuous in the probabilities used, so it’s mostly stable against the probabilities used and will only make big changes in response to big changes in the probability.

Pingback: The Top-to-Random Shuffle II | Eventually Almost Everywhere