So as part of my new resolution to start reading the books on my shelves, I recently read through Probability with Martingales.

I’d be lying if I said I fully understood all the material: It’s quite dense, and my ability to read mathematics has atrophied a lot (I’m now doing a reread of Rudin to refresh my memory). But there’s one very basic point that stuck out as genuinely interesting to me.

When introducing measure theory, it’s common to treat sigma-algebras as this annoying detail you have to suffer through in order to get to the good stuff. They’re that family of sets that it’s really annoying that it isn’t the whole power set. And we would have gotten away with it, if it weren’t for that pesky axiom of choice.

In Probability with Martingales this is not the treatment they are given. The sigma algebras are a first class part of the theory: You’re not just interested in the largest sigma algebra you can get, you care quite a lot about the structure of different families of sigma algebras. In particular you are very interested in sub sigma algebras.

Why?

Well. If I may briefly read too much into the fact that elements of a sigma algebra are called measurable sets… what are we measuring them with?

It turns out that there’s a pretty natural interpretation of sub-sigma algebras in terms of measurable functions: If you have a sigma-algebra \(\mathcal{G}\) on \(X\) and a family of measurable functions \(\{f_\alpha : X \to Y_\alpha : \alpha \in A \}\) then you can look at the the smallest sigma-algebra \(\sigma(f_\alpha) \subseteq \mathcal{G}\) for which all these functions are still measurable. This is essentially the measurable sets which we can observe by only asking questions about these functions.

It turns out that every sub sigma algebra can be realised this way, but the proof is disappointing: Given \(\mathcal{F} \subseteq \mathcal{G}\) you just consider the identify function \(\iota: (X, \mathcal{F}) \to (X, \mathcal{G})\) and \(\mathcal{G}\) is the sigma-algebra generated by this function.

One interesting special case of this is sequential random processes. Suppose we have a set of random variables \(X_1, \ldots, X_n, \ldots\) (not necessarily independent, identically distributed, or even taking values in the same set). Our underlying space then captures an entire infinite chain of random variables stretching into the future. But we are finite beings and can only actually look at what has happened so far. This then gives us a nested sequence of sigma algebras \(\mathcal{F_1} \subseteq \ldots \subseteq \mathcal{F_n} \subseteq \ldots \) where \(\mathcal{F_n} = \sigma(X_1, \ldots, X_n)\) is the collection of things we we can measure at time n.

One of the reasons this is interesting is that a lot of things we would naturally pose in terms of random variables can instead be posed in terms of sigma-algebras. This tends to very naturally erase any difference between single random variables and families of random variables. e.g. you can talk about independence of sigma algebras (\(\mathcal{G}\) and \(\mathcal{H}\) are independent iff for \(\mu(G \cap H) = \mu(G) \mu(H)\) for \(G \in \mathcal{G}, H \in \mathcal{H}\)) and two families of random variables are independent if and only if the generated sigma algebras are independent.

A more abstract reason it’s interesting is that it’s quite nice to see the sigma-algebras play a front and center role as opposed to this annoyance we want to forget about. I think it makes the theory richer and more coherent to do it this way.