In this post I will talk about conditional expectation and disintegration of a measure with respect to a -algebra. All this is classical probability theory but I think not many people (me included) come across this in a standard course in probability. These tools and ideas are quite useful in ergodic theory. An example of this is the proof I present on this post of the ergodic decomposition theorem.
— 1. Conditional expectation —
I will start by trying to give some intuition about the notion of conditional expectation with an example. Suppose that you want to know how many people will visit a particular beach on a given day (say you sell ice-cream). This is a random variable . The first approximation you can make for the expected value for is the statistical average. However, a better approximation for can be made if you take other factors into account. For instance, let’s say that the number of visitors depends on the temperature of the given day and suppose that you know at the beginning of the day what the temperature will be. Thus, what you want to know is how many visitors to expect, conditional on the event that the temperature is (say). Let’s call the random variable that represents the temperature . The expected number of visitors on a day where the temperature is can be denoted by or . The function is what is called the conditional expectation of with respect to .
Note that that (or, more precisely, ) can also be thought of as a random variable (it is a number that depends on the random temperature ). Indeed this random variable is the best approximation for if all you know is the temperature . It is not hard to see that the expectation of (the average over all possible temperatures of the expected number of visitors in a day with that temperature) is exactly the expectation of (the expected number of visitors, regardless of temperature).
Definition 1 Let be a probability space, let be a -algebra and let be a random variable. The conditional expectation of with respect to is the function such that for all we have
Informally, the conditional expectation is the function in that better approximates . This sentence is made more precise below on Theorem 6. From an information theory point of view, is the best guess of the value of when all the information we have is . For instance if we have no information at all (so ), then is the constant function , and that’s the best guess one can have for . On the other extreme situation, when we have complete information (i.e. when ) then and our “guess” for is itself.
We need to show that the conditional expectation exists and is unique in the space :
Proposition 2 Let be a probability space, let be a -algebra and let be a random variable. The conditional expectation exists and is unique.
Proof: To prove existence we define the complex measure by for every (if you are not comfortable with complex measures, split with each a non-negative real valued function and apply the proof to each separately). It is easy to check that this is indeed a complex measure. Moreover if is such that then as well. In other words we have . Therefore we can apply the Radon-Nikodym theorem to find a derivative . By definition of we have that for every
Thus is a conditional expectation of with respect to . To prove uniqueness, assume that are both conditional expectations of with respect to . Then for each we have which implies .
If the set has positive measure then without loss of generality the set has positive measure. Hence there is some such that the set has positive measure. But since both and are measurable in we conclude that and hence
which is a contradiction. This shows the uniqueness of .
We will need the following basic fact about conditional expectations:
Indeed if takes values in a convex set in a Banach space, then also takes values on that convex set, but the proof is technically more cumbersome.
Proof: We prove only the first inequality, the second can be easily derived from the first one by considering the random variable . Fix and let . We have
which simplifies to , and thus . Since was arbitrary we conclude that almost surely .
Proof: Let and . Let be the set of points where the inequality (1) fails. Let and let . We have
Since is the set of points where the inequality (1) fails, we conclude that as desired.
Proof: We show that actually the norm of the operator is : Let . By Lemma 4 we have
Finally I will present another way to think about conditional expectation for a function . In this case we can use the Hilbert space structure to give a different characterization of the conditional expectation.
Proof: By definition of orthogonal projection, for any function we have . If then the indicator function of is in . Therefore
and hence as desired.
— 2. Disintegration of measures —
A good example to keep in mind when talking about disintegration of measures is the following: Let be the lower triangle on the unit square, let be the Borel -algebra over and let be the dimensional Lebesgue measure. Now let be the -algebra defined by if and only if and is the union of vertical lines (more precisely, for all point we have for all ).
Let be the restriction of to , let be the Borel -algebra and let be the measure on that has density with respect to the Lebesgue measure. Then the probability space is equivalent to the system . More precisely, the map from to is an isomorphism of probability spaces. The meaning of this is quite intuitive: induces a bijection of the algebras that matches sets with the same measure.
where is a probability measure on . More precisely, is the normalized Lebesgue measure on , so that for any interval .
Note that if we started with the unit square instead of the triangle, then all the measures would be the same, and equation (2) is essentially Fubini’s theorem. Thus the disintegration of measures can be view as an inverted version of Fubini’s theorem.
Theorem 7 Let be a compact metric space, let be the -algebra of Borel sets on and let be a probability measure. Let be a -algebra. Then for almost every there exists a probability measure on such that for every :
This result applies more generally than to compact metric spaces, but this restriction makes the proof technically easier.
Proof: Since is a compact metric space, the space of continuous functions from to is separable (with the topology of uniform convergence, equivalently, the supremum norm). Let be a countable dense set in . For each , the conditional expectation is defined -a.e. on . Thus there is a set of full measure such that is defined on for all .
For each define . Note that, by Proposition 3 we have
Thus can be extended to a continuous functional on . By the Riesz representation theorem there exists a measure on such that .
For each we have that , hence the function is in . Since is a dense set in and is a dense set in , we conclude that is a dense set in . It follows from Proposition 5 that the function is in for any .
Finally for each we have
and since the sequence is a dense set in we conclude that (3) holds for any . Now given , one can find such that a.e. and . From Proposition 5 this implies that and hence a.e. For each for which this series converges and we have