## The Ergodic Theorem

— 1. Introduction —

One can argue that (modern) ergodic theory started with the ergodic Theorem in the early 30’s. Vaguely speaking the ergodic theorem asserts that in an ergodic dynamical system (essentially a system where “everything” moves around) the statistical (or time) average is the same as the space average. For instance, if a bird flies “ergodically” on the earth surface, and one measures the proportion of the time that the bird is over an ocean, after enough time (say, a couple centuries) that proportion (the time average) would get close to the actual proportion of the earth surface covered by oceans (the space average).

To make things more interesting there is more than just one ergodic theorem, and both von Neumann and Birkhoff proved a version of the ergodic theorem at around the same time. While von Neumann’s result concerns ${L^2}$ convergence, has a quick proof and is valid for any averaging scheme (and indeed any discrete amenable group action), Birkhoff’s theorem is about pointwise convergence, holds for any function in ${L^1}$, has a harder proof and is not suitable for all averaging schemes.

In this post, a measure preserving system (m.p.s. for short) is a triple ${(X,\mu, T)}$ where ${(X,\mu)}$ is a probability space (the ${\sigma}$-algebra is omitted from the notation because it plays no important part in this discussion. All sets and functions will be assumed to be measurable) and ${T:X\rightarrow X}$ preserves the measure, i.e. for any ${A\subset X}$ we have ${\mu(T^{-1}A)=\mu(\{x\in X:Tx\in A\})=\mu(A)}$. A m.p.s. is ergodic if the only sets ${A\subset X}$ such that ${T^{-1}A=A}$ have measure ${\mu(A)=0}$ or ${\mu(A)=1}$. Equivalently there are no functions ${f\in L^1(X)}$ such that ${f(Tx)=f(x)}$ a.e. We now state both versions:

Theorem 1 (von Neumann’s ergodic Theorem) Let ${(X,\mu, T)}$ be an ergodic m.p.s. and let ${f\in L^2(X)}$. Then

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nf(T^nx)=\int_X fd\mu\qquad\text{ in }L^2(X)$

Theorem 2 (Birkhoff’s ergodic Theorem) Let ${(X,\mu, T)}$ be an ergodic m.p.s. and let ${f\in L^1(X)}$. Then

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nf(T^nx)=\int_X fd\mu\qquad\text{a.e.}$

— 2. von Neumann’s Ergodic Theorem —

The proof of von Neumann’s Theorem (also known as mean ergodic Theorem) reveals that the result is intrinsically related with unitary operators on Hilbert spaces. It is this feature that makes this result suitable for so many generalizations and that makes it so useful in establishing recurrence results.

The following Lemma is sometimes called itself von Neumann’s Ergodic Theorem.

Lemma 3 Let ${H}$ be a Hilbert space and let ${U:H\rightarrow H}$ be an unitary operator. Let ${I=\{f\in H:Uf=f\}}$ be the subspace of invariant vectors and let ${P:H\rightarrow I}$ be the orthogonal projection onto ${I}$. Then for any ${f\in H}$ we have

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NU^nf=Pf$

where convergence is in the norm topology.

Before we prove this lemma, let’s see how it implies the ergodic Theorem 1:

First make ${H=L^2(X)}$ and note that the map ${U:H\rightarrow H}$ defined by ${(Uf)(x)=f(Tx)}$ is a unitary operator (known as the Koopman operator of ${T}$). Also observe that in this situation, the subspace ${I}$ of invariant functions contains only the constant functions, and the orthogonal projection ${P}$ is the integral operator, i.e. ${Pf=\int_Xfd\mu}$. Thus the Theorem 1 is just a rephrasing of the Lemma 3 for this case.

We now prove the Lemma 3

Proof: The convergence clearly holds if ${f\in I}$. On the other hand, if ${f=g-Ug}$ for some ${g\in H}$, then for any ${h\in I}$ we have

$\displaystyle \langle f,h\rangle=\langle g,h\rangle-\langle Ug,h\rangle=\langle g,h\rangle-\langle g,Uh\rangle=0$

hence ${f}$ is orthogonal to ${I}$ and so ${Pf=0}$. Moreover we have that ${\sum_{n=1}^NU^nf=U^{N+1}g-g}$, so the limit in the lemma is indeed ${0}$.

Call ${J}$ the subspace of the vectors of the form ${g-Ug}$. We claim that ${H=I\oplus J}$ and this concludes the proof. To prove the claim, let ${f\perp J}$, we have:

$\displaystyle \begin{array}{rcl} \displaystyle\|f-Uf\|&=&\displaystyle\|f\|^2+\|Uf\|^2-2Re\langle f,Uf\rangle\\&=&\displaystyle2\|f\|^2-2Re\langle f,Uf\rangle-2Re\langle f,f-Uf\rangle=2\|f\|^2-2Re\langle f,f\rangle=0 \end{array}$

so ${f\in I}$ and hence ${I=J^\perp}$ and this finishes the proof. $\Box$

— 3. Birkhoff’s Ergodic Theorem —

The first proof I learned of this Theorem uses a so-called maximal inequality. Here I will present (or rather sketch) a different proof, more combinatorial in nature.

First we can easily reduce to non-negative functions, so we will assume that ${f\geq0}$. Moreover we will only deal with bounded functions (so we will assume wlog that ${f\leq1}$), the general case follows from some standard estimations. Let ${A_Nf(x)=\frac1N\sum_{n=0}^{N-1}f(T^nx)}$ and let ${f^+(x)=\limsup A_Nf}$ and ${f^-=\liminf A_Nf(x)}$. It suffices to prove that ${f^+(x)\leq\int_Xfd\mu}$ a.e., (because considering then ${f'=1-f}$ we get that also ${f^-(x)\geq\int_Xfd\mu}$).

Now fix some ${\epsilon>0}$. It suffices to prove that ${f^+(x)\leq\int_Xfd\mu+\epsilon=:\alpha}$ a.e., because we can then take limit when ${\epsilon\rightarrow0}$. Let ${B=\{x:f^+(x)>\alpha\}}$. Clearly ${T^{-1}B=B}$, so by ergodicity either ${\mu(B)=1}$ or ${\mu(B)=0}$. Since we need to prove that ${\mu(B)=0}$, it will suffice to prove that ${\mu(B)<1}$.

Let ${f_n(x)=f(x)+f(tx)+...+f(T^{n-1}x)-n\alpha}$, let ${F_N=\max\{0,f_1,...,f_N\}}$ and let ${B_N=\{x:F_N(x)>0\}}$. For each ${n\leq N}$ we have ${F_N\geq f_n}$, so ${F_N(Tx)\geq f_n(Tx)}$ and hence ${F_N(Tx)+f(x)\geq f_n(Tx)+f(x)=f_{n+1}(x)+\alpha}$. Thus for ${x\in B_N}$, taking the maximum over ${n=1,...,N}$, we have ${F_N(Tx)+f(x)\geq F_N(x)+\alpha}$, or ${f(x)-\alpha\geq F_N(x)-F_N(Tx)}$. Integrating over ${B_N}$ we get

$\displaystyle \begin{array}{rcl} \displaystyle\int_{B_N}f-\alpha d\mu&\geq&\displaystyle\int_{B_N}F_N(x)-F_N(Tx)d\mu=\int_{B_N}F_Nd\mu-\int_{B_N}F_N(Tx)d\mu \\&=&\displaystyle\int_XF_Nd\mu-\int_{B_N}F_N(Tx)d\mu\geq\int_XF_N(x)-F_N(Tx)d\mu=0 \end{array}$

Rewriting we get that

$\displaystyle \mu(B_N)\leq\frac1\alpha\int_{B_N}fd\mu\leq\frac1\alpha\int_Xfd\mu=1-\epsilon/\alpha<1$

Since ${B_N\subset B_{N+1}}$ we have that ${\mu\left(\bigcup B_N\right)<1}$ as well. Now note that ${B=\{x:f_n(x)>0\text{ for infinitely many }n\}\subset\bigcup B_N}$, so we finally have that ${\mu(B)<1}$ and we are done.

— 4. Averages over Følner sequences —

As stated (and proved) the Birkhoff theorem seems stronger than von Neumann’s (although it is not true that almost everywhere pointwise convergence implies ${L^2}$ convergence in general, it is possible to deduce von Neumann’s Theorem from Birkhoff’s Theorem). However, the proof for the von Neumann’s Theorem also proves the following result:

Theorem 4 In the conditions of the Theorem 1, let ${\{M_n\},\{N_n\}}$ be sequences of integers such that ${N_n-M_n\rightarrow\infty}$. Then

$\displaystyle \lim_{n\rightarrow\infty}\frac1{M_n-N_n}\sum_{k=N_n+1}^{M_n}f(T^kx)=\int_Xfd\mu\qquad\text{ in }L^2$

If one attempts to modify the proof of Birkhoff’s Theorem to obtain an analogue result, one runs into difficulties, and indeed such a result is not true. The importance of this generalization lies in the fact that sequences of intervals ${\{[N_n,M_n]\}}$ with ${M_n-N_n\rightarrow\infty}$ are essentially all the Følner sequences of ${{\mathbb N}}$, and it turns out that the von Neumann Ergodic Theorem holds for any amenable group and any Følner sequence.

We now provide an example, due to Akcoglu and del Junco which provides a counter-example for a pointwise ergodic Theorem along general Følner sequences. We mention that Lindenstrauss found sufficient conditions for a Følner sequence to verify the pointwise ergodic theorem, and every Følner sequence has a “good” subsequence (and hence every discrete amenable group has such a Følner sequence).

Theorem 5 Let ${(X,\mu,T)}$ be an invertible ergodic m.p.s. with ${\mu(\{x\})=0}$ for each ${x\in X}$. Let ${\{a(N)\}_N}$ be a non-decreasing unbounded sequence of integers such that ${a(N)/N\rightarrow0}$. Then there exists a set ${A}$ such that the averages

$\displaystyle \frac 1{a(N)}\sum_{n=N+1}^{N+a(N)}1_A(T^nx)$

fail to converge in a set of positive measure.

For the proof we will need to use the Rokhlin lemma:

Lemma 6 (Rokhlin Lemma) Let ${(X,\mu,T)}$ be an invertible ergodic m.p.s. with ${\mu(\{x\})=0}$ for each ${x\in X}$, let ${\epsilon>0}$ and let ${N\in{\mathbb N}}$ be arbitrary. Then there exists some set ${A\subset X}$ such that ${\{T^nA,n=0,...,N\}}$ are disjoint and

$\displaystyle \mu\left(X\setminus\bigcup_{n=0}^NT^nA\right)<\epsilon$

We can now prove the Theorem 5

Proof: For each ${k\in{\mathbb N}}$, let ${n_k}$ large enough so that ${2^ka(n_k), and let ${D_k}$ be the set provided by the Rokhlin Lemma with ${N=n_k+a(n_k)}$ and ${\epsilon=1/10}$. Let ${\displaystyle B_k=\bigcup_{i=1}^{n_k-a(n_k)}T^iD_k}$ and ${\displaystyle A_k=\bigcup_{i=n_k+1}^{n_k+a(n_k)}T^iD_k}$. We observe that for ${k}$ sufficiently large (so that ${a(n_k)<2/7 n_k}$) we have ${\mu(B_k)>1/2}$ and ${\mu(A_k)<1/2^k}$.

Now making ${A=\bigcup_{k\geq2} A_k}$ we have ${\mu(A)<1/2}$, and making ${1_B=\limsup1_{B_k}}$ we have (using Fatou’s Lemma) that ${\mu(B)>1/2>0}$. With this setup, the result follows if we show that for each ${x\in B}$ we have

$\displaystyle \limsup_{N\rightarrow\infty}\frac1{a(N)}\sum_{n=N+1}^{N+a(N)}1_A(T^nx)=1\ \ \ \ \ (1)$

because if the averages were to converge almost everywhere then that ${\limsup}$ would be ${\lim}$ and by the Dominated convergence Theorem we would have

$\displaystyle \begin{array}{rcl} \displaystyle\frac12&>&\displaystyle\mu(A)=\int_X\frac1{a(N)}\sum_{n=N+1}^{N+a(N)}1_A(T^nx)d\mu= \lim_{N\rightarrow\infty}\int_X\frac1{a(N)}\sum_{n=N+1}^{N+a(N)}1_A(T^nx)d\mu\\&=&\displaystyle \int_X\lim_{N\rightarrow\infty}\frac1{a(N)}\sum_{n=N+1}^{N+a(N)}1_A(T^nx)\geq\int_B\lim_{N\rightarrow\infty}\frac1{a(N)}\sum_{n=N+1}^{N+a(N)}1_A(T^nx)=\mu(B)>\frac12 \end{array}$

which is a contradiction. To show that for ${x\in B}$ we have the equation (1), note that if ${x\in B_k}$ then ${x\in T^iD_k}$ for some ${i\in\{1,...,n_k-a(a_k)\}}$. Then, making ${N=n_k-i}$ we have that ${N\geq a(n_k)}$ and ${T^{N+1}x\in A, T^{N+2}x\in A,..., T^{N+a(n_k)}x\in A}$. In particular (because ${a(n)}$ is non-decreasing) we have that

$\displaystyle \frac1{a(N)}\sum_{n=N+1}^{N+a(N)}1_A(T^nx)=1$

Now if ${x\in B}$, then ${x\in B_k}$ for arbitrarily large ${k}$, and since ${a(n)}$ is unbounded and ${n_k\rightarrow\infty}$ we conclude that for each ${x\in B}$ we can take ${N}$ in the previous equation to be arbitrary large. But that is exactly equation (1). $\Box$

As a final remark we note that more than a mere counter-example for a possible pointwise ergodic theorem for arbitrary Følner sequences, this Theorem shows that such a result fails in every (non-trivial) m.p.s.

This entry was posted in Analysis, Classic results, Ergodic Theory, Tool and tagged , , , , , , , . Bookmark the permalink.

### 6 Responses to The Ergodic Theorem

1. Robert says:

Hi, it’s curious that even though the Birkhoffs ergodic theorem is stated for the non-negative numbers the generalizations are for amenable groups not semigroups. Do you know if semigroups provide extra difficulties or do these results also apply for semigroups?

• Joel Moreira says:

I know that von Neumann’s theorem holds in every discrete countable semigroup (and in more general semigroups but then one needs to worry about the topology).
Lindenstrauss’s sufficient condition on a Følner sequence to be “good” for pointwise convergence uses inverses, so it hints that his result may not hold for semigroups, but perhaps one could adapt his argument to hold on semigroups.
If you are asking about Akcoglu and del Junco’s example, then I think you will find difficulties with the Rokhlin Lemma, even in certain groups.