## Sated extensions

I recently learned of a promising technique in ergodic Ramsey theory that is useful to establish multiple recurrence and convergence of nonconventional ergodic averages. The trick is to reduce the general statement to certain systems, called sated systems.

A common trick in ergodic theory, and indeed in other areas in mathematics, is to decompose a function (or the ${L^2}$ space) as a sum of a structured component and a random component. The idea is then to show that averages of the orbit of the random component converge to ${0}$, while the orbit of the structured component can be analyzed using the structure it has. What is the random and the structured component will depend on the particular problem at hands, but intuitively, the “more random” we choose the random component, the easier it becomes to establish convergence to ${0}$, but the less structure we obtain for the structured component, and hence the harder it is to deal with it. In this vague level, sated systems are such that low structure implies higher structure (or, equivalently, low randomness implies higher randomness).

The key observation is that every measure preserving system has a sated extension. In other words, for any system there exists an extension (i.e. another more complex system which contains the original system in a certain way) which is sated.

These ideas were first developed by Tim Austin in order to give a proof of convergence for the nonconventional ergodic averages associated with commuting measure preserving transformations (the first proof of that result, due to Tao, was finitely in nature). Later Host managed to improve Austin’s proof in the sense that the sated extension used was more concretely constructed (to be precise, the extension in Host’s work is a finite power of the original system). This may be useful to gain some information about the limit. Eventually, Austin was able to derive the multidimensional Szemeredi theorem using sated extensions (together with an infinitary removal lemma) and then also the density Halles-Jewett theorem, where the word sated is used for the first time (that I am aware of. Before this, sated extensions have been called pleasant, magical or isotropized).

This trick came to my attention because it was recently used (among other things) by Austin to give a proof of a generalization of Szemeredi’s theorem for amenable groups. Also, at the same time, Qing Chu and Pavel Zorin-Kranich used this idea to extend a result on large recurrence times for large intersections of commuting actions of amenable groups (both pre-prints appeared on the arXiv in the same week).

In this post I will use the idea to pass to a sated extension to deduce the convergence of nonconventional ergodic averages that arise when studying the ergodic Roth theorem. This convergence was first establish by Furstenberg in the same paper where he gave an ergodic theoretical proof of Szemeredi’s theorem. The proof using sated extensions is a very particular case of the work of Austin. I will follow the approach in Austin’s thesis.

Theorem 1 Let ${(X,{\cal B},\mu,T)}$ be a measure preserving system and let ${f\in L^\infty(X)}$. Then the limit

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NT^nfT^{2n}f$

exists in ${L^2}$.

— 1. Sated extensions —

Instead of defining a general notion of sated systems, I will restrict this post to the notions needed to prove Theorem 1. The purpose is to illustrate the main ideas behind how this technique can be applied.

Definition 2 (${{\mathbb Z}^2}$-system) By a ${{\mathbb Z}^2}$-system we mean a probability space ${(X,{\cal B},\mu)}$ together with two commuting measure preserving transformations ${T}$ and ${S}$. We represent this as ${{\bf X}=(X,{\cal B},\mu,T,S)}$.

Definition 3 (Factor and extension) Let ${{\bf X}=(X,{\cal B},\mu,T_1,T_2)}$ and ${{\bf Y}=(Y,{\cal C},\nu,S_1,S_2)}$ be two ${{\mathbb Z}^2}$-systems and let ${\pi:Y\rightarrow X}$ be a map such that ${\nu\big(\pi^{-1}(B)\big)=\nu(B)}$ for every ${B\in{\cal B}}$ and ${\pi\circ S_i=T_i\circ\pi}$ for each ${i\in\{1,2\}}$.

Then we say that ${{\bf X}}$ is a factor of ${{\bf Y}}$ or (equivalently) that ${{\bf Y}}$ is an extension of ${{\bf X}}$. To be precise, the factor (or the extension) is the system ${{\bf X}}$ (or the system ${{\bf Y}}$) together with the map ${\pi}$.

As is usual in the study of multiple recurrence and convergence of non-conventional averages, factors and extensions play an important role in this post. We can associate a factor ${{\bf X}=(X,{\cal B},\mu,T_1,T_2)}$ of a system ${{\bf Y}=(Y,{\cal C},\nu,S_1,S_2)}$ with the system ${(Y,\pi^{-1}({\cal B}),\nu,S_1,S_2)}$. Thus we can think of factors of ${{\bf Y}}$ as ${\sigma}$-subalgebras of ${{\cal C}}$ that are globally invariant under ${S_1}$ and ${S_2}$.

Moreover, a factor ${\pi:{\bf Y}\rightarrow{\bf X}}$ induces an inclusion ${L^2(X)\subset L^2(Y)}$ by associating a function ${f\in L^2(X)}$ with ${f\circ\pi\in L^2(Y)}$. If ${f\in L^2(Y)}$ we can consider orthogonal projection onto the space ${L^2(X)}$ using the conditional expectation operator ${\mathop{\mathbb E}[f\mid\pi^{-1}({\cal B})]}$, where ${{\cal B}}$ is the ${\sigma}$-algebra in the system ${{\bf X}}$. We may also represent this by the (intuitive) notation ${\mathop{\mathbb E}[f\mid{\bf X}]}$. The trivial fact that the projection of a vector already in the image subspace is the vector itself can be stated as

$\displaystyle \mathop{\mathbb E}[f\circ\pi\mid{\bf X}]=f\circ\pi\qquad\forall f\in L^2(X)$

We will frequently associate ${\sigma}$-algebras with the subspaces of ${L^2}$ they induce. Thus, for instance we can write ${f\in {\cal D}}$ to denote that ${f\in L^2({\cal D})}$, or ${f\perp{\cal D}}$ to mean that for every ${g\in L^2({\cal D})}$ we have ${\langle f,g\rangle=0}$.

If ${{\cal B}_1}$ and ${{\cal B}_2}$ are two ${\sigma}$-algebras in some space ${X}$, we denote by ${{\cal B}_1\vee{\cal B}_2}$ the ${\sigma}$-algebra generated by the union ${{\cal B}_1\cup{\cal B}_2}$.

Definition 4 We will denote by ${\mathfrak C}$ the class of ${{\mathbb Z}^2}$-systems ${{\bf X}=(X,\mathcal{B},\mu,T,S)}$ such that ${\mathcal B=\mathcal I_T \vee \mathcal I_{T=S}}$, where ${\mathcal I_T:=\{B\in\mathcal B:T^{-1}B=B\}}$ is the ${\sigma}$-algebra of ${T}$-invariant sets and ${\mathcal I_{T=S}:=\{B\in\mathcal B:T^{-1}B=S^{-1}B\}}$ is the ${\sigma}$-algebra of ${TS^{-1}}$-invariant sets.

Given a ${{\mathbb Z}^2}$-system ${{\bf X}=(X,{\cal B},\mu,T,S)}$, we denote by ${{\mathfrak C}{\bf X}}$ the ${\sigma}$-algebra ${{\mathfrak C}{\bf X}:=\mathcal I_T \vee \mathcal I_{T=S}}$. In other words ${{\mathfrak C}{\bf X}}$ is the largest ${\sigma}$-subalgebra ${{\cal C}}$ of ${{\cal B}}$ such that the system ${(X,{\cal C},\mu,T,S)}$ is in the class ${\mathfrak C}$.

We can now define what a sated system for our situation will be.

Definition 5 (Sated system) A ${{\mathbb Z}^2}$ system ${{\bf X}=(X,{\cal B},\mu,T_1,T_2)}$ is sated for the class ${\mathfrak C}$ if for any extension ${\pi:{\bf Y}\rightarrow{\bf X}}$ (where ${{\bf Y}=(Y,{\cal C},\nu,S_1,S_2)}$) and any function ${f\in L^2(X,{\cal B})}$ we have

$\displaystyle \mathop{\mathbb E}[f\mid{\mathfrak C}{\bf X}]\circ\pi=\mathop{\mathbb E}[f\circ\pi\mid{\mathfrak C}{\bf Y}]$

A way to think about sated extensions is in terms of the ${L^2}$ spaces. As mentioned above, an extension ${\pi:(Y,{\cal C},\nu,S_1,S_2)\rightarrow(X,{\cal B},\mu,T_1,T_2)}$ induces the inclusion (which by abuse of language we denote by the same letter) ${\pi:L^2(X,{\cal B})\rightarrow L^2(Y,{\cal C})}$. It is clear that ${\pi\big(L^2(X,{\mathfrak C}{\bf X})\big)\subset L^2(Y,{\mathfrak C}{\bf Y})}$. Thus we have the trivial inclusion:

$\displaystyle \pi\big(L^2(X,{\mathfrak C}{\bf X})\big)\subset\pi\big(L^2(X,{\cal B})\big)\cap L^2(Y,{\mathfrak C}{\bf Y})$

The system ${{\bf X}}$ is sated for the class ${\mathfrak C}$ if and only if for every extension ${\pi:{\bf Y}\rightarrow{\bf X}}$ we also have the reversed inclusion:

$\displaystyle \pi\big(L^2(X,{\mathfrak C}{\bf X})\big)=\pi\big(L^2(X,{\cal B})\big)\cap L^2(Y,{\mathfrak C}{\bf Y})$

Going back to the philosophical discussion in the beginning of the post, if a function ${f}$ is in the space ${\pi\big(L^2(X,{\cal B})\big)\cap L^2(Y,{\mathfrak C}{\bf Y})}$ (which means that ${f}$ has a certain structure) and the system ${{\bf X}}$ is sated, we can deduce that actually ${f}$ belongs to the apriori smaller (and hence more structured) subspace ${\pi\big(L^2(X,{\mathfrak C}{\bf X})\big)}$.

Theorem 6 For every ${{\mathbb Z}^2}$-system ${{\bf X}}$ there exists an extension ${\pi:{\bf Y}\rightarrow{\bf X}}$ such that ${{\bf Y}}$ is sated for the class ${\mathfrak C}$.

— 2. Proof of Theorem 1

The idea of the proof is to pass to a sated extension where most of the work is done. This is the content of the following lemma:

Lemma 7 Let ${(X,\mathcal{B},\mu)}$ be a probability space and let ${T,S}$ be commuting measure preserving transformations. Assume that the ${{\mathbb Z}^2}$-system ${{\bf X}:=(X,\mathcal{B},\mu,T,S)}$ is sated for the class ${\mathfrak{C}}$. Then for any ${f_1,f_2\in L^\infty(X,\mathcal{B})}$ such that ${f_1\perp \mathfrak C{\bf X}}$ we have

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NT^nf_1S^nf_2=0$

in the ${L^2}$ norm.

Proof: We will apply the van der Corput trick. Let ${a_n=T^nf_1S^nf_2\in L^2(X,\mathcal{B})}$. For each ${h\in{\mathbb N}}$ we have

$\displaystyle \langle a_{n+h},a_n\rangle=\int_XT^{n+h}f_1\cdot S^{n+h}f_2\cdot T^nf_1\cdot S^nf_2d\mu$

Observe that ${T^{n+h}f=T^nT^hf}$ for any ${f\in L^2(X)}$. Also, ${S(f\cdot f')=Sf\cdot Sf'}$ for any ${f,f'\in L^2(X)}$. Thus we can pull out ${S^n}$ from the form terms inside the integral to get

$\displaystyle \langle a_{n+h},a_n\rangle=\int_XS^n\Big(S^{-n}T^nT^hf_1\cdot S^hf_2\cdot S^{-n}T^nf_1\cdot f_2\Big)d\mu$

Since ${S}$ preserves the measure, ${\int_XSfd\mu=\int_Xfd\mu}$ for any ${f\in L^1(X,{\cal B})}$. Thus we get

$\displaystyle \begin{array}{rcl} \displaystyle\langle a_{n+h},a_n\rangle&=&\displaystyle\int_XS^{-n}T^nT^hf_1\cdot S^hf_2\cdot S^{-n}T^nf_1\cdot f_2d\mu\\&=&\displaystyle\int_X\left(TS^{-1}\right)^nT^hf_1\cdot S^hf_2\cdot\left(TS^{-1}\right)^nf_1\cdot f_2d\mu\\&=&\displaystyle\int_X\left(TS^{-1}\right)^n(T^hf_1\cdot f_1)\cdot\left(S^hf_2\cdot f_2\right)d\mu \end{array}$

Taking the Cesaro limit as ${n\rightarrow\infty}$ and applying the mean ergodic theorem to the measure preserving transformation ${TS^{-1}}$ we get

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\langle a_{n+h},a_n\rangle=\int_X\mathop{\mathbb E}\left[T^hf_1\cdot f_1\mid\mathcal I\right]\cdot\left(S^hf_2\cdot f_2\right)d\mu$

where ${{\cal I}:=\{B\in{\cal B}:(TS^{-1})B=B\}}$ is the ${\sigma}$-algebra of the ${(TS^{-1})}$-invariant sets. Since the conditional expectation operator ${\mathop{\mathbb E}[\ \bullet\mid\mathcal I]:L^2(X,{\cal B})\rightarrow L^2(X,{\cal I})}$ is an orthogonal projection, we have that ${\langle \mathop{\mathbb E}[f\mid\mathcal I],f'\rangle=\langle \mathop{\mathbb E}[f\mid\mathcal I],\mathop{\mathbb E}[f'\mid\mathcal I]\rangle}$ for any ${f,f'\in L^2(X,{\cal B})}$. In particular we have

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\langle a_{n+h},a_n\rangle=\int_X\mathop{\mathbb E}\left[T^hf_1\cdot f_1\mid\mathcal I\right]\cdot\mathop{\mathbb E}\left[S^hf_2\cdot f_2\mid\mathcal I\right]d\mu\ \ \ \ \ (1)$

Now define the ${{\mathbb Z}^2}$-system ${{\bf Y}=(Y,{\cal C},\nu,\tilde T,\tilde S)}$ where the underlying space is ${Y=X\times X}$, the ${\sigma}$-algebra is ${{\cal C}={\cal B}\otimes{\cal B}}$, the measure ${\nu}$ is defined by

$\displaystyle \nu(A\times B)=\int_X\mathop{\mathbb E}[1_A\mid{\cal I}]\mathop{\mathbb E}[1_B\mid{\cal I}]d\mu\qquad\forall A,B\in{\cal B}$

(this defines a measure in ${(Y,{\cal C})}$ by the Hanh-Kolmogorov theorem) and the transformations are ${\tilde T(x,y)=(Tx,Sy)}$ and ${\tilde S=(Sx,Sy)}$. We need to check that both ${\tilde T}$ and ${\tilde S}$ are measure preserving. Since ${{\cal C}={\cal B}\otimes{\cal B}}$, it suffices to check that ${\nu\big(\tilde T^{-1}(A\times B)\big)=\nu\big(\tilde S^{-1}(A\times B)\big)=\nu(A\times B)}$ for every ${A,B\in{\cal B}}$. Indeed we have for each ${A,B\in{\cal B}}$:

$\displaystyle \begin{array}{rcl} \displaystyle\nu\big(\tilde T^{-1}(A\times B)\big)&=&\displaystyle\nu\big(T^{-1}A\times S^{-1}B\big)=\int_X\mathop{\mathbb E}[T1_A\mid{\cal I}]\cdot\mathop{\mathbb E}[S1_B\mid{\cal I}]d\mu\\&=&\displaystyle\int_XT1_A\cdot\mathop{\mathbb E}[S1_B\mid{\cal I}]d\mu\\&=&\displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\int_XT1_A\cdot\big(TS^{-1}\big)^n(S1_B)d\mu\\&=&\displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\int_XT\Big(1_A\cdot\big(TS^{-1}\big)^{n-1}(1_B)\Big)d\mu\\&=&\displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=0}^{N-1}\int_X1_A\cdot\big(TS^{-1}\big)^n(1_B)d\mu=\int_X1_A\cdot\mathop{\mathbb E}[1_B\mid{\cal I}]d\mu\\&=&\displaystyle\nu(A\times B) \end{array}$

To prove that ${\tilde S}$ preserves ${\nu}$ we will use the observation that ${TS^{-1}f=f}$ for every function ${f\in L^2(X,{\cal I})}$. We have

$\displaystyle \begin{array}{rcl} \displaystyle\nu\big(\tilde S^{-1}(A\times B)\big)&=&\displaystyle\nu\big(S^{-1}A\times S^{-1}B\big)=\int_X\mathop{\mathbb E}[S1_A\mid{\cal I}]\cdot\mathop{\mathbb E}[S1_B\mid{\cal I}]d\mu\\&=&\displaystyle\int_XS1_A\cdot TS^{-1}\mathop{\mathbb E}[S1_B\mid{\cal I}]d\mu\\&=&\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\int_XS1_A\cdot\big(TS^{-1}\big)^n(S1_B)d\mu\\&=&\displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\int_XS\Big(1_A\cdot\big(TS^{-1}\big)^n(1_B)\Big)d\mu\\&=&\displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\int_X1_A\cdot\big(TS^{-1}\big)^n(1_B)d\mu=\int_X1_A\cdot\mathop{\mathbb E}[1_B\mid{\cal I}]d\mu\\&=&\displaystyle\nu(A\times B) \end{array}$

This concludes the proof that ${(Y,{\cal C},\nu,\tilde T,\tilde S)}$ is a ${{\mathbb Z}^2}$-system. We can now rewrite equation (1) as

$\displaystyle \begin{array}{rcl} \displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\langle a_{n+h},a_n\rangle&=&\displaystyle\int_Y(T^hf_1\cdot f_1)\otimes(S^hf_2\cdot f_2)d\nu\\&=&\displaystyle\int_Y(T^hf_1\otimes S^hf_2)\cdot(f_1\otimes f_2)d\nu\\&=&\displaystyle\int_Y\tilde T^h(f_1\otimes f_2)\cdot(f_1\otimes f_2)d\nu\end{array}$

Now taking the Cesaro limit as ${h\rightarrow\infty}$ we obtain

$\displaystyle \lim_{H\rightarrow\infty}\frac1H\sum_{h=1}^H\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\langle a_{n+h},a_n\rangle=\int_Y\mathop{\mathbb E}\big[f_1\otimes f_2\mid\tilde{\cal I}\big]\cdot(f_1\otimes f_2)d\nu$

where ${\tilde{\cal I}=\{C\in{\cal C}:\tilde TC=C\}}$ is the ${\sigma}$-algebra of sets left invariant under ${\tilde T}$. Now observe that ${f_1\otimes f_2=(1\otimes f_2)\cdot(f_1\otimes 1)}$, where ${1}$ denotes the constant function ${1}$ on ${X}$. Thus we can rewrite the previous equation as

$\displaystyle \lim_{H\rightarrow\infty}\frac1H\sum_{h=1}^H\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\langle a_{n+h},a_n\rangle=\int_Y\mathop{\mathbb E}\big[f_1\otimes f_2\mid\tilde{\cal I}\big]\cdot(1\otimes f_2)\cdot(f_1\otimes1)d\nu\ \ \ \ \ (2)$

We have, ${\big(\tilde S^{-1}\tilde T\big)(1\otimes f_2)=(ST1)\otimes(S^{-1}Sf_2)=1\otimes f_2}$. Thus ${1\otimes f_2}$ is invariant under ${\tilde S^{-1}\tilde T}$.

Recall that ${\mathfrak C{\bf Y}}$ denotes the largest factor of ${{\bf Y}}$ in the class ${\mathfrak C}$. Thus both functions ${\mathop{\mathbb E}\big[f_1\otimes f_2\mid\tilde{\cal I}\big]}$ and ${1\otimes f_2}$ (and hence their product) are in ${\mathfrak C{\bf Y}}$. We deduce that the right hand side of equation (2) becomes

$\displaystyle \int_Y\mathop{\mathbb E}\big[f_1\otimes f_2\mid\tilde{\cal I}\big]\cdot(1\otimes f_2)\cdot\mathop{\mathbb E}[f_1\otimes1\mid\mathfrak C{\bf Y}]d\nu$

We will show that ${\mathop{\mathbb E}[f_1\otimes1\mid\mathfrak C{\bf Y}]=0}$, which will imply that the integral vanishes and hence by the van der Corput trick, we conclude that

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Na_n=\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NT^nf_1S^nf_2=0$

in ${L^2}$ as desired.

To prove that ${\mathop{\mathbb E}[f_1\otimes1\mid\mathfrak C{\bf Y}]=0}$ we claim that ${\pi:Y\rightarrow X}$ defined by ${\pi(x,y)=x}$ induces a factor map ${\pi:{\bf Y}\rightarrow{\bf X}}$ of ${{\mathbb Z}^2}$-systems. Indeed, let ${B\in{\cal B}}$. Then

$\displaystyle \nu(\pi^{-1}B)=\nu(B\times X)=\int_X\mathop{\mathbb E}[1_B\mid{\cal I}]d\mu=\mu(B)$

and hence ${\pi}$ is measure preserving. Also, if ${(x,y)\in Y}$ we have ${\pi\big(\tilde T(x,y)\big)=\pi(Tx,Sy)=Tx=T\big(\pi(x,y)\big)}$ and ${\pi\big(\tilde S(x,y)\big)=\pi(Sx,Sy)=Sx=S\big(\pi(x,y)\big)}$, hence both diagrams

commute. Now is the point where we use the fact that ${{\bf X}}$ is sated for the class ${\mathfrak C}$. Observe that ${f_1\otimes1=f_1\circ\pi}$ and hence by satedness (and then the hypothesis on ${f}$) we conclude that

$\displaystyle \mathop{\mathbb E}[f_1\otimes1\mid\mathfrak C{\bf Y}]=\mathop{\mathbb E}[f_1\mid\mathfrak C{\bf X}]\circ\pi=0$

$\Box$

We can finally prove Theorem 1:

Proof: Let ${T}$ be a measure preserving transformation on the probability space ${(X,{\cal B},\mu)}$. Let ${{\bf X}}$ be the ${{\mathbb Z}^2}$ system defined by ${{\bf X}=(X,{\cal B},\mu,T,T^2)}$ and let ${\pi:{\bf Y}=(Y,{\cal C},\nu,S_1,S_2)\rightarrow{\bf X}}$ be an extension of the ${{\mathbb Z}^2}$-system. Let ${f\in L^\infty({\bf X})}$ be arbitrary. Observe that for every ${N\in{\mathbb N}}$ we have

$\displaystyle \frac1N\sum_{n=1}^NT^nf\cdot T^{2n}f=\mathop{\mathbb E}\left[\left.\frac1N\sum_{n=1}^NS_1^n(f\circ\pi)\cdot S_2^n(f\circ\pi)\ \right|\,{\bf X}\right]$

Thus the result will follow if we prove that for every ${g\in L^\infty({\bf Y})}$ the limit

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NS_1^ng\cdot S_2^ng$

exists in ${L^2({\bf Y})}$.

By Theorem 6 we can take ${{\bf Y}}$ to be sated for the class ${{\mathfrak C}}$. Let ${g\in L^\infty({\bf Y})}$ and assume (without loss of generality) that ${\|g\|_{L^\infty}=1}$. Decompose ${g=g_1+g_2}$, where ${g_2\in{\mathfrak C}{\bf Y}}$ and ${g_1\perp{\mathfrak C}{\bf Y}}$. It follows from Lemma 7 that

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NS_1^ng_1\cdot S_2^ng=0$

so it suffices to show that the limit

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NS_1^ng_2\cdot S_2^ng$

exists. Since ${g_2\in{\mathfrak C}{\bf Y}={\cal I}_{S_1}\vee{\cal I}_{S_1=S_2}}$, for every ${\epsilon>0}$ there exist functions ${f_1,\dots,f_k\in L^\infty({\cal I}_{S_1})}$ and ${h_1,\dots h_k\in L^\infty({\cal I}_{S_1=S_2})}$ such that

$\displaystyle \left\|g_2-\sum_{i=1}^kf_ih_i\right\|_{L^2}<\epsilon\ \ \ \ \ (3)$

For each ${i\in\{1,\dots,k\}}$ we have that ${S_1^n(f_i\cdot h_i)=S_1^nf_i\cdot S_1^nh_i=f_i\cdot S_2^nh_i}$. Thus for every ${N\in{\mathbb N}}$ we have

$\displaystyle \frac1N\sum_{n=1}^NS_1^n(f_i\cdot h_i)\cdot S_2^ng=\frac1N\sum_{n=1}^Nf_i\cdot S_2^n(h_i\cdot g)=f_i\frac1N\sum_{n=1}^NS_2^n$

Observe that the last expression converges in ${L^2}$ as ${N\rightarrow\infty}$ by the mean ergodic theorem. Thus we have that

$\displaystyle \ell_i:=\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NS_1^n(f_i\cdot h_i)\cdot S_2^ng\ \ \ \ \ (4)$

exist for each ${i}$. Finally observe that by the generalized Holder inequality:

$\displaystyle \begin{array}{rcl} \displaystyle\left\|S_1^ng_2\cdot S_2^ng-\sum_{i=1}^kS_1^nf_ih_i\cdot S_2^ng\right\|_{L^2}&=&\displaystyle\left\|S_1^n\left(g_2-\sum_{i=1}^kf_ih_i\right)\cdot S_2^ng\right\|_{L^2}\\&=&\displaystyle\left\|\left(g_2-\sum_{i=1}^kf_ih_i\right)\cdot (S_1^{-1}S_2)^ng\right\|_{L^2}\\&\leq&\displaystyle\left\|g_2-\sum_{i=1}^kf_ih_i\right\|_{L^2}\cdot \left\|(S_1^{-1}S_2)^ng\right\|_{L^\infty}\\&<&\displaystyle\epsilon\|g\|_{L^\infty}=\epsilon\end{array}$

We can now finish the proof. Fix ${\epsilon>0}$ and let ${f_1,\dots,f_k,h_1,\dots,h_k}$ be as before and satisfying (3). Let ${\ell_i}$ be defined by (4) for each ${i\in\{1,\dots,k\}}$ and let ${N\in{\mathbb N}}$ be large enough so that

$\displaystyle \left\|\ell_i-\frac1N\sum_{n=1}^NS_1^n(f_i\cdot h_i)\cdot S_2^ng\right\|_{L^2}<\frac\epsilon k\qquad\forall i\in\{1,\dots,k\}$

Thus we have:

$\displaystyle \begin{array}{rcl} \displaystyle\left\|\frac1N\sum_{i=1}^NS_1^ng_2\cdot S_2^ng-\sum_{i=1}^k\ell_i\right\|&\leq&\displaystyle\left\|\frac1N\sum_{i=1}^NS_1^ng_2\cdot S_2^ng-\sum_{i=1}^k\frac1N\sum_{n=1}^NS_1^n(f_i\cdot h_i)\cdot S_2^ng\right\|_{L^2}+\epsilon\\&\leq&\displaystyle \frac1N\sum_{i=1}^N\left\|S_1^ng_2\cdot S_2^ng-\sum_{i=1}^kS_1^nf_ih_i\cdot S_2^ng\right\|_{L^2}+\epsilon\\&\leq&\displaystyle2\epsilon\end{array}$

$\Box$