## Szemerédi Theorem Part VI – Dichotomy between weak mixing and compact extension

This is the sixth and final post in a series about Szemerédi’s theorem. In this post I complete the proof of the Multiple Recurrence Theorem, which I showed in a previous post of this series to be equivalent to Szemerédi’s theorem. The last step missing (cf. third point of Theorem 7 in my previous post) is the following:

Proposition 1 Let ${(X,{\cal B},\mu,T)}$ be a measure preserving system and let ${{\mathcal A}\subset{\mathcal B}}$ be factor. If the extension ${{\cal A}\subset{\cal B}}$ is not weak mixing, then there exists a non-trivial extension ${{\cal A}\subset{\cal D}\subset{\cal B}}$ of which is compact.

The definitions are all precisely given in this earlier post from this series. Most of this post is adapted from the Chapter 7.8 of the book by Einsiedler and Ward.

— 1. Some Lemmas —

First we will need a few lemmas:

Lemma 2 If an extension ${{\cal A}\subset{\cal B}}$ of factors of a measure preserving system ${(X,{\cal B},\mu,T)}$ is not weak mixing, then the relatively independent self joining ${(X\times X,{\cal B}\otimes{\cal B},\lambda,T\times T)}$ is not ergodic.

The converse of this lemma is also true; in other words, it provides an alternative description of weakly mixing extensions. We omit the proof of the converse because we do not need it. One good place to look up the proof of the converse direction is Einsiedler and Ward’s book mentioned above.

Proof: I will use some facts and definitions from this previous post. In particular recall that the measure ${\lambda}$ is defined by

$\displaystyle \int_{X\times X}f(x)g(y)d\lambda(x,y)=\int_X\mathop{\mathbb E}[f\mid{\cal A}]\mathop{\mathbb E}[g\mid{\cal A}]d\mu$

Now assume that ${(X\times X,{\cal B}\otimes{\cal B},\lambda,T\times T)}$ is an ergodic system and take ${f,g\in L^2({\cal B}\mid{\cal A})}$ such that ${\mathop{\mathbb E}[f\mid{\cal A}]=0}$. We need to show (using the notion of uniform Cesàro averages defined in a previous post) that

$\displaystyle UC-\lim|\langle T^nf,g\rangle_{({\cal B}\mid{\cal A})}|=0\qquad\text{ in }L^2({\cal A}) \ \ \ \ \ (1)$

By definition, we have that ${\langle T^nf,g\rangle_{({\cal B}\mid{\cal A})}=\mathop{\mathbb E}[T^nfg\mid{\cal A}]}$. Using the triangular inequality in ${L^2({\mathcal A})}$, (1) will follow from

$\displaystyle UC-\lim\big\|\mathop{\mathbb E}[T^nfg\mid{\mathcal A}]\big\|_{L^2}=0 \ \ \ \ \ (2)$

We can rewrite the norm of the conditional expectation in terms of the relatively independent self joining ${\lambda}$ as

$\displaystyle \big\|\mathop{\mathbb E}[T^nfg\mid{\mathcal A}]\big\|_{L^2}^2=\int_X\mathop{\mathbb E}[T^nfg\mid{\mathcal A}]^2d\mu=\int_{X\times X}(g\otimes g)(T\times T)^n(f\otimes f)d\lambda$

Taking uniform Cesàro limits and applying the ergodic theorem in the relatively independent self joining we conclude

$\displaystyle \begin{array}{rcl} \displaystyle UC-\lim\big\|\mathop{\mathbb E}[T^nfg\mid{\mathcal A}]\big\|_{L^2}^2&=&\displaystyle UC-\lim\int_{X\times X}(g\otimes g)(T\times T)^n(f\otimes f)d\lambda\\&=&\displaystyle \int_{X\times X}(g\otimes g)d\lambda\int_{X\times X}(f\otimes f)d\lambda\\&=&\displaystyle \int_{X\times X}(g\otimes g)d\lambda\int_X\mathop{\mathbb E}[f\mid{\cal A}]^2d\mu \\&=&0 \end{array}$

which finishes the proof. $\Box$

Although I managed to avoid doing it so far, it seems that the most convenient way to prove Proposition 1 is to appeal to the disintegration of measures relative to a factor (see also this post for a description of additional properties of disintegration of measures). The only alternative I know of is presented in this post of Tao.

Definition 3 (Disintegration of a measure) Using the same notation and conventions as in the previous posts in this series, given a factor ${{\mathcal A}\subset{\mathcal B}}$ we define, for almost every ${x\in X}$, the measure ${\mu_x:{\mathcal B}\rightarrow[0,1]}$ by ${\int_Xfd\mu_x=\mathop{\mathbb E}[f\mid{\mathcal A}](x)}$.

Observe that if ${f\in L^2({\mathcal B}\mid{\mathcal A})}$, then the ${L^2}$ norm of ${f}$ with respect to each of the measures ${\mu_x}$ is (almost everywhere) uniformly bounded and, in particular, ${f}$ belongs to almost every space ${L^2(\mu_x)}$.

The relatively independent self joining measure ${\lambda}$ can be decomposed as the integral ${\int_X\mu_x\times\mu_xd\mu(x)}$, in the sense that

$\displaystyle \int_{X\times X}F(z,y)d\lambda(z,y)=\int_X\left(\int_{X\times X}F(z,y)d\mu_x(z)d\mu_x(y)\right)d\mu(x)\ \ \ \ \ (3)$

for any ${F\in L^1(X\times X,\lambda)}$.

— 2. Finding one conditionally compact function —

In this section we show that there exists at least one function ${f\in L^2({\mathcal B}\mid{\mathcal A})}$ which is conditionally compact but is not in ${L^2({\mathcal A})}$ (observe that any function in ${L^2({\mathcal A})}$ is trivially conditionally compact).

Since we are assuming that the extension ${{\mathcal A}\subset{\mathcal B}}$ is not weak mixing, it follows from Lemma 2 that the relatively independent self joining ${(X\times X,{\mathcal B}\otimes{\mathcal B},\lambda,T\times T)}$ is not ergodic. Let ${H\in L^\infty({\mathcal B}\otimes{\mathcal B},\lambda)}$ be a ${T\times T}$-invariant function which is not a constant (${\lambda}$ almost everywhere).

Next, it follows from (3) that ${H\in L^\infty(\mu_x\times\mu_x)}$ for almost every ${x}$ and hence the operator ${\Phi_x:L^2(\mu_x)\rightarrow L^2(\mu_x)}$ given by

$\displaystyle (\Phi_x f)(z)=\int_XH(z,y)f(y)d\mu_x(y)$

is Hilbert-Schmidt and thus compact.

We can glue the ${\Phi_x}$ together to obtain an operator ${\Phi:L^2(\mathcal{B}\mid{\mathcal A})\rightarrow L^2({\mathcal B}\mid{\mathcal A})}$ defined (almost everywhere) by ${(\Phi f)(x)=(\Phi_xf)(x)}$. Since ${H}$ is invariant under ${T\times T}$, it follows that ${\Phi(Tf)=T(\Phi f)}$.

Lemma 4 There exists some ${f\in L^2({\mathcal B}\mid{\mathcal A})}$ such that ${\Phi f}$ is bounded but does not belong to ${L^2({\mathcal A})}$.

Proof: One can write ${H}$ as a (possibly infinite) linear combination of functions of the form ${h_1(x)h_2(y)}$, so we will simply assume that ${H(x,y)=h_1(x)h_2(y)}$. We will first show that ${h_1\notin L^2({\mathcal A})}$. Indeed, if ${h_1\in L^2({\mathcal A})}$, then we claim that ${H(x,y)=H(y,y)}$ in ${L^2(\lambda)}$. To see this, observe that also ${\overline{h_1}\in L^2({\mathcal A})}$ and hence

$\displaystyle \begin{array}{rcl} \displaystyle\int_{X\times X}\big|H(x,y)-H(y,y)\big|^2d\lambda&=&\displaystyle\int_X |h_1|^2\mathop{\mathbb E}\big[|h_2|^2\mid{\mathcal A}\big]-h_1\mathop{\mathbb E}\big[\overline{h_1}|h_2|^2\mid{\mathcal A}\big]\\&&\displaystyle-\ \overline{h_1}\mathop{\mathbb E}\big[h_1|h_2|^2\mid{\mathcal A}\big]+\mathop{\mathbb E}\big[|h_1h_2|^2\mid{\mathcal A}\big]d\mu\\&=&0 \end{array}$

Recall that ${(X\times X,\lambda)}$ is an extension of ${(X,\mu)}$ through the projection in either coordinate. Therefore, the function ${y\mapsto H(y,y)}$ is invariant under ${T}$ and belongs to ${L^2(X,{\mathcal B},\mu)}$. Invoking the ergodicity of ${X}$ we deduce that ${H}$ would have to be constant, which is a contradiction. We conclude that ${h_1\notin L^2({\mathcal A})}$.

Since ${H}$ is not ${0}$ almost everywhere, the set ${P=\{y:h_2(y)\neq0\}}$ has positive measure. For any ${y\in P}$, let ${x\in X}$ such that ${H(x,y)\neq0}$. Then also ${0\neq H(Tx,Ty)=h_1(Tx)h_2(Ty)}$, so ${Ty\in P}$. We conclude that ${P}$ is ${T}$-invariant, so again by ergodicity it follows that ${P}$ has full measure. Finally, choose ${f=1/h_2}$. We deduce that

$\displaystyle (\Phi f)(x)=h_1\mathop{\mathbb E}[fh_2\mid{\mathcal A}]=h_1\notin L^2({\mathcal A})$

As a last step, one can truncate ${f}$ to ensure that ${\Phi f}$ is bounded.

$\Box$

Next, we will show that if ${f}$ is the function given by Lemma 4, then ${\Phi f}$ is conditionally compact. Indeed ${f\in L^2({\mathcal B}\mid{\mathcal A})}$, which implies that ${B:=\big\|\|f\|_{L^2({\mathcal B}\mid{\mathcal A})}\big\|_{L^\infty(\mathcal A)}<\infty}$ or, equivalently, that ${\|f\|_{L^2(\mu_x)}\leq B}$ for almost every ${x}$. Moreover, ${\|Tf\|_{L^2(\mu_x)}=\|f\|_{L^2(\mu_{Tx})}, and we deduce that the set ${\{T^nf:n\in{\mathbb Z}\}}$ is a bounded subset of ${L^2(\mu_x)}$, for almost every ${x\in X}$. Since each ${\Phi_x:L^2(\mu_x)\rightarrow L^2(\mu_x)}$ is compact, we deduce that, for almost every ${x\in X}$, the orbit

$\displaystyle \{T^n(\Phi f):n\in{\mathbb Z}\}=\Phi\big(\{T^nf:n\in{\mathbb Z}\}\big)\subset L^2({\mathcal B},\mu_x)$

is pre-compact. However, the number of functions necessary to ${\epsilon}$-cover the orbit of ${\Phi f}$ in ${L^2(\mu_x)}$ may depend on ${x}$.

Define, for every ${\epsilon>0}$, the function ${M=M_\epsilon:X\rightarrow{\mathbb N}}$ such that ${\big\{\Phi f,T(\Phi f),\dots, T^{M(x)}(\Phi f)\big\}}$ is ${\epsilon}$ dense in the whole orbit with respect to the ${L^2({\mathcal B},\mu_x)}$ norm. In symbols, we have

${M(x):=}$

$\displaystyle \min\Big\{M\in{\mathbb N}:(\forall n\in{\mathbb Z})(\exists j\in[M])\text{ s.t. } \big\|T^n(\Phi f)-T^j(\Phi f)\big\|_{L^2({\mathcal B},\mu_x)}<\epsilon\Big\}$

It is easy to check that ${M}$ is measurable with respect to ${{\mathcal A}}$, and it follows from the compactness of each ${\Phi_x}$ that ${M}$ is almost everywhere finite. Let ${M_0\in{\mathbb N}}$ be such that the set ${A\in{\mathcal A}}$ defined by ${A:=\{x\in X:M(x)\leq M_0\}}$ has positive measure. We will show that the orbit of ${\Phi f}$ in ${L^2({\mathcal B},\mu_x)}$ can be ${\epsilon}$-covered by ${M_0}$ functions, for almost every ${x\in X}$. Indeed, for each ${j=0,\dots,M_0}$, define the function

$\displaystyle g_j(x)=\left\{\begin{array}{ccl}\displaystyle T^j(\Phi f)(x)&\text{ if } & x\in A \\ \displaystyle g_j(T^mx) & \text{ if } & T^mx\in A\text{ but }x,Tx,\dots,T^{m-1}x\notin A\end{array}\right.$

Observe that, by ergodicity, ${g_j}$ is defined almost everywhere. For every ${n\in{\mathbb Z}}$ and every ${x\in X}$ where the ${g_j}$ are defined we have

$\displaystyle \min_{j\in[M_0]}\big\|T^n(\Phi f)-g_j\big\|_{L^2({\cal B},\mu_x)}=\min_{j\in[M_0]}\mathop{\mathbb E}\big[T^{n-m}(\Phi f)-g_j\mid{\cal A}\big](T^mx)\leq\epsilon$

where ${m\in\{0,1,\dots\}}$ is the first hitting time of ${A}$, more precisely, ${m=\min\{i\geq0:T^ix\in A\}}$. We just proved that ${\Phi f}$ is indeed conditionally compact.

— 3. Finishing the proof —

In the previous section we showed that the subspace

$\displaystyle F=\{f\in L^\infty({\mathcal B}):f\text{ is conditionally compact}\}$

is not contained in ${L^2({\mathcal A})}$. We claim that ${F}$ is in fact closed under products. Indeed, let ${f,g\in F}$, let ${\epsilon>0}$ and choose ${f_1,\dots,f_r,g_1,\dots,g_r\in L^2({\mathcal B}\mid{\mathcal A})}$ such that for every ${n\in{\mathbb Z}}$ we have

$\displaystyle \Big\|\min_{1\leq t\leq r}\|T^nf-f_t\|_{L^2({\mathcal B}\mid{\mathcal A})}\Big\|_{L^\infty}<\frac\epsilon{\|g\|_{L^\infty}}\quad \text{ and }\quad \Big\|\min_{1\leq t\leq r}\|T^ng-g_t\|_{L^2({\mathcal B}\mid{\mathcal A})}\Big\|_{L^\infty}<\frac\epsilon{\|f\|_{L^\infty}}$

Observe that truncating each ${f_t}$ by ${\|f\|_{L^\infty}}$ does not alter the above inequality, so we will simply assume that ${\|f_t\|_{L^\infty}\leq\|f\|_{L^\infty}}$. Therefore for each ${n\in{\mathbb Z}}$, choosing ${t}$ and ${s}$ appropriately, we have

$\displaystyle \begin{array}{rcl} \displaystyle \big\|T^n(fg)-f_tg_s\big\|_{({\mathcal B}\mid{\mathcal A})}(x)&\leq&\displaystyle \big\|T^n(fg)-f_tT^ng\big\|_{({\mathcal B}\mid{\mathcal A})}(x)+\big\|f_tT^ng-f_tg_s\big\|_{({\mathcal B}\mid{\mathcal A})}(x)\\&\leq\displaystyle & \|T^ng\|_{L^\infty}\|T^nf-f_t\|_{L^2({\mathcal B}\mid{\mathcal A})}+\|f_t\|_{L^\infty}\|T^ng-g_s\|_{L^2({\mathcal B}\mid{\mathcal A})}\\&\leq&2\epsilon \end{array}$

We just showed that ${F}$ is an algebra. Next let

$\displaystyle {\mathcal D}:=\{D\in{\mathcal B}:1_D\in\bar F\}$

Since ${F}$ is an algebra and ${\bar F}$ is closed, the set ${{\mathcal D}}$ is a ${\sigma}$-algebra and ${F\cap L^2({\mathcal D})}$ is dense in ${L^2({\mathcal D})}$.

The last step is to prove that ${F\subset L^\infty({\mathcal D})}$. Let ${f\in F}$ and let ${D=\{x\in X:f(x) for some ${a\in{\mathbb R}}$. By Weierstrass’ approximation theorem, the function ${1_{(-\infty,a)}\in L^2({\mathbb R},f_*\mu)}$ (where ${f_*\mu}$ is the pushforward measure of ${\mu}$ through ${f}$) can be approximated by a polynomial (even though ${1_{(-\infty,a)}}$ is not continuous, it has only one point of discontinuity, so it can be altered in a set of arbitrarily small measure to become continuous). This means that ${1_D}$ can be approximated by ${p\circ f}$ for the same polynomial ${p\in{\mathbb R}[x]}$. Since ${F}$ is an algebra, ${p\circ f\in F}$ and so ${1_D\in \bar F}$ and hence ${f\in L^\infty({\mathcal D})}$

Finally, note that ${F}$ is invariant under ${T}$, thus ${{\mathcal D}}$ is invariant under ${T}$ and therefore it is a compact extension of ${{\mathcal A}}$. This finishes the proof of Proposition 1, and hence of the Multiple recurrence theorem of Furstenberg, and hence of Szemerédi’s theorem.