## Proof of Roth’s Theorem using Ergodic Theory

In 1975 Szemerédi proved what is now known as Szemerédi’s theorem on arithmetic progressions, answering an old question by Erdös and Turán. Twenty years earlier Roth had proved the (much simpler) case of arithmetic progressions of length 3.

Theorem 1 (Roth) Let ${E\subset{\mathbb Z}}$ with positive upper density:

$\displaystyle \overline d(E):=\limsup_{n\rightarrow\infty}\frac{|E\cap\{-n,-n+1,...,n-1,n\}|}{2n+1}>0$

Then there exists an arithmetic progression of length ${3}$ in ${E}$.

In 1977 Furstenberg gave a new proof of Szemerédi’s theorem using ergodic theory. Thus he also found a new proof of Roth’s theorem. In this post I will present an ergodic theoretical proof of Roth’s theorem, which sheds some light on the ergodic theoretical proof of the full Szemerédi’s theorem. This proof does not quite follows Furstenberg’s original argument, and is more close to a second (somewhat different) ergodic theoretical proof by Furstenberg, Katznelson and Ornstein. Other sources where this proof is presented include the chapter 7 of the recent book of Einsiedler and Ward, the section 4 of this Bergelson’s survey and this exposition by Zhao.

In this post I will focus on the technical aspects of the proof and my goal is to write a short but complete proof, assuming only some basic knowledge on ergodic theory. For all the (vast and important) motivational aspects of the proof I recommend any of the above sources.

Also, to keep things short, I will not prove the so-called Koopman-von Neumann decomposition (Theorem 8 bellow) which is a fundamental piece of this proof, I plan to write a post soon about that (there are also other not so important ergodic theoretical results which I don’t prove here, such as the Ergodic Decomposition, Theorem 4 bellow or the mean ergodic theorem, I plan to also address this results in a future post)

— 1. Some reductions —

The first step is to transform Theorem 1 into a statement on ergodic theory. This was achieved by Furstenberg using his Correspondence Principle. On this post, by a measure preserving system I mean a quadruple ${(X,{\cal B},\mu,T)}$ where ${(X,{\cal B})}$ is a Borel space (i.e. ${X}$ is a Borel subset of a compact metric space and ${{\cal B}}$ is the restriction to ${X}$ of the Borel ${\sigma}$-algebra), ${\mu:{\cal B}\rightarrow[0,1]}$ is a probability measure and ${T:X\rightarrow X}$ is an invertible measure preserving transformation, i.e., for any ${A\in{\cal B}}$ we have ${\mu(TA)=\mu(A)}$ (usually one does not assume invertibility, but since the system that arises from the correspondence principle as stated is invertible I only need to deal with invertible maps).

Theorem 2 (Furstenberg’s Correspondence Principle) Let ${E\subset{\mathbb Z}}$ be a subset with positive upper density. Then there exists a measure preserving system ${(X,{\cal B},\mu,T)}$ and a set ${A\in{\cal B}}$ with ${\mu(A)=\bar d(E)}$ and such that for all ${n_1,...,n_k\in{\mathbb Z}}$ we have

$\displaystyle \mu(T^{-n_1}A\cap T^{-n_2}A\cap...\cap T^{-n_k}A)\leq\bar d((E-n_1)\cap(E-n_2)\cap...\cap(E-n_k))$

I presented a proof of this Theorem in my previous post in a somewhat more general way.

Using this Correspondence Principle, Theorem 1 becomes (I prove a stronger implication formally at the end of this section):

Theorem 3 Let ${(X,{\cal B},\mu,T)}$ be a measure preserving system and ${A\in{\cal B}}$ with ${\mu(A)>0}$. Then there exists ${n\in{\mathbb N}}$ such that ${\mu(A\cap T^{-n}A\cap T^{-2n}A)>0}$.

Another important reduction is to assume that the system is ergodic, this means roughly that there are no invariant subsystems. More precisely a measure preserving system ${(X,{\cal B},\mu,T)}$ is ergodic if any set ${A\in{\cal B}}$ which is invariant (i.e. ${T^{-1}A=A}$) we have either ${\mu(A)=0}$ or ${\mu(A)=1}$. To be able to reduce to the case of an ergodic system one can use the Ergodic Decomposition:

Theorem 4 (Ergodic Decomposition) Let ${(X,{\cal B},\mu,T)}$ be a measure preserving system. Then there is a Borel probability space ${(Y,{\cal B}_Y,\nu)}$ and a measurable map ${y\mapsto\mu_y}$ such that the quadruple ${(X,{\cal B},\mu_y,T)}$ forms an ergodic measure preserving system for ${\nu}$-a.e. ${y\in Y}$ and such that for any ${A\in{\cal B}}$ one has ${\mu(A)=\int_Y\mu_y(A)d\nu(y)}$.

I will not prove this Theorem here, as it is a basic tool in Ergodic Theory with a somewhat technical proof. For reference it is Theorem 6.2 on Einsiedler and Ward’s book.

Putting everything together one can now reduce Theorem 1 to the following:

Theorem 5 Let ${(X,{\cal B},\mu,T)}$ be an ergodic measure preserving system and let ${A\in{\cal B}}$ have positive measure. Then there exists ${n\in{\mathbb N}}$ such that ${\mu(A\cap T^{-n}A\cap T^{-2n}A)>0}$.

I now show how one can use the Correspondence Principle and the Ergodic Decomposition to deduce theorem 1 from Theorem 5:

Proof: (of Theorem 1, assuming Theorem 5) Let ${E\subset{\mathbb Z}}$ have positive upper density. Let ${(X,{\cal B},\mu,T)}$ be the measure preserving system one gets by applying the Correspondence Principle (Theorem 2) and let ${A\in{\cal B}}$ be the set. Now let ${(Y,{\cal B}_Y,\nu)}$ be the probability space one gets from the Ergodic Decomposition (Theorem 4).

Let ${Y'\subset Y}$ be the set ${\{y:\mu_y(A)>0\}}$. We have ${\nu(Y')>0}$. Applying Theorem 5 to the system ${(X,{\cal B},\mu_y,T)}$ (for some ${y\in Y'}$ that makes the system ergodic) and the set ${A}$ we get some ${n=n_y\in{\mathbb N}}$ such that ${\mu_y(A\cap T^{-n}A\cap T^{-2n}A)>0}$. Let ${m\in{\mathbb N}}$ be such that the set ${Y_m:=\{y\in Y':n_y=m\}}$ has ${\nu(Y_m)>0}$. Such ${m}$ exists because ${Y'}$ is the countable union of the sets ${Y_m}$ with ${m=1,2,...}$. Now let ${\epsilon>0}$ be such that the set ${Y_{m,\epsilon}=\{y\in Y_m:\mu_y(A\cap T^{-m}A\cap T^{-2m}A)>\epsilon\}}$ has positive ${\nu}$ measure. Again such ${\epsilon}$ exists because ${Y_m}$ is the countable union of the sets ${Y_{m,1/n}}$ with ${n=1,2,...}$. Let ${Y^*:=Y_{m,\epsilon}}$.

Now by the Ergodic Decomposition

$\displaystyle \begin{array}{rcl} \displaystyle\mu(A\cap T^{-m}A\cap T^{-2m}A)&=&\displaystyle\int_Y\mu_y(A\cap T^{-m}A\cap T^{-2m}A)d\nu(y)\\ &\geq&\displaystyle\int_{Y^*}\mu_y(A\cap T^{-m}A\cap T^{-2m}A)d\nu(y)\geq \epsilon\nu(Y^*)>0 \end{array}$

Finally by the Correspondence Principle we conclude that ${\bar d(E\cap(E-n)\cap(E-2n))>0}$ and in particular that intersection is non-empty. Let ${a\in E\cap(E-n)\cap(E-2n)}$, then ${\{a,a+n,a+2n\}}$ is a three term arithmetic progression on ${E}$.

$\Box$

— 2. Koopman-von Neumann decomposition —

To prove Theorem 5 a crucial step is to split the characteristic function ${1_A}$ of the set ${A}$ as the sum ${1_A=f+g}$ of two orthogonal functions from which we have some information. It turns out that the right splitting is given by the Koopman-von Neumann Decomposition. In this section I state without proof this decomposition, I plan to prove this in a future post.

In this section (short by the lack of proofs) I will assume that we are in the conditions of Theorem 5. Let ${H}$ be the Hilbert space ${H=L^2(X,\mu)}$ and let ${U:H\rightarrow H}$ be the unitary operator defined by ${(Uf)(x)=f(Tx)}$.

Definition 6 Let ${f\in H}$. We say that ${f}$ is a compact or almost periodic function if the orbit closure ${\overline{\{U^nf|n\in{\mathbb Z}\}}\subset H}$ is compact as a subset of ${H}$ with the strong topology. The set of all compact functions is denoted by ${H_c}$.

Definition 7 Let ${g\in H}$. We say that ${g}$ is a weak-mixing function if

$\displaystyle \lim_{n\rightarrow\infty}\frac1n\sum_{k=1}^n|\langle U^kg,g\rangle|=0$

The set of all weak-mixing functions is denoted by ${H_{wm}}$.

To briefly explain the names, it turns out that a measure preserving system is weak mixing if and only if all non-constant ${L^2}$ functions are weak mixing. Also, if all functions are compact then the system is isomorphic to a rotation on a compact group.

Theorem 8 (Koopman-von Neumann) The set ${H_c}$ is a closed subspace of ${H}$ and it’s orthogonal complement is ${H_{wm}}$.

We will also need another fact about compact functions.

Proposition 9 Let ${f}$ and ${g}$ be bounded functions in ${H_c}$. Then the product ${fg}$ is also a compact function.

This proposition expresses the fact that the set of compact functions form a factor, but we will not use that term in this post.

— 3. The van der Corput Trick and the weak-mixing component —

In the conditions of Theorem 5 let ${1_A}$ be the characteristic function of ${A}$. Write ${1_A=f+g}$ where ${f\in H_c}$ and ${g\in H_{wm}}$. In order to prove that there exists ${n}$ such that ${\mu(A\cap T^{-n}A\cap A^{-2n}A)>0}$ it suffices to prove that the Cesàro limit

$\displaystyle \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\mu(A\cap T^{-n}A\cap T^{-2n}A) \ \ \ \ \ (1)$

is positive. Note that

$\displaystyle \mu(A\cap T^{-n}A\cap T^{-2n}A)=\int_X1_AU^n1_AU^{2n}1_Ad\mu \ \ \ \ \ (2)$

Replacing ${1_A}$ with ${f+g}$ one can express ${\mu(A\cap T^{-n}A\cap T^{-2n}A)}$ as a sum of ${8}$ terms. One of those terms is ${\int_XfU^nfU^{2n}fd\mu}$, all the others have at least one copy of ${g}$. I will show that terms with a copy of ${g}$ average down to ${0}$ using the fact that ${g\in H_{wm}}$, and then prove that the term ${\int_XfU^nfU^{2n}fd\mu}$ has a positive Cesàro liminf. For the first part I will need the van der Corput Trick:

Lemma 10 (van der Corput Trick) Let ${\{u_n\}}$ be a bounded sequence in some Hilbert space. Assume that

$\displaystyle \lim_{H\rightarrow\infty}\frac1H\sum_{h=1}^H\limsup_{N\rightarrow\infty}\left|\frac1N\sum_{n=1}^N\langle u_{n+h},u_n\rangle\right|=0$

Then ${\displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu_n=0}$.

I have used this trick before in this blog, for instance here, but not in the language of Hilbert spaces. It turns out that the phenomenon (and the proof) is exactly the same.

Proof: Fix ${\epsilon>0}$ and assume without loss of generality that ${\|u_n\|\leq1}$ for all ${n}$. For each ${H}$, if ${N}$ is large enough we have

$\displaystyle \left|\frac1N\sum_{n=1}^Nu_n-\frac1H\sum_{h=1}^H\frac1N\sum_{n=1}^Nu_{n+h}\right|<\epsilon$

Now fix ${H}$ and ${N}$ very large, to be precise, using the hypothesis, choose ${H_0}$ such that for any ${H>H_0}$ one has ${\displaystyle\frac1H\sum_{h=1}^H\limsup_{N\rightarrow\infty}\left|\frac1N\sum_{n=1}^N\langle u_{n+h},u_n\rangle\right|<\epsilon}$ and make ${H>H_0/\epsilon}$. Then choose ${N>H/\epsilon}$ large enough so that for any ${H'\geq H_0}$, ${H' one has ${\displaystyle\frac1{H'}\sum_{h=1}^{H'}\left|\frac1N\sum_{n=1}^N\langle u_{n+h},u_n\rangle\right|<2\epsilon}$.

Note in particular that if ${1\leq h\leq H}$ then

$\displaystyle \left|\frac1N\sum_{n=1+h_2}^{N+h_2}\langle u_{n+h_1-h_2},u_n\rangle-\frac1N\sum_{n=1}^N\langle u_{n+h_1-h_2},u_n\rangle\right|<\frac{2H}N<2\epsilon$

Putting everything together we have

$\displaystyle \begin{array}{rcl} \displaystyle\left\|\frac1H\sum_{h=1}^H\frac1N\sum_{n=1}^Nu_{n+h}\right\|^2&\leq&\displaystyle \frac1N\sum_{n=1}^N\left\|\frac1H\sum_{h=1}^Hu_{n+h}\right\|^2\\&=&\displaystyle \frac1N\sum_{n=1}^N\frac1{H^2}\sum_{h_1=1}^H\sum_{h_2=1}^H\langle u_{n+h_1},u_{n+h_2}\rangle\\&=& \displaystyle\frac1N\sum_{n=1}^N\frac1{H^2}\sum_{h_1=1}^H\|u_{n+h_1}\|^2+ 2\frac1{H^2}\sum_{h_1=1}^H\sum_{h_2> h_1}\frac1N\sum_{n=1+h_1}^{N+h_1}\langle u_{n+h_2-h_1},u_n\rangle\\&\leq&\displaystyle \epsilon+ \frac2H\sum_{h_1=1}^{H-H_0}\frac1H\sum_{h_2> h_1}\left|\frac1N\sum_{n=1}^N\langle u_{n+h_1-h_2},u_n\rangle\right|+\frac{H_0}H\\&\leq&\epsilon+2(2\epsilon)+\epsilon=6\epsilon \end{array}$

Since ${\epsilon>0}$ was arbitrary we conclude that indeed ${\displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu_n=0}$ as desired.

$\Box$

Now I’m ready to show that terms with the weak mixing component average down to ${0}$:

Lemma 11 Let ${g_1,g_2\in H}$, at least one of them is in ${H_{wm}}$. Then

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NU^ng_1U^{2n}g_2=0$

Proof: Let ${u_n=U^ng_1U^{2n}g_2}$, I am going to use the van der Corput Trick (Lemma 10). We have

$\displaystyle \langle u_n,u_{n+h}\rangle=\int_XU^ng_1U^{2n}g_2U^{n+h}\overline{g_1}U^{2n+2h}\overline{g_2}d\mu= \int_X(g_1U^h\overline{g_1})U^n(g_2U^{2h}\overline{g_2})d\mu$

Since ${T}$ is ergodic, the only ${U}$-invariant vectors are the constants, so the Mean Ergodic Theorem says that

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^NU^n(g_2U^{2h}g_2)=\int_Xg_2U^{2h}\overline{g_2}d\mu=\langle g_2,U^{2h}g_2\rangle$

The convergence is in the strong ${H}$ topology, and hence also in the weak topology. Thus we get

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\langle u_n,u_{n+h}\rangle= \int_Xg_1U^h\overline{g_1}d\mu\langle g_2,U^{2h}g_2\rangle=\langle g_1,U^hg_1\rangle\langle g_2,U^{2h}g_2\rangle$

Since either ${g_1}$ or ${g_2}$ is assumed to be in ${H_{wm}}$ we conclude that

$\displaystyle \lim_{H\rightarrow\infty}\frac1H\sum_{h=1}^H\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\langle u_n,u_{n+h}\rangle=0$

and by the van der Corput Trick we get the desired conclusion.

$\Box$

This lemma can now be used to reduce the Cesàro limit in equation (1) to a Cesàro limit involving only compact functions:

Lemma 12 In the conditions of Theorem 5, write ${1_A=f+g}$ where ${f\in H_c}$ and ${g\in H_{wm}}$. Then

$\displaystyle \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\mu(A\cap T^{-n}A\cap T^{-2n}A)=\liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\int_XfU^nfU^{2n}fd\mu$

Proof: Applying the Lemma 11 with ${(g_1,g_2)=(g,1_A)}$ gives

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N U^ngU^{2n}1_A=0$

Since convergence in the strong topology implies convergence in the weak topology and ${g=(1_A-f)}$, we have

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\int_X1_AU^n(1_A-f)U^{2n}1_Ad\mu=0$

Using the elementary fact that if ${a_n\rightarrow0}$ then ${\liminf (a_n+b_n)=\liminf b_n}$ one gets

$\begin{array}{rcl} \displaystyle \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\mu(A\cap T^{-n}A\cap T^{-2n}A)&=&\displaystyle\liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\int_X1_AU^n1_AU^{2n}1_Ad\mu\\&=& \displaystyle\liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\int_X1_AU^nfU^{2n}1_Ad\mu \end{array}$

Applying again Lemma 11 with ${(g_1,g_2)=(f,g)}$ we get that

$\displaystyle \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\mu(A\cap T^{-n}A\cap T^{-2n}A)= \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\int_X1_AU^nfU^{2n}fd\mu$

By Proposition 9 the functions ${U^nfU^{2n}f}$ are compact for each ${n}$, so by the Koopman-von Neumann Decomposition (Theorem 8) they are always orthogonal to ${g}$. Therefore I conclude:

$\displaystyle \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\mu(A\cap T^{-n}A\cap T^{-2n}A)= \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N-1}\int_XfU^nfU^{2n}fd\mu$

$\Box$

— 4. The compact component —

All that remains to prove is the following Lemma:

Lemma 13 Let ${(X,{\cal B},\mu,T)}$ be an ergodic measure preserving system and let ${f\geq0}$ be a bounded compact function such that ${\int fd\mu>0}$. Then

$\displaystyle \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\int_XfU^nfU^{2n}fd\mu>0$

Before being able to prove this Lemma I need a different characterization of compact functions

Lemma 14 Let ${f}$ be a compact function. Then for any ${\epsilon>0}$ the set ${\{n\in{\mathbb Z}:\|f-U^nf\|<\epsilon\}}$ is syndetic.

This reciprocate is also true but we don’t need it here.

Proof: Since ${f}$ is a compact function, the orbit closure ${\overline{\left\{U^nf:n\in{\mathbb Z}\right\}}}$ is a compact set. Thus there is some finite set ${F\subset{\mathbb Z}}$ such that the balls (in the ${L^2}$ norm) ${\{B(U^nf,\epsilon)\}_{n\in F}}$ cover the orbit closure. Let ${S=\{n\in{\mathbb Z}:\|f-U^nf\|<\epsilon\}}$ and fix an arbitrary ${m\in{\mathbb Z}}$. We have to show that we can write ${m=a+b}$ with ${a\in F}$ and ${b\in S}$.

By construction, ${U^mf}$ is in some ball ${B(U^af,\epsilon)}$ with ${a\in F}$. Since ${U}$ is unitary we get that ${\|U^mf-U^af\|=\|U^{m-a}f-f\|<\epsilon}$. We conclude that ${m-a\in S}$ as desired.

$\Box$

We can now prove Lemma 13

Proof: (of Lemma 13) Assume without loss of generality that ${f}$ is bounded by ${1}$ and fix ${\epsilon>0}$ small enough (to be determined later). Let ${n\in{\mathbb Z}}$ be such that ${\|f-U^nf\|<\epsilon}$. Then since ${U}$ is unitary, using the triangular inequality, we get ${\|U^{2n}f-f\|\leq\|U^{2n}f-U^nf\|+\|U^nf-f\|= 2\|U^nf-f\|<2\epsilon}$.

$\displaystyle \int_XfU^nfU^{2n}fd\mu=\int_Xf^3\mu+\int_Xf^2(U^{2n}f-f)d\mu+\int_Xf(U^n-f)U^{2n}fd\mu$

By Cauchy-Schwartz inequality the last two integrals are bounded by ${2\epsilon+\epsilon=3\epsilon}$, so we have

$\displaystyle \int_XfU^nfU^{2n}fd\mu>\int_Xf^3d\mu-3\epsilon$

Making ${\epsilon}$ small enough we get that ${\int_XfU^nfU^{2n}fd\mu}$ is larger than a positive constant. Since ${f}$ is almost periodic, by the previous Lemma we get that the set of ${n}$ for which this happens is syndetic, so averaging we conclude

$\displaystyle \liminf_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\int_XfU^nfU^{2n}fd\mu>0$

$\Box$

This finishes the proof of Theorem 5.

UPDATE: It turns out that it doesn’t quite finishes, the (short) proof is given at the end of my next post.