In 1975 Szemerédi proved what is now known as Szemerédi’s theorem on arithmetic progressions, answering an old question by Erdös and Turán. Twenty years earlier Roth had proved the (much simpler) case of arithmetic progressions of length 3.
Then there exists an arithmetic progression of length in .
In 1977 Furstenberg gave a new proof of Szemerédi’s theorem using ergodic theory. Thus he also found a new proof of Roth’s theorem. In this post I will present an ergodic theoretical proof of Roth’s theorem, which sheds some light on the ergodic theoretical proof of the full Szemerédi’s theorem. This proof does not quite follows Furstenberg’s original argument, and is more close to a second (somewhat different) ergodic theoretical proof by Furstenberg, Katznelson and Ornstein. Other sources where this proof is presented include the chapter 7 of the recent book of Einsiedler and Ward, the section 4 of this Bergelson’s survey and this exposition by Zhao.
In this post I will focus on the technical aspects of the proof and my goal is to write a short but complete proof, assuming only some basic knowledge on ergodic theory. For all the (vast and important) motivational aspects of the proof I recommend any of the above sources.
Also, to keep things short, I will not prove the so-called Koopman-von Neumann decomposition (Theorem 8 bellow) which is a fundamental piece of this proof, I plan to write a post soon about that (there are also other not so important ergodic theoretical results which I don’t prove here, such as the Ergodic Decomposition, Theorem 4 bellow or the mean ergodic theorem, I plan to also address this results in a future post)
— 1. Some reductions —
The first step is to transform Theorem 1 into a statement on ergodic theory. This was achieved by Furstenberg using his Correspondence Principle. On this post, by a measure preserving system I mean a quadruple where is a Borel space (i.e. is a Borel subset of a compact metric space and is the restriction to of the Borel -algebra), is a probability measure and is an invertible measure preserving transformation, i.e., for any we have (usually one does not assume invertibility, but since the system that arises from the correspondence principle as stated is invertible I only need to deal with invertible maps).
I presented a proof of this Theorem in my previous post in a somewhat more general way.
Using this Correspondence Principle, Theorem 1 becomes (I prove a stronger implication formally at the end of this section):
Theorem 3 Let be a measure preserving system and with . Then there exists such that .
Another important reduction is to assume that the system is ergodic, this means roughly that there are no invariant subsystems. More precisely a measure preserving system is ergodic if any set which is invariant (i.e. ) we have either or . To be able to reduce to the case of an ergodic system one can use the Ergodic Decomposition:
Theorem 4 (Ergodic Decomposition) Let be a measure preserving system. Then there is a Borel probability space and a measurable map such that the quadruple forms an ergodic measure preserving system for -a.e. and such that for any one has .
I will not prove this Theorem here, as it is a basic tool in Ergodic Theory with a somewhat technical proof. For reference it is Theorem 6.2 on Einsiedler and Ward’s book.
Putting everything together one can now reduce Theorem 1 to the following:
Proof: (of Theorem 1, assuming Theorem 5) Let have positive upper density. Let be the measure preserving system one gets by applying the Correspondence Principle (Theorem 2) and let be the set. Now let be the probability space one gets from the Ergodic Decomposition (Theorem 4).
Let be the set . We have . Applying Theorem 5 to the system (for some that makes the system ergodic) and the set we get some such that . Let be such that the set has . Such exists because is the countable union of the sets with . Now let be such that the set has positive measure. Again such exists because is the countable union of the sets with . Let .
Now by the Ergodic Decomposition
Finally by the Correspondence Principle we conclude that and in particular that intersection is non-empty. Let , then is a three term arithmetic progression on .
— 2. Koopman-von Neumann decomposition —
To prove Theorem 5 a crucial step is to split the characteristic function of the set as the sum of two orthogonal functions from which we have some information. It turns out that the right splitting is given by the Koopman-von Neumann Decomposition. In this section I state without proof this decomposition, I plan to prove this in a future post.
In this section (short by the lack of proofs) I will assume that we are in the conditions of Theorem 5. Let be the Hilbert space and let be the unitary operator defined by .
Definition 6 Let . We say that is a compact or almost periodic function if the orbit closure is compact as a subset of with the strong topology. The set of all compact functions is denoted by .
Definition 7 Let . We say that is a weak-mixing function if
The set of all weak-mixing functions is denoted by .
To briefly explain the names, it turns out that a measure preserving system is weak mixing if and only if all non-constant functions are weak mixing. Also, if all functions are compact then the system is isomorphic to a rotation on a compact group.
We will also need another fact about compact functions.
This proposition expresses the fact that the set of compact functions form a factor, but we will not use that term in this post.
— 3. The van der Corput Trick and the weak-mixing component —
In the conditions of Theorem 5 let be the characteristic function of . Write where and . In order to prove that there exists such that it suffices to prove that the Cesàro limit
Replacing with one can express as a sum of terms. One of those terms is , all the others have at least one copy of . I will show that terms with a copy of average down to using the fact that , and then prove that the term has a positive Cesàro liminf. For the first part I will need the van der Corput Trick:
I have used this trick before in this blog, for instance here, but not in the language of Hilbert spaces. It turns out that the phenomenon (and the proof) is exactly the same.
Proof: Fix and assume without loss of generality that for all . For each , if is large enough we have
Now fix and very large, to be precise, using the hypothesis, choose such that for any one has and make . Then choose large enough so that for any , one has .
Note in particular that if then
Putting everything together we have
Since was arbitrary we conclude that indeed as desired.
Now I’m ready to show that terms with the weak mixing component average down to :
Proof: Let , I am going to use the van der Corput Trick (Lemma 10). We have
Since is ergodic, the only -invariant vectors are the constants, so the Mean Ergodic Theorem says that
The convergence is in the strong topology, and hence also in the weak topology. Thus we get
Since either or is assumed to be in we conclude that
and by the van der Corput Trick we get the desired conclusion.
This lemma can now be used to reduce the Cesàro limit in equation (1) to a Cesàro limit involving only compact functions:
Lemma 12 In the conditions of Theorem 5, write where and . Then
Proof: Applying the Lemma 11 with gives
Since convergence in the strong topology implies convergence in the weak topology and , we have
Using the elementary fact that if then one gets
Applying again Lemma 11 with we get that
— 4. The compact component —
All that remains to prove is the following Lemma:
Before being able to prove this Lemma I need a different characterization of compact functions
Lemma 14 Let be a compact function. Then for any the set is syndetic.
This reciprocate is also true but we don’t need it here.
Proof: Since is a compact function, the orbit closure is a compact set. Thus there is some finite set such that the balls (in the norm) cover the orbit closure. Let and fix an arbitrary . We have to show that we can write with and .
By construction, is in some ball with . Since is unitary we get that . We conclude that as desired.
We can now prove Lemma 13
Proof: (of Lemma 13) Assume without loss of generality that is bounded by and fix small enough (to be determined later). Let be such that . Then since is unitary, using the triangular inequality, we get .
By Cauchy-Schwartz inequality the last two integrals are bounded by , so we have
Making small enough we get that is larger than a positive constant. Since is almost periodic, by the previous Lemma we get that the set of for which this happens is syndetic, so averaging we conclude
This finishes the proof of Theorem 5.
UPDATE: It turns out that it doesn’t quite finishes, the (short) proof is given at the end of my next post.