Szemerédi’s Theorem Part II – Overview of the proof

This is the second in a series of {6} posts about Szemerédi’s theorem. In the first post I presented the first step in the proof of Szemerédi’ theorem, namely applying the correspondence principle of Furstenberg to transform the problem into one of ergodic theory. More precisely, I showed that Szemerédi’s theorem is equivalent to the following:

Theorem 1 Let {(X,{\cal B},\mu, T)} be a measure preserving system, let {f\in L^\infty({\cal B})} be such that {f\geq0} and {\int_Xfd\mu>0} and let {k\in{\mathbb N}}. Then there exists {n\in{\mathbb N}} such that

\displaystyle \int_Xf(x)f(T^nx)f(T^{2n}x)\cdots f(T^{kn}x)d\mu(x)>0

It turns out that much more is given by Furstenberg’s proof. Indeed he showed that

Theorem 2 Let {(X,{\cal B},\mu, T)} be a measure preserving system, let {f\in L^\infty({\cal B})} be such that {f\geq0} and {\int_Xfd\mu>0} and let {k\in{\mathbb N}}. Then

\displaystyle \liminf_{N\rightarrow\infty}\inf_{M\in{\mathbb N}}\frac1N\sum_{n=M}^{M+N}\int_Xf(x)f(T^nx)f(T^{2n}x)\cdots f(T^{kn}x)d\mu(x)>0

Theorem 2 implies not only Szemerédi’s theorem, it gives some information about the common difference of the arithmetic progression:

Corollary 3 Let {E\subset{\mathbb N}} be such that {d^*}{(E)>0} and let {k\in{\mathbb N}}. Then the set

\displaystyle R:=\big\{s\in{\mathbb N}:\{a,a+s,a+2s,\dots,a+ks\}\subset E\text{ for some }a\in{\mathbb N}\big\}

is syndetic (this means that it has bounded gaps).

The proof of this corollary follows the same lines as the deduction of Szemerédi’s theorem from Theorem 1 in the previous post.


Let {E\subset{\mathbb N}} have positive Banach upper density and let {k\in{\mathbb N}}. We apply Furstenberg’s correspondence principle to find a m.p.s. {(X,{\cal B},\mu,T)} and a set {A\in{\cal B}} such that {\mu(A)=d^*(E)} and

\displaystyle  d^*(E\cap E-n\cap E-2n\cap\cdots\cap E-kn)\geq\mu(A\cap T^{-n}A\cap\cdots\cap T^{-kn}A) \ \ \ \ \ (1)

Apply Theorem 2 with {f=1_A}. We can find {N\in{\mathbb N}} such that

\displaystyle \forall M\in{\mathbb N}\qquad\frac1N\sum_{n=M}^{M+N}\mu(A\cap T^{-n}A\cap T^{-2n}A\cap\cdots\cap T^{-kn}A)>0

Applying (1) we deduce that

\displaystyle \forall M\in{\mathbb N}\qquad\frac1N\sum_{n=M}^{M+N}d^*(E\cap E-n\cap E-2n\cap\cdots\cap E-kn)>0

In particular, for each {M\in{\mathbb N}} there exists some {n\in[M,M+N]} such that the intersection {E\cap E-n\cap\cdots\cap E-kn} is nonempty. If {a} is in that intersection, then {\{a,a+n,\dots,a+kn\}\subset E} and so {n\in R}. We concluded that for every {M\in{\mathbb N}} the set {R} has nonempty intersection with the interval {[M,M+N]}. Thus {R} has bounded gaps and hence is syndetic. \Box

The proof of Theorem 2, which will occupy the rest of this series on Szemerédi’s theorem, is based on a structure theorem for measure preserving systems. This structure theorem was devised by Furstenberg in order to prove Theorem 2 and has been widely used since then (together with variants) to prove many other dynamical results which have direct combinatorial consequences.

In this post I will introduce some notions of ergodic theory that we will need, and then I will motivate and present the structure theorem (Theorem 8 below).

— 1. A note about uniform Cesàro limits —

Before jumping in to the main part of this post I want to make a remark on the averaging scheme used in this series on Szemerédi’s theorem. As mentioned above, Furstenberg’s multiple recurrence theorem (2 above) gives more that just Szemerédi’s theorem, namely we can obtain Corollary 3. However, to prove Theorem 2 we need to deal with averages along any increasing sequence of intervals (or, equivalently, take limiting averages with respect to an arbitrary F\o lner sequence on {{\mathbb N}}). While it is possible to just average over the interval {[1,N]} and still obtain Szemerédi’s theorem, I chose to use a more general averaging scheme to stress the advantage of the ergodic theoretical proof. Besides, the proof does not get any longer or harder, if we introduce the right notation:

Definition 4 (Uniform Cesàro averages) Let {H} be a vector space (usually some function space or {{\mathbb C}}) and let {(h_n)_{n\in{\mathbb N}}} be a sequence taking values in {H}. We use the notation

\displaystyle UC-\lim h_n:=\lim_{N-M\rightarrow\infty}\frac1{N-M}\sum_{n=M}^Nh_n=c

to say that for every {\epsilon>0} there exists some {K} such that if {M,N\in{\mathbb N}} satisfy {N-M>K}, then {\displaystyle\left|\frac1{N-M}\sum_{n=M}^Nh_n-c\right|<\epsilon}.

It is not hard to show (if one knows all the involving definitions) that

\displaystyle \lim_{N-M\rightarrow\infty}\frac1{N-M}\sum_{n=M}^Nh_n=c

is equivalent to

\displaystyle \forall\text{ F\o lner sequence }(F_N)_{N\in{\mathbb N}}\qquad\lim_{N\rightarrow\infty}\frac1{|F_N|}\sum_{n\in F_N}h_n=c

— 2. Factors and extensions —

In this section I will develop some general facts of ergodic theory that will play an important role in what follows. We identify two {\sigma}-subalgebras (say {{\cal A}} and {{\cal D}}) of {{\cal B}} if the spaces {L^1({\cal A})} and {L^1({\cal D})} of equivalence classes of functions are equal (in the sense that for every equivalence class {{\bf f}\in L^1({\cal A})} there exists exactly one equivalence class {{\bf g}\in L^1({\cal D})} such that {{\bf f}\cap{\bf g}\neq\emptyset}). In other words, {{\cal A}} and {{\cal D}} are equivalent if for every {A\in{\cal A}} there exists some {D\in{\cal D}} such that {A\bigtriangleup D=(A\cup D)\setminus(A\cap D)} has {0} measure, and vice versa. Thus every time we speak of {\sigma}-subalgebras of {{\cal B}} we are strictly speaking referring to equivalence classes of {\sigma}-subalgebras under this equivalent relation. Observe that if the {\sigma}-subalgebras {{\cal A}} and {{\cal D}} are equivalent, then the m.p.s. {(X,{\cal A},\mu,T)} and {(X,{\cal D},\mu,T)} are isomorphic.

Definition 5 (Factor) Let {(X,{\cal B},\mu, T)} be a measure preserving system. A {\sigma}-subalgebra {{\cal D}} of {{\cal B}} such that {T} is measurable with respect to {{\cal D}} (in other words, such that {{\cal D}} is {T}-invariant) is called a factor of {(X,{\cal B},\mu,T)}. More precisely {{\cal D}} is a factor if and only if for all {D\in{\cal D}} the pre-image {T^{-1}D} is also in {{\cal D}}.

There are several simple examples of factors. Of course {{\cal B}} is itself a factor, and the trivial {\sigma}-algebra {\{\emptyset,X\}} is also a factor. If {T} is not invertible then, for each {n\in{\mathbb N}}, the {\sigma}-algebra {\{T^{-n}B:B\in{\cal B}\}} is a nontrivial factor.

The example that best illustrates the idea behind the notion of a factor is the following: Let {(X,{\cal B},\mu,T)} and {(Y,{\cal C},\nu,S)} be m.p.s. and let {Z=X\times Y}, {{\cal D}={\cal B}\otimes{\cal C}} (this is the smallest {\sigma}-algebra on {Z} containing all the rectangles {B\times C} with {B\in{\cal B}} and {C\in{\cal C}}), let {\lambda=\mu\otimes\nu} and {R=T\times S}. Then {(Z,{\cal D},\lambda,R)} is a m.p.s., called the product of the systems {(X,{\cal B},\mu,T)} and {(Y,{\cal C},\nu,S)}. The {\sigma}-subalgebras {\tilde{\cal B}:=\{B\times Y:B\in{\cal B}\}\subset{\cal D}} and {\tilde{\cal C}:=\{X\times C:C\in{\cal C}\}\subset{\cal D}} are factors of {(Z,{\cal D},\lambda,R)}. Observe that the systems {(Z,\tilde{\cal B},\lambda,R)} and {(Z,\tilde{\cal C},\lambda,R)} are isomorphic, respectively, to {(X,{\cal B},\mu,T)} and {(Y,{\cal C},\nu,S)}.

Another important example is the case of a skew product. Instead of giving a general definition I will just consider a particular case: Let {X={\mathbb T}^2} be the two dimensional torus with the Borel {\sigma}-algebra and the Haar-Lebesgue measure. Let {\alpha\in{\mathbb T}} and {T(x,y)=(x+\alpha,y+x)}. Note that the {T} is not the product of two transformations on {{\mathbb T}}, so this system can not be decomposed as the product of two systems. Now let {{\cal C}=\{A\times{\mathbb T}:A\in Borel({\mathbb T})\}}. It is not hard to see that {{\cal C}} is a factor of {(X,{\cal B},\mu,T)}. Moreover, the system {(X,{\cal C},\mu,T)} is isomorphic to the rotation {Sx=x+\alpha} on the torus {{\mathbb T}} with the Borel {\sigma}-algebra and the Haar-Lebesgue measure. More generally, if {{\cal D}} is a factor of {(X,{\cal B},\mu,T)}, the system {(X,{\cal D},\mu,T)} is “simpler” than the original system. For instance if {{\cal D}=\{\emptyset,X\}} is the trivial {\sigma}-algebra, then {(X,{\cal D},\mu,T)} is isomorphic as a measure preserving system to the one point trivial system. The idea behind this is that in ergodic theory one only cares about measurable sets (the elements of the underlying {\sigma}-algebra) and hence in a smaller {\sigma}-algebra one has less sets to worry about.

Definition 6 (Sz factor) A factor {{\cal A}} of {(X,{\cal B},\mu,T)} is called Sz if {(X,{\cal A},\mu,T)} satisfies the conclusion of the Theorem 2.

For instance, the factor {\{\emptyset,X\}} is trivially Sz (the only measurable functions are the constants). Theorem 2 can be rephrased as saying that {{\cal B}} itself is a Sz factor. The idea to prove Theorem 2 is to find, for each Sz factor {{\cal A}} of (and strictly contained in) {{\cal B}}, a larger Sz factor {{\cal D}} containing {{\cal A}}. Thus one can show that no maximal Sz factor can exist except {{\cal B}} itself.

Definition 7 (Extension) Let {{\cal A}} and {{\cal D}} be factors of {(X,{\cal B},\mu,T)}. If {{\cal A}\subset{\cal D}}, then we say that {{\cal D}} is an extension of {{\cal A}}. We say that {{\cal D}} is a non-trivial extension of {{\cal A}} if {{\cal A}\neq{\cal D}\neq{\cal B}}.

Our task is to prove that any proper factor {{\cal A}\neq{\cal B}} which is Sz has a (non-trivial) extension which is also Sz. But how does one construct an extension of a given factor? The answer lies with almost periodic functions!

Let’s start with the trivial factor {{\cal A}:=\{\emptyset,X\}}, which is Sz. An extension of {{\cal A}} is simply any factor of {{\cal B}}. Consider the collection {AP} of all almost periodic functions, i.e., all functions {f\in L^\infty({\cal B}} such that the set {\{f\circ T^n:n\in{\mathbb N}\}} is precompact in {L^2({\cal B})}. It turns out that the family {\{D\in{\cal B}:1_D\in AP\}} forms a factor of {{\cal B}} (I presented a proof of this fact in a previous post). Moreover, because of the almost periodicity, it is not hard to show that this factor is Sz. But what if there are no almost periodic functions? Well then it turns out that the system is weak mixing, and then proving multiple recurrence for the whole system (i.e. showing that {{\cal B}} is Sz) is easy. This shows how one gets one extension of the trivial factor to be Sz. To get a Sz extension of a general factor, more technical definitions are needed. It turns out that one can still adapt the ideas of the previous paragraph with the trivial factor replaced with any factor {{\cal A}}. Thus one can construct an extension {{\cal D}} consisting of almost periodic functions relative to {{\cal A}} (the precise definitions will be given in the next post), and it is possible to leverage the special structure of this extension to prove that the Sz property lifts from {{\cal A}} to {{\cal D}}. Finally, if no such extension of {{\cal A}} exists, then {{\cal B}} is itself a weak mixing extension of {{\cal A}}, and again this property can be used to lift the Sz property to {{\cal B}}.

I will not define precisely either compact or weak mixing extensions until the next post. For now it suffices to know that an extension of m.p.s. can take two special forms: it can be a compact extension or a weak mixing extension (and of course there are many extensions which are neither compact nor weak mixing). The ideas of the previous paragraph can be summarized in the next theorem:

Theorem 8

  • Let {{\cal A}\subset{\cal D}} be an extension between factors of {{\cal B}}. If the extension is weak mixing and {{\cal A}} is a Sz factor, then also {{\cal D}} is a Sz factor.
  • Let {{\cal A}\subset{\cal D}} be an extension between factors of {{\cal B}}. If the extension is compact and {{\cal A}} is a Sz factor, then also {{\cal D}} is a Sz factor.
  • Let {{\cal A}} be a factors of {{\cal B}}. If the extension {{\cal A}\subset{\cal B}} is not weak mixing, then there exists a non-trivial extension {{\cal D}} of {{\cal A}} which is compact.

In the next post in this series I will give all the precise definitions and I will show how Theorem 8 implies Theorem 2. Theorem 8 will be proved in the three last post of this series, one point in each post.

This entry was posted in Combinatorics, Ergodic Theory and tagged , , , , , , , , . Bookmark the permalink.

4 Responses to Szemerédi’s Theorem Part II – Overview of the proof

  1. Pingback: Szemerédi’s Theorem Part III – Precise definitions | I Can't Believe It's Not Random!

  2. Pingback: Szemerédi’s Theorem Part IV – Weak mixing extensions | I Can't Believe It's Not Random!

  3. Pingback: Szemerédi’s Theorem Part V – Compact extensions | I Can't Believe It's Not Random!

  4. Pingback: Szemerédi Theorem Part VI – Dichotomy of extensions | I Can't Believe It's Not Random!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s