## Szemerédi’s Theorem Part I – Equivalent formulations

The theorem of van der Waerden on arithmetic progressions, whose precise statement and proof can be found in a previous post of mine, states that in a finite partition of the set of positive integers, one of the pieces contains arbitrarily long arithmetic progressions. This was one of the first results in what we call now Ramsey theory. According to the principles of Ramsey theory, the partition result of van der Waerden suggests that any “large” subset of the integers should contain arbitrarily long arithmetic progressions. Erdös and Turán conjectured, in 1936, that the right notion of largeness here is having positive upper density:

Definition 1 (Upper density) Let ${A\subset {\mathbb N}}$ be a subset of the positive integers. Its upper density is defined by

$\displaystyle \bar d(A):=\limsup_{N\rightarrow\infty}\frac{|A\cap\{1,2,\dots,N\}|}N$

A set ${A\subset {\mathbb N}}$ has positive upper density if ${\bar d(A)>0}$.

This turns out to be the right notion of largeness for this result (well, at least one good notion of largeness. A famous unsolved conjecture by Erdös claims that it suffices ${\sum_A\frac1a=\infty}$ for ${A}$ to have arbitrarily long arithmetic progressions), and that fact was proved by Szemerédi in 1975:

Theorem 2 (Szemerédi) Let ${E\subset {\mathbb N}}$ be such that ${\bar d(E)>0}$. Then ${E}$ contains arbitrarily long arithmetic progressions.

Szemerédi used a complicated combinatorial argument. Two years later Furstenberg provided a different proof of this result, using ergodic theory and a correspondence principle which allows one to derive conclusions about sets of integers by studying dynamics on certain measure spaces. It should also be mentioned that a third proof of the Szemerédi’s Theorem was discovered by Gowers in 2001, generalizing Roth’s harmonic analysis proof of the existence of progressions of length ${3}$ in a set of positive density, to progressions of arbitrary length and having the advantage of providing concrete bounds for the finitistic version. More recently a number of different proofs appeared. A good discussion on the different proofs and similarities between them can be found on chapters 10 and 11 of the Tao and Vu’s book.

This post is the first in a sequence of ${6}$ posts I am writing about Szemerédi’s theorem which will include the ergodic theoretical proof of Furstenberg. In fact I will not quite follow the original paper by Furstenberg, but instead I will use an approach more closely related to the one in his paper with Katznelson and Ornstein. This first post is a general introduction with some equivalent forms of Szemerédi’s theorem, including the ergodic theoretical version. The plan for the following posts is the following: In the second post I will present a high level draft of the proof, in the third post I will set up some machinery which will be used in the last three (and more technical) posts where the proof will be finished.

Some good sources for the ergodic approach to Szemerédi’s theorem are the chapter 4.1 of this survey by Bergelson and the chapter 7 of the book of Einsiedler and Ward.

— 1. Finitistic version —

An equivalent statement of Szmerédi’s theorem is the following, called a “finitistic” version. We use the following convention: for a positive integer ${n}$ we denote the set ${\{1,2,\dots,n\}}$ simply by ${[n]}$.

Theorem 3 Let ${\delta>0}$ and ${k\in{\mathbb N}}$. Then there exists a number ${N=N(\delta,k)}$ such that for every ${n\geq N}$ and every set ${A\subset[n]}$ with ${|A|>\delta n}$, there exists a ${k}$-term arithmetic progression in ${A}$.

In fact it was this finitistic form that was proved in Szemerédi’s original paper. To see why Theorem 3 implies Theorem 2, note simply that if ${\bar d(A)>\delta}$, then there exists an arbitrarily large positive integer ${n}$ such that ${|A\cap[n]|>\delta n}$ (this follows directly from the definition of upper density). On the other hand it takes a small trick to get the converse.

Proof that Theorem 2 implies Theorem 3: Assume that Theorem 3 is false, let ${\delta>0}$ and ${k\in{\mathbb N}}$ be such that no ${N(\delta,k)}$ exists. Thus for each ${n\in{\mathbb N}}$ there exists a set ${A_n\subset[n]}$ with ${|A|>\delta n}$ and such that ${A_n}$ has no ${k}$ terms in arithmetic progression. We can now form a set ${A}$ by gluing the sets ${A_n}$ in the following way:

• Let ${B_1=A_1}$.
• For ${m>1}$ let ${n=\max B_{m-1}}$ and let ${B_m=B_{m-1}\cup(A_n+2n)}$.
• Let ${\displaystyle A=\bigcup_{m=1}^\infty B_m}$

Note that ${A_n+2n\subset[3n]}$, and hence, letting ${n(m)=\max B_{m-1}}$ we have

$\displaystyle |B_m\cap[3n(m)]|\geq|A_{n(m)}+2n(m)|=|A_{n(m)}|\geq\delta n(m)$

Therefore we have

$\displaystyle \bar d(A)=\limsup_{n\rightarrow\infty}\frac{|A\cap[n]|}n\geq\limsup_{m\rightarrow\infty}\frac{|A\cap[3n(m)]|}{3n(m)}$

$\displaystyle \geq\limsup_{m\rightarrow\infty}\frac{|B_m\cap[3n(m)]|}{3n(m)}\geq\frac\delta3$

It follows from Theorem 2 that ${A}$ must contain a ${k}$-term arithmetic progression, and this implies that some ${B_m}$ must contain a ${k}$-term arithmetic progression (because arithmetic progressions are finite objects). Assume also that ${m}$ is the smallest such that ${B_m}$ contains an arithmetic progression. Then the progression cannot be contained in ${B_{m-1}}$. If some part of the progression is in ${B_{m-1}}$ and the rest is in ${A_{n(m)}+2n(m)}$, then because of the gap of length ${n}$ in between these two portions of ${B_m}$ we conclude that the gap between consecutive terms of the progression is at least ${n(m)}$. However this is impossible for ${k\geq3}$ since ${B_m\subset[3n(m)]}$. Finally we conclude that the progression must be entirely contained in ${A_{n(m)}+2n(m)}$, which is also impossible as this would imply that ${A_{n(m)}}$ contains a ${k}$-term arithmetic progression, contradicting the construction. $\Box$

The finitistic version of Szemerédi’s theorem implies a slightly stronger version of Theorem 2. We will need the following definition

Definition 4 (Banach upper density) Let ${A\subset{\mathbb N}}$. The Banach upper density is defined as

$\displaystyle d^*(A)=\limsup_{N\rightarrow\infty}\max_{M\in{\mathbb N}}\frac{|A\cap[M+1,M+N]|}N$

For instance, the set ${A=\bigcup[n^3,n^3+n]}$ has ${\bar d(A)=0}$ but ${d^*(A)=1}$. Also it is clear that ${d^*(A)\geq\bar d(A)}$ for any set ${A\subset{\mathbb N}}$. Thus the following version of Szemerédi’s theorem covers more sets than the formulation of Theorem 2:

Theorem 5 Let ${A\subset{\mathbb N}}$ be such that ${d^*(A)>0}$. Then ${A}$ contains arbitrarily long arithmetic progressions.

This formulation clearly implies Theorem 2. It is also not hard to see how it follows from Theorem 3.

Proof: Let ${A\subset{\mathbb N}}$ be such that ${d^*(A)+\delta>0}$. Let ${k\in{\mathbb N}}$ and let ${N=N(\delta/2,k)\in{\mathbb N}}$ be given by Theorem 3. Let ${n>N}$ be such that ${\max_{M\in{\mathbb N}}|A\cap[M+1,M+n]|>n\delta/2}$ and let ${M\in{\mathbb N}}$ be such that ${|A\cap[M+1,M+n]|>n\delta/2}$. Then ${|(A-M)\cap[n]|>n\delta/2}$ and since ${n>N(k,\delta/2)}$ we deduce that ${(A-M)\cap[n]}$ contains a ${k}$-term arithmetic progression. It follows that also ${A}$ contains such a progression. Since ${k}$ was arbitrary we conclude that ${A}$ contains arbitrarily long arithmetic progressions. $\Box$

— 2. Szemerédi’s theorem as multiple recurrence —

Let ${A\subset{\mathbb N}}$. To say that ${A}$ contains the ${k}$-term arithmetic progression ${\{a,a+s,\dots,a+(k-1)s\}\subset A}$ is the same as saying that ${a\in A\cap(A-s)\cap\cdots\cap(A-(k-1)s)}$. Thus, to say that ${A}$ contains arbitrarily long arithmetic progressions is equivalent to saying that

$\displaystyle A\cap(A-s)\cap(A-2s)\cap\cdots\cap(A-(k-1)s)\neq\emptyset\ \ \ \ \ (1)$

for some ${s\in{\mathbb N}}$. If one defines the map ${T:{\mathbb N}\rightarrow{\mathbb N}}$ by ${Tx=x+1}$, then one can rewrite (1) as

$\displaystyle A\cap T^{-s}A\cap T^{-2s}A\cap\cdots\cap T^{-(k-1)s}A\neq\emptyset$

Now this expression starts to look a lot like certain expressions one encounters in ergodic theory. In ergodic theory though one takes the measure of that intersection. Indeed Szemerédi’s theorem follows if one can show that for every set ${A\subset{\mathbb N}}$ with ${d^*(A)>0}$ and every ${k\in{\mathbb N}}$ there exists some ${s\in{\mathbb N}}$ such that

$\displaystyle d^*(A\cap T^{-s}A\cap T^{-2s}A\cap\cdots\cap T^{-(k-1)s}A)>0$

Again drawing the analogy with ergodic theory, to show that this is positive for some ${s\in{\mathbb N}}$, we can show that the average over all ${s\in{\mathbb N}}$ is positive. Here “average” is in the following sense:

$\displaystyle \limsup_{N\rightarrow\infty}\frac1N\sum_{s=1}^Nd^*(A\cap T^{-s}A\cap T^{-2s}A\cap\cdots\cap T^{-(k-1)s}A)>0 \ \ \ \ \ (2)$

From the above discussion it should be clear that Szemerédi’s theorem follows from the claim that if ${d^*(A)>0}$, then we have (2). Although the upper density ${\bar d}$, or even the upper Banach density ${d^*}$ are quite good notions of largeness, they are not countably additive (each singleton ${\{n\}}$ has ${0}$ density, but the countable union of all of them has density ${1}$), hence it is not a measure and many of the tools of measure theory can not be applied to the space ${({\mathbb N},d^*)}$ such as the Dominated Convergence Theorem. To deal with this problem, Furstenberg created a correspondence principle, which allows one to solve problems in ${{\mathbb N}}$ (or indeed quite more general settings) by looking at measure spaces. We will now formulate the version of Furstenberg’s correspondence principle which we will use. Recall that a probability measure preserving system is a quadruple ${(X,{\cal B},\mu,T)}$ where ${{\cal B}}$ is a ${\sigma}$-algebra over ${X}$, ${\mu}$ is a measure defined on ${{\cal B}}$ and ${T:X\rightarrow X}$ is a ${{\cal B}}$-measurable map which preserves ${\mu}$, in the sense that ${\mu(T^{-1}B)=\mu(B)}$ for every ${B\in{\cal B}}$.

Theorem 6 (Furstenberg’s correspondence principle) Let ${E\subset {\mathbb Z}}$. There exists a probability measure preserving system ${(X,{\cal B}, \mu,T)}$ and a set ${A\in {\cal B}}$ such that ${\mu(A)=d^*(E)}$ and for any ${n}$ and ${k}$,

$\displaystyle d^*(E\cap E-n\cap E-2n\cap...\cap E-kn)\geq \mu(A\cap T^{-n}A\cap...\cap T^{-nk}A) \ \ \ \ \ (3)$

As it becomes clear from the proof, one can take ${X}$ to be a compact metric space (with ${{\cal B}}$ the Borel ${\sigma}$-algebra) and ${T}$ an homeomorphism, but we will not need these facts. The proof of (a more general version of) this theorem can be found in my previous post.

Applying the correspondence principle, Szemerédi’s theorem reduces to:

Theorem 7 (Multiple recurrence theorem) Let ${(X,{\cal B},\mu,T)}$ be a probability measure preserving system. Then for all ${f\in L^\infty(\cal B)}$ with ${f\geq0}$ and ${\int_Xfd\mu>0}$, and all ${k\in {\mathbb N}}$ there exists some ${n\in{\mathbb N}}$ such that:

$\displaystyle \int_Xf(x)f(T^nx)...f(T^{nk}x)d\mu(x)>0$

One can easily verify that Theorems 6 and 7 together imply Theorem 2. Indeed given ${E}$ with positive density and ${k\in{\mathbb N}}$, using the correspondence principle we find some probability preserving system ${(X,{\cal B},\mu,T)}$ and some subset ${A\in {\cal B}}$ satisfying equation (3). Then using the multiple recurrence on that system with ${f=1_A}$ we find ${n\in{\mathbb N}}$ such that

$\displaystyle 0<\int_X1_A(x)1_A(T^nx)...1_A(T^{nk}x)d\mu(x)=\mu(A\cap T^{-n}A\cap\cdots\cap T^{-kn}A)$

Next we use equation (3) to get

$\displaystyle \bar d(E\cap E-n\cap E-2n\cap...\cap E-kn)\geq \mu(A\cap T^{-n}A\cap...\cap T^{-nk}A)>0$

In particular that intersection is non-empty. Let ${a\in E\cap (E-n)\cap (E-2n)\cap...\cap (E-kn)}$. We have then ${a,a+n,a+2n,...,a+kn\in E}$ and thus ${E}$ contains an arithmetic progression of the arbitrary length ${k+1}$.

The main effort of the following posts in this sequence will be to prove Theorem 7. By the discussion above, this will be a proof of Szemerédi’s theorem. I will finish this post going in the opposite direction, and showing how one can use the combinatorial version (Theorem 2) of Szemerédi’s theorem to deduce the dynamical formulation (Theorem 7).

Proof: Let ${(X,{\cal B},\mu,T)}$ be a probability measure preserving system, let ${f\in L^\infty(\cal B)}$ with ${f\geq0}$ and ${\int_Xfd\mu>0}$, and let ${k\in {\mathbb N}}$. Let

$\displaystyle B=\left\{x\in X:f(x)>\frac12\int_X fd\mu\right\}$

Clearly ${\mu(B)>0}$. For each ${n\in{\mathbb N}}$ defined the function ${f_n}$ which measures the proportion of the points in the ${n}$-th orbit of a point which are inside ${B}$:

$\displaystyle f_n(x)=\frac{|\{i\in[n]:T^ix\in B\}|}n=\frac1n\sum_{i=1}^n1_B(T^ix)=\frac1n\sum_{i=1}^n1_{T^{-i}B}(x)$

Integrating we get

$\displaystyle \int_Xf_nd\mu=\int_X\frac1n\sum_{i=1}^n1_{T^{-i}B}d\mu=\frac1n\sum_{i=1}^n\mu(T^{-i}B)=\mu(B)$

By Fatou’s lemma we obtain

$\displaystyle 0<\mu(B)=\liminf_{n\rightarrow\infty}\int_Xf_nd\mu\leq\int_X\liminf_{n\rightarrow\infty}f_nd\mu$

Let ${g=\liminf_{n\rightarrow\infty}f_n}$ and define the set ${C:=\{x\in X:g(x)>0\}}$, which must have positive measure. For each ${x\in C}$, let ${A(x)=\{i\in{\mathbb N}:T^ix\in B\}}$. Since ${x\in C}$ we have

$\displaystyle \begin{array}{rcl} 0&<&\displaystyle g(x)=\liminf_{n\rightarrow\infty}f_n(x)=\liminf_{n\rightarrow\infty}\frac{|\{i\in[n]:T^ix\in B\}|}n\\&\leq&\displaystyle \limsup_{n\rightarrow\infty}\frac{|\{i\in[n]:T^ix\in B\}|}n=\bar d\big(A(x)\big)\end{array}$

We can now apply Theorem 2 to find an arithmetic progression of length ${k}$ inside ${A(x)}$, say ${a(x)+js(x)\in A(x)}$ for all ${i=0,\dots,k}$. Now the pair ${(a(x),s(x))}$ takes values on the countable set ${{\mathbb N}^2}$. Therefore, we have a partition of ${C}$ into countably many pieces, depending on ${a(x)}$ and ${s(x)}$. Hence one of those pieces will still have positive measure.

There is a subtle detail here, regarding measurability of the pieces. However, note that for any fixed pair ${(a,s)\in{\mathbb N}^2}$, we can describe the set

$\displaystyle \big\{x\in C:\{a+js:j=0,\dots,k\}\subset A(x)\big\}=C\cap \bigcap_{j=0}^kT^{-js}\big(T^{-a} B)$

Thus by choosing, for each ${x\in C}$, the smallest pair ${(a,s)}$ (on some ordering of ${{\mathbb N}^2}$) which works for ${x}$ we can assure that the pieces are measurable.

Let ${D\subset C}$ be such that ${\mu(D)>0}$ and ${(a(x),s(x))}$ do not depend on ${x\in D}$. Let ${a=a(x)}$ and ${s=s(x)}$ for some (hence all) ${x\in D}$. Then we have that

$\displaystyle \begin{array}{rcl} 0&<&\displaystyle \mu(D)=\mu\left(C\cap T^{-a}B\cap T^{-a-s}B\cap T^{-a-2s}B\cap\cdots\cap T^{-a-ks}B\right)\\&\leq&\displaystyle \mu\left(T^{-a}\big(B\cap T^{-s}B\cap T^{-2s}B\cap\cdots\cap T^{-ks}B\big)\right)\\&=&\displaystyle \mu\left(B\cap T^{-s}B\cap T^{-2s}B\cap\cdots\cap T^{-ks}B\right)=\int_X1_B(x)\cdot1_B(T^sx)\cdots1_B(T^{ks}x)d\mu\end{array}$

Finally note that ${1_B\leq 2f/\int_Xfd\mu}$. We conclude that

$\displaystyle 0<\int_X1_B(x)\cdot1_B(T^sx)\cdots1_B(T^{ks}x)d\mu\leq\left(\frac2{\int_Xfd\mu}\right)^{k+1}\int_Xf(x)\cdot f(T^sx)\cdots f(T^{ks}x)d\mu$

$\Box$