A topological proof of Van der Waerden’s Theorem

1. Introduction

The van der Waerden’s theorem (vdW for short) was one of the first theorems to be proved in the branch of mathematics called Ramsey theory. I now state it in one of its many equivalent formulations:

Theorem 1 (van der Waerden) Let {r} be a positive integer. Consider a partition (or coloring) of the integers {{\mathbb Z}=C_1\cup C_2\cup...\cup C_r} in finitely many colors. Then there is some color {C_i} that contains arbitrarily long (but finite) arithmetic progressions.

An arithmetic progression is a set of the form {\{a+li, i=0,1,...,k\}}. It is easy to construct an example of a partition of {{\mathbb Z}} into two subsets (or colors) such that no one contains an infinite arithmetic progression. For instance if {\{x_n\}} is any increasing sequence such that {x_{n+1}-x_n} is also strictly increasing, than coloring the intervals {[\pm x_{2n};\pm x_{2n+1}]} with one color and the intervals {[\pm x_{2n-1};\pm x_{2n}]} with the other color, there is no monochromatic infinite arithmetic progression, although both colors have arbitrarily long arithmetic progressions.

Van der Waerden first proof of this theorem in 1927 used only combinatorial methods. Since then, other combinatorial proofs have been found. In this post I will prove vdW with methods from topological dynamics, to illustrate how one can use somewhat heavy machinery to prove combinatorial statements about integers. The approach here mainly follows the one in Furstenberg’s book Recurrence in Ergodic Theory and Combinatorial Number Theory

2. Compactness and vdW numbers

As said before, there are many equivalent formulations of vdW theorem. I present now a stronger looking version and prove the equivalence between this and the previous formulation. This section is independent of the rest of the post.

Theorem 2 (VdW Again) Let {r,k\in {\mathbb N}}. Then there is some {w=w(r,k)} such that if {n>w} and {[-n,n]} is colored with {r} colors then one of the colors contain an arithmetic progression of length {k}.

This is called the finitistic version of vdW. It is an easy exercise to deduce Theorem 1 from Theorem 2. We now present a standard compactness argument, explored in great detail by Terence Tao in his blog, to derive the other implication.

While solving mathematical olympiad problems in combinatorics I got used to see them solved by “considering” some set, usually to double count its cardinality or to create some bijection/injection. In the same spirit, in this case, thinking about the set of all colorings of {{\mathbb N}} is the main step to solve this question (this shouldn’t be seen as a magic step though. Since we want to prove that all colorings have some property it is natural that we consider the set of all colorings. However one may be tempted to try to think about van der Waerden theorem by fixing the coloring since that’s the way the theorem is formulated, so considering this set is a natural but non-trivial step). It is now important to look at this set, let’s call it {C}, from another angle. A coloring of {{\mathbb N}} can be viewed as a function that to each {n\in{\mathbb N}} assigns its color. Thus an element of {C} is a function from {{\mathbb N}} to {\{1,...,r\}}, or as a point in the space {\{1,...,r\}^{\mathbb N}}.

A second step, maybe not so natural but easier to come up with, is to endow {C} with a topology. After all a set with no structure isn’t that useful. We give {\{1,...,r\}} the discrete topology and in {C} we consider the product topology which makes {C} into a metrizable space. A key observation now is that, by Tychonoff’s theorem, {C} is a compact set.

After this set up let’s now prove Theorem 1 using Theorem 2: suppose Theorem 1 isn’t true. Then there is some sequence {\{n_j\}_j} going to infinity and such that there is some coloring {f_j} of {\{1,2,...,n_j\}} in {r} colors with no monochromatic arithmetic progression. Extend the colorings (functions) {f_j} to all {{\mathbb N}} arbitrarily. We have now a sequence {\{f_j\}_j} in our compact metric space {C}, so we can extract a convergent subsequence. Let {f} be the limit of that subsequence. Then, by Theorem 2 we have a monochromatic arithmetic progression of length {k}, say {P}, in {f}. On other words, {f|_P} is constant. We note that, in the topology on {C}, convergence is the pointwise convergence, and since {\{1,...,r\}} has the discrete topology this means that there is some arbitrarily large {j} such that {f_j|_P} is also constant. But when {P\subset \{1,...,N_j\}} (which happens for sufficiently large {j}) then {f_j|_{\{1,...,N_j\}}} has the same monochromatic arithmetic progression, contradicting the construction of {f_j}. This contradiction proves Theorem 1.

With this finitistic formulation, a question arises, namely how does the function {w} depends on {r} and {k}. The proof of vdW presented below gives no answer to this question but finding tight bounds for {w(r,k)} is famous open question. The best bounds known were found by Gowers in his proof of Szemerédi’s theorem:

\displaystyle W(r,k)\leq 2^{\displaystyle2^{r^{2^{2^{k+9}}}}}

This upper bounds are still far from the best know lower bounds.

3. Furstenberg-Weiss multiple recurrence theorem

In this section we will build the tools we need to prove vdW theorem. Let us first introduce some notation. A topological dynamical system is a pair {(X,T)} where {X} is a topological space and {T:X\rightarrow X} is a continuous functions. A good example is {T={\mathbb R}} and {T(x)=x+1} or any topological group with {T} being the multiplication by some fixed element. For a point in {X} we define its orbit as {O(x):=\{T^nx:n\in {\mathbb N}\}}. In our first example we have {O(1)={\mathbb N}}.

A point {x\in X} is said to be recurrent if {x\in \overline{O(x)}}, in other words if there is some sequence of positive integers {n_k\rightarrow \infty} such that {T^{n_k}x\rightarrow x}. If we have several continuous maps {T_1,T_2,...,T_k:X\rightarrow X} we say that {x} is multiply recurrent if there is some sequence {n_k\rightarrow \infty} such that for each {i=1,...,k} we have {T_i^{n_k}x\rightarrow x}. We will see that vdW is about existence of certain multiply recurrent point in a certain topological space. This topological statement, due to Furstenberg and Weiss, is a generalization of a theorem of Birkoff, curiously first proved about the same time van der Waerden proved his theorem on arithmetic progressions.

Theorem 3 (Furstenberg-Weiss Multiple  Recurrence Theorem) Let {X} be a compact metric space and let {T_1,...,T_k} be continuous pairwise commutative maps from {X} to itself. Then there is a multiply recurrent point.

The proof is somewhat lengthy and technical. I present it here in full detain primarily for my own benefit, so the reader is encourage to skip it on a first reading.

Proof: We will split the proof into several parts. First we prove that we can reduce the statement to the case of invertible systems, i.e. we can assume that each {T_i} is not only continuous but is actually a homeomorphism. We also reduce the theorem to the case of a minimal system. Then the argument follows by induction on {k}. The case {k=1} follows then easily, and for {k\geq 2} we reduce the statement to find a recurrent point in the diagonal on {X^k}. Using the induction hypothesis we get that the diagonal is a homogeneous and recurrent subset (this terms will be made precise later) and then we use several technical steps to eventually find a recurrent point.

  1. Reduction to an Invertible System.
    We will use the standard construction to replace {X} by {\Omega:=X^{{\mathbb Z}^k}}, the set of functions from {{\mathbb Z}^k} to {X} with the product topology. For each {i=1,...,k} we define \displaystyle S_i(\omega)(n_1,...,n_i,...,n_k)=\omega(n_1,...,n_i+1,...,n_k)Now let {\tilde X:=\{\omega\in \Omega: S_i\omega=T_i\circ \omega\ \forall 1\leq i\leq k\}}. To prove that this set is non-empty, we use the compactness of {\Omega} (given again by Tychonoff Theorem) to find a limit point of the sequence {\{\omega_n\}} defined by: \displaystyle \omega_n(n_1,...,n_k)=\left\{\begin{array}{ll}T_1^{n_1+n}...T_k^{n_k+n}x,&n_i+n\geq0\\ x,& \text{ otherwise}\end{array}\right.where {x\in X} is arbitrary. We have that {S_i\omega_n(n_1,...,n_k)=T_i\circ\omega_n(n_1,...,n_k)} if {n_1,...,n_k\geq n}. Thus a limit point of the sequence is in {\tilde X}.It is also clear that {\tilde X} is {S_i} in variant. We now verify that if {\omega_0} is a multiple recurrent point in {\tilde X} for all {S_i}, then {x=\omega_0(0,...,0)\in X} is multiple recurrent for all {T_i}. Let the sequence {\{n_j\}_j} be such that {S_i^{n_k}\omega_0\rightarrow \omega_0} and {U\subset X} an open neighborhood of {x}. Then {V:=\{\omega\in \tilde X:\omega(0,...,0)\in U\}} is a neighborhood of {\omega} in {\tilde X}, by the definition of product topology. Since {\omega_0} is multiple recurrent we will have for all {j} large enough and all {1\leq i\leq k} that {S_i^{n_j}\omega\in V}, so {T^{n_j}_ix=T_i^{n_j}(\omega(0,...,0))=S_i^{n_j}\omega(0,...,0)\in U} and so {x} is indeed multiple recurrent for all {T_i}.
  2. Reduction to a Minimal system.
    From now on we assume that all {T_i} are homeomorphisms of {X}. Now consider the family {A} of all closed invariant sets, i.e. {S\in A} if {S} is closed and {T_iS\subset S} for all {1\leq i\leq k}. Note that {A} can be partially ordered by inclusion and for a totally ordered family in {A}, the intersection is still in {A}. Applying Zorn’s lemma (ultimately this could be avoided, but to do so one needs to introduce a measure on {X} so I will use this method) we can find a minimal element {S\in A}. If we find a multiple recurrent point in {S} for all {T_i}, then clearly that point is also multiple recurrent on {X}. Thus we can assume from now on that {X} is minimal, i.e. it contains no closed invariant subsets.
  3. The Induction process.
    We use induction on {k}. For {k=1}, if {x\in X}, we have for its orbit closure, {\overline{O(x)}} is a closed {T}-invariant subset of {X}, so by minimality of {X} we have {\overline{O(x)}=X} and so {x} is recurrent for {T}.If {k\geq 2}, construct {\Delta\subset X^k} as the diagonal, i.e. {\Delta:\{(x,...,x)\in X^k: x\in X\}}. We note that {\Delta} is compact, since it is the diagonal of a Hausdorff space. Let {T:=T_1\times...\times T_k:X^k\rightarrow X^k}, and note that {T:\Delta\rightarrow \Delta}. Note that if {x_0=(x,...,x)\in \Delta} is a recurrent point for {T}, then {x\in X} is a multiple recurrent point for {T_1,...,T_k}.Define, for {i=1,...,k-1} the homeomorphism {R_i=T_iT_k^{-1}} and, using the induction hypothesis, let {y\in X} be a multiple recurrent point for all {R_i}, {1\leq i\leq k-1}. Also let {\{n_j\}_j} be a sequence such that {R_i^{n_j}y\rightarrow y} as {j\rightarrow \infty} for all {i=1,...,k-1}. Let {y_0=(y,...,y)\in \Delta} and {y_j=(T_k^{-n_j}y,...,T_k^{-n_j}y)} for {j\geq1}. Then for all {\delta>0}, if {j} is sufficiently large we have {d(T^{n_j}y_j,y_0)<\delta}. It is here that the commutative hypothesis on the {T_i} is used.
  4. {\Delta} is Homogeneous and Recurrent.
    A subset {A} of a compact metric space {X} is called homogeneous for a continuous function {T:X\rightarrow X} if it is the minimal invariant set of some group of transformations commuting with {T}; and it is called recurrent if {\forall z\in A, \epsilon>0\ \exists y\in A, n\in {\mathbb N}} such that {d(T^ny,z)<\epsilon}.{\Delta} is homogeneous in {X^k} with respect to {T} because it is a minimal invariant set for the continuous functions {\tilde{T_i}} defined by {\tilde{T_i}(x_1,...,x_k)=(T_ix_1,...,T_ix_k)}. Indeed {\Delta} is clearly {\tilde{T_i}} invariant and if {A\subset \Delta} is invariant for all {\tilde{T_i}}, by the minimality of {X}, we have {\Delta\subset A}, so {\Delta=A} and {\Delta} is minimal. Also since the {T_i} commute, also {\tilde{T_i}} commute with {T}.To prove that {\Delta} is recurrent, let {z_0=(z,...,z)\in \Delta} and {\epsilon>0}. Let {\{V_l\}_l} be a finite covering of {\Delta} by ball of diameter {\epsilon}. Let {G} be the group generated by {\tilde{T_i}}, so {G} acts on {X^k} and fixes {\Delta} but no closed proper subset of {\Delta}. Therefore there is a finite subset {G_0\subset G} such that {\{gV_l\}_{g\in G_0}} covers {\Delta} for each {l} (note that there are only finitely many such {l}. Therefore we can find {\delta>0} such that for {a,b\in \Delta}, {d(a,b)<\delta\Rightarrow d(ga,gb)<\epsilon} for all {g\in G_0}.We can now find {g\in G_0} such that {d(gy_0,z_0)<\epsilon}. Also for {j} large enough we have that {d(T^{n_j}y_j,y_0)<\delta}, so {d(T^{n_j}gy_j,gy_0)<\epsilon} (remember that {g} commutes with {T}). Putting both together we get {d(z_0,T^{n_j}gy_j)<2\epsilon}. From this we conclude that {\Delta} is recurrent.
  5. For each {\epsilon>0} there is some {w\in \Delta, m\in {\mathbb N}} such that {d(T^mw,w)<\epsilon}.
    Let {\epsilon_1:=\epsilon/2}. Let {z_0\in \Delta}. For each {l\in {\mathbb N}} find {z_l\in \Delta} and {n_l\in {\mathbb N}} such that {d(T^{n_l}z_l,z_{l-1})<\epsilon_l} and then {0<\epsilon_{l+1}\leq \epsilon_1} such that {d(a,z_l)<\epsilon_{l+1}\Rightarrow d(T^{n_l}a,z_{l-1})<\epsilon_l}. Such {z_l, n_l} can be found because {\Delta} is recurrent and the {\epsilon_l} exist because {T} is continuous.Let {l<j} be such that {d(z_l,z_j)<\epsilon_1} (they exist by compactness). Then clearly {d(z_j,z_j)<\epsilon_{j+1}} so {d(T^{n_j}z_j,z_{j-1})<\epsilon_j} and continuing with this we get {d(T^{n_{l+1}+...+n_{j-1}+n_j}z_j,z_l)<\epsilon_{l+1}\leq \epsilon_1}. Finally this implies that {d(T^{n_{l+1}+...+n_{j-1}+n_j}z_j,z_j)<\epsilon_1+\epsilon_1=\epsilon}.
  6. {\Delta} has a Recurrent Point.
    Let {F:\Delta\rightarrow {\mathbb R}^+} be defined by {F(x)=\displaystyle \inf_{n\geq1} d(T^nx,x)}. By the last point we have that {F} gets arbitrarily close to {0}. Also it is easy to see that {F} is upper semi-continuous, so there is a point of continuity (actually the points of discontinuity form a meager set which, by the Baire Category Theorem, can’t be the whole space).Let {x\in \Delta} be a point where {F} is continuous. If {F(x)=0} we are done (as {x} is our recurrent point in {\Delta}). Otherwise there is some neighborhood of {x}, say {U}, where {F(z)> F(x)/2} for all {z\in U}. But then there is some finite subset {G_0\subset G} such that {\Delta\subset\bigcup_{g\in G_0} g^{-1}U}. Let {\eta>0} be such that {d(a,b)<\eta\Rightarrow d(ga,gb)<F(x)/2}, for all {a,b\in \Delta} and {g\in G_0}.

    Let {w\in \Delta} be such that {F(w)<\eta}. Let {g\in G_0} be such that {w\in g^{-1}U}, so that {gw\in U}. Then for some {n} we have {d(T^nw,w)<\eta} and so {d(T^ngw,gw)<F(x)/2} which implies that {F(w)<F(x)/2}, but this contradicts the fact that {gw\in U}, and this contradiction means that actually {F(x)=0} and {x} is our recurrent point for {\Delta} which can be converted as referred to a multiple recurrent point for {T_1,...,T_k}. \Box

4. Deduction of Theorem 1 from Theorem 3

Now that we have the right tools from topological dynamics we have to reduce vdW theorem to a statement in that language. Let’s fix {r\in {\mathbb N}} the number of colors. Now consider {\Omega:=[r]^{\mathbb N}} the space of all functions from {{\mathbb N}} to {[r]:=\{1,2,...,r\}}, i.e. the space of all colorings of {{\mathbb N}} by {r} colors. In {\Omega} we can define a distance:

\displaystyle d(x,y)=(\min\{i:x(i)\neq y(i)\})^{-1}

It is easy to verify that {d} is indeed a metric on {\Omega}. To illustrate the behavior of this metric we note that for {x,y\in \Omega} we have {x(1)=y(1)\iff d(x,y)<1}. Lets now consider a ball {B(x,\epsilon)} in that metric:

\displaystyle y\in B(x,\epsilon)\iff \min\{i:x(i)\neq y(i)\}>\frac1\epsilon\iff \forall i<\frac1\epsilon, x(i)=y(i)

Therefore the topology in {\Omega} induced by {d} is the product topology arising from the discrete topology in {[r]}. We can apply Tychonoff Theorem to conclude that {\Omega} is a compact space.

We now have that two colorings are close to each other if they start the same way. However, arithmetic progressions are not about the beginning of coloring, but about repetition of the same color along some constant jumps or period. A crucial idea here is to introduce the shift map {T:\Omega\rightarrow \Omega} defined by {(Tx)(i)=x(i+1)}. Notice for instance that if {T^nx=x} for some {n}, then {x} is periodic, and we have actually infinite monochromatic arithmetic progressions. Even if we only know that {T^nx} and {x} are close, then they have the same beginning, and so {x(1)=(T^nx)(1)=x(n+1)}. Notice also that {T} is continuous, since if {d(x,y)<\frac1n} then the first {n} values of {x} and {y} coincide, so {d(Tx,Ty)<\frac1{n-1}}.

Now suppose a coloring {x\in \Omega} has a monochromatic arithmetic progression of length {k}, say {\{1+a+is, i=0,...,k-1\}} (the {1} is just there for notational convenience). Then {(T^ax)(1)=(T^{a+s}x)(1)=...=(T^{a+(k-1)s})(1)}, or in other words, the set {\{T^{a+is}x,i=0,...,k-1\}} has diameter less than {1}. This is actually equivalent to {\{1+a+is, i=0,...,k-1\}} be monochromatic, and in fact we have the following statement:

Theorem 4 (Topological vdW) Let {X} be a compact metric space and {T:X\rightarrow X} a continuous function. Then for each {k\in {\mathbb N}} and {\epsilon>0} there is some {x\in X} and {s\in {\mathbb N}} such that diam{\{x,T^sx,...,T^{(k-1)s}x\}<\epsilon}.

We first deduce Theorem 1 from Theorem 4:

Proof: } Let {x\in \Omega} and consider its orbit closure {X=\overline{\{T^nx,n\in {\mathbb N}_0\}}}. From Theorem 4 there is some {y\in X} and {s\in {\mathbb N}} such that diam{\{y,T^sy,...,T^{(k-1)s}y\}<1}, which is equivalent to {y(1)=(T^sy)(1)=...=(T^{(k-1)s}y)(1)}. Since {y\in X} we can find {a\in {\mathbb N}} such that {d(T^ax,y)<((k-1)s)^{-1}}. Then the first {(k-1)s} values of {y} are the same as those of {T^ax}, so also {(T^ax)(1)=(T^{a+s}x)(1)=...=(T^{a+(k-1)s})(1)} and as seen above this is equivalent to say that {x} has a monochromatic arithmetic progression. \Box

We have reduced vdW Theorem to a topological statement and we will now see how to prove Theorem 4 from Furstenberg-Weiss Theorem 3.

Proof: Let {T_i:=T^i} for {i=1,...,k}. This functions clearly commute. From Theorem 3 we can find a multiple recurrent point {y} for {T_1,...,T_k}. Let {s} be such that {d(T_i^sy,y)<\epsilon} for all {i=1,...,k}. Then {x:=T^sy} satisfies the conditions. \Box

This finishes the proof of van der Waerden theorem on arithmetic progressions.

Advertisements

About Joel Moreira

PhD Student at OSU in Mathematics. I'm portuguese.
This entry was posted in Ramsey Theory, Topological Dynamics. Bookmark the permalink.

13 Responses to A topological proof of Van der Waerden’s Theorem

  1. JCummings says:

    Hi Joel. My name is Jay Cummings, I am a math grad student at UCSD. A few friends and I maintain a math blog at http://whateversuitsyourboat.wordpress.com/. I’ve been writing on Ramsey theory and was wondering if it would be ok to post a link to this page as a nice alternative proof to this theorem. Is that ok?

    • Joel Moreira says:

      Dear JCummings

      Thank you for your interest on this proof. It’s definitely ok to link to this page. If you post a combinatorial proof of van der Waerden’s theorem in your blog I would also be happy to have a link to that post here.

    • JCummings says:

      Awesome, thanks man! Absolutely. I will hopefully post a combinatorial proof of it today. But possibly tomorrow.

  2. Pingback: RT Part 7; Van Der Waerden’s Theorem | whateversuitsyourboat

  3. Pingback: YAMB

  4. Pingback: Hales-Jewett Theorem | YAMB

  5. Pingback: Double van der Waerden | I Can't Believe It's Not Random!

  6. Pingback: Arithmetic progressions and the affine semigroup | I Can't Believe It's Not Random!

  7. Pingback: Szemerédi’s Theorem Part I – Equivalent formulations | I Can't Believe It's Not Random!

  8. Paolo Rampazzo says:

    Hi, I would like to ask if you could me send an email so that I can tell you what is my doubt, since it’s very hard to explain it in a few words here. Thanks !

  9. Pingback: Pomerance Theorem on colinear points in certain paths in a two dimensional lattice | I Can't Believe It's Not Random!

  10. Pingback: Szemerédi’s Theorem Part V – Compact extensions | I Can't Believe It's Not Random!

  11. Pingback: An arithmetic van der Corput trick and the polynomial van der Waerden theorem | I Can't Believe It's Not Random!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s