Affine Images of Infinite sets

— 1. Szemerédi’s theorem as affine images —

Szemerédi’s theorem is usually stated as “every set {A\subset{\mathbb N}} with positive upper density contains arbitrarily long arithmetic progressions”, but it can also be formulated without explicit mention of arithmetic progressions:
Theorem 1 (Szemerédi’s theorem, reformulated) Let {A\subset{\mathbb N}} have positive upper density, i.e.

\displaystyle \bar d(A):=\limsup_{N\rightarrow\infty}\frac{|A\cap\{1,\dots,N\}|}N>0.

Then for any finite set {F\subset{\mathbb N}} there is an affine transformation {\phi:x\mapsto ax+b} with coefficients {a,b\in{\mathbb N}} such that {\phi(F)\subset A}.

To see how this result implies Szemerédi’s theorem, note that for any affine transformation {\phi}, the set {\phi(\{1,2,\dots,k\})} is a {k}-term arithmetic progression.

Conversely, assuming Szemerédi’s theorem, given {A\subset{\mathbb N}} with {\bar d(A)>0} and a finite set {F\subset{\mathbb N}} let {k=\max F}, so that {F\subset\{1,\dots,k\}}. Then there is a arithmetic progression of length {k+1} contained in {A}, say {\{b,b+a,\dots,b+ka\}\subset A}. Letting {\phi(x)=ax+b} it follows that {\phi(i)\in A} for every {i\leq k} and in particular {\phi(F)\subset A}.

The advantage of formulating Szemerédi’s theorem in this form is that it readily allows for analogues in other settings. For instance, the multidimensional Szemerédi theorem of Furstenberg and Katznelson can be formulated as follows.

Theorem 2 (Multidimensional Szemerédi theorem, reformulated) Let {d\in{\mathbb N}} and let {A\subset{\mathbb N}^d} have positive upper density, i.e.

\displaystyle \bar d(A):=\limsup_{N\rightarrow\infty}\frac{|A\cap\{1,\dots,N\}^d|}{N^d}>0.

Then for any finite set {F\subset{\mathbb N}^d} there is an affine transformation {\phi:x\mapsto ax+b} with coefficients {a\in{\mathbb N}}, {b\in{\mathbb N}^d} such that {\phi(F)\subset A}.

Continue reading

Posted in Analysis, Combinatorics, Ramsey Theory | Tagged , , , | Leave a comment

Additive transversality of fractal sets in the reals and the integers

Daniel Glasscock, Florian Richter and I have just uploaded to the arXiv our new paper entitled “Additive transversality of fractal sets in the reals and the integers”. In this paper we introduce a new notion of fractal sets of natural numbers in analogy with {\times r}-invariant subsets of {{\mathbb T}={\mathbb R}/{\mathbb Z}}, and obtain the analogues of several results pertaining {\times r}-invariant subsets of {{\mathbb T}={\mathbb R}/{\mathbb Z}}.

The prototypical example of the kind of fractal subsets of {{\mathbb N}_0=\{0,1,2,\dots\}} that we deal with is the integer middle thirds Cantor set consisting of all {n\in{\mathbb N}_0} that when represented in base {3} use only the digits {0} and {2}. Symbolically

\displaystyle C_{\mathbb Z}:=\left\{\sum_{i=0}^na_i3^i:n\in{\mathbb N}_0, (a_0,\dots,a_n)\in\{0,2\}^{n+1}\right\}.

For comparison, the classical middle thirds Cantor set consists of all {x\in[0,1]} that when written in base {3} use only the digits {0} and {2}, and can be described symbolically as

\displaystyle C_{\mathbb R}:=\left\{\sum_{i=1}^na_i3^{-i}:n\in{\mathbb N}_0, (a_1,\dots,a_n)\in\{0,2\}^{n+1}\right\}.

Continue reading

Posted in Combinatorics, Number Theory, paper | Tagged , , , , , , , , , , , | Leave a comment

Multiple ergodic averages along functions from a Hardy field: convergence, recurrence and combinatorial applications

Vitaly Bergelson, Florian Richter and I have recently uploaded to the arXiv our new paper entitled “Multiple ergodic averages along functions from a Hardy field: convergence, recurrence and combinatorial applications”. We establish a new multiple recurrence result (and consequentially a new generalization of Szemerédi’s theorem) which contains several previously known results as special cases and many new results.

An example of a result that follows as a special case from our main theorem but was currently unknown is that for any positive real numbers {a,b>0}, any set {E\subset{\mathbb N}} with positive upper density, i.e.

\displaystyle \bar d(E):=\limsup_{N\rightarrow\infty}\frac{|E\cap\{1,\dots,N\}|}N>0,

contains a configuration {\{x,x+\lfloor n^a\rfloor,x_\lfloor n^b\rfloor\}} for some {x,n\in{\mathbb N}}. Here, as usual, we denote by {\lfloor\cdot\rfloor} the floor function. This result was known when both {a} and {b} are integers, and when both {a} and {b} are non-integers, but the mixed case was, somewhat surprisingly, still unknown. Below the fold I briefly mention the history of related results and present the main steps in the proof, highlighting the main novelties.

Continue reading

Posted in Combinatorics, Ergodic Theory, paper, Ramsey Theory | Tagged , , , | Leave a comment

Distal systems and Expansive systems

A topological dynamical system is a pair {(X,T)} where {X} is a compact metric space and {T:X\rightarrow X} is a continuous map. This gives rise to an action of the semigroup {({\mathbb N},+)} on {X}, where {n\in{\mathbb N}} acts via the iterated map {T^n=T\circ T^{n-1}} (and as usual {T^0} denotes the identity map). If {T} is invertible (i.e. a homeomorphism), then in fact it induces a {({\mathbb Z},+)}-action. More generally, one may consider an action {T=(T_g)_{g\in G}} of any semigroup {G} on {X} by continuous functions. This means that for each {g\in G} we have a continuous map {T_g:X\rightarrow X} which satisfy the semigroup law {T_g\circ T_h=T_{gh}}. In this case we say that {(X,T)} is a {G}-system. Throughout this post I will denote by {d_X} (or {d} if the underlying space is clear) a compatible metric on {X}.

Two important classes of topological dynamical systems are the class of distal systems and the class of expansive systems.

Definition 1

  • A {G}-system {(X,T)} is distal if

    \displaystyle \forall x,y\in X, \ x\neq y,\ \inf\big\{d(T_g x,T_g y):g\in G\big\}>0.

  • A {G}-system {(X,T)} is expansive if there exists {\delta>0} such that

    \displaystyle \forall x,y\in X, \ x\neq y,\ \sup\big\{d(T_g x,T_g y):g\in G\big\}>\delta.

While not entirely trivial, both properties are preserved by topological conjugacy (i.e., isomorphism in the category of {G}-systems), and in particular don’t depend on the choice of the (compatible) metric {d}.

At a first glance at the definition it looks like the two conditions are quite similar. They both fall into the loose statement that “if two points are distinct, then they are far apart in the future” (at least if the acting semigroup is {{\mathbb N}}). However, it turns out that the two properties are very much different, and in some sense actually incompatible.

In this post I will mention some examples and properties of distal and expansive systems which illustrate this difference between the two classes, and present a proof of the incompatibility result:

Theorem 2 An {{\mathbb Z}}-system which is both distal and expansive must be finite. The same is true for {{\mathbb N}}-systems.

It is clear that any finite system (i.e., where {X} is a finite set) is both distal and expansive.

Continue reading

Posted in Classic results, Topological Dynamics | Tagged , , | 2 Comments

Structure of multicorrelation sequences with integer part polynomial iterates along primes

Andreas Koutsogiannis, Anh Le, Florian Richter and I have just uploaded to the arXiv a new paper entitled “Structure of multicorrelation sequences with integer part polynomial iterates along primes”. This short paper establishes a natural common extension of our earlier result with Le and Richter and a theorem of Koutsogiannis. I decided this was a good opportunity to write more generally about multicorrelation sequences and their relation to nilsequences.

— 1. Single transformation —

Given an invertible measure preserving system {(X,\mu,T)} and bounded functions {f_0,\dots,f_k\in L^\infty(X)} one can form the multicorrelation sequence

\displaystyle  \alpha(n)=\int_Xf_0\cdot T^n f_1\cdots T^{kn}f_k\,\mathrm{d}\mu. \ \ \ \ \ (1)

Here, as usual, we denote by {T^nf} the function {f\circ T^n}. This is a natural extension of the concept of single correlation sequences, which correspond to the case {k=1} and control, for instance, spectral and mixing properties of the system. Multicorrelation sequences (or multiple correlation sequences) appear naturally in the theory of multiple recurrence. For instance, if {f_0=f_1=\cdots=f_k=1_A} for some {A\subset X} with {\mu(A)>0}, then the fact that {\alpha(n)>0} for some {n} is the statement of Furstenberg’s famous multiple recurrence theorem, which, via the correspondence principle, is equivalent to the celebrated Szemerédi theorem on arithmetic progressions.

— 1.1. The Bergelson-Host-Kra theorem —

It turns out that, despite the large number of degrees of freedom involved, multicorrelation sequences are quite rigid. This was first captured in a beautiful result of Bergelson Host and Kra in 2005, which shows that multicorrelation sequences are always only a small perturbation away from a very structured kind of sequence they named nilsequence. Since their debut in the aforementioned paper, nilsequences have became surprisingly important in additive combinatorics and number theory, due largely to the Inverse Theorem for Gowers norms.

Definition 1 Let {G} be a nilpotent Lie group and let {\Gamma\subset G} be a discrete and co-compact subgroup. The (compact) homogeneous space {X=G/\Gamma} is called a nilmanifold and {G} acts naturally on {X} via left multiplication. Given a continuous function {F:X\rightarrow{\mathbb C}}, a point {x\in X} and {g\in G}, the sequence {\phi:{\mathbb N}\rightarrow{\mathbb C}} defined via {\phi(n)=F(g^nx)} is called a basic nilsequence. A sequence {\psi} is called a nilsequence if for every {\epsilon>0} there exists a basic nilsequence {\phi} such that {\|\phi-\psi\|_{\ell^\infty}<\epsilon}.

There are several variants of the notion of nilsequence. In this post I chose to use the definition given in the original paper of Bergelson Host and Kra (in our paper we use the more common terminology to call a nilsequence to what I defined above as a basic nilsequence). In some other variants {F} is allowed different degrees of regularity (Riemann integrability, or Lipschitz or even {C^\infty}).

It will also be convenient to use the Weyl and the Besicovitch seminorms on {\ell^\infty}, defined for {f\in\ell^\infty}, respectively, by the formulas

\displaystyle \|f\|_W:=\limsup_{N-M\rightarrow\infty}\frac1{N-M}\sum_{n=M}^N\big|f(n)\big|\qquad \|f\|_B:=\limsup_{N\rightarrow\infty}\frac1{N}\sum_{n=1}^N\big|f(n)\big|

Theorem 2 (Bergelson-Host-Kra, 2005) Let {\alpha} be a multicorrelation sequence defined by (1) where {(X,\mu,T)} is an ergodic system. Then there exists a nilsequence {\psi} such that {\|\alpha-\psi\|_W=0}.

— 1.2. Averaging along subsequences —

Several enhancements of Szemerédi’s theorem involve restrictions on the common difference of the arithmetic progression to be found inside an arbitrary set with positive density. This leads to the study of averages of multicorrelation sequences along subsequences of natural numbers. It turns out that the structure of multicorrelation sequences survives even when we restrict attention to moderately sparse subsequences. For instance, given {\alpha} defined by (1) and letting {\psi} be given by Theorem 2, we have also that

\displaystyle \lim_{N-M\rightarrow\infty}\frac1{N-M}\sum_{n=M}^N\big|\alpha(n^2)-\psi(n^2)\big|=0.

One can write the previous equation as {\|(\alpha-\psi)\circ p\|_W=0} where {p(n)=n^2}. In fact, the same is true when {p} is any polynomial with integer coefficients (see Theorem 3 below).

On a related direction, when {p(n)} is the increasing enumeration of the prime numbers, or when {p(n)=\lfloor n^c\rfloor} for an arbitrary {c\in{\mathbb R}^{>0}} (where {\lfloor\cdot\rfloor} denotes the floor function), then Le proved that {\|(\alpha-\psi)\circ p\|_B=0}. In this case one can not replace the Besicovitch seminorm with the Weyl seminorm.

Back to polynomials, a far reaching extension of Theorem 1 was obtained by Leibman in two papers published in 2010 and 2015. In particular, Leibman was able to remove the hypothesis that the system defining the multicorrelation sequence is ergodic.

Theorem 3 (Leibman, 2010-2015) Let {(X,\mu,T)} be a measure preserving system and let {f_0,\dots,f_k\in L^\infty(X)}. Let {p_1,\dots,p_k\in{\mathbb Z}[x]} and let

\displaystyle  \alpha(n)=\int_Xf_0\cdot T^{p_1(n)} f_1\cdots T^{p_k(n)}f_k\,\mathrm{d}\mu. \ \ \ \ \ (2)

Then there exists a nilsequence {\psi} such that {\|\alpha-\psi\|_W=0}.

By taking {p_i(n)=in}, we see that the class of multicorrelations defined by (1) is contained in the class defined by (2). On the other hand, multicorrelations of the form (2) appear naturally when considering polynomial extensions of Szemerédi’s theorem or of Furstenberg’s multiple recurrence theorem.

Part of the aforementioned result of Le also applies to multicorrelations {\alpha} of the form (2).

Theorem 4 (Le, 2017) Let {\alpha} be defined by (2), let {\psi} be the nilsequence given by Theorem 3 and let {p(n)} be the increasing enumeration of the prime numbers. Then {\big\|(\alpha-\psi)\circ p\big\|_B=0}.

— 2. Commuting transformations —

Multidimensional extensions of Szemerédi’s theorem (which in turn are a significant special case of the density Hales-Jewett theorem) can also be obtained via ergodic theoretic methods but require the use of measure preserving systems with several commuting transformations. This in turn leads to a more general class of multicorrelation sequences:

\displaystyle  \alpha(n)=\int_Xf_0\cdot T_1^n f_1\cdots T_k^nf_k\,\mathrm{d}\mu, \ \ \ \ \ (3)

where {T_1,\dots,T_\ell} are commuting measure preserving transformations on the probability space {(X,\mu,T)} and {f_0,\dots,f_k\in L^\infty(X)}. Letting {T_i=T^i} we see that the class of multicorrelation sequences defined by (3) contains the class of sequences defined by (1). In view of Theorem 2, the following is a natural conjecture.

Conjecture 5 Let {\alpha} be defined by (3). Does there exist a nilsequence {\psi} such that {\|\alpha-\psi\|_W=0}?

Unfortunately, the proofs of the results mentioned in the previous section all rely on the structure theory of Host and Kra, which allows one to understand multiple ergodic averages by studying characteristic nilfactors, but which is unavailable for commuting transformations. In particular, this conjecture is still open. Nevertheless, in 2014, Frantzikinakis was able to obtain the following

Theorem 6 (Frantzikinakis) Let {\alpha} be defined by (3). Then for every {\epsilon>0} there exists a nilsequence {\psi} such that {\|\alpha-\psi\|_W<\epsilon}.

Remark 1 Since {\|a\|_W\leq\|a\|_{\ell^\infty}} for any {a\in\ell^\infty}, in Theorem 6 one can require {\psi} to be basic nilsequence. To streamline the presentation I will keep the slightly weaker formulation in this and other results below.

A convenient way to encode a tuple of {\ell} commuting measure preserving transformations {T_1,\dots,T_\ell} on a probability space {(X,\mu)} is via a single measure preserving action {T=(T^v)_{v\in{\mathbb Z}^\ell}} of {{\mathbb Z}^\ell}, defined by {T^{e_i}=T_i}, where {e_i} is the {i}-th vector in the canonical basis of {{\mathbb Z}^\ell}. Given a measure preserving {{\mathbb Z}^\ell}-action {T} on a probability space and vector polynomials {p_1,\dots,p_k:{\mathbb Z}\rightarrow{\mathbb Z}^\ell} (i.e. each {p_i=(p_{i,1},\dots,p_{i,\ell})} for some {p_{i,j}\in{\mathbb Z}[x]}) one can consider the multicorrelation sequence

\displaystyle  \alpha(n)=\int_Xf_0\cdot T^{p_1(n)} f_1\cdots T^{p_k(n)} f_k\,\mathrm{d}\mu. \ \ \ \ \ (4)

The class of multicorrelation sequences defined by (4) contains all the other classes of multicorrelation sequences mentioned before in this post and appear naturally when studying multidimensional polynomial extensions of Szemerédi’s theorem. Theorem 6 was proved by Frantzikinakis also in the more general situation where {\alpha} is given by (4).

Earlier this year, Le, Richter and I obtained a refinement of Theorem 6, which stands to it in the same way that Theorem 4 stands to Theorem 2.

Theorem 7 Let {\alpha} be defined by (4) and let {p(n)} be the increasing enumeration of the prime numbers. Then for every {\epsilon>0} there exists a nilsequence {\psi} satisfying simultaneously {\|\alpha-\psi\|_W<\epsilon} and {\big\|(\alpha-\psi)\circ p\|_B<\epsilon}.

— 3. Integer part of real polynomials —

Given a measure preserving {{\mathbb Z}^\ell}-action {T} on a probability space {(X,\mu)}, vector polynomials {p_1,\dots,p_k:{\mathbb R}\rightarrow{\mathbb R}^\ell} (i.e., each {p_i=(p_{i,1},\dots,p_{i,\ell})} with {p_{i,j}\in{\mathbb R}[x]}) and functions {f_0,\dots,f_k\in L^\infty(X)}, one can construct the multicorrelation sequence

\displaystyle  \alpha(n)=\int_Xf_0\cdot T^{\lfloor p_1(n)\rfloor} f_1\cdots T^{\lfloor p_k(n)\rfloor} f_k\,\mathrm{d}\mu, \ \ \ \ \ (5)

where for a vector {x=(x_1,\dots,x_\ell)\in{\mathbb R}^\ell} we denote {\lfloor x\rfloor=(\lfloor x_1\rfloor,\dots,\lfloor x_\ell\rfloor)}. If all polynomials {p_i} have only integer coefficients, then (5) becomes (4), so the class of multicorrelation sequences defined via (5) contains all the other multicorrelation sequences defined in this post.

The extension of Frantzikinakis theorem to multicorrelation sequences of the form (5) was obtained by Koutsogiannis.

Theorem 8 (Koutsogiannis) Let {\alpha} be defined by (5). Then for every {\epsilon>0} there exists a nilsequence {\psi} such that {\|\alpha-\psi\|_W<\epsilon}.

The main result of our paper is the common extension of Theorems 7 and 8.

Theorem 9 Let {\alpha} be defined by (5) and let {p(n)} be the increasing enumeration of the prime numbers. Then for every {\epsilon>0} there exists a nilsequence {\psi} satisfying simultaneously {\|\alpha-\psi\|_W<\epsilon} and {\big\|(\alpha-\psi)\circ p\|_B<\epsilon}.

— 4. Proofs —

As already mentioned above, the proofs of the results listed in Section 1 all require the structure theory. This involves two steps: first is a reduction to nilsystems and then an analysis of the equidistribution properties of orbits in nilsystems. I plan to write another post about these ideas in the near future.

The breakthrough idea of Frantzikinakis, leading to his proof of Theorem 6 can be described in rough terms as follows. Using a common van der Corput-type procedure, one can show that a multicorrelation sequence {\alpha} of the form (3) is orthogonal, with respect to the {\|\cdot\|_W} seminorm, to any sequence {\beta\in\ell^\infty} which is uniform (in a sense analogous to Gowers uniformity in finite groups.) The same is true for any nilsequence. Then, letting {\gamma} be the orthogonal projection of {\alpha} onto the space of all nilsequences (w.r.t. {\|\cdot\|_W}) it follows that the difference {\alpha-\gamma} is also orthogonal to every uniform sequence {\beta}. By (a version of) the inverse theorem for uniformity seminorms, if {\alpha-\gamma} were not negligible, then it would have to correlate with some nilsequence. This is not possible, since {\gamma} is the orthogonal projection of {\alpha} to the space of nilsequences, and it follows that {\|\alpha-\gamma\|_W=0}. Unfortunately, the space of nilsequences is not closed in the {\|\cdot\|_W} seminorm, so {\gamma} itself is not a nilsequence. Nevertheless it belongs to the {\|\cdot\|_W}-closure of the space of nilsequences, so for every {\epsilon>0} there is a nilsequence {\psi} such that {\|\alpha-\psi\|_W<\epsilon}.

When proving Theorem 7, because the primes are involved, we need to employ the {W}-trick. This in turn requires some control on the nilsequences {\psi} that we pick. In order to obtain the necessary additional properties on {\psi} we have to first improve Theorem 6. We do this by describing the (topological) Furstenberg system of a multicorrelation sequence {\alpha}.

Finally, the proof of Theorem 9 follows closely the proof of Theorem 8. The idea is to approximate {\alpha} as defined in (5) by a multicorrelation sequence as defined in (4). To do this we pass to (a multidimensional analogue of) the suspension flow and use the fact that, if {T} is a {{\mathbb R}}-action and {q(n)=a_0+a_1n+\cdots+a_mn^m\in{\mathbb R}[n]} then

\displaystyle T^{q(n)}=T^{a_0}\big(T^{a_1}\big)^n\big(T^{a_2}\big)^{n^2}\cdots\big(T^{a_m}\big)^{n^m}.

Since the measure preserving transformations {T^{a_1},\dots,T^{a_m}} commute, by extending the original {{\mathbb Z}^\ell}-action to a {{\mathbb Z}^{\ell m}}-action one can represent a real polynomial by polynomials with integer coefficients. There are of course some difficulties with implementing this strategy, since one also needs to handle the floor function, but this is ultimately taken care of by the fact that polynomials, and polynomials evaluated along primes, equidistribute mod {1} in a well understood subset of the {[0,1)} interval.

Posted in Ergodic Theory, paper, State of the art | Tagged , , , , | Leave a comment

Tao’s Proof of (logarithmically averaged) Chowla’s conjecture for two point correlations

The Liouville function is the completely multiplicative function {\lambda} with {\lambda(p)=-1} for every prime {p}. The Chowla conjecture predicts that this function behaves randomly. Here is a version of this conjecture.

Conjecture 1 (Chowla) For every finite set {F\subset\{0,1,\dots\}},

\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^N\prod_{i\in F}\lambda(n+i)=0.

This conjecture received a lot of attention in the last decade, since Peter Sarnak derived from it an orthogonality criterion for the Liouville (or, similarly, the Mobius function), now known as the “Sarnak conjecture”. Both Chowla’s and Sarnak’s conjectures remain open, however many weaker versions and special cases have been derived (see, for instance, this survey). Perhaps the most impressive result so far is the following theorem of Tao, which was in turn the starting point for many of the more recent developments.

Theorem 2 (Tao) For every (distinct) {a,b\in{\mathbb N}},

\displaystyle \lim_{N/M\rightarrow\infty}\frac1{\log{(N/M)}}\sum_{n=M}^N\frac1n\lambda(n+a)\lambda(n+b)=0.

Tao’s main theorem in his paper is actually significantly more general than Theorem 2, for it allows more general affine maps inside the argument of {\lambda}, and in fact replaces {\lambda} with a much wider class of multiplicative functions. To see why Theorem 2 follows from Conjecture 1, observe that any sequence {x(n)} satisfying

\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nx(n)=0

must also satisfy

\displaystyle \lim_{N/M\rightarrow\infty}\frac1{\log{N/M}}\sum_{n=M}^N\frac1n x(n)=0.

In the terminology and notation of my earlier post, the conclusion of this theorem states that the uniform logarithmic average of the sequence {n\mapsto\lambda(n+a)\lambda(n+b)} is {0}, i.e.,

\displaystyle \underset{n\rightarrow\infty}{UW\text{-}\lim}~\lambda(n+a)\lambda(n+b)=0

for {W(n)=\log n}.

In this post I will present Tao’s proof restricted to the following special case. (Most of the proof remains unchanged in the general case, but the presentation becomes easier when we avoid general {a,b}).

Theorem 3 Let {\lambda} denote the Liouville function. Then

\displaystyle \lim_{N/M\rightarrow\infty}\frac1{\log(N/M)}\sum_{n=M}^N\frac1n\lambda(n)\lambda(n+1)=0

We will use a correspondence principle to reformulate the problem in the language of dynamical systems to, among other things, simplify the notation. This approach was also carried out by Tao in his blog, and the main motivation for me to write this post was to try to use an easier construction of a Furstenberg system, which would still keep track of the “congruence classes of the points” in the system, as detailed below the fold.

Continue reading

Posted in Number Theory, Probability | Tagged , , , , | Leave a comment

A viewpoint on Katai’s orthogonality criterion

The Liouville function, defined as the completely multiplicative function {\lambda} which sends every prime {p} to {-1}, encodes several important properties of the primes. For instance, the statement that

\displaystyle \frac1N\sum_{n=1}^N\lambda(n)=o(1)

is equivalent to the prime number theorem, while the improved (and essentially best possible) rate of convergence

\displaystyle \frac1N\sum_{n=1}^N\lambda(n)=O_\epsilon\left(\frac1{N^{1/2-\epsilon}}\right)

for every positive {\epsilon} is equivalent to the Riemann hypothesis.

Of particular interest are statements involving correlations (or lack thereof) between {\lambda} and certain “structured” functions. In this direction, Green and Tao established “orthogonality” of {\lambda} with any nilsequence. More precisely, they showed that for any nilpotent Lie group {G}, any discrete subgroup {\Gamma} for which {X:=G/\Gamma} is compact, any Lipschitz function {F:X\rightarrow{\mathbb C}}, any {A>0} and any {b\in G},

\displaystyle \frac1N\sum_{n=1}^N\lambda(n)F(b^n\Gamma)=O_A\left(\frac1{\log^AN}\right),

which was an important ingredient in their proof of the “finite complexity” version of the Dickson/Hardy-Littlewood prime tuple conjecture. Perhaps inspired by this result, Sarnak then conjectured that in fact the Liouville function (actually, the closely related Möbius function) is orthogonal to every deterministic sequence:

Conjecture 1 (Sarnak conjecture for the Liouville function) Let {X} be a compact metric space and let {T:X\rightarrow X} be a homeomorphism. If the topological dynamical system {(X,T)} has {0} topological entropy, then for every continuous {F\in C(X)} and every {x\in X},

\displaystyle \frac1N\sum_{n=1}^N\lambda(n)F(T^nx)=o(1).

Sarnak’s conjecture is still open, despite some remarkable progress (see the recent survey of Ferenczi, Kułaga-Przymus and Lemańczyk for information on the progress made so far).

A useful tool to establish partial cases of Sarnak’s conjecture (and other results not implied by the Sarnak conjecture) is the following orthogonality criterion of Katai.

Theorem 2 (Katai’s orthogonality criterion) Let {u:{\mathbb N}\rightarrow {\mathbb C}} be bounded and let {v:{\mathbb N}\rightarrow S^1} be a completely multiplicative function. If for every distinct primes {h,h'}

\displaystyle  \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(nh)\overline{u(nh')}=0 \ \ \ \ \ (1)


\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(n)\overline{v(n)}=0.

Remark 1 One can in fact relax the condition on {v} to be only a multiplicative function. This allows the criterion to be applied, for instance, with {v} being the Möbius function.

The statement of Theorem 2 is reminiscent of the van der Corput trick, which is a ubiquitous tool in the fields of uniform distribution and ergodic Ramsey theory, was the protagonist of a survey Vitaly Bergelson and I wrote and has been mentioned repeatedly in this blog. In fact one can prove both results (Katai’s orthogonality criterion and van der Corput’s trick) using the same simple functional analytic principle. In this post I present this principle and use it to motivate proofs for the two theorems.

Continue reading

Posted in Classic results, Number Theory, Tool | Tagged , , , , | Leave a comment

Entropy of measure preserving systems

A measure preserving system is a quadruple {(X,{\mathcal B},\mu,T)} where {X} is a set, {{\mathcal B}} is a {\sigma}-algebra, {\mu:{\mathcal B}\rightarrow[0,1]} is a probability measure and {T:X\rightarrow X} is a measurable map satisfying {\mu(T^{-1}A)=\mu(A)} for every {A\in{\mathcal B}}.

The notion of isomorphism in the category of measure preserving systems (defined, for instance, in this earlier post) is fairly flexible, as it allows {0} measure perturbations. For instance, the system {([0,1],{\mathcal B},\lambda,T_2)} where {\lambda} is the Lebesgue measure and {T_2x=2x\bmod1} is isomorphic to the Bernoulli system {(\{0,1\}^{\mathbb N},\mu,\sigma)}, where {\sigma:\{0,1\}^{\mathbb N}\rightarrow\{0,1\}^{\mathbb N}} is the left shift and {\mu} is the infinite product of the measure {p} on {\{0,1\}} defined so that {p(\{0\})=p(\{1\})=1/2}.

However, this flexibility can make it quite difficult to decide whether two given systems are isomorphic or not. While several spectral invariants can be used, such as ergodicity, weakly mixing or strongly mixing, they are not sufficient to distinguish even between two Bernoulli shifts. To address this problem, Kolmogorov and Sinai developed the notion of entropy in the late 1950’s. Since then, the entropy of measure preserving systems, together with its sister in topological dynamics, has became a fundamental tool in understanding the structure of dynamical systems, with applications far beyond distinguishing non-isomorphic systems.

In this post I will define and then briefly survey some of the basic properties of entropy. I’ll mostly follow the exposition in the upcoming book of Einsiedler, Lindenstrauss and Ward, whose draft is available here; other good sources are Walter’s book and, in a somewhat different perspective, Austin’s recent notes.

Continue reading

Posted in Classic results, Ergodic Theory, Tool | Tagged | 1 Comment

Optimal intersectivity

In ergodic Ramsey theory, one often wants to prove that certain dynamically defined sets in a probability space intersect (or “recur”) in non-trivial ways. Typically, this is achieved by studying the long term behavior of the sets as the dynamics flow. However, in certain situations, one can establish the desired intersection (or recurrence) using purely combinatorial arguments, and without using the fact that the sets are dynamically defined. In such cases, one ends up obtaining a “static” (as opposed to dynamical) statement. An instance of this situation is the following intersectivity result of Bergelson, first used in this paper, and which I have mentioned before in this blog.

Lemma 1 Let {A_1,A_2,\dots} be sets in a probability space {(X,\mu)} with {\inf_n\mu(A_n)>\delta}. Then there exists an infinite set {I\subset{\mathbb N}} with density {d(I)>\delta}, such that for every non-empty finite set {F\subset I} we have

\displaystyle \mu\left(\bigcap_{n\in F}A_n\right)>0

A different kind of intersection property is the following static modification of Poincaré’s recurrence theorem.

Lemma 2 Let {(X,\mu)} be a probability space and let {A_1, A_2,\dots} be sets with {\inf_n\mu(A_n)>\delta}. Then there exists an infinite subset {I\subset{\mathbb N}} such that for every {n,m\in I},

\displaystyle \mu\left(A_n\cap A_m\right)>\delta^2

Observe that if the events {A_1,A_2,\dots} are independent, then the lower bound {\delta^2} is essentially achieved, and so this lemma is that sense optimal. The purpose of this post is to present a proof of the following common strengthening of Lemmas 1 and 2.

Theorem 3 Let {A_1,A_2,\dots} be sets in a probability space {(X,\mu)} with {\inf_n\mu(A_n)>\delta}. Then there exists an infinite set {I\subset{\mathbb N}} such that for every non-empty finite set {F\subset I} we have

\displaystyle \mu\left(\bigcap_{n\in F}A_n\right)>\delta^{|F|}

Observe that the bound {\delta^{|F|}} is optimal, again by considering the case of independent sets.

I learned this strengthening, as well as its proof from Konstantinos Tyros last December, when we were in Lyon, France attending the conference Ultrafilters, Ramsey Theory and Dynamics.

One could ask whether something more can be said about the set {I} in Theorem 3 other than being infinite . Something one can not hope for is that the set {I} has positive density, as showed in my previous post, using Forest’s theorem that not all sets of recurrence are sets of nice recurrence. On the other hand, one can indeed obtain certain combinatorial structures inside {I}. For instance, assuming the index set of the sets {(A_n)} is given the structure of a homogeneous tree, one can choose {I} to be a strong subtree; this is Theorem 3 in this paper of Dodos, Kanellopoulos and Tyros. Results of this kind are related to density versions of the Hales-Jewett and the Carlson-Simpson theorems due respectively to Furstenberg and Katznelson, and to Dodos, Kanellopoulos and Tyros. Continue reading

Posted in Combinatorics, Probability, Ramsey Theory, Tool | Tagged , , , , | Leave a comment

A proof a sumset conjecture of Erdős

Florian Richter, Donald Robertson and I have uploaded to the arXiv our paper entitled A proof a sumset conjecture of Erdős. The main goal of the paper is to prove the following theorem, which verifies a conjecture of Erdős discussed in this previous post.

Theorem 1 Let {A\subset{\mathbb N}} have positive upper density. Then it contains a sumset {B+C=\{b+c:b\in B, c\in C\}} for infinite sets {B} and {C}.

As usual, the upper density of a set {A\subset{\mathbb N}} is the quantity

\displaystyle \bar d(A):=\limsup_{N\rightarrow\infty}\frac{|A\cap\{1,\dots,N\}|}N.

More generally, given a sequence {\Phi:N\mapsto\Phi_N=\{a_N,a_N+1,\dots,b_N\}} of intervals of integers with lengths {|\Phi_N|} tending to infinity (or even more generally, given any Følner sequence {\Phi:N\mapsto\Phi_N} in {{\mathbb N}}), we define the upper density with respect to {\Phi} as

\displaystyle \bar d_\Phi(A):=\limsup_{N\rightarrow\infty}\frac{\big|A\cap\Phi_N\big|}{|\Phi_N|}.

When the limit exists we say that the density of {A} with respect to {\Phi} exists and denote it by {d(A)} (and whenever we write {d(A)} we are implicitly stating that the limit exists).

As explained in my earlier post mentioned above (see Proposition 12), one can adapt an argument of Di Nasso, Goldbring, Jin, Leth, Lupini and Mahlburg to reduce Theorem 1 to the following statement.

Theorem 2 If {A\subset{\mathbb N}} has positive upper density, then there exist a sequence {\Phi} of intervals, a set {L\subset{\mathbb N}} and {\epsilon>0} such that for every finite subset {F\subset L}, the intersection

\displaystyle  \bigcap_{\ell\in F}(A-\ell)\cap\big\{m\in{\mathbb N}:d_\Phi(A-m\cap L)>0\big\} \ \ \ \ \ (1)

is infinite.

In my previous post I described how Theorem 2 can then be easily checked for weak mixing sets and for Bohr sets, whose definitions we now recall.

Definition 3 A set {A\subset{\mathbb N}} is:

  • a Bohr set if there exist {d\in{\mathbb N}}, a vector {\alpha\in{\mathbb T}^d} and an open set {U\subset{\mathbb T}^d} such that {A\supset\{n\in{\mathbb N}:n\alpha\in U\}\neq\emptyset}.
  • a weak mixing set with respect to the sequence of intervals {\Phi} if the set {\big\{n:\big|d_\Phi((A-n)\cap A)-d_\Phi(A)^2\big|<\epsilon\big\}} has density {1}.

It is well known in the realm of ergodic theory that weak mixing and almost periodicity are extreme points of a dichotomy (see for instance, this post) and so it was natural to try to patch these two observations into a proof of Theorem 2 in general. While this is ultimately how we prove Theorem 2, there were several problems with implementing this idea. Below the fold I explain the problems that arise and (briefly) sketch how we address them. Continue reading

Posted in Combinatorics, paper | Tagged , , , , , , , | Leave a comment