The Liouville function is the completely multiplicative function with for every prime . The Chowla conjecture predicts that this function behaves randomly. Here is a version of this conjecture.
This conjecture received a lot of attention in the last decade, since Peter Sarnak derived from it an orthogonality criterion for the Liouville (or, similarly, the Mobius function), now known as the “Sarnak conjecture”. Both Chowla’s and Sarnak’s conjectures remain open, however many weaker versions and special cases have been derived (see, for instance, this survey). Perhaps the most impressive result so far is the following theorem of Tao, which was in turn the starting point for many of the more recent developments.
Tao’s main theorem in his paper is actually significantly more general than Theorem 2, for it allows more general affine maps inside the argument of , and in fact replaces with a much wider class of multiplicative functions. To see why Theorem 2 follows from Conjecture 1, observe that any sequence satisfying
must also satisfy
In the terminology and notation of my earlier post, the conclusion of this theorem states that the uniform logarithmic average of the sequence is , i.e.,
In this post I will present Tao’s proof restricted to the following special case. (Most of the proof remains unchanged in the general case, but the presentation becomes easier when we avoid general ).
We will use a correspondence principle to reformulate the problem in the language of dynamical systems to, among other things, simplify the notation. This approach was also carried out by Tao in his blog, and the main motivation for me to write this post was to try to use an easier construction of a Furstenberg system, which would still keep track of the “congruence classes of the points” in the system, as detailed below the fold.
— 1. The correspondence principle —
Since the proof of Theorem 3 requires the use of both addition and multiplication (unlike, say, certain proofs of Szemerédi’s theorem in arithmetic progressions, which don’t use the multiplicative action of the integers on themselves), we can not simply quote the classical Furstenberg Correspondence principle, which translates the combinatorial problem into an ergodic theoretic situation which only allows one to use (the dynamical version of) addition. Instead we will build a measure preserving system from scratch using the Stone-Čech compactification of the natural numbers (see my previous post for a construction of using ultrafilters). The idea of obtaining a correspondence principle using was used by Beigleböck in his short proof of Jin’s theorem (discussed in this previous post). I also adapted a similar idea in my paper on patterns to construct a dynamical system where both addition and multiplication were acting.
I will use the fact that can be identified with the space of all ultrafilters over . Recall that an ultrafilter is a collection of subsets of with and satisfying the properties that if and then , if then and . Each natural number gives rise to an ultrafilter, called a principal ultrafilter, defined by . Any ultrafilter not of this form is called non-principal. We identify with the subset of consisting of the principal ultrafilters in the obvious way.
The topology on can be described as follows. For every , the collection of ultrafilters is an open set. In fact open sets of this kind form a basis for the topology on . It is clear that , which implies that is also closed. In fact it is the closure of in . Observe that is a dense subset of .
With this topology, is a compact Hausdorff space, so the Riesz Representation theorem applies and hence Radon measures on can be identified with positive linear functionals on the space of continuous functions on . For each , the point mass is the probability measure on defined by for every . Let denote the logarithmic average of the measures as ranges over . In other words
Observe that, in order to establish Theorem 3 it suffices to show that for every sequences , with , the sequence
has as an accumulation point. Next fix arbitrary increasing sequences and take to be any weak accumulation point of the sequence . Let be the -preserving transformation defined by . Given a function , we denote by the composition .
Since is the Stone-Čech compactification of , every bounded function can be uniquely identified with a continuous function , with the property that for every . We will abuse the notation by referring to simply as and to simply as . Therefore, Theorem 3 is equivalent to the statement that for every probability measure obtained as above
Next we introduce multiplication in the system. Given an ultrafilter and , denote by the ultrafilter , where . We now introduce the transformation given by , and given a function we denote by the composition . To understand how behaves under the multiplication action, take and and observe that
we deduce that
for every measure constructed as above. This equation would not hold in general for measures which correspond to arbitrary invariant means and is the reason why the logarithmic averaging are used in this proof; indeed we will not make reference to the method of summation beyond using (2).
Given a prime , notice that . This means that the continuous functions and on agree on the dense set , and hence are the same function. Using (2) and the distributivity law we have that
To establish (1), which will finish the proof of Theorem 3, we need to show that the left hand side of (3) is . We will do this by showing that the right hand side of (3) gets arbitrarily small for appropriate choices of the set . To achieve this we will use two ingredients: the Matomäki-Radziwill theorem on averages of the Liouville function on short intervals, and the entropy decrement argument of Tao.
— 2. The Matomäki-Radziwill theorem —
The first case of Chowla’s conjecture, Conjecture 1 above, is when is a singleton. In this case it states that the averages tend to as , and this is known to be equivalent to the prime number theorem. On the other hand, if the Chowla conjecture holds, then there will be arbitrarily long intervals where is constant, so one can not replace the average over the initial interval with averages over arbitrary intervals of the form . Nevertheless, Matomäki and Radziwill showed that the average of is small for “most” intervals. As a corollary of their main result, using now logarithmic averages, we have
This result was shortly after improved by Matomäki Radziwill and Tao, allowing some “twisting” by exponential functions.
We can interpret this theorem in the context of the measure preserving system constructed in the previous section as saying that
If is any eigenfunction of the system , say , then
Since eigenfunctions are bounded, we can use Holder’s inequality now to conclude that is orthogonal to every eigenfunction of .
where the sum is taken over primes .
— 3. A probabilistic interpretation —
In order to establish (5) we will use a probabilistic argument. Let be the function
let be the function
and let be the function
Then we can rewrite (5) as
If for some the random variables and are independent, then in particular and are orthogonal for every and hence . Unfortunately, we will not be able to show that and are independent infinitely often (or even for some ), which would imply (6), so instead we will need a quantitative version of this observation.
A quantitative version of independence is provided by the notion of mutual information, defined below. As we will show, as long as the mutual information between and doesn’t grow too fast along some subsequence, then we still obtain (6). I explore the notion of entropy and its connection with “information” in my previous post; here are the relevant definitions.
Observe that and are independent if and only if (see, for instance, this proposition in my earlier post), which is also equivalent to . An application of Jensen’s inequality shows that any random variable taking values in a finite set has entropy at most , with maximum attained when the random variable is uniformly distributed. This and the fact that is uniformly distributed in its range will be important in what follows. In order to show that a lack of mutual information between and suffices to obtain a useful upper bound on , we will use the following Pinsker type inequality.
Proof: This is Lemma 1 in this post of Tao.
Two random variables on the probability space , say and , are independent if and only if the diagonal coupling is the same as the product coupling, in the sense that for every function ,
The next lemma exploits a lack of mutual information between two random variables to establish a quantitative version of this fact (for certain functions ).
Proof: Since is uniformly distributed, for any set we have . A quick computation shows that
Applying Markov’s inequality to the function , whose expectation is , we find that the set consisting of those with is large, in the sense that the union
has measure .
For each , denote by the conditional measure on . Then Lemma 7 implies that for each and every set ,
On the other hand, we can apply Hoeffding’s inequality to deduce that, for each , the set defined by
has cardinality at most . In other words, . Using (7) it follows that for each ,
Now let . We can estimate its measure by
Finally, if is not in , then and , which implies that
Since and , integrating over , we conclude that
which is the goal of the last section.
— 4. The Entropy Decrement Argument —
In this section we make no use of the fact that is the Liouville function. In particular, the argument works (and (8) holds) if we replace with any function with a finite image!
The definition of the entropy of a random variable only depends on the partition of induced by . If is a measure preserving transformation, then the random variable induces a different partition then , but there is a bijection between the two partitions that preserves the measure. Therefore . Given two random variables , the conditional entropy is defined as . Therefore
Recall that . Introducing the new random variables it is clear that and so its entropy does not depend on . On the other hand, , so .
The partition of induced by is invariant under , and therefore
which in turn implies that
Iterating this fact we deduce that, for every ,
Finally we deduce
Recall that we are trying to establish (8). Suppose, for the sake of a contradiction, that
On the other hand, using the prime number theorem we can estimate and
so for every large enough, dividing (9) by we obtain
Finally, let be the first value of for which (10) holds, and let we deduce that
Since the sequence satisfies the recurrence relation , a quick induction implies that for some absolute constant , and hence the right hand side of (11) becomes negative for large enough , which is clearly a contradiction. We conclude that (8) holds and this finishes the proof of Theorem 3.