The Liouville function, defined as the completely multiplicative function which sends every prime to , encodes several important properties of the primes. For instance, the statement that

is equivalent to the prime number theorem, while the improved (and essentially best possible) rate of convergence

for every positive is equivalent to the Riemann hypothesis.

Of particular interest are statements involving correlations (or lack thereof) between and certain “structured” functions. In this direction, Green and Tao established “orthogonality” of with any nilsequence. More precisely, they showed that for any nilpotent Lie group , any discrete subgroup for which is compact, any Lipschitz function , any and any ,

which was an important ingredient in their proof of the “finite complexity” version of the Dickson/Hardy-Littlewood prime tuple conjecture. Perhaps inspired by this result, Sarnak then conjectured that in fact the Liouville function (actually, the closely related Möbius function) is orthogonal to every deterministic sequence:

Conjecture 1 (Sarnak conjecture for the Liouville function)Let be a compact metric space and let be a homeomorphism. If the topological dynamical system has topological entropy, then for every continuous and every ,

Sarnak’s conjecture is still open, despite some remarkable progress (see the recent survey of Ferenczi, Kułaga-Przymus and Lemańczyk for information on the progress made so far).

A useful tool to establish partial cases of Sarnak’s conjecture (and other results not implied by the Sarnak conjecture) is the following orthogonality criterion of Katai.

Theorem 2 (Katai’s orthogonality criterion)Let be bounded and let be a completely multiplicative function. If for every distinct primes

Remark 1One can in fact relax the condition on to be only a multiplicative function. This allows the criterion to be applied, for instance, with being the Möbius function.

The statement of Theorem 2 is reminiscent of the van der Corput trick, which is a ubiquitous tool in the fields of uniform distribution and ergodic Ramsey theory, was the protagonist of a survey Vitaly Bergelson and I wrote and has been mentioned repeatedly in this blog. In fact one can prove both results (Katai’s orthogonality criterion and van der Corput’s trick) using the same simple functional analytic principle. In this post I present this principle and use it to motivate proofs for the two theorems.

** — 1. The van der Corput trick — **

Given orthonormal vectors , Pythagoras’ theorem implies that their sum has norm . Therefore if is another vector whose correlation with each is a constant independent of , then by the Cauchy-Schwarz inequality,

which implies that .

Taking we obtain the following corollary of Bessel’s inequality:

Proposition 3Let be pairwise orthogonal unit vectors in a Hilbert space . If is such that does not depend on , then for all .

*Proof:* Let be the common value and, for each , let . Then for all and hence by the Pythagoras theorem and the Cauchy-Schwarz inequality, for every . This implies that as desired.

It is possible to extend the above proposition to a setting where one does not have a genuine Hilbert space. Given a sequence of complex numbers, define its *Besicovitch seminorm* by

and let denote the space of all sequences with . The Minkowski inequality implies that is a vector space. We can (partially) define an inner product in via the formula

whenever the limit exists. It is clear that if exists, then it must equal . Moreover, it is easy to check that both the Pythagoras theorem and the Cauchy-Schwarz inequality hold in this space, assuming that all the inner products are well defined. Therefore, the proof of Proposition 3 holds for vectors in and we obtain

Lemma 4Let be elements of . If for every (and in particular it exists) and is such that (exists and) does not depend on , then for all .

Remark 2There was nothing special about the sequence of intervals in the above discussion. In fact, one can define all the averages with respect to an arbitrary sequence of intervals whose lengths tend to infinity, or indeed with respect to any Følner sequence in .

Next let’s introduce the isometry induced by shifting: . It follows easily from the definitions that . Thus if is such that for all , then taking and in Lemma 4 we deduce that (assuming it exists).

This is essentially the van der Corput trick, the only difference being that we do not need to assume that exists. In other words, from Lemma 4 we quickly obtain the following version of the van der Corput trick:

Proposition 5 (van der Corput trick)Let and suppose that for every

One can prove this by passing to a subsequence along which the limit in (4) exists and then using the space with respect to this averaging scheme as in Remark 2 and apply Lemma 4. Instead we adapt the proof of Lemma 4 to establish the following stronger version directly.

Proposition 6 (Uniform van der Corput trick)Let and suppose that for everyThen

where, as usual, we denote by the complex number .

*Proof:* To make the proof look more natural, given two sequences , for each we introduce the notation

Next let be a function which tends to but grows sufficiently slow, so that

In other words, at scale , the first shifts of are orthogonal to , and hence to each other. Since , we will want to rotate each shift of accordingly. To this effect, let , noticing that now

for every . Let be the sum of the first (rotated) shifts of . Using the orthogonality of the we obtain

where the constant implicit in the notation does not depend on . Finally, the Cauchy-Schwarz inequality implies that

Since the term is independent of , taking the supremum over all , and then the limit as gives the desired result.

Remark 3There are several versions of the van der Corput trick strictly stronger than Proposition 5, allowing each to be a vector in a Hilbert space, and weakening the condition (3) to hold not for every but only on average. One could adapt the proof presented to obtain analogous strengthenings of Proposition 6.

** — 2. Katai’s orthogonality criterion — **

The orthogonality criterion of Katai can be seen as another application of Lemma 4, as follows. Let’s first pretend that the inner product in given by (2) is invariant under dilation by a prime, in the sense that , where . Then letting be a completely multiplicative function we would have for every prime . In this language, the assumption (1) in Theorem 2 states that for every , so that now Lemma 4 implies that as desired.

Of course it is not true that the inner product in (2) is invariant under dilation, since the sequence of intervals is an additive Følner sequence but not a multiplicative one. Nevertheless, we can make this strategy work by using the fact that the inner product is invariant under dilation *on average* in a strong sense made precise by the Turán-Kubilius lemma:

Lemma 7 (Turán-Kubilius)Let be a finite set of primes and let , where the normalization factor is the sum of the reciprocals of the primes in , so that has average . Then .

While this lemma is quite powerful and can look a bit surprising at first, it is just an application of the following simple fact in probability.

Proposition 8Let be a probability space, let , let be independent events in (not all empty), let and letThen .

*Proof:* Using the fact that we compute

Independence implies that whenever . Integrating the previous equation over we conclude

*Proof of Lemma 7:* Apply Proposition 8 with being the normalized counting measure on , where is divisible by all the primes in , observing that the Chinese remainder theorem implies that the events are independent.

Lemma 7 will be useful to us when combined with the following straightforward computation.

Lemma 9Let , let be a completely multiplicative function and let be a prime. Then

*Proof:*

We can now apply the method outlined in the first paragraph of this section to prove the Katai orthogonality criterion.

*Proof of Theorem 2:* Fix a finite set of primes and let be the sum of their reciprocals. Using Lemma 7 and the Cauchy-Schwarz inequality we have

Using Lemma 9 and again the Cauchy-Schwarz inequality, we then get

Finally, using the hypothesis (1) we have

which implies, after sending , that as desired.

Remark 4One could adapt the proof presented to deal with multiplicative functions (as opposed to completely multiplicative). I chose not to do so to keep the main idea more clear, one can find a good presentation of the full proof in Tao’s blog post on the subject.

The proof of Theorem 2 was presented in a soft form, with no mention of the parameter . By being somewhat more careful with the proof one could hope to extract some quantitative decay in the correlation between and as , in terms of the decay assumed in (1). For instance, letting be such that

then, under the conditions of Theorem 2, for every and every finite set of primes all smaller than , the proof gives the inequality

Unfortunately, this is not very useful, because even assuming all the time, the term goes to slower than , unless contains primes which are larger than , which would render the Turán-Kubilius lemma useless (as seen by the fact that the second term explodes in this scenario).