## A viewpoint on Katai’s orthogonality criterion

The Liouville function, defined as the completely multiplicative function ${\lambda}$ which sends every prime ${p}$ to ${-1}$, encodes several important properties of the primes. For instance, the statement that

$\displaystyle \frac1N\sum_{n=1}^N\lambda(n)=o(1)$

is equivalent to the prime number theorem, while the improved (and essentially best possible) rate of convergence

$\displaystyle \frac1N\sum_{n=1}^N\lambda(n)=O_\epsilon\left(\frac1{N^{1/2-\epsilon}}\right)$

for every positive ${\epsilon}$ is equivalent to the Riemann hypothesis.

Of particular interest are statements involving correlations (or lack thereof) between ${\lambda}$ and certain “structured” functions. In this direction, Green and Tao established “orthogonality” of ${\lambda}$ with any nilsequence. More precisely, they showed that for any nilpotent Lie group ${G}$, any discrete subgroup ${\Gamma}$ for which ${X:=G/\Gamma}$ is compact, any Lipschitz function ${F:X\rightarrow{\mathbb C}}$, any ${A>0}$ and any ${b\in G}$,

$\displaystyle \frac1N\sum_{n=1}^N\lambda(n)F(b^n\Gamma)=O_A\left(\frac1{\log^AN}\right),$

which was an important ingredient in their proof of the “finite complexity” version of the Dickson/Hardy-Littlewood prime tuple conjecture. Perhaps inspired by this result, Sarnak then conjectured that in fact the Liouville function (actually, the closely related Möbius function) is orthogonal to every deterministic sequence:

Conjecture 1 (Sarnak conjecture for the Liouville function) Let ${X}$ be a compact metric space and let ${T:X\rightarrow X}$ be a homeomorphism. If the topological dynamical system ${(X,T)}$ has ${0}$ topological entropy, then for every continuous ${F\in C(X)}$ and every ${x\in X}$,

$\displaystyle \frac1N\sum_{n=1}^N\lambda(n)F(T^nx)=o(1).$

Sarnak’s conjecture is still open, despite some remarkable progress (see the recent survey of Ferenczi, Kułaga-Przymus and Lemańczyk for information on the progress made so far).

A useful tool to establish partial cases of Sarnak’s conjecture (and other results not implied by the Sarnak conjecture) is the following orthogonality criterion of Katai.

Theorem 2 (Katai’s orthogonality criterion) Let ${u:{\mathbb N}\rightarrow {\mathbb C}}$ be bounded and let ${v:{\mathbb N}\rightarrow S^1}$ be a completely multiplicative function. If for every distinct primes ${h,h'}$

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(nh)\overline{u(nh')}=0 \ \ \ \ \ (1)$

then

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(n)\overline{v(n)}=0.$

Remark 1 One can in fact relax the condition on ${v}$ to be only a multiplicative function. This allows the criterion to be applied, for instance, with ${v}$ being the Möbius function.

The statement of Theorem 2 is reminiscent of the van der Corput trick, which is a ubiquitous tool in the fields of uniform distribution and ergodic Ramsey theory, was the protagonist of a survey Vitaly Bergelson and I wrote and has been mentioned repeatedly in this blog. In fact one can prove both results (Katai’s orthogonality criterion and van der Corput’s trick) using the same simple functional analytic principle. In this post I present this principle and use it to motivate proofs for the two theorems.

— 1. The van der Corput trick —

Given ${k}$ orthonormal vectors ${u_1,\dots,u_k}$, Pythagoras’ theorem implies that their sum ${s=u_1+\cdots+u_k}$ has norm ${\|s\|=\sqrt{k}}$. Therefore if ${v}$ is another vector whose correlation ${\langle u_i,v\rangle}$ with each ${u_i}$ is a constant ${c}$ independent of ${i}$, then by the Cauchy-Schwarz inequality,

$\displaystyle \sqrt{k}\|v\|\geq\left|\left\langle s,v\right\rangle\right|= k|c|$

which implies that ${|c|\leq\|v\|/\sqrt{k}}$.

Taking ${k\rightarrow\infty}$ we obtain the following corollary of Bessel’s inequality:

Proposition 3 Let ${u_1,u_2,\dots}$ be pairwise orthogonal unit vectors in a Hilbert space ${H}$. If ${v\in H}$ is such that ${|\langle u_i,v\rangle|}$ does not depend on ${i}$, then ${\langle u_i,v\rangle=0}$ for all ${i}$.

Proof: Let ${c}$ be the common value ${c=|\langle u_i,v\rangle|}$ and, for each ${i\in{\mathbb N}}$, let ${v_i=\overline{\langle u_i,v\rangle}u_i}$. Then ${\langle v_i,v\rangle=c^2}$ for all ${i\in{\mathbb N}}$ and hence by the Pythagoras theorem and the Cauchy-Schwarz inequality, ${c^2<\|v\|/\sqrt{n}}$ for every ${n\in{\mathbb N}}$. This implies that ${c=0}$ as desired. $\Box$

It is possible to extend the above proposition to a setting where one does not have a genuine Hilbert space. Given a sequence ${u(n)}$ of complex numbers, define its Besicovitch seminorm by

$\displaystyle \|u\|:=\limsup_{N\rightarrow\infty}\sqrt{\frac1N\sum_{n=1}^N\big|u(n)\big|^2}.$

and let ${L^2({\mathbb N})}$ denote the space of all sequences ${u:{\mathbb N}\rightarrow{\mathbb C}}$ with ${\|u\|<\infty}$. The Minkowski inequality implies that ${L^2({\mathbb N})}$ is a vector space. We can (partially) define an inner product in ${L^2({\mathbb N})}$ via the formula

$\displaystyle \langle u,v\rangle:=\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(n)\overline{v(n)}\ \ \ \ \ (2)$

whenever the limit exists. It is clear that if ${\langle u,u\rangle}$ exists, then it must equal ${\|u\|^2}$. Moreover, it is easy to check that both the Pythagoras theorem and the Cauchy-Schwarz inequality hold in this space, assuming that all the inner products are well defined. Therefore, the proof of Proposition 3 holds for vectors in ${L^2({\mathbb N})}$ and we obtain

Lemma 4 Let ${u_1,u_2,\dots}$ be elements of ${L^2({\mathbb N})}$. If ${\langle u_i,u_j\rangle=0}$ for every ${i\neq j}$ (and in particular it exists) and ${v\in L^2({\mathbb N})}$ is such that ${|\langle v,u_i\rangle|}$ (exists and) does not depend on ${i}$, then ${\langle v,u_i\rangle=0}$ for all ${i}$.

Remark 2 There was nothing special about the sequence of intervals ${\{1,\dots,N\}}$ in the above discussion. In fact, one can define all the averages with respect to an arbitrary sequence of intervals whose lengths tend to infinity, or indeed with respect to any Følner sequence in ${({\mathbb N},+)}$.

Next let’s introduce the isometry ${T:L^2({\mathbb N})\rightarrow L^2({\mathbb N})}$ induced by shifting: ${Tu:n\mapsto u(n+1)}$. It follows easily from the definitions that ${\langle Tu,Tv\rangle=\langle u,v\rangle}$. Thus if ${u\in L^2({\mathbb N})}$ is such that ${\langle T^nu,u\rangle=0}$ for all ${n}$, then taking ${u_i=T^iu}$ and ${v=1}$ in Lemma 4 we deduce that ${\langle u,1\rangle=0}$ (assuming it exists).

This is essentially the van der Corput trick, the only difference being that we do not need to assume that ${\langle u,1\rangle}$ exists. In other words, from Lemma 4 we quickly obtain the following version of the van der Corput trick:

Proposition 5 (van der Corput trick) Let ${u\in L^2({\mathbb N})}$ and suppose that for every ${h\in{\mathbb N}}$

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(n+h)\overline{u(n)}=0. \ \ \ \ \ (3)$

Then

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(n)=0. \ \ \ \ \ (4)$

One can prove this by passing to a subsequence ${(N_k)}$ along which the limit in (4) exists and then using the space ${L^2}$ with respect to this averaging scheme as in Remark 2 and apply Lemma 4. Instead we adapt the proof of Lemma 4 to establish the following stronger version directly.

Proposition 6 (Uniform van der Corput trick) Let ${u\in L^2({\mathbb N})}$ and suppose that for every ${h\in{\mathbb N}}$

$\displaystyle \lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(n+h)\overline{u(n)}=0.$

Then

$\displaystyle \lim_{N\rightarrow\infty}\sup_{\theta\in{\mathbb R}}\left|\frac1N\sum_{n=1}^Nu(n)e(-\theta n)\right|=0.$

where, as usual, we denote by ${e(\alpha)}$ the complex number ${e^{2\pi i\alpha}}$.

Proof: To make the proof look more natural, given two sequences ${v,w:{\mathbb N}\rightarrow{\mathbb C}}$, for each ${N\in{\mathbb N}}$ we introduce the notation

$\displaystyle \langle u,v\rangle_N:=\frac1N\sum_{n=1}^Nu(n)\overline{v(n)}\qquad\|u\|_N:=\sqrt{\langle u,u\rangle_N}$

Next let ${H=H(N)}$ be a function which tends to ${\infty}$ but grows sufficiently slow, so that

$\displaystyle \max_{1\leq h\leq H}\big| \langle T^hu,u\rangle_N\big|=o(1)$

In other words, at scale ${N}$, the first ${H}$ shifts of ${u}$ are orthogonal to ${u}$, and hence to each other. Since ${\langle T^hu,e(\theta n)\rangle_N=e(\theta h)\langle u,e(\theta n)\rangle_N+o(1)}$, we will want to rotate each shift of ${u}$ accordingly. To this effect, let ${v_{h,\theta}(n)=e(-h\theta)u(n+h)}$, noticing that now

$\displaystyle \big\langle v_{h,\theta},e(n\theta)\big\rangle_N=\big\langle u,e(n\theta)\big\rangle_N+o(1)$

for every ${h\leq H}$. Let ${u_{H,\theta}:=v_{1,\theta}+\cdots+v_{H,\theta}}$ be the sum of the first ${H}$ (rotated) shifts of ${u}$. Using the orthogonality of the ${v_{h,\theta}}$ we obtain

$\displaystyle \|u_{H,\theta}\|^2_N=\sum_{i,j=1}^H\langle v_{i,\theta},v_{j,\theta}\rangle_N=\sum_{h=1}^H\|v_{j,\theta}\|^2+o(1)=H\|u\|_N^2+o(1)$

where the constant implicit in the ${o(1)}$ notation does not depend on ${\theta}$. Finally, the Cauchy-Schwarz inequality implies that

$\displaystyle \Big|\big\langle u,e(n\theta)\big\rangle_N\Big|=\frac1H\Big|\big\langle u_{H,\theta},e(n\theta)\big\rangle_N\Big|+o(1)\leq\frac1H\sqrt{H}\|u\|_N+o(1)$

Since the ${o(1)}$ term is independent of ${\theta}$, taking the supremum over all ${\theta\in{\mathbb R}}$, and then the limit as ${N\rightarrow\infty}$ gives the desired result. $\Box$

Remark 3 There are several versions of the van der Corput trick strictly stronger than Proposition 5, allowing each ${u(n)}$ to be a vector in a Hilbert space, and weakening the condition (3) to hold not for every ${h}$ but only on average. One could adapt the proof presented to obtain analogous strengthenings of Proposition 6.

— 2. Katai’s orthogonality criterion —

The orthogonality criterion of Katai can be seen as another application of Lemma 4, as follows. Let’s first pretend that the inner product in ${L^2({\mathbb N})}$ given by (2) is invariant under dilation by a prime, in the sense that ${\langle u,v\rangle=\langle D^hu,D^hv\rangle}$, where ${D^hu:n\mapsto u(hn)}$. Then letting ${v:{\mathbb N}\rightarrow S^1}$ be a completely multiplicative function we would have ${\big|\langle D^hu,v\rangle\big|=\big|\langle u,v\rangle\big|}$ for every prime ${h}$. In this language, the assumption (1) in Theorem 2 states that ${\langle D^hu,D^{h'}u\rangle=0}$ for every ${h\neq h'}$, so that now Lemma 4 implies that ${\langle u,v\rangle=0}$ as desired.

Of course it is not true that the inner product in (2) is invariant under dilation, since the sequence of intervals ${\{1,\dots,N\}}$ is an additive Følner sequence but not a multiplicative one. Nevertheless, we can make this strategy work by using the fact that the inner product is invariant under dilation on average in a strong sense made precise by the Turán-Kubilius lemma:

Lemma 7 (Turán-Kubilius) Let ${H\subset\mathbb{P}}$ be a finite set of primes and let ${f(n)=\frac1{A}\sum_{h\in H}^H1_{h{\mathbb Z}}(n)}$, where the normalization factor ${A:=\sum_{h=1}^H1/h}$ is the sum of the reciprocals of the primes in ${H}$, so that ${f}$ has average ${1}$. Then ${\|f-1\|\leq\frac1{\sqrt{A}}}$.

While this lemma is quite powerful and can look a bit surprising at first, it is just an application of the following simple fact in probability.

Proposition 8 Let ${(X,\mu)}$ be a probability space, let ${H\in{\mathbb N}}$, let ${E_1,\dots,E_h}$ be independent events in ${X}$ (not all empty), let ${A=\mu(E_1)+\cdots+\mu(E_H)}$ and let

$\displaystyle f=\frac1A\sum_{h=1}^H1_{E_h}$

Then ${\|f-1\|_{L^2(\mu)}<1/\sqrt{A}}$.

Proof: Using the fact that ${1_E1_F=1_{E\cap F}}$ we compute

$\displaystyle |f-1|^2=f^2-2f+1=\frac1{A^2}\sum_{h,h'=1}^H1_{E_h\cap E_{h'}}-\frac2H\sum_{h=1}^H1_{E_h}+1$

Independence implies that ${\int_X 1_{E_h\cap E_{h'}}~ d\mu=\mu(E_h\cap E_{h'})=\mu(E_h)\mu(E_{h'})}$ whenever ${h\neq h'}$. Integrating the previous equation over ${X}$ we conclude

$\displaystyle \begin{array}{rcl} \displaystyle\|f-1\|_{L^2(\mu)}^2 &=& \displaystyle\frac1{A^2}\left[\sum_{h=1}^H\mu(E_h)+\sum_{h\neq h'}\mu(E_h)\mu(E_{h'})\right]-\frac2H\sum_{h=1}^H\mu(E_h)+1 \\&\leq& \displaystyle \frac1A \end{array}$

$\Box$

Proof of Lemma 7: Apply Proposition 8 with ${\mu}$ being the normalized counting measure on ${X:=\{1,\dots,N\}}$, where ${N}$ is divisible by all the primes in ${H}$, observing that the Chinese remainder theorem implies that the events ${E_h:=1_{h{\mathbb Z}}\cap X}$ are independent. $\Box$

Lemma 7 will be useful to us when combined with the following straightforward computation.

Lemma 9 Let ${u\in L^2({\mathbb N})}$, let ${v:{\mathbb N}\rightarrow S^1}$ be a completely multiplicative function and let ${h}$ be a prime. Then

$\displaystyle \langle u1_{h{\mathbb Z}},v\rangle=\frac{\overline{v(h)}}h\langle D^hu,v\rangle$

Proof:

$\displaystyle \begin{array}{rcl} \displaystyle\langle u1_{h{\mathbb Z}},v\rangle &=& \displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^Nu(n)1_{h{\mathbb Z}}(n)\overline{v(n)} \\&=& \displaystyle\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N/h}u(nh)\overline{v(nh)} \\&=& \displaystyle\frac{\overline{v(h)}}h\lim_{N\rightarrow\infty}\frac1N\sum_{n=1}^{N}D^hu(n)\overline{v(n)} \\&=& \displaystyle\frac{\overline{v(h)}}h\langle D^hu,v\rangle \end{array}$

$\Box$

We can now apply the method outlined in the first paragraph of this section to prove the Katai orthogonality criterion.

Proof of Theorem 2: Fix a finite set ${H}$ of primes and let ${A}$ be the sum of their reciprocals. Using Lemma 7 and the Cauchy-Schwarz inequality we have

$\displaystyle \langle u,v\rangle = \frac1A\sum_{h\in H}\langle u1_{h{\mathbb Z}},v\rangle+o_{A\rightarrow\infty}(1)$

Using Lemma 9 and again the Cauchy-Schwarz inequality, we then get

$\displaystyle \langle u,v\rangle = \frac1A\sum_{h\in H}\frac{\overline{v(h)}}h\langle D^hu,v\rangle+o_{A\rightarrow\infty}(1)\leq \left\| \frac1A\sum_{h\in H}\frac{\overline{v(h)}}hD^hu\right\|+o_{A\rightarrow\infty}(1)$

Finally, using the hypothesis (1) we have

$\displaystyle \left\|\sum_{h\in H}\frac{\overline{v(h)}}hD^hu\right\|^2=\sum_{h,h'\in H}\frac{\overline{v(h)}v(h')}{hh'}\langle D^hu,D^{h'}u\rangle=\sum_{h\in H}\frac{|v(h)|^2}{h^2}\|D^hu\|^2=O_{A\rightarrow\infty}(1)$

which implies, after sending ${A\rightarrow\infty}$, that ${\langle u,v\rangle=0}$ as desired. $\Box$

Remark 4 One could adapt the proof presented to deal with multiplicative functions (as opposed to completely multiplicative). I chose not to do so to keep the main idea more clear, one can find a good presentation of the full proof in Tao’s blog post on the subject.

The proof of Theorem 2 was presented in a soft form, with no mention of the parameter ${N}$. By being somewhat more careful with the proof one could hope to extract some quantitative decay in the correlation between ${u}$ and ${v}$ as ${N\rightarrow\infty}$, in terms of the decay assumed in (1). For instance, letting ${\phi_M(N)}$ be such that

$\displaystyle \max_{1\leq h,h'\leq M}~\max_{N'\geq N}~\left|\frac1{N'}\sum_{n=1}^{N'}u(hn)\overline{u(h'n)}\right|\leq\phi_M(N),$

then, under the conditions of Theorem 2, for every ${N\in{\mathbb N}}$ and every finite set ${H}$ of primes all smaller than ${M}$, the proof gives the inequality

$\displaystyle \frac1N\sum_{n=1}^Nu(n)\overline{v(n)}\leq\frac1A+\frac1N\prod_{h\in H}h+\sqrt{\frac1A+\phi_M(N/M)}$

Unfortunately, this is not very useful, because even assuming ${\phi_M(N)=0}$ all the time, the term ${1/A}$ goes to ${0}$ slower than ${1/\log\log N}$, unless ${H}$ contains primes which are larger than ${N}$, which would render the Turán-Kubilius lemma useless (as seen by the fact that the second term explodes in this scenario).

This entry was posted in Classic results, Number Theory, Tool and tagged , , , , . Bookmark the permalink.