## On different ways to take limits

One of the most fundamental notions in modern mathematics is the notion of limit. And to define limit of a sequence, the only structure needed is a Hausdorff topology (if we want the limit to be unique). One definition of the limit of a sequence is:

Definition 1 (Limit) Let ${\{x_n\}_{n\in {\mathbb N}}}$ be a sequence in a Hausdorff space. Then we say that the limit of the sequence is ${x}$ if for any open neighborhood ${U}$ of ${x}$ there is a finite set ${A\subset {\mathbb N}}$ such that ${x_n\in U}$ for all ${n\notin A}$.

This may not be exactly the definition used most times, but is clearly equivalent. One issue that arises with the limit is that it may not exist, so any time one wants to say something about the limit of a sequence, one has to prove first that the limit exist. Several classical techniques are used to do this, for instance, if the it’s a sequence of real numbers, both ${\limsup}$ and ${\liminf}$ of the sequence are defined, and they coincide if and only if the sequence has a limit. Other crucial idea, for sequences in complete metric spaces, is that one can tell if a sequence is convergent by information only inside the sequence, namely the property of being a Cauchy sequence.

However, sometimes, having a limit as defined above is too much to ask a sequence to satisfy. One weaker statement that has, perhaps surprisingly, a lot of useful applications is to require that some subsequence converge. This is, of course, the case if the space is compact. Another way to weaken the notion of limit is to consider weaker topologies (with less open sets). This has been extensively used in applied functional analysis and in the study of differential equations.

We now turn our attention to sequences taking values in normed vector spaces (over ${{\mathbb C}}$). If ${\displaystyle\lim_{n\rightarrow \infty} x_n=x}$ then it is easy to conclude that ${\displaystyle\lim_{n\rightarrow \infty} \mathop{\mathbb E}_{k\in [n]} x_k=x}$, where ${\mathop{\mathbb E}_{k\in [n]}}$ is the expectation and in this case can be replaced by ${\frac1n\sum_{k=1}^n}$. If ${\{x_n\}}$ satisfies the second condition, we say that ${x}$ is the Cesàro limit of the sequence. A sequence can be Cesàro convergent without being convergent in the usual sense, for instance, ${x_n=(-1)^n}$ is clearly not convergent, but is convergent in the Cesàro sense to ${0}$. This idea of weaker limit is used for instance in Fourier Analysis. A perhaps more famous example is the Law of Large Numbers, which we can state as: Let ${X_1,X_2,...}$ be independent and equidistributed random variables on some probability space ${\Omega}$, with expectation ${m}$ and bounded variance. Then for almost all ${\omega\in \Omega}$ we have that ${X_n(\omega)}$ converges to ${m}$ in the Cesàro sense. This illustrates the fact that with Cesàro limits one can find some order in random (or chaotic) behavior in the long time run.

We now give a definition of Cesàro convergence, which is clearly equivalent to the one discussed above.

Definition 2 (Cesàro limit) Let ${\{x_n\}}$ be a sequence in a normed vector space. We say that ${x_n}$ converges in the Cesàro sense to ${x}$ if ${\displaystyle \lim_{n\rightarrow \infty}\mathop{\mathbb E}_{k\in [n]}(x_k-x)=0}$.

This definition was written in this form to suggest a slight strengthening, which we will call strong Cesàro convergence:

Definition 3 (Strong Cesàro limit) Let ${\{x_n\}}$ be a sequence in a normed vector space. We say that ${x_n}$ converges in the strong Cesàro sense to ${x}$ if ${\displaystyle \lim_{n\rightarrow \infty}\mathop{\mathbb E}_{k\in [n]}|x_k-x|=0}$

It’s easy to see that the sequence ${\{(-1)^n\}}$ which converges in Cesàro sense to ${0}$, does not converge in the strong Cesàro sense, so this is indeed a stronger notion.

In a Hilbert space ${H}$, let ${U}$ be a unitary operator with no fixed points (i.e. ${Ux\neq x}$ for all ${x\in H\setminus\{0\}}$). Then for all ${x\in H}$ we have that ${U^nx}$ converges to ${0}$ in the (regular) Cesàro sense (and clearly if this holds then ${U}$ fixes no point). This result is called the (mean) Ergodic Theorem, and so such operator is called ergodic. The name comes from the fact that if ${(X,{\cal B},\mu,T)}$ is a probability preserving system (meaning that ${\mu(X)=1}$ and for all ${A\in{\cal B}}$ we have ${\mu(T^{-1}A)=\mu(A)}$), then the Koopman operator (i.e. the operator ${U:L^2(X,\mu)\rightarrow L^2(X,\mu)}$ defined by ${Uf(x)=f(Ux)}$. Notice that ${U}$ is unitary since ${T}$ preserves the measure) is ergodic in ${L^2_0(X,\mu)}$ (which is the set of functions ${f\in L^2}$ such that ${\int f=0}$) if and only if the system is ergodic, in the classic sense that if ${T^{-1}A=A}$ then ${\mu(A)=0}$ or ${\mu(A)=1}$.

Now assume that ${U}$ has no invariant subspace (and so in particular no fixed points). Then this is equivalent to the fact that ${\langle U^nx,y\rangle}$ converges to ${0}$ in the strong Cesàro sense, for all ${x,y\in H}$ (note how two weaker notions of convergence are being used here: if ${\lim \langle x_n,y\rangle=0}$ for all ${y\in H}$ then ${x_n\rightarrow 0}$ in the weak topology, here we request that to happen only in the strong Cesàro sense.) Such operator is called weakly mixing, again related to the ergodic theory definition.

Furthermore, given a unitary operator ${U}$ defined in the Hilbert space ${H}$, one can decompose the space ${H=H_{fix}\oplus H_{erg}}$ where ${H_{fix}:=\{x\in H: Ux=x\}}$ and it’s orthogonal complement is ${\displaystyle U_{erg}:=\{x\in H: \lim_{n\rightarrow \infty}U^nx= 0 \text{ in Ces\{a}ro sense}\}}$ (and thus ${U}$ is ergodic if ${H=H_{erg}}$). Another decomposition is ${H=H_{c}\oplus H_{WM}}$, where ${H_{c}:=\{x\in H:Ux=\lambda x, \lambda\in {\mathbb C}\}}$ (${c}$ stands for compact, as ${\overline{\{U^n x, n\in {\mathbb Z}\}}}$ is compact for ${x\in H_c}$) and it’s orthogonal complement ${\displaystyle H_{WM}:=\{x\in H: \forall y\in H, \ \lim_{n\rightarrow \infty}\langle U^nx,y\rangle=0\text{ in strong Ces\{a}ro sense}\}}$.

These decompositions are simple examples of the fruitful idea to separate a system into a structured component (such as ${H_{fix}}$ or ${H_c}$) and a noisy (or pseudo-random) component (such as ${H_{erg}}$ or ${H_{WM}}$). In the long run the noisy components cancel and became negligible, and the behavior of the structured component, which is easier to control/predict becomes dominant.

Other nice feature of the strong Cesàro convergence is that if a sequence converges in the strong Cesàro sense, then it also converges in density as defined by:

Definition 4 (Convergence in density) Let ${\{x_n\}}$ be a sequence in a Hausdorff space. Then we say that the sequence converges to ${x}$ in density if for all open neighborhood ${U}$ of ${x}$ there exist a sub set ${A\subset {\mathbb N}}$ of ${0}$ density such that ${x_n\in U}$ for all ${n\notin A}$.

Here ${A}$ having ${0}$ density means that ${\displaystyle\overline{d}(S):=\limsup_{n\rightarrow \infty}\frac1n|S\cap[1,n]|=0}$.

Compare this definition with the first definition of (usual) limit. We notice that this notion is independent of the (regular) Cesàro convergence, in the sense ${\{(-1)^n\}}$ doesn’t converge in density but the sequence ${\{x_n\}}$ defined by ${x_n=n}$ if ${n=2^k}$ for some ${k}$ and ${x_n=0}$ otherwise, converges in density to ${0}$ but not in the Cesàro sense.

I will now prove the claimed relation between strong Cesàro limit and density limit.

Proposition 5 Let ${\{x_n\}}$ converge strong Cesàro to ${x}$. Then, there is a subset ${S\subset {\mathbb N}}$ of ${0}$ density such that ${\displaystyle\lim_{n\notin S}x_n=x}$.

For ${\epsilon>0}$, the set ${S_\epsilon:=\{n\in {\mathbb N}:|x_n-x|>\epsilon\}}$ has ${0}$ density. Indeed, suppose ${\overline{d}(S_\epsilon)>2\delta>0}$. Then there is some sequence ${\{n_k\}_k}$ increasing to ${\infty}$ such that ${\displaystyle\frac1{n_k}\sum_{i=1}^{n_k}|x_i-x|\geq \frac1{n_k}\sum_{\substack{i\in S_\epsilon\\i\leq n_k}}|x_i-x|>\delta\epsilon}$ for large enough ${k}$ and this contradicts the strong Cesàro convergence.

Let ${N_0=0}$ and for each ${k\in {\mathbb N}}$ let ${N_k}$ be such that if ${N\geq N_k}$ then ${\displaystyle\frac1N|S_{\frac1{k+1}}\cap[1,N]|<\frac1k}$. Then let ${\displaystyle S=\bigcup_{k=1}^\infty S_{1/k}\cap[N_{k-1},N_k]}$. We claim that ${S}$ satisfies the required conditions. Indeed ${S\cap[1,N_k]\subset S_{1/k}\cap[1,N_k]}$, since for ${j we have ${S_{1/j}\subset S_{1/k}}$. Therefore, if ${N\in [N_k,N_{k+1}]}$,

$\displaystyle \frac1N|S\cap[1,N]|\leq \frac1N|S_{\frac1{k+1}}\cap[1,N]<\frac1k$

so ${\overline d(S)=0}$. Also if ${n>N_k}$ and ${n\notin S}$ then ${|x_n-x|<\frac1k}$ so ${\displaystyle\lim_{n\notin S}a_n=a}$ $\Box$

Looking to definitions 1 and 4 we can formulate a general notion of limit by saying that ${x_n\rightarrow x}$ if for all open neighborhood ${U}$ of ${x}$ there is a co-null subset ${A}$ of ${{\mathbb N}}$ such that ${x_n\in U}$ for all ${n\notin A}$. In my previous post on co-null sets I didn’t discuss co-null sets in ${{\mathbb N}}$, however it is quite acceptable that finite sets and sets of ${0}$ density can be classified as null, since the union of finitely many sets in one of those collections is still in that collection, and because ${{\mathbb N}}$ is a countable set we can’t expect to do better than finite unions.

A somewhat artificial way to introduce a notion of co-null sets in ${{\mathbb N}}$ is through the use of a non-principal ultrafilter, i.e. a collection ${p}$ of subsets of ${{\mathbb N}}$ (which will be our co-null sets) satisfying the properties one would expect co-null sets to satisfy, namely:

1. ${\emptyset\notin p}$,
2. If ${A\in p}$ and ${A\subset B}$ then ${B\in p}$,
3. If ${A}$ and ${B}$ are in ${p}$, then ${A\cap B}$ is also in ${p}$,

and two more properties to assure we don’t get trivial issues

4. If ${A\notin p}$ then ${{\mathbb N}\setminus A\in p}$,
5. No finite set is in ${p}$.

A collection ${p}$ satisfying conditions ${1}$ to ${4}$ is called simply a ultrafilter. It’s easy to see that for each ${n\in {\mathbb N}}$, the family of all sets containing ${n}$ is a ultrafilter, and such ultrafilters are called principal. The ${5}$ condition prevents that from happening, and so if ${p}$ also satisfies condition ${5}$ it is called a non-principal ultrafilter. The existence of such ${p}$ requires the axiom of choice.

Definition 6 (p-limit) Let ${p}$ be a ultrafilter and ${X}$ a Hausdorff space. Given a sequence ${\{x_n\}}$ we define it’s limit along ${p}$ by ${\displaystyle p-lim_{n\rightarrow \infty} x_n=x}$ if for all neighborhood ${U}$ of ${x}$ there is some ${A\in p}$ such that ${x_n\in U}$ for all ${n\in A}$.

One nice feature of this notion is that sequences in compact spaces always have limit along ${p}$, without need to pass to a subsequence.

This entry was posted in Classic results. Bookmark the permalink.

### 5 Responses to On different ways to take limits

1. Jon Lam says:

Hi, Joel. Do you think one could replace N in Def. 1, 2, 3, 4 by a more general space? I think at least we have Def. 1 for general top. space X, with A compact set. Cesaro sum looks like normalized integral to me, with Dirac measure in your particular case.
With all these, do we still have Prop. 5(with appropriate definition of density) ?

2. Joel Moreira says:

One can indeed define limits for a function $x(n)$ with domain in a general topological space (instead of just functions with support in $\mathbb{N}$, which are sequences) when $n$ goes to infinity (and now $n$ can be in any topological space) replacing in definition 1 the condition on A to be a compact set.

In definitions 2 and 3 we take the average over some set, and then we change the set and consider the limit of those averages. A way I can see to make a more general definition in the spirit of definition 2 is the following:
Let $X$ be a measure space, and for each $n\in \mathbb{N}$ let $A_n\subset X$ be a measurable set such that the sequence $\{A_n\}$ is either increasing and $\bigcup A_n=X$ or decreasing and $\bigcap A_n=\{x_0\}$ for some $x_0\in X$. Then for a function $f:X\to \mathbb{C}$ its Cesàro limit along $\{A_n\}$ is the limit of the averages of the function on $A_n$ (if this limit exists).

An application of this formulation (in the case when $X=\mathbb{R}^n$ and $A_n$ are shrinking balls around some fixed point $x_0$) is the Lebesgue Differentiation Theorem.