## Primes of the form x^2+2y^2

In this post I will present a quite nice proof of the following fact from elementary number theory:

Theorem 1 Let ${p\in{\mathbb N}}$ be a prime number. There are ${x,y\in{\mathbb N}}$ such that ${x^2+2y^2=p}$ if and only if ${-2}$ is a quadratic residue ${\mod p}$.

Recall that ${-2}$ is a quadratic residue ${\mod p}$ if there exist ${z,k\in{\mathbb N}}$ such that ${z^2+2=kp}$, or equivalently if ${z^2\equiv-2\mod p}$. I recently learned a short proof of Theorem 1 using a nice result of Minkowski in geometry.

One of the directions of Theorem 1 is quite simple: if there are ${x,y\in{\mathbb N}}$ such that ${x^2+2y^2=p}$, then ${y}$ is co-prime to ${p}$ and hence there exists ${u\in{\mathbb N}}$ such that ${uy\equiv1\mod p}$. We have

$\displaystyle y^2\big((ux)^2+2\big)=(yux)^2+2y^2\equiv x^2+2y^2\equiv0\mod p$

Therefore ${y^2\big((ux)^2+2\big)}$ is a multiple of ${p}$ and hence ${(ux)^2\equiv-2\mod p}$. In the next section I will give an elementary proof of the other direction. Observe that, as a corollary of Theorem 1 we obtain the following:

Corollary 2 There are infinitely many primes of the form ${x^2+2y^2}$.

Proof: Assume, for the sake of a contradiction, that there are only finitely many primes, say ${p_1,p_2,\dots,p_n}$. Note that ${2}$ is in this list. Let

$\displaystyle z=\left(\prod_{i=1}^n p_i\right)^2+2$

Observe that ${x\equiv2\mod 4}$, so ${z/2}$ is odd. Let ${p}$ be a prime that divides ${z/2}$. Then ${p}$ is odd, and hence it is not on the list ${p_1,\dots,p_n}$ (because otherwise ${p}$ would divide ${z}$ and ${z-2}$). However, ${-2}$ is a quadratic residue ${\mod p}$ and by Theorem 1 ${p}$ is of the form ${x^2+2y^2}$, which is the desired contradiction. $\Box$

— 1. From number theory to geometry —

Let ${p}$ be a prime number for which ${-2}$ is a quadratic residue. The idea to prove (the hard direction of) Theorem 1 is that it suffices to find ${x,y\in{\mathbb N}}$ such that

$\displaystyle x^2+2y^2<2p\qquad\text{ and }\qquad x^2+2y^2\equiv0\mod p$

With this in mind we define the sets

$\displaystyle A=\big\{(x,y)\in{\mathbb R}^2:x^2+2y^2<2p\big\} \qquad B=\big\{(x,y)\in{\mathbb Z}^2:x^2+2y^2\equiv0\mod p\big\}$

We need to show that ${A\cap B\neq\{(0,0)\}}$. Let ${z\in{\mathbb N}}$ be such that ${z^2\equiv-2\mod p}$ and ${z. Note that both ${(z,1)\in B}$ and ${(p,0)\in B}$. Let ${C}$ be the subgroup of ${{\mathbb R}^2}$ generated by ${(z,1)}$ and ${(p,0)}$ and note that ${C\subset B}$. Let ${F}$ be the parallelogram with vertices ${(0,0),(z,1),(p,0),(z+p,1)}$. In other words

$\displaystyle F=\big\{(x,y)\in {\mathbb R}^2:0\leq y<1\quad \text{ and }\quad zy\leq x

Observe that translations of ${F}$ by elements in ${C}$ form a tilling of the plane. More precisely, I claim that for any ${(a,b)\in{\mathbb R}^2}$ there exists some point ${(x,y)\in C}$ such that ${(a-x,b-y)\in F}$. Indeed, let

$\displaystyle (x,y)=\left(\lfloor b\rfloor z+\left\lfloor\frac{a-\{b\}z}p\right\rfloor p,\lfloor b\rfloor\right)=\lfloor b\rfloor(z,1)+\left\lfloor\frac{a-\{b\}z}p\right\rfloor(p,0)$

where ${\lfloor b\rfloor}$ is the unique integer in the interval ${[b,b+1)}$ and ${\{b\}=b-\lfloor b\rfloor}$. Then ${(x,y)\in C}$, ${0\leq b-y=\{b\}<1}$ and

$\displaystyle \begin{array}{rcl} \displaystyle z\{b\}\leq a-x

which proves the claim. We now turn to ${A}$. Adopting the usual convention

$\displaystyle \frac A2:=\left\{\left(\frac x2,\frac y2\right):(x,y)\in A\right\}=\left\{(x,y)\in{\mathbb R}^2:x^2+2y^2<\frac p2\right\}$

We find that the area of the ellipse ${A/2}$ is ${\pi p/(2\sqrt{2})>p}$. On the other hand, the area of the parallelogram ${F}$ is ${p}$. Thinking again of the tilling of ${{\mathbb R}^2}$ by shifts of ${F}$, we conclude that there exist two points in ${A/2}$ which belong to different shifts of the same point in ${F}$. More precisely, there exist ${(a_1,b_1),(a_2,b_2)\in A/2}$ and ${(x,y)\in C}$ such that

$\displaystyle (a_1,b_1)-(a_2,b_2)=(x,y)\neq(0,0)$

Since ${A/2}$ is an ellipse centered at ${0}$ we conclude that ${-(a_2,b_2)\in A/2}$ and thus ${(a_1,b_1)-(a_2,b_2)\in A}$. The point ${(x,y)}$ just found is then in both ${A}$ and ${C\subset B}$ and is not ${(0,0)}$, so we have ${|x|,|y|\in{\mathbb N}}$ (because ${p}$ is not a square, so ${x}$ can’t be ${0}$ and neither can ${y}$) and ${|x|^2+|y|^2=p}$ as desired.

— 2. Minkowski theorem —

In this less elementary section I will explore the argument used in the previous section and formalize it. Let ${d\in{\mathbb N}}$. A lattice in ${{\mathbb R}^d}$ is a discrete subgroup ${\Lambda}$ which linearly generates the whole ${{\mathbb R}^d}$ as a vector space over ${{\mathbb R}}$ or, equivalently, it is a subgroup such that the quotient group ${F:={\mathbb R}^d/\Lambda}$ is compact (for the quotient topology). We will denote by ${\lambda}$ the natural projection from ${{\mathbb R}^d}$ to ${F}$.

As an example take ${d=2}$ and let ${\Lambda}$ be the set ${C}$ in the proof of Theorem 1 above. Another example is when ${\Lambda={\mathbb Z}^2}$. A non-example is ${\Lambda={\mathbb Z}\times\{0\}}$, still with ${d=2}$, because the quotient space can be identified with an infinite vertical strip of width ${1}$ and is not compact.

Lemma 3 Let ${d\in{\mathbb N}}$, let ${\Lambda}$ be a lattice in ${{\mathbb R}^d}$ and let ${F={\mathbb R}^d/\Lambda}$ be the quotient group. Denote by ${\ell}$ the (${d}$-dimensional) Lebesgue measure and, for a Borel set ${A\subset F}$, let

$\displaystyle \mu(A)=\inf\left\{\ell(\tilde A):A\subset\lambda(\tilde A)\right\}\ \ \ \ \ (1)$

Then ${\mu}$ is (a normalization of) the Haar measure in ${F}$.

Proof: We have trivially that ${\mu(\emptyset)=0}$ and ${\mu(A)\geq0}$ for every measurable set ${A}$. Thus all that is needed for ${\mu}$ to be a measure is being countably additive.

Let ${(A_n)_{n\in{\mathbb N}}}$ be a sequence of disjoint measurable sets in ${F}$ and let ${A=\bigcup_{n\in{\mathbb N}}A_n}$. Fix ${\epsilon>0}$ and, for each ${n\in{\mathbb N}}$, let ${B_n\subset{\mathbb R}^d}$ be such that ${A_n\subset\lambda(B_n)}$ and ${\mu(A_n)>\ell(B_n)-\epsilon/2^n}$. Intersecting ${B_n}$ with ${\lambda^{-1}(A_n)}$ if needed, we can assume that ${(B_n)}$ is a disjoint family (to see this, assume ${x\in B_n\cap B_m}$, then ${\lambda(x)\in A_n\cap A_m=\emptyset}$ which is a contradiction). Thus we have that ${B:=\bigcup_{n\in{\mathbb N}}B_n}$ has measure

$\displaystyle \ell(B)=\sum_{n\in{\mathbb N}}\ell(B_n)<\sum_{n\in{\mathbb N}}\mu(A_n)+\epsilon$

Since ${A\subset\lambda(B)}$ and ${\epsilon>0}$ was arbitrary, we conclude that ${\mu(A)\leq\sum_{n\in{\mathbb N}}\mu(A_n)}$.

To prove the other inequality, let ${B\subset{\mathbb R}^d}$ be such that ${A\subset\lambda(B)}$ and ${\mu(A)>\ell(B)-\epsilon}$. For each ${n\in{\mathbb N}}$, let ${B_n=B\cap\lambda^{-1}(A_n)}$. Since ${A\subset\lambda(B)}$, we have ${A_n\subset\lambda(B_n)}$. Thus ${\mu(A_n)\leq\ell(B_n)}$ for each ${n}$. Since ${(A_n)}$ are disjoint, then also ${(B_n)}$ are disjoint. Summing over ${n}$ we deduce

$\displaystyle \sum_{n\in{\mathbb N}}\mu(A_n)\leq\sum_{n\in{\mathbb N}}\ell(B_n)\leq\ell(B)<\mu(A)+\epsilon$

Since ${\epsilon}$ is arbitrary we have ${\mu(A)=\sum_{n\in{\mathbb N}}\mu(A_n)}$. This implies that ${\mu}$ is a measure on ${F}$.

The fact that ${\mu}$ is invariant follows directly from the fact that ${\ell}$ is invariant and ${\lambda}$ is a group homomorphism. $\Box$

Equivalently one can identify ${F}$ with a subset of ${{\mathbb R}^d}$ (a fundamental domain) and take ${\mu}$ to be the restriction of ${\ell}$ to ${F}$, but we will not do this here.

We can now formulate Minkowski’s theorem, which is really behind the proof of Theorem 1:

Theorem 4 (Minkowski) Let ${d\in{\mathbb N}}$ and let ${\Lambda}$ be a lattice in ${{\mathbb R}^d}$. Let ${F={\mathbb R}^d/\Lambda}$ and let ${\mu}$ be the normalization of the Haar measure on ${F}$ given by (1). Let ${A\subset{\mathbb R}^d}$ be symmetric with respect to the origin and convex. If ${\ell(A)>2^d\mu(F)}$, there exists a nonzero point ${x\in A\cap\Lambda}$.

Proof: Let ${B=A/2=\left\{x/2:x\in A\right\}}$. Then ${\ell(B)>\mu(F)}$ and I claim that there exist two points ${x,x'\in B}$ such that ${\lambda(x)=\lambda(x')}$.

To see this we can use equation (1) to find a set ${\tilde F\subset{\mathbb R}^d}$ such that ${\ell(\tilde F)<\ell(B)}$ and ${F=\lambda(\tilde F)}$. For each ${\gamma\in\Lambda}$ define

$\displaystyle B_\gamma=\{x\in B:\exists y\in \tilde F:x-y=\gamma\}=B\cap(\tilde F+\gamma)$

Since ${\gamma(\tilde F)=F}$ we have that for every ${x\in{\mathbb R}^d}$ there exists some ${y\in\tilde F}$ such that ${\lambda(x)=\lambda(y)}$ and hence ${x\in \tilde F+\gamma}$ for some ${\gamma\in\Lambda}$. In particular we have that

$\displaystyle B\subset\bigcup_{\gamma\in\Lambda}B_\gamma$

and hence

$\displaystyle \ell(B)=\sum_{\gamma\in\Lambda}\ell(B_\gamma)$

Using the fact that ${\ell}$ is translation invariant we deduce that

$\displaystyle \ell(B)=\sum_{\gamma\in\Lambda}\ell(B_\gamma-\gamma)$

But we have that ${B_\gamma-\gamma\subset\tilde F}$, by construction. Thus the sets ${\{B_\gamma-\gamma:\gamma\in\Lambda\}}$ can not be disjoint. Let ${\gamma_1\neq\gamma_2}$ be in ${\Lambda}$ and such that ${(B_{\gamma_1}-\gamma_1)\cap(B_{\gamma_2}-\gamma_2)\neq\empty}$ and let ${y}$ be in that intersection. Then both ${y+\gamma_1,y+\gamma_2}$ are in ${B}$ and ${\lambda(y+\gamma_1)=\lambda(y+\gamma_2)}$, proving the claim.

So let ${x\neq x'\in B}$ be such that ${x-x'\in\Lambda}$. Then also ${-x'\in B}$ because ${X}$ is symmetric with respect to the origin, and hence so is ${B}$. Since ${A}$ is convex we conclude that ${x-x'\in A}$. $\Box$

The method from this post illustrates how one can use a variety of tools, from measure theory and geometry to algebra, to deduce a result in number theory. It looks like this could be the beginning of general result, but the fact that the set ${C}$ in the proof of Theorem 1 is a lattice does not generalize to higher dimensions. One can still find a lattice of solutions for the local restriction, but the volume of the fundamental domain will be of the form ${p^{d-1}}$, which does not work well with the rest of the argument.

This entry was posted in Classic results, Number Theory and tagged , , . Bookmark the permalink.