Primes of the form x^2+2y^2

In this post I will present a quite nice proof of the following fact from elementary number theory:

Theorem 1 Let {p\in{\mathbb N}} be a prime number. There are {x,y\in{\mathbb N}} such that {x^2+2y^2=p} if and only if {-2} is a quadratic residue {\mod p}.

Recall that {-2} is a quadratic residue {\mod p} if there exist {z,k\in{\mathbb N}} such that {z^2+2=kp}, or equivalently if {z^2\equiv-2\mod p}. I recently learned a short proof of Theorem 1 using a nice result of Minkowski in geometry.

One of the directions of Theorem 1 is quite simple: if there are {x,y\in{\mathbb N}} such that {x^2+2y^2=p}, then {y} is co-prime to {p} and hence there exists {u\in{\mathbb N}} such that {uy\equiv1\mod p}. We have

\displaystyle y^2\big((ux)^2+2\big)=(yux)^2+2y^2\equiv x^2+2y^2\equiv0\mod p

Therefore {y^2\big((ux)^2+2\big)} is a multiple of {p} and hence {(ux)^2\equiv-2\mod p}. In the next section I will give an elementary proof of the other direction. Observe that, as a corollary of Theorem 1 we obtain the following:

Corollary 2 There are infinitely many primes of the form {x^2+2y^2}.

Proof: Assume, for the sake of a contradiction, that there are only finitely many primes, say {p_1,p_2,\dots,p_n}. Note that {2} is in this list. Let

\displaystyle z=\left(\prod_{i=1}^n p_i\right)^2+2

Observe that {x\equiv2\mod 4}, so {z/2} is odd. Let {p} be a prime that divides {z/2}. Then {p} is odd, and hence it is not on the list {p_1,\dots,p_n} (because otherwise {p} would divide {z} and {z-2}). However, {-2} is a quadratic residue {\mod p} and by Theorem 1 {p} is of the form {x^2+2y^2}, which is the desired contradiction. \Box

— 1. From number theory to geometry —

Let {p} be a prime number for which {-2} is a quadratic residue. The idea to prove (the hard direction of) Theorem 1 is that it suffices to find {x,y\in{\mathbb N}} such that

\displaystyle x^2+2y^2<2p\qquad\text{ and }\qquad x^2+2y^2\equiv0\mod p

With this in mind we define the sets

\displaystyle A=\big\{(x,y)\in{\mathbb R}^2:x^2+2y^2<2p\big\} \qquad B=\big\{(x,y)\in{\mathbb Z}^2:x^2+2y^2\equiv0\mod p\big\}

We need to show that {A\cap B\neq\{(0,0)\}}. Let {z\in{\mathbb N}} be such that {z^2\equiv-2\mod p} and {z<p}. Note that both {(z,1)\in B} and {(p,0)\in B}. Let {C} be the subgroup of {{\mathbb R}^2} generated by {(z,1)} and {(p,0)} and note that {C\subset B}. Let {F} be the parallelogram with vertices {(0,0),(z,1),(p,0),(z+p,1)}. In other words

\displaystyle F=\big\{(x,y)\in {\mathbb R}^2:0\leq y<1\quad \text{ and }\quad zy\leq x<zy+p\big\}

Observe that translations of {F} by elements in {C} form a tilling of the plane. More precisely, I claim that for any {(a,b)\in{\mathbb R}^2} there exists some point {(x,y)\in C} such that {(a-x,b-y)\in F}. Indeed, let

\displaystyle (x,y)=\left(\lfloor b\rfloor z+\left\lfloor\frac{a-\{b\}z}p\right\rfloor p,\lfloor b\rfloor\right)=\lfloor b\rfloor(z,1)+\left\lfloor\frac{a-\{b\}z}p\right\rfloor(p,0)

where {\lfloor b\rfloor} is the unique integer in the interval {[b,b+1)} and {\{b\}=b-\lfloor b\rfloor}. Then {(x,y)\in C}, {0\leq b-y=\{b\}<1} and

\displaystyle \begin{array}{rcl} \displaystyle z\{b\}\leq a-x<z\{b\}+p&\iff&\displaystyle z\{b\}\leq a-\lfloor b\rfloor z-\left\lfloor\frac{a-\{b\}z}p\right\rfloor p<z\{b\}+p\\&\iff&\displaystyle 0\leq a-bz-\left\lfloor\frac{a-\{b\}z}p\right\rfloor p<p\\&\iff&\displaystyle 0\leq \frac{a-bz}p-\left\lfloor\frac{a-\{b\}z}p\right\rfloor <1\end{array}

which proves the claim. We now turn to {A}. Adopting the usual convention

\displaystyle \frac A2:=\left\{\left(\frac x2,\frac y2\right):(x,y)\in A\right\}=\left\{(x,y)\in{\mathbb R}^2:x^2+2y^2<\frac p2\right\}

We find that the area of the ellipse {A/2} is {\pi p/(2\sqrt{2})>p}. On the other hand, the area of the parallelogram {F} is {p}. Thinking again of the tilling of {{\mathbb R}^2} by shifts of {F}, we conclude that there exist two points in {A/2} which belong to different shifts of the same point in {F}. More precisely, there exist {(a_1,b_1),(a_2,b_2)\in A/2} and {(x,y)\in C} such that

\displaystyle (a_1,b_1)-(a_2,b_2)=(x,y)\neq(0,0)

Since {A/2} is an ellipse centered at {0} we conclude that {-(a_2,b_2)\in A/2} and thus {(a_1,b_1)-(a_2,b_2)\in A}. The point {(x,y)} just found is then in both {A} and {C\subset B} and is not {(0,0)}, so we have {|x|,|y|\in{\mathbb N}} (because {p} is not a square, so {x} can’t be {0} and neither can {y}) and {|x|^2+|y|^2=p} as desired.

— 2. Minkowski theorem —

In this less elementary section I will explore the argument used in the previous section and formalize it. Let {d\in{\mathbb N}}. A lattice in {{\mathbb R}^d} is a discrete subgroup {\Lambda} which linearly generates the whole {{\mathbb R}^d} as a vector space over {{\mathbb R}} or, equivalently, it is a subgroup such that the quotient group {F:={\mathbb R}^d/\Lambda} is compact (for the quotient topology). We will denote by {\lambda} the natural projection from {{\mathbb R}^d} to {F}.

As an example take {d=2} and let {\Lambda} be the set {C} in the proof of Theorem 1 above. Another example is when {\Lambda={\mathbb Z}^2}. A non-example is {\Lambda={\mathbb Z}\times\{0\}}, still with {d=2}, because the quotient space can be identified with an infinite vertical strip of width {1} and is not compact.

Lemma 3 Let {d\in{\mathbb N}}, let {\Lambda} be a lattice in {{\mathbb R}^d} and let {F={\mathbb R}^d/\Lambda} be the quotient group. Denote by {\ell} the ({d}-dimensional) Lebesgue measure and, for a Borel set {A\subset F}, let

\displaystyle \mu(A)=\inf\left\{\ell(\tilde A):A\subset\lambda(\tilde A)\right\}\ \ \ \ \ (1)


Then {\mu} is (a normalization of) the Haar measure in {F}.

Proof: We have trivially that {\mu(\emptyset)=0} and {\mu(A)\geq0} for every measurable set {A}. Thus all that is needed for {\mu} to be a measure is being countably additive.

Let {(A_n)_{n\in{\mathbb N}}} be a sequence of disjoint measurable sets in {F} and let {A=\bigcup_{n\in{\mathbb N}}A_n}. Fix {\epsilon>0} and, for each {n\in{\mathbb N}}, let {B_n\subset{\mathbb R}^d} be such that {A_n\subset\lambda(B_n)} and {\mu(A_n)>\ell(B_n)-\epsilon/2^n}. Intersecting {B_n} with {\lambda^{-1}(A_n)} if needed, we can assume that {(B_n)} is a disjoint family (to see this, assume {x\in B_n\cap B_m}, then {\lambda(x)\in A_n\cap A_m=\emptyset} which is a contradiction). Thus we have that {B:=\bigcup_{n\in{\mathbb N}}B_n} has measure

\displaystyle \ell(B)=\sum_{n\in{\mathbb N}}\ell(B_n)<\sum_{n\in{\mathbb N}}\mu(A_n)+\epsilon

Since {A\subset\lambda(B)} and {\epsilon>0} was arbitrary, we conclude that {\mu(A)\leq\sum_{n\in{\mathbb N}}\mu(A_n)}.

To prove the other inequality, let {B\subset{\mathbb R}^d} be such that {A\subset\lambda(B)} and {\mu(A)>\ell(B)-\epsilon}. For each {n\in{\mathbb N}}, let {B_n=B\cap\lambda^{-1}(A_n)}. Since {A\subset\lambda(B)}, we have {A_n\subset\lambda(B_n)}. Thus {\mu(A_n)\leq\ell(B_n)} for each {n}. Since {(A_n)} are disjoint, then also {(B_n)} are disjoint. Summing over {n} we deduce

\displaystyle \sum_{n\in{\mathbb N}}\mu(A_n)\leq\sum_{n\in{\mathbb N}}\ell(B_n)\leq\ell(B)<\mu(A)+\epsilon

Since {\epsilon} is arbitrary we have {\mu(A)=\sum_{n\in{\mathbb N}}\mu(A_n)}. This implies that {\mu} is a measure on {F}.

The fact that {\mu} is invariant follows directly from the fact that {\ell} is invariant and {\lambda} is a group homomorphism. \Box

Equivalently one can identify {F} with a subset of {{\mathbb R}^d} (a fundamental domain) and take {\mu} to be the restriction of {\ell} to {F}, but we will not do this here.

We can now formulate Minkowski’s theorem, which is really behind the proof of Theorem 1:

Theorem 4 (Minkowski) Let {d\in{\mathbb N}} and let {\Lambda} be a lattice in {{\mathbb R}^d}. Let {F={\mathbb R}^d/\Lambda} and let {\mu} be the normalization of the Haar measure on {F} given by (1). Let {A\subset{\mathbb R}^d} be symmetric with respect to the origin and convex. If {\ell(A)>2^d\mu(F)}, there exists a nonzero point {x\in A\cap\Lambda}.

Proof: Let {B=A/2=\left\{x/2:x\in A\right\}}. Then {\ell(B)>\mu(F)} and I claim that there exist two points {x,x'\in B} such that {\lambda(x)=\lambda(x')}.

To see this we can use equation (1) to find a set {\tilde F\subset{\mathbb R}^d} such that {\ell(\tilde F)<\ell(B)} and {F=\lambda(\tilde F)}. For each {\gamma\in\Lambda} define

\displaystyle B_\gamma=\{x\in B:\exists y\in \tilde F:x-y=\gamma\}=B\cap(\tilde F+\gamma)

Since {\gamma(\tilde F)=F} we have that for every {x\in{\mathbb R}^d} there exists some {y\in\tilde F} such that {\lambda(x)=\lambda(y)} and hence {x\in \tilde F+\gamma} for some {\gamma\in\Lambda}. In particular we have that

\displaystyle B\subset\bigcup_{\gamma\in\Lambda}B_\gamma

and hence

\displaystyle \ell(B)=\sum_{\gamma\in\Lambda}\ell(B_\gamma)

Using the fact that {\ell} is translation invariant we deduce that

\displaystyle \ell(B)=\sum_{\gamma\in\Lambda}\ell(B_\gamma-\gamma)

But we have that {B_\gamma-\gamma\subset\tilde F}, by construction. Thus the sets {\{B_\gamma-\gamma:\gamma\in\Lambda\}} can not be disjoint. Let {\gamma_1\neq\gamma_2} be in {\Lambda} and such that {(B_{\gamma_1}-\gamma_1)\cap(B_{\gamma_2}-\gamma_2)\neq\empty} and let {y} be in that intersection. Then both {y+\gamma_1,y+\gamma_2} are in {B} and {\lambda(y+\gamma_1)=\lambda(y+\gamma_2)}, proving the claim.

So let {x\neq x'\in B} be such that {x-x'\in\Lambda}. Then also {-x'\in B} because {X} is symmetric with respect to the origin, and hence so is {B}. Since {A} is convex we conclude that {x-x'\in A}. \Box

The method from this post illustrates how one can use a variety of tools, from measure theory and geometry to algebra, to deduce a result in number theory. It looks like this could be the beginning of general result, but the fact that the set {C} in the proof of Theorem 1 is a lattice does not generalize to higher dimensions. One can still find a lattice of solutions for the local restriction, but the volume of the fundamental domain will be of the form {p^{d-1}}, which does not work well with the rest of the argument.

This entry was posted in Classic results, Number Theory and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s