There are fundamental rules of the universe that I don't yet understand. And for some reason, one of them seems to spell out: "thou shall use complex-valued analysis to study the behavior of prime numbers".
I am not at all a number theory expert and I am not quite sure what shape of explanation you are looking for here. One possible explanation I have here though is that you might be missing the forest for all the trees. From my outsider perspective the connection is already obvious in your introduction: Prime numbers -> modular arithmetic ~= arithmetic on circles -> complex numbers
If you have a problem involving operations on circles throwing complex analysis at it seems like the type of thing you would want to throw at the problem. The arrow Prime numbers -> modular arithmetic seems actually more worthy of a good compressed explanation.
The problem I'm trying to understand is more of a meta/proof-theoretic one: why do some arithmetical claims have a proof only when passing through non-arithmetical language?
Formally, we have the Peano Axioms which characterize all first-order properties of . An example statement for the ternary Goldbach conjecture:
has a proof in Peano Axioms, but the only proof found so far passes through complex analysis. This means that the only known way to prove in PA is to encode all properties of complex analysis within the language of natural numbers.
This sounds horribly inefficient; intuitively, it sounds like that any "natural" statement provable in PA should be provable using tools from this system, and not by encoding concepts from a different field.
This does relate to some interesting concepts in logic, like Godel's speed-up theorem. I didn't mention these considerations at all in my post, might be good material for a part II 🙂
I also find instances of this phenomenon very interesting. One example I can think of is in differential geometry where statements about manifolds are often easier proved (and in many cases, so far only proved) by routing through Riemannian geometry (since via partitions of unity, a manifold can always be given a Riemannian metric) and manipulating the manifold primarily through its metric structure (eg Ricci Flow) - even though the original statement doesn't invoke metrics at all!
The problem I'm trying to understand is more of a meta/proof-theoretic one: why do some arithmetical claims have a proof only when passing through non-arithmetical language?
I agree this is an interesting question. Thanks for pointing me to the speed-up theorem. I didn't know about that one. :)
This sounds horribly inefficient; intuitively, it sounds like that any "natural" statement provable in PA should be provable using tools from this system, and not by encoding concepts from a different field.
Yeah, I don't share that intuition. It feels like if that was true, there would be no other fields and everyone would be using arithmetic for everything at all times. I guess your phrasing of "natural" is doing a lot of work here.
Yeah, there's definitely a few relevant things here:
The post author seems to already know a lot of math, so I guess they're looking for a deeper kind of answer.
I found this Terry Tao blog post helpful. In particular, this paragraph,
It is difficult to prove that no conspiracy between the primes exist. However, it is not entirely impossible, because we have been able to exploit two important phenomena. The first is that there is often a “all or nothing dichotomy” (somewhat resembling the zero-one laws in probability) regarding conspiracies: in the asymptotic limit, the primes can either conspire totally (or more precisely, anti-conspire totally) with a multiplicative function, or fail to conspire at all, but there is no middle ground. (In the language of Dirichlet series, this is reflected in the fact that zeroes of a meromorphic function can have order 1, or order 0 (i.e. are not zeroes after all), but cannot have an intermediate order between 0 and 1.) As a corollary of this fact, the prime numbers cannot conspire with two distinct multiplicative functions at once (by having a partial correlation with one and another partial correlation with another); thus one can use the existence of one conspiracy to exclude all the others. In other words, there is at most one conspiracy that can significantly distort the distribution of the primes. Unfortunately, this argument is ineffective, because it doesn’t give any control at all on what that conspiracy is, or even if it exists in the first place!
But I'm not sure how much this is just restating the problem.
One of the weirdest things in mathematics is how completely unrelated fields tend to connect with one another.
A particularly interesting case is the play between number theory (the study of natural numbers N={0,1,2…}) and complex analysis (the study of functions on C={a+bi:a,b∈R}). One is discrete and uses modular arithmetic and combinatorics; one is continuous and uses integrals and epsilon-delta proofs. And yet, many papers in modern number theory use complex analysis. Even if not directly, basically all of them rely on some result with a complex analytic flavor.
What's going on?
This post gives an example of a fundamental result in number theory that uses complex analysis, and tries to explain where that usage is coming from. My writing is semi-technical, I'm not trying to formalize everything, but mostly doing this as an exercise to clear out my own confusion (and hopefully to convince others of why this mathematical connection is valuable).
Technical details in the proof follow Ang Li’s notes; exposition, intuition and possible mistakes are mine.
I Dirichlet's Theorem
We know that there are infinitely many prime numbers since around 300 BC. Over 2000 years later, Dirichlet proved that there are infinitely many primes even if we restrict ourselves to an arithmetic progression of the form a,a+q,a+2q,…, as long as a,q have no common factors.
I'll focus on a concrete example: there are infinitely many prime numbers ending in the digit 7, that is, primes of the form p=10n+7.
This statement can be understood by a curious 7-year-old: she could learn about prime numbers, then write out examples of them - her own age, 17, 37, ... and then be told that, although such numbers ending in 7 become harder and harder to find, they will always be there, beyond whatever limits we imagine.
If the statement is so easy to phrase, why does it require complex analysis?
Do we really need to use a number defined as the square root of -1, as well as computing infinite sums and integrals, just to handle primes ending in 7?
You might say that this is necessary because we're working with an "infinite" object, and that necessarily requires tools from analysis. But that's not dissipating the mystery. Here's a simpler statement:
This statement has a finite "elementary" proof. In fact, I can guide you on how to construct one: start with N=1010100+1 and keep adding 1 until you get that 10N+7 is prime (as verified by your favorite primality test). This search will stop at some point when we get to a definite prime. Done.
This template of a proof clearly isn't enough; how do we know for sure that it will stop? we can't actually run the search, because just writing out N requires more space than there are atoms in the known universe.
And yet, we know for sure that this algorithm will stop with a definite prime. And we got this knowledge from properties of log(z) and i=√−1. But why?
II Euler's proof for infinitude of primes
First we'll prove the easier result: that there are infinitely many primes, without restricting their last digit.
Euler came up with an interesting proof, that shows something stronger than that: the sum ∑p1p, made by summing over all primes p, diverges to infinity.
This implies the infinitude of primes, because in order for a sum of finite terms to have an infinite value, it must have infinitely many terms. It's also a nice trick to turn a "combinatorial" problem (about the amount of prime numbers) to an "analytic" problem (about the divergence of a certain sum).
∑n≥11n=∞.How would we go about proving something like that?
Well, we have one classic example of a divergent infinite sum - the harmonic series:
We want some way to express this sum in terms of prime numbers.
Primes are the building blocks of the natural numbers: each number can be written as a unique product of prime numbers. So it makes sense to write this sum in terms of products involving prime numbers.
Let's first sum over natural numbers whose only prime factors are 2 and 3. We'll denote this set by N2,3:={2a3b|a,b≥0}. We have:
∑n∈N2,31n=∑a,b≥012a3b.We can use the distributive law, to write this as the product of two distinct sums:
∑n∈N2,31n=(∑a≥012a)⋅(∑b≥013b).Each sum on the right-hand side is an infinite geometric series. We know that ∑m≥0rm=11−r whenever the sum converges. So we have a simple closed formula:
∑n∈N2,31n=(11−1/2)⋅(11−1/3).This is when summing over numbers whose only prime factors are 2 and 3. If we add 5 as a prime factor, by summing over N2,3,5:={2a3b5c|a,b,c≥0}, we get:
∑n∈N2,3,51n=∑a,b,c≥012a3b5c=(1+12+122+⋯)⋅(1+13+132+⋯)⋅(1+15+152+⋯)=(11−1/2)⋅(11−1/3)⋅(11−1/5).And we carry on, each time adding a new prime factor to our set. This makes the left-hand side sum over more natural numbers, while the right-hand side gets more factors.
Since all natural numbers are products of some prime numbers, in the limit we get the identity:
∑n≥11n=∏p(11−1/p)(∗)where p ranges over all prime numbers. This is the key identity for our purposes.[1]
On itself, (∗) is already enough to show the infinitude of primes; the LHS is infinite, while the RHS is a product of terms, one term for each prime number. This only makes sense if the product has infinitely many terms.
But still, we want to show that ∑p1p diverges, which is a stronger requirement. We wish to use (∗), but it involves products over prime numbers, not sums over prime numbers.
Which mathematical machinery can we use to turn products into sums?
Oh, right; the natural logarithm satisfies log(xy)=log(x)+log(y). This holds for infinite products/sums as well:
log(∏p(11−1/p))=∑plog(11−1/p)⟹log(∑n≥11n)=∑plog(11−1/p).(∗∗)The function log(11−x) has a nice Taylor expansion: log(11−x)=x+x22+x33+⋯. We only care about asymptotic behavior, so for small x, we can approximate log(11−x)≈x.[2]
Substituting that into (∗∗), we get:
log(∑n≥11n)≈∑p1p.(⋆)Since the left term diverges, the right term also diverges, so ∑p1p=∞. Done.
III Trying the same for Dirichlet
This was a cool trick; we proved a purely number-theoretic result using tools from analysis, like taking the Taylor series of a logarithm.
Can we do the same thing for proving the infinitude of primes of the form 10n+7?
Let δ7(n) denote the function that returns 1 whenever n≡7mod10, and 0 otherwise. We want to show that infinitely many primes satisfy δ7(p)=1. Dirichlet actually shows the stronger claim, that ∑pδ7(p)p=∞ (remember that p ranges over prime numbers while n ranges over all natural numbers). How do we do that?
We can try a similar technique to Euler's proof. The analogue of identity (∗) would be:
∑n≥1δ7(n)n=∏p(1+δ7(p)p+δ7(p)2p2+⋯)=∏p(11−δ7(p)/p).Unfortunately, this doesn't hold. The second equality is still valid, but the first one fails, because δ7 is not a multiplicative function; we have 1ab=1a⋅1b but we don't have δ7(ab)=δ7(a)⋅δ7(b). (Concretely, δ7(3)=δ7(9)=0 but δ7(3⋅9)=1.)
But this identity does hold whenever we replace δ7 with a multiplicative function χ, one that satisfies χ(ab)=χ(a)χ(b) for all natural numbers a,b. That is, we have for such χ:
∑n≥1χ(n)n=∏p(11−χ(p)/p).Euler's original proof used this identity for the constant function χ(n)=1.
How do we use this property for analyzing δ7?
The main idea is to write δ7 as a linear combination of multiplicative functions. If we analyze each χ separately, combining them gives us insight to the resulting function.
Well, δ7(n) is a periodic function on the integers of period 10. It's a signal, or a heart beat. It beats with the value 1 every 10 integers, and is flat elsewhere.
We already have a technique to write a general periodic signal as a sum of well-behaved components - a Fourier sum:
Fourier series and Fourier transforms are examples of a much more general concept. I won't get into the details here, but if you have a group G satisfying some properties, then you can write every (suitably regular) function f:G→C as a combination of multiplicative functions χ:G→C×. Fourier transforms act on G=R, Fourier sums act on the circle group G=R/Z, while our example acts on the group G=(Z/10Z)∗ of invertible elements modulo 10. (Don't worry if you are not familiar with groups.)
The main point is that, given a function f:N→R with period 10, we can always[3] write it as a linear combination of multiplicative functions χ:N→C of the same period. These multiplicative components are called characters.
Allowing χ to range over complex numbers is necessary; just using real-valued functions isn't enough. This is similar to how Fourier sums require both a cosine and a sine component, where considering only real-valued components only gives us the cosine part. This is the reason why complex numbers are fundamental to our proof - we need roots of unity ω=e2πi/k in order to have a rich choice of characters (the real numbers only have +1,−1 as possible character values).
Finding all characters modulo 10 is rather easy. Here they are:
There are 4 such characters in total: χ0,χ1,χ2,χ3. Since they are 10-periodic, it's enough to write out their values on the set {0,1,…,9}. Also, χ(n)=0 whenever n and 10 have a common divisor, so we can throw out all multiples of 2 and 5; we're left with just 4 values n=1,3,7,9 that χ has a non-zero value on. These values are described in the table above.
One can check that all characters defined above are multiplicative: χ(ab)=χ(a)χ(b). (This is when working modulo 10, so for example, 3⋅9≡7.)
Finally, we can see that the following identity holds (one can check it explicitly for n=1,3,7,9):
δ7(n)=14(χ0(n)+iχ1(n)−iχ2(n)−χ3(n)).Going back, we wanted to show that ∑pδ7(p)p diverges. We'll write it as a sum of these 4 components:
∑pδ7(p)p=14(∑pχ0(p)p+i∑pχ1(p)p−i∑pχ2(p)p−∑pχ3(p)p).(†)We know that the first term diverges: ∑pχ0(p)p=∞. This is because all primes besides p=2,5 satisfy χ0(p)=1; this is almost equivalent to ∑p1p, which was already proven to diverge.
What about the three other components?
They involve sums of the form ∑pχ(p)p, where χ is a non-trivial character; it's not always equal to 1, and we may have χ(n)=−1,+i,−i as well.
Looking again into the table above, we see that besides χ0, all other rows sum to 0. That is, χ sums to 0 when summing over all integers modulo 10:
9∑n=0χ(n)=0.Fourier series have a similar phenomenon; there is the "constant" component cos(0x)=1, but all other components cos(nx),sin(nx) for n≥1 create a periodic "wave" that cancels out:
∫2π0cos(nx)dx=∫2π0sin(nx)dx=0.This is not a coincidence because nothing is ever a coincidence.
Since χ(p) has a cancelling-out behavior, we might expect that unlike for χ0, ∑pχ(p)p remains bounded; this is like how ∫∞1sin(x)xdx<∞ even though ∫∞11xdx=∞. This does turn out to be the case.
After proving that, we look back in (†): The right-hand side is a sum of 4 components. The first one is infinite, while the other three are bounded. Intuitively (this can be formalized), divergent sum + bounded sum = divergent sum. Thus, the entire expression is divergent. Since the left-hand side is a sum over all primes p≡7mod10 (these are the only primes with δ7(p)≠0), the number of such elements must be infinite. Done.
IV Wrapping up (or: getting started)
Of course, we're not really done. We have to prove that ∑pχ(p)p is bounded whenever χ is a non-trivial character modulo 10.
A first step would be to revisit formula (⋆), that relates sums over all natural numbers to sums over all prime numbers. The same reasoning as above, while using the fact that χ is multiplicative, gives us:[4]
log(∑n≥1χ(n)n)=log(∏p(11−χ(p)/p))≈∑pχ(p)p.This means that to show that ∑pχ(p)p is bounded, it's enough to show that the infinite series L(χ):=∑n≥1χ(n)n converges to a non-zero number. (If L(χ)=0 then log(L(χ)) diverges, breaking our argument.)
Showing that ∑n≥1χ(n)n converges to a finite number turns out to be pretty easy; we use summation by parts, a discrete analogue of integration by parts, the same technique which shows that ∫∞1sin(x)xdx converges.
So, the only thing keeping us from showing that there are infinitely many prime numbers ending with 7, is the following claim:
There are three sums to check: L(χ1),L(χ2),L(χ3) (for χ as defined in the table above). Each one of them can be calculated explicitly to a high-enough degree of accuracy, showing that the final sum escapes 0. So, by doing additional finite calculation, we are truly done.
The next step is to prove the general form of Dirichlet's theorem: if a,q are coprime, then a+nq contains infinitely many prime numbers. Our proof only worked for q=10.
The same techniques work for other q, but this time, we can't calculate L(χ) explicitly; there are infinitely many characters to check.
In the general case proving L(χ)≠0 is very hard - in fact, this is the main difficulty of the proof!
This is a typical phenomenon in analytic number theory: given some problem involving prime numbers, there is a standard way to encode it in the language of complex numbers (typically using the Riemann zeta function or other character sums like L(χ)), and the hard part is to show that some analytic bound holds.
V Looking ahead
After studying enough number theory, proofs like the one above start feeling more intuitive to me. I already know how the arguments take place, what's the role of each component, how they play with each other and what are the main difficulties.
At some point, I was even able to give my own new observations and prove them using similar techniques.
Unfortunately, after all that, I notice that I am still confused.
There are fundamental rules of the universe that I don't yet understand. And for some reason, one of them seems to spell out: "thou shall use complex-valued analysis to study the behavior of prime numbers".
This is not always the case; some well-known statements in number theory which were proven with complex-analytic methods were later proved by other means. In fact, Dirichlet's theorem is one of them! But I am still not satisfied, because:
This is far from being the only mystery in mathematics. When going deep enough into some field, we see more and more surprising connections. Homotopy type theory is one of them; Monstrous moonshine is another. I chose to introduce a specific connection I'm more familiar with. If anyone has insights into these other connections, please share them here.
Maybe studying more math will resolve my confusion. But until then, I remain awestruck by the unreasonable deepness of number theory.
Since the sum ∑n≥11/n diverges, doing algebraic manipulations on this series is tricky; this can often result in nonsense results like 1+2+3+4+⋯=−1/12. The formal proof involves the series ∑n≥11/ns instead, for s>1, and studying the asymptotic behavior as s→1+. I ignore this (important) detail here.
Formally, we say that log(1/(1−x))=x+O(x2). Instead of saying things are approximately equal, we show an explicit bound on the error term, and note that the total error ∑pp−2 is finite.
Assuming f(n)=0 whenever n and 10 have a common divisor.
Defining log(z) over complex numbers is non-trivial, as log is a multi-branch function. Li's notes (link above) show how to settle this over the required domain.