Sums and products

Ege Erdil; Metaculus

This is a linkpost for https://www.metaculus.com/notebooks/10464/sums-and-products/

Much of number theory is about trying to understand the interaction between two different structures on the integers (or the natural numbers): the additive structure and the multiplicative structure.

To clarify what I mean, consider the twin prime conjecture, one of the famous open problems in analytic number theory. The conjecture asserts that there are infinitely many twin primes, in other words, primes such that $p + 2$ is also prime. Being a prime number is a multiplicative property: a natural number $\geq 2$ is defined to be prime if it's only divisible by itself and $1$ . In contrast, adding two to a number is an additive operation. Therefore the twin prime conjecture is asking a question about how these two structures interact with each other.

A substantial chunk of the open problems in number theory are about trying to get a better understanding of the interaction between these two structures. In this notebook, I'll be focusing on one of the most powerful conjectures in this area: the $a b c$ conjecture.

Randomness

Before I talk about any specific problems, I'll take the time to introduce the concept of heuristics based on randomness. These are suggestive probabilistic arguments that "prove" a result under false assumptions that different events are independent when they are really not so.

Let's see how we can "prove" the twin prime conjecture using such methods. Suppose that we pick a large natural number $N$ and we pick a natural number $n$ from the interval $[N, 2 N]$ uniformly at random. What's the chance that $n$ and $n + 2$ form a twin prime pair?

Well, we know by the prime number theorem that there are roughly $\sim N / log N$ primes in this interval, so if we say $E_{1}$ is the event of $n$ being prime and $E_{2}$ is the event of $n + 2$ being prime, we'll have roughly $P (E_{1}) \approx P (E_{2}) \sim 1 / log N$ . We want the probability that both of them are prime, in other words, we want $P (E_{1} \cap E_{2})$ .

So far everything we've done is rigorous. Now we'll introduce the heuristic: we'll assume that the events $E_{1}$ and $E_{2}$ are independent, which amounts to assuming that in this case the additive and multiplicative structures on the integers are "orthogonal" in some sense. Under this assumption, we can deduce

$P (E_{1} \cap E_{2}) = P (E_{1}) \cdot P (E_{2}) \sim \frac{1}{{log}^{2} (N)}$

In other words, the probability of a natural number chosen uniformly at random from $[N, 2 N]$ being the smaller of a pair of twin primes is some constant times $1 / {log}^{2} (N)$ . That tells us that there are roughly $N / {log}^{2} (N)$ such numbers from $N$ to $2 N$ , and now it's easy to see that the total number of twin primes will be infinite since this expression grows without bound as $N \to \infty$ .

This argument strongly suggests that the twin prime conjecture should hold: if it didn't hold, there would have to be a strong anticorrelation between the events of $n$ being prime and $n + 2$ being prime.

On the other hand, it's important to underline the limitations of this kind of argument. Notice that reasoning along the same lines would also give us a strongly suggestive argument that there ought to be infinitely many triplets of primes: in other words, infinitely many primes $p$ such that both $p + 2$ and $p + 4$ are also prime. This is, however, false. One way to see that it's false is to note that no matter what $p$ is, at least one of these three numbers must be divisible by $3$ . Therefore the only way they can all be prime is if one of them is actually equal to $3$ , so the only prime triplet is $3, 5, 7$ . Here our heuristic fails because we've ran into a situation in which there is some interaction happening between the additive and multiplicative structures on the integers.

The general rule is that if we know a particular bit of structure is there, we can adjust our heuristics to incorporate it and sharpen up our results. If we're in the dark, however, then heuristic arguments can't go beyond being suggestive of a result. For an actual proof we have to demonstrate that our heuristic hasn't left out any "hidden structure".

The strongest kind of partial evidence we have of results in number theory is often a suggestive heuristic coupled with similar results being proven in other structures that we might think of as analogous. As I'll show in the next section, both of these are true of the $a b c$ conjecture.

The abc conjecture

The $a b c$ conjecture has a rather peculiar looking statement. It's about the simplest equation we can think of: $a + b = c$ where $a, b, c$ are all natural numbers. The conjecture states that for any given real number $ε > 0$ there are only finitely many solutions of $a + b = c$ such that $a, b, c$ are pairwise coprime and $c > rad (a b c)^{1 + ε}$ .

Here $rad (n)$ is the radical of the natural number $n$ . It's defined to be the product of the distinct primes dividing $n$ . For example, $rad (12) = 6$ , $rad (90) = 30$ , et cetera. The "pairwise coprime" condition means that $a$ and $b$ don't share any prime divisor in common.

Now, let's try to understand what this statement is all about, since on first sight it appears rather bizarre. The intuitive idea of the statement is that almost every solution of the equation $a + b = c$ with $a, b$ coprime should involve a large "mass" of prime factors spread across all of $a, b, c$ if we weigh each prime factor by its logarithm (equivalently, its number of digits). The condition that $a, b$ are coprime is essential for this, since otherwise we could simply take $2^{n} + 2^{n} = 2^{n + 1}$ for any natural number $n$ and we'd have infinitely many counterexamples to the conjecture. Indeed, in this case $rad (a b c) = rad (2^{3 n + 1}) = 2$ and so all we have to do is make $n$ very big and we'll be done.

If you're wondering about the funny $ε$ in the exponent, it does need to be there for the statement to not be trivially false. We'll see the reason behind this when we look at the heuristics behind why we should expect the conjecture to hold, but for the moment it's easy to give a counterexample to the conjecture where $ε = 0$ : let $p$ be prime and take $a = 1, b = 2^{p (p - 1)} - 1, c = 2^{p (p - 1)}$ . The radical of the product in this case is clearly $2 rad (b)$ and since $2^{p (p - 1)} - 1$ is always divisible by $p^{2}$ (this is due to Euler's theorem) we get that $rad (a b c) = 2 rad (b) \leq 2 b / p < 2 c / p$ . By picking $p$ to be a large prime, we can obviously make the ratio of $c$ to $rad (a b c)$ arbitrarily large, so we indeed can't remove the $ε > 0$ from the exponent of the radical.

Morally, the $a b c$ conjecture is saying that, in some sense, $a$ and $b$ sharing prime factors is the only obstruction to the prime factors spread across $a, b, c$ behaving in a random fashion. I'll give a heuristic justification for why randomness should imply the conjecture later - for now let's investigate some of its consequences.

Perhaps the most famous theorem in all of mathematics is Fermat's last theorem: the claim that $a^{n} + b^{n} = c^{n}$ has no nonzero integer solutions when $n \geq 3$ . It was an extremely difficult theroem to prove, but actually if we assume the $a b c$ conjecture the hard part of the problem becomes trivial. This is because the $a b c$ conjecture says that for any $ε > 0$ all but finitely many primitive solutions of the equation must satisfy

$c^{n} < rad ((a b c)^{n})^{1 + ε} = rad (a b c)^{1 + ε} \leq (a b c)^{1 + ε} \leq c^{3 (1 + ε)}$

and now if we pick some value like $ε = 1 / 6$ then we get a contradiction for any $n \geq 4$ . So the $a b c$ conjecture is sufficient to show that there are at most finitely many primitive counterexamples to Fermat's last theorem for any $n \geq 4$ . The conjecture doesn't quite prove that the only solutions are the trivial ones, but it comes very close to doing so.

In fact the inability of the $a b c$ conjecture to get close to proving Fermat's last theorem in the case $n = 3$ is not surprising: this is because other heuristic methods are also unable to see that the theorem should hold for $n = 3$ . For example, a simple heuristic argument is that if we pick $a, b$ to be uniformly distributed on $[N / 2, N]$ , the probability that $a^{3} + b^{3}$ is a perfect cube should be roughly $\sim 1 / N^{2}$ . Since there are $\sim N^{2}$ pairs $(a, b)$ we can choose, this suggests that we should expect a constant number of counterexamples to Fermat's last theorem in every such interval $[N / 2, N]$ and we get an infinite expected value for the total number of counterexamples.

It turns out that the structure that is being missed by the heuristic here is that $a^{3} + b^{3} = c^{3}$ is actually a (homogenized) elliptic curve. I won't go into the details here, but it's another good illustration of how imprudent use of heuristics can lead one to wrong conclusions.

Now that we've seen how the conjecture can be useful, let's get on with the reasons for expecting that it might be true.

Analogous structures

The $a b c$ conjecture has an easy proof in the case of polynomials over a field of characteristic $0$ . If you don't know what a field of characteristic $0$ is, in this section I'll be working with the complex numbers for the sake of concreteness.

The statement has to be modified somewhat when we're working with polynomials over $C$ . It becomes the following: for all nonconstant polynomials $a (x), b (x), c (x)$ , if $gcd (a, b) = 1$ and $a (x) + b (x) = c (x)$ as polynomials then $deg c < deg rad (a b c)$ .

The proof is also surprisingly easy and uses a derivative trick. The key fact is that for a polynomial $p$ the degree of its radical $deg rad (p)$ counts the number of its distinct zeroes. In this case, the $a b c$ conjecture is saying that the number of zeroes that $a, b, c$ have in total has to be bigger than the degree of $c$ . It turns out that we can also identify zeroes of polynomials which have multiplicity by taking greatest common divisors with their derivative, so we have that $rad (p) = \frac{p}{gcd (p, p^{'})}$ . (Here the equality can be up to some overall constant - this doesn't matter for our purposes.)

For example, suppose we're working with $p (X) = X^{2} (X + 1)^{3}$ . If we differentiate this we get $p^{'} (X) = 2 X (X + 1)^{3} + 3 X^{2} (X + 1)^{2} = X (X + 1)^{2} (5 X + 2)$ , and if we now take greatest common divisors we get $gcd (p, p^{'}) = X (X + 1)^{2}$ . We can then compute the radical by

$rad (p) = \frac{p}{gcd (p, p^{'})} = \frac{X^{2} (X + 1)^{3}}{X (X + 1)^{2}} = X (X + 1)$

The reason this alternative way of computing the radical helps us is because the derivative also behaves well with respect to addition: it simply commutes with it. This means in the case of polynomials we have this tool to connect the additive and multiplicative structures with each other and this makes our life a lot easier.

Now for the proof: write the original equation as

$\frac{a}{b} + 1 = \frac{c}{b}$

and multiply through by $b^{2}$ after taking derivatives to obtain

$F = a^{'} b - b^{'} a = c^{'} b - b^{'} c$

Now, $F$ is divisible by all of $gcd (a, a^{'})$ , $gcd (b, b^{'})$ and $gcd (c, c^{'})$ for obvious reasons. Since $a, b, c$ are all pairwise coprime, this means $F$ is actually divisible by their product, and so

$deg F \geq deg gcd (a, a^{'}) + deg gcd (b, b^{'}) + deg gcd (c, c^{'})$ $= deg a + deg b + deg c - deg rad (a b c)$

On the other hand, $deg F = deg a + deg b - 1$ , so it follows that

$deg c + 1 \leq deg rad (a b c)$

which is what we wanted to prove.

Since polynomial rings over fields are analogous to the integers in many ways, we might think that even if we don't have access to differentiation as a tool to prove results over the integers, a similar result to the one we've proved here should still hold. If it didn't, there would have to be some relevant difference between the two structures which would explain their divergent properties.

Heuristics

The second source of partial evidence for the $a b c$ conjecture comes from the kind of randomness heuristics that I've discussed before. I'll be borrowing heavily from Terence Tao's excellent post on this subject, you should read it if you want further details about this argument.

The key fact is that if we fix some radical $r$ then we have $N^{o (1)}$ positive integers smaller than $N$ with radical equal to $r$ - here $o (1)$ simply means some function of $N$ which goes to $0$ as $N \to \infty$ . Now, we fix a large natural number $N$ and model sampling a counterexample to the $a b c$ conjecture as follows:

We pick the radicals $r_{a}, r_{b}, r_{c}$ uniformly at random from pairwise coprime squarefree integers in $[1, N]$ subject to $r_{a} r_{b} r_{c} < N^{1 - ε}$ .
We pick $c$ from among the numbers in $[N, 2 N)$ having radical $r_{c}$ uniformly at random.
We pick $a, b$ from the numbers in $[1, 2 N)$ having radical $r_{a}$ and $r_{b}$ respectively, once again uniformly at random.
We check if $a + b = c$ or not. If so, then we've found a counterexample.

Now let's roughly compute the number of counterexamples we get from this process in total. The volume of the region we're sampling over in the first case turns out to be $\sim N^{1 - ε} {log}^{2} (N)$ , so at the expense of making $ε$ slightly smaller we can throw away the logarithm factor and pretend this is just $\sim N^{1 - ε}$ .

In the second and third steps, thanks to the result I've cited above we know that the number of total choices available is $N^{o (1)}$ .

Finally, in the last step we've found a counterexample if $a + b = c$ . Once again we introduce the heuristic assumption here: if $a, b$ are coprime then there's no structural reason for why $a + b$ should be equal to $c$ , so since both $a + b$ and $c$ are about as big as $N$ we can say that their probability of being equal is $\sim 1 / N$ .

This gives us an expected number of counterexamples $\sim N^{1 - ε + o (1)} / N = N^{- ε + o (1)}$ . Now, since this covers all cases where $N \leq c < 2 N$ , we simply take the infinite sum with $N$ ranging over powers of $2$ to find the total number of expected counterexamples to the $a b c$ conjecture. In this case, this ends up being a geometric series with its main term $\sim 2^{- ε k}$ and so the sum is finite.

Since the sum is finite, this tells us that if the heuristic were a good model seeing infinitely many counterexamples would be an event of probability zero. We can therefore say that this heuristic is suggestive of the $a b c$ conjecture being true.

Recent history

So far I've only talked about the mathematics behind the conjecture. The recent history is in fact interesting as well.

In 2012, Shinichi Mochizuki claimed that he had proven the conjecture using methods (named inter-universal Teichmuller theory, abbreviated as IUT) that he had developed in a series of relatively obscure papers over the years. Since nobody was familiar with his body of work at the time, it proved to be impossible for independent mathematicians to verify or disprove his claim, and to this day it's an open question whether Mochizuki's proof is actually valid or not.

The most recent attempt to resolve the issue was by Peter Scholze and Jakob Stix, who traveled to Kyoto in 2018 to meet Mochizuki in person in an attempt to understand his proof better. They came back declaring that Mochizuki's proof had a fundamental flaw, while Mochizuki accused them of misunderstanding his work. In the end, I believe the meeting changed the minds of few people about whether the proof is correct or not.

After a long delay, Mochizuki's proof was submitted to Publications of the Research Institute for Mathematical Sciences (abbreviated as RIMS), a journal of which Mochizuki was the chief editor. Mochizuki recused himself from reviewing the paper, but the paper was accepted and published in 2021. It is unusual for this to happen when substantial doubts remain about the validity of the argument presented in a mathematics paper. In the meantime, Mochizuki has continued to publish papers about IUT and his body of work on the subject continues to grow larger with every passing year.

I would have liked to go into Mochizuki's attempted proof in more detail, but there are probably only a few people in the world who are qualified to speak about this subject and I'm not one of them. I'll only say that there are domain experts on both sides of the correctness dispute and the issue has by no means been resolved. Personally I believe with some confidence that Mochizuki's proof isn't correct, but this is based more on outside view considerations than any understanding of the mathematics involved in the proof.

Forecasts

I think there are two generic but important questions to ask about the conjecture: is it true, and when will it be settled?

For the truth of the conjecture, I can say that the above arguments are quite persuasive to me and I think I'm about 95% confident that the conjecture is true. The reason my confidence is so high is because there's evidence from two directions: both from analogous structures and from heuristic arguments.

As for when it will be settled, this seems less clear to me. I think there is a non-negligible chance that Mochizuki's proof is actually correct - I would put this at around 10%. If it is not correct, then while the conjecture itself is not that old, the base rate for problems of this difficulty being settled seems low to me - I think the conjecture is of comparable difficulty to the Riemann hypothesis, and the base rate of Millennium Prize problems being settled using a Jeffreys prior is ~ 1%/year. I'll be more optimistic than this, partly due to considerations about potential growth acceleration in the next century, and put the conditional median somewhere around 2070.

Finally, there is a question that's peculiar to the abc conjecture, and that's about Mochizuki's claimed proof. So far Mochizuki has been adamant that his proof is correct and people only think it's not because of their misunderstandings. An important question is therefore whether Mochizuki himself will at some point no longer endorse his proof in the form that it was published in RIMS. I think the likelihood of this is rather low but find it hard to go below 10%.

Conclusion

The abc conjecture is uniquely interesting because it's both one of the most important open problems in number theory and it's the only case I know of in recent history in which a dispute about the correctness of a proof to such a high profile problem has persisted for so long.

It's possible that the controversy will only truly be resolved once computer formalization and verification of proofs advances to such a degree that most mathematics proofs are verified by computers. We're currently far from this state of affairs, however, and the dispute is likely to go unresolved for the foreseeable future.

[-]nose262y70

This is one of the most fun descriptions of the weird, pseudorigorous intuition you pick up eventually doing while math. It's really hard to explain to someone who hasn't done it, or even worked mainly in a different area, why one hand-waved statement is better than another which seems equivalent. This comes remarkably close, with both the fast and dirty method as well as why it sort of works and why it doesn't work completely. Fantastic work.

[-]Mitchell_Porter2y60

the interaction between ... the additive structure and the multiplicative structure

The Collatz conjecture is another famous example of this.

[-]Oscar_Cunningham2y50

I gave some more examples here: https://www.reddit.com/r/math/comments/9gyh3a/if_you_know_the_factors_of_an_integer_x_can_you/e68u5x4/

[-]Ege Erdil2y30

This is true. The Collatz conjecture of course has a similar heuristic justification for why it should hold: if is odd then on average $3 n + 1$ is divisible by $2$ two times, so on average when given an odd integer we multiply it by $3$ and then divide it by $4$ . This process should eventually get us to the stable orbit starting at $1$ . Unfortunately I think the Collatz conjecture is one of the most difficult open problems in mathematics today; definitely far more difficult than any of the Millennium Prize problems for instance.

There are also interesting relationships between Collatz-type problems and busy beavers. If a Turing machine with a small number of states wants to run for a large but finite number of steps, the best strategy we know is often for it to implement a Collatz-like algorithm which has the property that it is known to eventually halt. I don't think anyone understands this very well but it suggests that sufficiently difficult Collatz-like problems may actually be probing too deeply into the properties of arithmetic for PA or even ZFC to be able to decide them. The famous Collatz conjecture itself is probably not this difficult, but it's impossible to know for certain.

[-]Zorger742y40

I upvoted this post; it's really well written. The introduction made the conjecture less scary, which I appreciated. :)

I had vaguely heard about this conjecture before but never looked into it; after reading this post, I feel like I have a basic understanding.

The fact that there could be a correct proof but there aren't very many people qualified enough to actually verify it is funny to me. Is that normal? It seems like a bit of a problem, if people care about determining the correctness of this type of conjecture.

[-]Ege Erdil2y20

It's good to know that the post was accessible - it's something I can never quite be sure of when reviewing it on my own.

It's not normal. The problem with Mochizuki is that he has been working by himself on IUT for one to two decades and in 2012 the only people who had any expertise on his theory were his own graduate students and associates. Nobody outside of his "network" was following his work that closely, and people had to scramble to see what it was all about when he claimed a proof of the conjecture in 2012. His work is particularly dense and so it took years for anyone to even acquire a sufficient understanding of it to try to figure out if his proof was valid or not.

Generally mathematicians aren't this solitary when working - most mathematics is done in a more interactive and social way nowadays. There are a few exceptions, however: Andrew Wiles, Grigori Perelman and Shinichi Mochizuki are all mathematicians who worked in isolation from the rest of the mathematics community. Mochizuki's difference from the others is that he seems to have built up a whole theoretical apparatus with much more ambitious goals than just proving the abc conjecture. For him this conjecture is just a corollary, not something that he set out to prove. Since he has this vast machinery that he's built up over the years, it makes it that much harder for anyone to understand a proof which heavily relies on it.

[-]philh2y30

An important question is therefore whether Mochizuki himself will at some point no longer endorse his proof in the form that it was published in RIMS. I think the likelihood of this is rather low but find it hard to go below 10%.

Would "the proof as written contains errors but is basically correct; here's how to fix them" count? These errors might include typos but also things like "the proof of lemma 3.2 relied on theorem 2.9, but the preconditions of that theorem weren't satisfied. But here's another way to prove a weaker version of the lemma, which is sufficient for our purposes".

(IIRC Wiles' proof of Fermat's Last Theorem had a significant error corrected only shortly before he revealed it publicly.)

I think this depends on how big the error is. No mathematical proof written in natural language is fully explicit, so there's always room for deductions which aren't fully justified, details which are not fully elaborated, etc.

If the error is big enough that it breaks the whole argument and a major new idea is needed to salvage the proof, then I would count that as Mochizuki no longer endorsing the proof in the form it appeared in RIMS. This was actually true for the error in Wiles' proof of FLT; he needed a whole new idea to save his proof. His original idea didn't work in some case and he had to realize that he had in the past tried something else which failed in general but was suited for handling this specific case.

If Wiles had published his proof in its original form (before he noticed the error) and then publicly stated that this proof had an error in it, I think the question should resolve positively. If it's just a typo or some computational mistake which doesn't affect the rest of the proof, however, then it shouldn't resolve on that basis alone.

[-]Yitz2y30

I am upvoting this post despite my eyes glazing over the details of the math, because it is very early in the morning for me and I’m quite confident that when I have more mental energy I’ll be able to fully appreciate this post. Even without an understanding of the actual math involved, a really important insight here is the basic presentation of using heuristics in math, and the potential pitfalls that come with using them, which I really appreciate.

[-]Said Achmiz2y30

What’s a “structure on the integers” (or on anything else, for that matter)?

And what, specifically, is “the additive structure” and/or “the multiplicative structure”? How do these things relate to addition and/or multiplication?

What does it mean for such “structures” to “interact”?

EDIT: Also, in what sense are you using the word “morally” (in “Morally, the abc conjecture is saying …”)? It doesn’t seem to be the usual one, I don’t think…?

[-]Dirichlet-to-Neumann2y40

"Morally" here is a bit of mathematical slang - it means intuitively.

The additive structure is the addition, the multiplicative structure is the multiplication.

A structure over a set is a collection of relations between elements of the set (2+3=5 is a relation between 2, 3, and 5). The point is that we can abstract away everything until we are left with the basic properties of a structure - for the addition those are the associativity (2+3)+5=2+(3+5), the neutral element 3+0=3, the existence of symmetric elements 3+(-3)=0, and the commutativity 3+2=2+3. Any structure with those properties tends to be called an addition, regardless of the set. And the first three properties defines the group structure.

Now we can do the same thing for the multiplication (which for the integers has the same basic properties, except the existence of a symmetric). And then we can wonder how these structures interact - for example, a*(b+c)=ab+ac, this is called the distributivity of the multiplication over the addition.

LESSWRONG
LW