Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

EDIT: Added a section on Euler's conjecture.

The Goldbach conjecture is likely

The Goldbach conjecture is that "every even integer above two is the sum of two primes". For example, , , , and so on.

Though this is a mathematically precise statement, we can talk about the "probability" of it begin correct. How so?

Well, by the prime number theorem, the probability of a random number less than being prime, is . So if we sum up all the primes less than , we get different sums; these sums will be less than .

So, is itself is one of these sums? Well, the "probability" that it's not the total of any given sum is ; therefore the probability of it being the total of none of the sums is:

So the probability of being the total of such a sum is roughly:

Therefore, the probability of all numbers being the total of such a sum is roughly:

Now, the infinite product converges to a non-zero number if and only if the sum converges to a finite number. That series can be seen to be convergent (for example, by noting that for large enough and using the comparison test).

If use computers to get an estimate of , we get a pretty low probability. However, most of that improbability mass is on the low numbers, and the Goldbach conjecture has been tested up to . So, if we assume it's valid up to , we numerically get:

So the Goldbach conjecture is pretty likely, and, the more examples we discover where it holds, the more likely it is to hold all the way to infinity.

"Probabilities" of logical facts

The above reasoning seems dubious. The primes are not defined by random sampling among the natural numbers; quite to the contrary, they come from a mathematical rule of extreme precision. So what do these probabilities mean?

Let be an infinite set of numbers, selected from the natural numbers in a way that looks like the prime number theorem (eg the -th number is approximately ). Then what we've shown is that, if such an obeys the "-Goldbach conjecture" up to , then we'd expect it to go all the way to infinity.

Thus the Goldbach conjecture can be restated as "in terms of sums of two elements, the prime numbers behave like a typical sequence selected in a prime-number-theorem way".

So the Goldbach conjecture is not saying that there is something special about the primes; in fact, it's saying the opposite, that the primes are typical of similar sequences, that nothing in the specific ways that the primes are selected has an impact on the sum of two primes. So the Goldbach conjecture is essentially saying "there is no obstruction to the primes being typical in this way".

One obstruction

Did you notice that, so far, at no point did I require to be an even number? But all the primes except for are odd. So the distribution of sums of primes is very (very!) heavily skewed towards even numbers; most odd numbers will not appear at all. So, that is one clear obstruction to the possible values of the sum, coming from the way the primes are constructed. The Goldbach conjecture is therefore saying that there are no additional obstructions beyond this one condition on parity.

In fact, the Goldbach conjecture has changed; used to be seen as a prime number, and the original conjecture included as another example Then was removed from the list of prime numbers, and it turned out, as far as we can tell, that was the only even number we lost from the list of sums.

If we removed from the list of primes, we'd only lose as a sum. Similarly, if we strike out the first primes, we expect - on probabilistic grounds - that "all numbers greater than a given are the sums of two primes (first primes not included)". If that were to fail, then there's a really interesting obstruction out there.

Fermat's last theorem was likely (for )

We can show, similarly, that Fermat's last theorem was very likely on probabilistic grounds. The theorem states that, for , there do not exist natural numbers such that .

Fix and . Counting and , there are natural numbers less than or equal to . Therefore there are possible , all less than . So the probability that any two of these -th powers sum to is .

So the probability that there are no 's such that , is

The sum converges. Moreover, we can also sum over : . This also converges. So the probability of Fermat's last theorem was non-zero, at least for ; add on the fact that the theorem was proved for many and checked for many and , means that, even before it was proved, it was very probable it was correct.

So Andrew Wiles's genius was in showing there were no unexpected obstructions for the "likely" outcome to be true. That's why the proof is so hard: he was trying to prove something very "likely", and show an absence of structure, rather than a presence, without knowing what that structure could be.

Euler's conjecture was unlikely

Euler's conjecture was that you needed to sum at least powers of to get another power of ; Fermat's last theorem establishes this for , and Euler theorised that this extended.

Euler's theorem is in fact false; for we have three fourth powers that sum to another fourth power as:

There are counterexamples known for as well, so the conjecture is false, and not just for one value of .

More interesting from our perspective, we expect it to be false on probabilistic grounds. Recall that the argument about Fermat's last theorem does not work for ; it fails because the crucial sum is of the type , which diverges.

Similarly, if we estimate the probability of Euler's conjecture, we get terms like the following (for some constants ):

This goes to zero for the same reason as the case.

So, on probabilistic grounds, we expect Fermat's last theorem to be true for , and we expect Euler's conjecture to be false.

The only unexpected result here is that Fermat's last theorem and Euler's conjecture are true for . So something about the structure of the problem for is moving the result away from the probabilistic outcome.

The "Stuart conjecture"

Based on what I wrote about Euler's conjecture, I'll hazard a conjecture myself, which I believe to be true on probabilistic grounds. Namely that if there are integers, whose -th powers sum non-trivially to another -th power, then is greater than or equal to .

Fermat's last theorem shows this is true for and .

New to LessWrong?

New Comment
27 comments, sorted by Click to highlight new comments since: Today at 8:05 PM

Reminiscent of Freeman Dyson's 2005 answer to the question: "what do you believe is true even though you cannot prove it?":

Since I am a mathematician, I give a precise answer to this question. Thanks to Kurt Gödel, we know that there are true mathematical statements that cannot be proved. But I want a little more than this. I want a statement that is true, unprovable, and simple enough to be understood by people who are not mathematicians. Here it is.
Numbers that are exact powers of two are 2, 4, 8, 16, 32, 64, 128 and so on. Numbers that are exact powers of five are 5, 25, 125, 625 and so on. Given any number such as 131072 (which happens to be a power of two), the reverse of it is 270131, with the same digits taken in the opposite order. Now my statement is: it never happens that the reverse of a power of two is a power of five.
The digits in a big power of two seem to occur in a random way without any regular pattern. If it ever happened that the reverse of a power of two was a power of five, this would be an unlikely accident, and the chance of it happening grows rapidly smaller as the numbers grow bigger. If we assume that the digits occur at random, then the chance of the accident happening for any power of two greater than a billion is less than one in a billion. It is easy to check that it does not happen for powers of two smaller than a billion. So the chance that it ever happens at all is less than one in a billion. That is why I believe the statement is true.
But the assumption that digits in a big power of two occur at random also implies that the statement is unprovable. Any proof of the statement would have to be based on some non-random property of the digits. The assumption of randomness means that the statement is true just because the odds are in its favor. It cannot be proved because there is no deep mathematical reason why it has to be true. (Note for experts: this argument does not work if we use powers of three instead of powers of five. In that case the statement is easy to prove because the reverse of a number divisible by three is also divisible by three. Divisibility by three happens to be a non-random property of the digits).
It is easy to find other examples of statements that are likely to be true but unprovable. The essential trick is to find an infinite sequence of events, each of which might happen by accident, but with a small total probability for even one of them happening. Then the statement that none of the events ever happens is probably true but cannot be proved.

How does the randomness of the digits imply that the statement cannot be proven? Superficially the quote seems to use two different notions of randomness, namely "we cannot detect any patterns" (i.e. a pure random generator is the best predictor we have) and "we have shown that there can be no patterns" (i.e. we have shown no other predictor can do any better). Is this a known result from Ergodic Theory?

I was about to type out a rebuttal of this, but halfway through I realized I actually agree with you. The "some non-random property" of the digits of the powers of two is that they are all digits found in order inside of powers of two. I would even go so far as to say that if the statement really can't be proven (even taking into account the fact that the digits aren't truly random) then there's a sense in which it isn't true. (And if it can't be proven false, then I'd also say it isn't false.)

If it can't be proven false, then it definitely isn't false!

Equivalently: If it's false, then it can be proven false.

Why do I say that? Well, if it's false, then there exists a power of two that is the reverse of a power of five. But that has a very short proof: just write down the smallest example.

As to the case where it can't be proven either way: I would say that it has to be true in that case, but this might be one of those things that sufficiently diehard constructivists would agree with you on.

"If it can't be proven false, then it definitely isn't false"
Hmm, if you are applying that to mathematical conjectuire, then those statements dont seem compatible with Godel's theorem to me.

You need to add some assumptions to make it work. For example, I believe the following works:

"In second order arithmetic, we can prove that NP1 implies NF, where NP1 is the statement 'there exists no first order proof of the conjecture' and NF is the statement 'the conjecture isn't false'."

I meant this specific conjecture, not all conjectures. More generally it applies to all conjectures of the form "there is no number n such that Q(n)" where Q is straightforward to check for a particular n.

I am extremely interested in examples of heuristic arguments like this where the naive version leads you astray and it takes much more sophisticated arguments to fix the problem. I'd be happy to pay a $500 bounty for any examples that I find really interesting, in number theory or elsewhere. (Usual disclaimers, my judgment of what counts as interesting will be completely arbitrary and unaccountable, etc.)

The best example I found was the Chebyshev bias that primes are very slightly more likely to be 3 mod 4 than 1 mod 4. The simplest explanation I know of this goes through an explicit formula for the von Mangoldt function's distribution mod 4 as explained by Terry Tao here.

(Roughly speaking: the prime powers should be uniform mod 4, based on an argument about L-functions that I don't understand. But the prime squares are all 1 mod 4, so the primes need to be 3 mod 4 slightly more often to cancel out. It's very surprising that it works this way---if you made me guess I would have quite strongly predicted that the prime powers would skew towards 1 mod 4, instead of the primes accommodating the squares like that.)

That post also describes another bias where prime gaps are surprisingly non-uniform, though once it's pointed out that one feels much more obvious to me.

Another example is that the strong form of Cramer's conjecture is false, i.e. the prime gaps are not as consistently small as you would expect based on a simple random model. I don't have a good understanding of why this happens, people don't seem shocked by it (but right now I feel kind of shocked).

I'm not really interested in diophantine equations where you can override a simple heuristic argument by factorization / analysis modulo fixed primes / dividing by common factors. It's pretty easy to see why that happens, the heuristic argument seems like a good prima facie guess, and it's easy to see how to adjust it once you notice new arguments.

It so happens that I asked a relevant question on MathOverflow recently. Very difficult to find any elementary explanation for this phenomenon.

That's another great example.

Thinking through the heuristic argument:

  • About  of the numbers are prime and have value .
  • As we multiply by more and more factors,  should alternate between .
  • But starting at  should introduce a bias towards .
  • It's not clear how big the bias would be. A natural guess would be in the ballpark of .
  • A bit more precisely, I think the number of prime factors is roughly  plus a poisson with mean . (I'm just getting that from some random google results, e.g. here.  But it seems intuitively right: each number  has a probability of about  of dividing  and a probability of about  of being prime, and the sum of  is .)
  • Poisson distributions are more likely to be even than odd (and we're adding 1 so more likely to be odd than even). This computation gives you a bias of order . Putting in  we get a bias of order .
  • That's still way bigger than the real bias .

So tentatively I'm pretty surprised by how close this is to 50-50. I do feel like I'm probably overlooking something or messed up that argument. If not then this is a bit of a different situation from the other surprises, since there's a heuristic argument suggesting a big bias that we don't actually see. It would be good to understand what's up here.

(From my perspective this is probably the more interesting kind of failure, since it suggests that this is a case where the heuristic argument really "was wrong" instead of merely overlooking something that could be pointed out by a more sophisticated argument.)

The Poisson approximation is not good: it's actually a known theorem that the number of prime factors of for large behaves like it's distributed normally with mean and standard deviation . Since the whole even/odd distinction relies crucially on the Poisson approximation, it also fails here. (This part is incorrect, the distinction between Poisson and normal doesn't matter in this case because both approximations are too low resolution to be useful. See my next comment in the thread.)

It's easy to give a heuristic justification for this: the basic idea is that if you look at an interval for large, then the indicator functions which take the value if divides the input and otherwise are all "independent enough" for, say, .

Now, you want to know the distribution of

Primes between and don't contribute much to this sum, because the sum of the reciprocals of the primes scales like , so you can pretend they don't exist. When you do that, the remaining indicator functions are all uncorrelated enough with each other thanks to the Chinese remainder theorem that you can apply some version of the central limit theorem to conclude this random variable defined on basically behaves as if it's normally distributed. It's easy to check both its mean and variance are .

That's the same heuristic argument I was imagining for why it would be Poisson (except I wasn't being careful and thinking about CRT, or explicitly cutting out the large primes, just naively assuming independence by default). When mean and variance are equal, isn't a Poisson distribution extremely close to a normal distribution? Or put differently: if you are adding up coins whose probability of heads is close to 0, I'd expect the poisson approximation to be as good as the normal approximation. There are higher-order differences between the two distributions, but is it specifically known to be more normal than Poisson?

I guess the big problem is that you need an extremely accurate approximation to get at the parity, and neither of these approximations is very accurate? E.g. adding 1 is a pretty small impact on the distribution, but a huge impact on the parity.

Another pass at the heuristic argument:

  • The number of copies of a prime  that divides  is geometric. The probability that the count is odd is . I'd guess these events are all pretty independent for  based on what you said.
  • If we have a bunch of coins with probability  of being odd, the probability of the sum being even is  
  • That's a bias of order  again.
  • So it seems like the count of small prime divisors should have a very significant bias towards being even.
  • So I guess all the action is in the large prime divisors, and probably from the anticorrelation with the count of small prime divisors (seems like the magic is that 0 isn't a valid answer). I guess we kind of knew that anyway, since it's going to flip the sign of the bias. In expectation there are O(1) large prime divisors, but that's still enough to totally dominate the calculus.

This doesn't feel like a promising line of attack. Would be interesting to check the claim that the count of small prime divisors with multiplicities is quite even-skewed.

I agree with all of that, and I think if you simulated it you'd indeed find a large bias for the number of small prime divisors to be even. The problem is is such a small bias in expectation that it will already be offset by just the primes in the interval which contributes on the order of to the average in the opposite direction, since all primes trivially have an odd number of divisors.

Furthermore, there are subtleties here about exactly how much independence you need. If you want everything to be jointly independent then you can really only work with the primes up to while being safe - this is because the product of the primes up to is roughly of order . Once you go past that, while correlations involving only a small number of primes are still fine, correlations involving lots of primes break down, and for parity problems you need to control the entire joint distribution in a fine way.

This is not a problem for the normal approximation because to show convergence in distribution to a normal distribution, you just need to show all the moments converge to the right values and use a result like Stone-Weierstrass to approximate any continuous test function uniformly by a polynomial. You can do this just by working with primes up to for depending on the exact moment you're studying. However, this result is really "low resolution", as you correctly identify.

A couple of examples from quadratic residue patterns modulo primes:

Example 1. Let  be a large prime. How many elements  are there such that both  and  are quadratic residues?

Since half of elements mod  are quadratic residues and the events " is a QR" and " is a QR" look like they are independent, a reasonable guess is . This is the correct main term, but what about the error? A natural square-root error term is not right: one can show that the error is , the error depending only on whether  is  or  mod . (The proof is by elementary manipulations with the Legendre symbol, see here. So there's hidden structure that makes the error smaller than what a naive randomness heuristic suggests.)

Example 2. Let  be a large prime. How many elements  are such that all of  and  are quadratic residues?

Again, the obvious guess for the main term is correct (there are roughly  such ), so consider the error term. The error is not  this time! The next guess is a square-root error term. Perhaps as  ranges over the primes, the error term is (after suitable normalization) normally distributed (as is motivated by the central limit theorems)? This is not correct either!

The error is in fact bounded in absolute value by , following from a bound on the number of points on elliptic curves modulo . And for the distribution of the error, if one normalizes the error by dividing by  (so that the resulting value is in ), the distribution behaves like , where  is uniformly distributed on  as  ranges over the primes (see here). This is a deep result, which is not easy to motivate in a couple of sentences, but again there's hidden structure that the naive randomness heuristic does not account for.

(And no, one does not get normal distribution for longer streaks of quadratic residues either.) 

Readers interested in this topic should probably read Terrence Tao's excellent discussions of probabilistic heuristics in number theory, e.g. this post discussing Fermat's last theorem, the ABC conjecture, and twin primes or this post on biases in prime number gaps. Those posts really helped improve my understanding of how such heuristic arguments work, and there's some cool surprises.

Isn't the Stuart conjecture an extremely weak form of the Lander, Parkin and Selfridge conjecture? If you specialize their conjecture to  then it implies your conjecture but with  instead of . (I'm not sure why you picked ).

I think the fundamental thing about Fermat's last theorem for n=3 is that you can divide by any common divisor of x and y. Once you consider a minimal solution with (x, y, z) relatively prime, and consider the factorization , the n=3 case is also very straightforward to argue heuristically. [ETA: this is not true, it actually seems to fundamentally be a subtle claim related to the elliptic curve structure.]

I'm not sure why you picked .

Because it's the first case I thought of where the probability numbers work out, and I just needed one example to round off the post :-)

I don't think it's a fair deduction to conclude that Goldbach's conjecture is "probably true" based on a estimate of the measure (or probability) of the set of possible counter examples being small. The conjecture is either true or false, but more to the point I think you are using the words probability and probable in two different ways (the measure theoretic sense, and in the sense of uncertainty about the truth value of a statement), which obfuscates (at least to me) what exactly the conclusion of your argument is.

There is of course a case to be made about whether it matters if Goldbach's conjecture should be considered as true if the first counter example is larger than an number that could possibly and reasonable manifest in physical reality. Maybe this was what you are getting at, and I don't really have a strong or well thought out opinion either way on this.

Lastly, I wonder whether there are examples of heuristic calculations which make the wrong prediction about the truth value of the conjecture to which they pertain. I'm spitballing here, but it would be interesting to see what the heuristics for Fermat's theorem say about Euler's sum of primes conjecture (of which Fermat's last theorem is K = 2 case), since we know that the conjecture is false for K = 4. More specifically, how can we tell a good heuristic from a bad one? I'm not sure, and I also don't mean to imply that heuristics are useless, more that maybe they are useful because they (i) give one an idea of whether to try to prove something or look for a counter example, and (ii) give a rough idea of why something should be true or false, and what direction a proof should go in (e.g. for Goldbach's conjecture, it seems like one needs to have precise statements about how the primes behave like random numbers).

Note that the probabilistic argument fails for n=3 for Fermat's last theorem; call this (3,2) (power=3, number of summands is 2).

So we know (3,2) is impossible; Euler's conjecture is the equivalent of saying that (n+1,n) is also impossible for all n. However, the probabilistic argument fails for (n+1,n) the same way as it fails for (3,2). So we'd expect Euler's conjecture to fail, on probabilistic grounds.

In fact, the surprising thing on probabilistic grounds is that Fermat's last theorem is true for n=3.

There is of course a case to be made about whether it matters if Goldbach's conjecture should be considered as true if the first counter example is larger than an number that could possibly and reasonable manifest in physical reality. Maybe this was what you are getting at, and I don't really have a strong or well thought out opinion either way on this.

That would be sufficient, but there are more easily met conditions like:

1)

if you have a bunch of sequences of observations (100 or 1000 year's worth)

and none of them include the counterexample as an observation with high probability

2)

The frequency is low enough that it isn't worth accounting for.

For example, if you flip a coin it comes up heads or tails - is it worth bringing up another possibility?

So Andrew Wiles's genius was in showing there were no unexpected obstructions for the "likely" outcome to be true. That's why the proof is so hard: he was trying to prove something very "likely", and show an absence of structure, rather than a presence, without knowing what that structure could be.

This is a poor description of Wiles's proof; in fact, I would call it diametrically wrong. Wiles proved the presence of a very rigid structure - not the absence - and the presence of this structure implied FLT via the work of other mathematicians.

I don't have a great understanding of the broader point you are making, so I don't know how big an issue this mistake presents. However, be aware that the paradigm you've extracted from the ideas in this post has lead to at least one incorrect prediction.

I'll try to explain how Wiles's proof diverges from your model of it by way of analogy. Suppose that we instead wanted to prove Fermat's first theorem:

Fermat's first theorem: For every even integer there are no nontrivial integer solutions to the equation .

Further suppose that in our world, mathematicians know about the notion of positivity and absolute values, but the proof of the following fact has long evaded them.

Positivity conjecture: For every integer , we have .

The positivity conjecture is a very important structural fact about the integers. And it immediately implies Fermat's first theorem (since the left-hand side must be positive, but the right-hand side must be negative unless x,y,z are all 0). So Fermat's first theorem follows from an important structural fact.

However, in our supposed world, mathematicians don't have access to the positivity conjecture. They might perform the exact same analysis in your post (it goes through verbatim!), and conclude that if you check Fermat's first theorem for enough n, then it is probable to be true. However, it is not true that the proof of FFT via the positivity conjecture is "proving an absence of structure" - quite the opposite!

The analogue of the positivity conjecture in the real world is the Modularity theorem. This is what Wiles actually proved, and it was already known that the Modularity theorem implies FLT. And as with the positivity conjecture, the Modularity theorem is a very powerful structural result. To give a slogan, it says that every elliptic curve over is "modular," meaning that it relates in an appropriate way to an object called a modular form.

Wiles proved the presence of a very rigid structure - not the absence - and the presence of this structure implied FLT via the work of other mathematicians.

If you say that "Wiles proved the Taniyama–Shimura conjecture" (for semistable elliptic curves), then I agree: he's proved a very important structural result in mathematics.

If you say he proved Fermat's last theorem, then I'd say he's proved an important-but-probable lack of structure in mathematics.

So yeah, he proved the existence of structure in one area, and (hence) the absence of structure in another area.

And "to prove Fermat's last theorem, you have to go via proving the Taniyama–Shimura conjecture", is, to my mind, strong evidence for "proving lack of structure is hard".

I'm not sure whats going on with the small numbers though. When I use the formula to try and get the probability of goldbachs conjecture holding for all numbers >2 , I get around 10^-15.

More sophisticated formulae that take into account the even odd pattern still get around 10^-6. Of course, this is sensitively dependant on exactly where you start. But it seems that this probabilistic reasoning showing so strongly that a small even number shouldn't be a sum of primes indicates a problem with the probabilistic model.

Yes, I got that result too. The problem is that the prime number theorem isn't a very good approximation for small numbers. So we'd need a slightly more sophisticated model that has more low numbers.

I suspect that moving from "sampling with replacement" to "sampling without replacement" might be enough for low numbers, though.

For a more precise probabilistic approach to Fermat's last theorem by Feynman, which take into account the repartition of the nth powers, see this amazing article http://www.lbatalha.com/blog/feynman-on-fermats-last-theorem

Thanks! It's cool to see his approach.

Since a+b = b+a shouldn't the total number of 'different sums' be half of what you give? Fortunately the rest of the argument works completely analogously.

You can see this as sampling times sorta-independently, or as sampling times with less independence (ie most sums are sampled twice).

Either view works, and as you said, it doesn't change the outcome.