Context: Logical Induction is a framework that makes sense of intuitively plausible statements like "the probability that the th digit of is odd is about ".

People often do this sort of informal reasoning about mathematical conjectures. Like "The Collatz conjecture has been checked up to , and held for all those - updating on this, I increase my likelyhood that the conjecture is true in general". Logical induction seems to provide, in principle, a set of rules that such updates should follow. How many of these rules are known?

Some example rules that seem very plausible (here all my variables are implicitly natural numbers):

  • The observation that is true does not decrease the likelyhood of .
  • Updating on the observations " for all ", the probability of goes to as

Do these hold for logical inductors?

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

Donald Hobson

Aug 09, 2020

130
Updating on the observations "ϕ(n) for all 0≤n≤N", the probability of ∀nϕ(n) goes to 1 as N→∞

Logical induction doesn't have this property. No alternatives can either.

To make sense of this answer, I recommend reading

https://www.lesswrong.com/posts/3FoMuCLqZggTxoC3S/logical-pinpointing

and

https://www.lesswrong.com/posts/i7oNcHR3ZSnEAM29X/standard-and-nonstandard-numbers

In short, logical induction can't be sure if its operating in a nonstandard context or not. So suppose that is true for all standard numbers, but false for some nonstandard number. Logical Induction has no way to know whether it is operating in the standard or non-standard numbers, so assigns a probability strictly between 0 and 1 to it.

Suppose you had an alternate algorithm that didn't do that. A Turing machine can be formally specified in PA, so we can talk about the output of the algorithm. The concept of limits can be discussed in PA using of foundational calculus. (The and actually need to be Natural numbers, but you can encode rational numbers in them with prime factors)

So for every formula in PA there exists another formula that says that your algorithm assigns probability 1 to in the limit. (And is computable from )

Let be a PA formula with one free variable. takes in a number . If does not encode a PA formula with one free variable, then is false. If encodes , then . Now consider an integer encoding . And consider .

This is a straightforward modification of godels incompleteness theorem. If you have any procedure that can perfectly distinguish between true and false statements within the structure of PA, then you can make a "this statement is false" and get a contradiction.

Technical Note. I am also assuming that your proposed alternative to logical induction respects negation and implication in the limit. (Ie if and .)

Some expressions in PA have multiple qualifiers Suppose we know that for every particular . Then by finite combinations of probabilities, and so . As , then it follows that any attempt at logical induction with this property must fail.

EDIT:

I have realized that I was confusing two separate concepts. You can't have any function from formulas to booleans that has the property . and the trivial preservation of logical operations. (definable within a formal first order theory with the same symbols)

However, if you have a process that returns True, False or Maybe, and is never wrong, then you can extend it to another process like this. Take the first function. If is True or False, define otherwise, if and then =True. Otherwise Maybe.

You can't always combine infinite sequences of your own beliefs, but you can combine infinite sequences of beliefs from some weaker system, and each time you do that you move further up the arithmetic hierarchy.

https://en.wikipedia.org/wiki/Arithmetical_hierarchy

As totally computable functions are in , the limit of a sequence of boolean functions is in . So as taking a forall over a function makes it a the highest you can get while still having

Updating on the observations "ϕ(n) for all 0≤n≤N", the probability of ∀nϕ(n) goes to 1 as N→∞

Is . This means that you can have your asymtotic property compared to any function.

Limits in the rationals add a to the front, puting them in . Meaning that your asymtotic property holds for all sets.

What this means is that if you have a computable function which tends to 1 as if and only if then you can construct which tends to 1 if and only if .

The downside of this approach is that it assigns probability almost 1 to false statements.

The boolean function case is to take a function that searches for an explicit counterexample, and return a probability that tends to one if no counter example has been found yet.

Sorry if these are stupid questions...

logical induction can't be sure if its operating in a nonstandard context or not.

The question specified "all my variables are implicitly natural numbers". Why can't there be traders that specialize on questions specifically about standard numbers and ignore others? (I assume that the natural numbers are standard numbers, correct?) Also, what's the connection between nonstandard numbers and your Godel-like proof?

If you have any procedure that can perfectly distinguish between true and false statements within the structu

... (read more)
4Donald Hobson4y
Your self referential maths is nearly right. Technically the limit is defined as ∃N∀n>N:ϕ(n) as your ∀n:ϕ(n) could fail because ϕ(1) is false, and logical induction only behaves well in the limit. The paper says that Pn=" the logical inductor will assign probability <0.5 to Pn after n inference steps" tend to 0.5 as n→∞. But I havn't yet worked out the infinite case. You can't do that because non-standard numbers look really similar to standard numbers from the inside. There is no formula ϕ(x) that is true on all standard numbers, and false on nonstandard numbers. Suppose you had a logical induction procedure that had the property that after updating on ϕ(1),ϕ(2)... it believed ∀x:ϕ(x). So suppose you had p1,p2...∈[0,1] with each pn depending only on ϕ(1)..ϕ(n). and pn→1 if ∀x:ϕ(x) else 0 as n→∞. (Produced by your alternative to logical induction, these represent the probability assigned to ∀x:ϕ(x)) Suppose you prove all that in PA. As PA has nonstandard models, the proof also holds in those. But we can pick a ϕ such that ∀x:ϕ(x) is true in the standard model, but false for some nonstandard x. There must exist nonstandard n such that 1−pn is infinitesimal. Otherwise the formula ∃m:1−pm<1/k would distinguish standard k from nonstandard k. Assigning infinitesimal probability to something that actually happend is a form of bad behaviour that I think can be transported back to the standard domain. I think that the resulting behaviour is consistent, but results in poor behaviour. Will post more as I think of it.

Thank you very much!

I guess an argument of this type rules out a lot of reasonable-seeming inference rules - if a computable process can infer "too much" about universal statements from finite bits of evidence, you do this sort of Gödel argument and derive a contradiction. This makes a lot of sense, now that I think about it.

3 comments, sorted by Click to highlight new comments since: Today at 11:26 PM

I think short timescale behavior of logical induction is model-dependent. I'm not sure whether your first conjecture is true, and I'd guess that it's false in some models. 

I find myself a little confused. Isn't it the case that the probability of statement converges to 1 if and only if it is provable?

Is there a reason to think this would be different than any other kind of induction or Bayesian reasoning? We use probabilities to describe things for which there is a true answer that we happen not to know all the time. Probability is often (arguably always) subjective in that way. For example, what is the probability that you, Eigil Rischel, have any siblings? The answer, in an objective sense, is either 0 or 1. The answer, from your subjective perspective, is either very close to 0 or very close to 1. But from my perspective not knowing anything about you, I'm going to put it at 0.7. If I wanted a better estimate, I could actually look up what fraction of people have siblings and use that. If I wanted an even better estimate, I could ask you. But right now, from my perspective, the probability of you having siblings is 0.7. This seems straight forward for physical truths, I don't see any real difference for mathematical truths. You should be able to use all the standard rules of probability theory, Bayes theorem, etc.


I'm unsure that your second bullet point follows. For that limit to work, I should be able to pick a (finite) N such that if psi(n) for all 0<=n<=N, then the probability of "for all n psi(n)" is greater than or equal to .9. I don't know how to find such an N. How do I know that the limit isn't 0.8? Intuitively I feel like just checking more and more values of n should not get us arbitrarily close to certainty, but I don't know how to justify that intuition rigorously. Infinities are weird. Possibly infinities give us different rules for certain mathematical truths, I don't know. I would be curious to hear other people's thoughts.

Eigel is asking a specific (purely mathematical!) question about "logical induction", which is defined in the paper they linked to. Your comment seems to miss the question.