Logic as Probability

This raises an issue: what about betting on the last digit of the 3^^^3'th prime?

This is a go to example here of using subjective probability under resource constraints. There are plenty of more familiar examples, such as having to answer a multiple-choice question on a test in 30 seconds, and having to estimate the probabilities of each answer in order to pick the likeliest. Everyone has done it, and used the basically the same tools as for the so-called "true" probabilities.

[-]somervta12y20

the law of the excluded middle (P(A|not-A)=0

Isn't this the LNC?

[-]philh12y50

(I had to look it up: Law of Non-Contradiction.)

[-]tristanhaze12y00

Yeah, this looks more like the Law of Non-Contradiction than the Law of Excluded Middle to me (which makes Manfred's jokey response seem doubly foolish).

[+]Manfred12y-70

[-]cousin_it12y20

it turns out that there is one true probability distribution over mathematical statements, given the axioms

I guess you meant to say "mathematical statements without quantifiers"?

[-]Manfred12y10

If you want quantifiers, you can just program your robot to respond to the symbol "for all" so that when it sees "for all x, x=y" it writes all the implications in the notebook, and when x=y for all x, it writes "for all x, x=y". This is an infinite amount to writing to do, but there was always an infinite amount of writing to do - the robot is infinitely fast, and anyway is just a metaphor for the rules of our language.

[-]cousin_it12y00

Sorry, I should've said "statements that are provable or disprovable from the axioms", mentioning quantifiers was kinda irrelevant. Are you saying that your robot will eventually write out truth values for statements that are independent of the axioms as well? (Like the continuum hypothesis in ZFC.)

[-]Manfred12y00

I feel like the robot metaphor may be outside of its domain of validity by now. Anyhow, I replied over in the other branch.

[-][anonymous]12y00

So if you give your robot the axioms of ZFC, it will eventually tell you if the continuum hypothesis is true or false?

[This comment is no longer endorsed by its author]Reply

[-][anonymous]12y00

Are you assuming that x can only range over the natural numbers? If x can range over reals or sets, or some arbitrary kind of objects described by the axioms, then it's harder to describe what the robot should do. The first problem is that an individual x can have no finite description. The second, more serious problem is that translating statements with quantifiers into statements of infinite length would require the robot to use some "true" model of the axioms, but often there are infinitely many models by Lowenheim-Skolem and no obvious way of picking out a true one.

Also, my original comment was slightly misleading - the "one true distribution" would in fact cover many statements with quantifiers, and miss many statements without quantifiers. The correct distinction is between statements that are provable or disprovable from the axioms, and statements that are independent of the axioms. If the axioms are talking about natural numbers, then all statements without quantifiers should be covered by the "one true distribution", but in general that doesn't have to be true.

[This comment is no longer endorsed by its author]Reply

[-]Manfred12y00

Well, it's certainly a good point that there are lots of mathematical issues I'm ignoring. But for the topics in this sequence, I am interested not in those issues themselves, but in how they are different between classical logic and probabilistic logic.

This isn't trivial, since statements that are classically undetermined by the axioms can still have arbitrary probabilities (Hm, should that be its own post, do you think? I'll have to mention it in passing when discussing the correspondence between inconsistency and limited information). But in this post, the question is whether there is no difference for statements that are provable or disprovable from the axioms. I'm claiming there's no difference. Do you think that's right?

[-]cousin_it12y10

Yeah, I agree with the point that classical logic would instantly settle all digits of pi, so it can't be the basis of a theory that would let us bet on digits of pi. But that's probably not the only reason why we want a theory of logical uncertainty. The value of a digit of pi is always provable (because it's a quantifier-free statement), but our math intuition also allows us to bet on things like Con(PA), which is independent, or P!=NP, for which we don't know if it's independent. You may or may not want a theory of logical uncertainty that can cover all three cases uniformly.

[-]Kurros12y00

But it turns out that there is one true probability distribution over mathematical statements, given the axioms. The right distribution is obtained by straightforward application of the product rule - never mind that it takes 4^^^3 steps - and if you deviate from the right distribution that means you violate the product rule at some point.

This does not seem right to me. I feel like you are sneakily trying to condition all of the robots probabilities on mathematical proofs that it does not have a-priori. E.g. consider A, A->B, therefore B. To learn that P(A->B)=1, the robot has to do a big calculation to obtain the proof. After this, it can conclude that P(B|A,A->B)=1. But before it has the proof, it should still have some P(B|A)!=1.

Sure, it seems tempting to call the probabilities you would have after obtaining all the proofs of everything the "true" probabilties, but to me it doesn't actually seem different to the claim that "after I roll my dice an infinity of times, I will know the 'true' probability of rolling a 1". I should still have some beliefs about a one being rolled before I have observed vast numbers of rolls.

In other words I suggest that proof of mathematical relationships should be treated exactly the same as any other data/evidence.

edit: in fact surely one has to consider this so that the robot can incorporate the cost of computing the proof into its loss function, in order to decide if it should bother doing it or not. Knowing the answer for certain may still not be worth the time it takes (not to mention that even after computing the proof the robot may still not have total confidence in it; if it is a really long proof, the probability that cosmic rays have caused lots of bit-flips to mess up the logic may become significant. If the robot knows it cannot ever get the answer with sufficient confidence within the given time constraints, it must choose an action which accounts for this. And the logic it uses should be just the same as how it knows when to stop rolling die).

edit2: I realised I was a little sloppy above; let me make it clearer here:

The robot knows P(B|A,A->B)=1 apriori. But it does not know "A->B" is true apriori. It therefore calculates

P(B|A) = P(B|A,A->B) P(A->B|A) + P(B|A,not A->B) P(not A->B|A) = P(A->B|A)

After it obtains proof that "A->B", call this p, we have P(A->B|A,p) = 1, so

collapses to

P(B|A,p) = P(B|A,A->B,p) = P(B|A,A->B) = 1

But I don't think it is reasonable to skip straight to this final statement, unless the cost of obtaining p is negligible.

edit3: If this somehow violates Savage or Cox's theorems I'd like to know why :).

[-]Manfred12y10

If this somehow violates Savage or Cox's theorems I'd like to know why

Well, Cox's theorem has as a requirement that when your axioms are completely certain, you assign probability 1 to all classical consequences of those axioms. Assigning probability 0.5 to any of those consequences thus violates Cox's theorem. But this is kind of unsatisfying, so: where do we violate the product rule?

Suppose our robot knows that P(wet outside | raining) = 1. And it observes that it's raining, so P(rain)=1. But it's having trouble figuring out whether it's wet outside within its time limit, so it just gives up and says P(wet outside)=0.5. Has it violated the product rule? Yes. P(wet outside) >= P(wet outside and raining) = P(wet outside | rain) * P(rain) = 1.

If we accept that the axioms have probability 1, we can deduce the consequences with certainty using the product rule. If at any point we stop deducing the consequences with certainty, this means we have stopped using the product rule.

[-]Kurros12y-10

Hmm this does not feel the same as what I am suggesting.

Let me map my scenario onto yours:

A = "raining"

B = "wet outside"

A->B = "It will be wet outside if it is raining"

The robot does not know P("wet outside" | "raining") = 1. It only knows P("wet outside" | "raining", "raining->wet outside") = 1. It observes that it is raining, so we'll condition everything on "raining", taking it as true.

We need some priors. Let P("wet outside") = 0.5. We also need a prior for "raining->wet outside", let that be 0.5 as well. From this it follows that

according to our priors [first and second equalities are the same as in my first post, third equality follow since whether or not it is "raining" is not relevant for figuring out if "raining->wet outside"].

So the product rule is not violated.

P("wet outside") >= P("wet outside" and "raining") = P("wet outside" | "raining") P("raining") = 0.5

Where the inequality is actually an equality because our prior was P("wet outside") = 0.5. Once the proof p that "raining->wet outside" is obtained, we can update this to

P("wet outside" | p) >= P("wet outside" and "raining" | p) = P("wet outside" | "raining", p) P("raining" | p) = 1

But there is still no product rule violation because

In a nutshell: you need three pieces of information to apply this classical chain of reasoning; A, B, and A->B. All three of these propositions should have priors. Then everything seems fine to me. It seems to me you are neglecting the proposition "A->B", or rather assuming its truth value to be known, when we are explicitly saying that the robot does not know this.

edit: I just realised that I was lucky for my first inequality to work out; I assumed I was free to choose any prior for P("wet outside"), but it turns out I am not. My priors for "raining" and "raining->wet outside" determine the corresponding prior for "wet outside", in order to be compatible with the product rule. I just happened to choose the correct one by accident.

[-]Manfred12y00

It seems to me you are neglecting the proposition "A->B"

Do you know what truth tables are? The statement "A->B" can be represented on a truth table. A and B can be possible. not-A and B can be possible. Not-A and not-B can be possible. But A and not-B is impossible.

A->B and the four statements about the truth table are interchangeable. Even though when I talk about the truth table, I never need to use the "->" symbol. They contain the same content because A->B says that A and not-B is impossible, and saying that A and not-B is impossible says that A->B. For example, "it raining but not being wet outside is impossible."

In the language of probability, saying that P(B|A)=1 means that A and not-B is impossible, while leaving the other possibilities able to vary freely. The product rule says P(A and not-B) = P(A) * P(not-B | A). What's P(not-B | A) if P(B | A)=1? It's zero, because it's the negation of our assumption.

Writing out things in classical logic doesn't just mean putting P() around the same symbols. It means making things behave the same way.

[-]tristanhaze12y00

'They contain the same content because A->B says that A and not-B is impossible, and saying that A and not-B is impossible says that A->B. For example, "it raining but not being wet outside is impossible."'

If you're talking about standard propositional logic here, without bringing in probabilistic stuff, then this is just wrong or at best very misleadingly put. All 'A->B' says is that it is not the case that A and not-B - nothing modal.

[-]Kurros12y-10

Ok sure, so you can go through my reasoning leaving out the implication symbol, but retaining the dependence on the proof "p", and it all works out the same. The point is only that the robot doesn't know that A->B, therefore it doesn't set P(B|A)=1 either.

You had "Suppose our robot knows that P(wet outside | raining) = 1. And it observes that it's raining, so P(rain)=1. But it's having trouble figuring out whether it's wet outside within its time limit, so it just gives up and says P(wet outside)=0.5. Has it violated the product rule? Yes. P(wet outside) >= P(wet outside and raining) = P(wet outside | rain) * P(rain) = 1."

But you say it is doing P(wet outside)=0.5 as an approximation. This isn't true though, because it knows that it is raining, so it is setting P(wet outside|rain) = 0.5, which was the crux of my calculation anyway. Therefore when it calculates P(wet outside and raining) = P(wet outside | rain) * P(rain) it gets the answer 0.5, not 1, so it is still being consistent.

[-]Manfred12y00

I'm just going to give up and hope you figure it on your own.

[-]Kurros12y-20

You haven't been very specific about what you think I'm doing incorrectly so it is kind of hard to figure out what you are objecting to. I corrected your example to what I think it should be so that it satisfies the product rule; where's the problem? How do you propose that the robot can possibly set P("wet outside"|"rain")=1 when it can't do the calculation?

[-]Manfred12y00

In your example, it can't. Because the axioms you picked do not determine the answer. Because you are incorrectly translating classical logic into probabilistic logic. And then, as one would expect, your translation of classical logic doesn't reproduce classical logic.

[-]Kurros12y-10

It was your example, not mine. But you made the contradictory postulate that P("wet outside"|"rain")=1 follows from the robots prior knowledge and the probability axioms, and simultaneously that the robot was unable to compute this. To correct this I alter the robots probabilities such that P("wet outside"|"rain")=0.5 until such time as it has obtained a proof that "rain" correlates 100% with "wet outside". Of course the axioms don't determine this; it is part of the robots prior, which is not determined by any axioms.

You haven't convinced nor shown me that this violates Cox's theorem. I admit I have not tried to follow the proof of this theorem myself, but my understanding was that the requirement you speak of is that the probabilistic logic reproduces classical logic in the limit of certainty. Here, the robot is not in the limit of certainty because it cannot compute the required proof. So we should not expect to get the classical logic until updating on the proof and achieving said certainty.

[-]VAuroch12y00

It was your example, not mine.

No, you butchered it into a different example. Introduced the Lewis Carroll Paradox, even.

You haven't convinced nor shown me that this violates Cox's theorem.

He showed you. You weren't paying attention.

Here, the robot is not in the limit of certainty because it cannot compute the required proof.

It can compute the proof. The laws of inference are axioms; P(A|B) is necessarily known a priori.

such that P("wet outside"|"rain")=0.5 until such time as it has obtained a proof that "rain" correlates 100% with "wet outside".

There is no such time. Either it's true initially, or it will never be established with certainty. If it's true initially, that's because it is an axiom. Which was the whole point.

[-]Jiro12y20

The laws of inference are axioms; P(A|B) is necessarily known a priori.

It does not follow that because someone knows some statements they also know the logical consequences of those statements.

[-]VAuroch12y00

When the someone is an idealized system of logic, it does. And we're discussing an idealized system of logic here. So it does.

[-]Kurros12y00

No we aren't, we're discussing a robot with finite resources. I obviously agree that an omnipotent god of logic can skip these problems.

[-]VAuroch12y00

The limitation imposed by the bounded resources are the next entry in the sequence. For this, we're still discussing the unbounded case.

[-]Kurros12y00

Very well, then i will wait for the next entry. But i thought the fact that we were explicitly discussing things the robot could not compute made it clear that resources were limited. There is clearly no such thing as logical uncertainty to the magic logic god of the idealised case.

[-][anonymous]12y00

[This comment is no longer endorsed by its author]Reply

[-]William_Quixote12y00

Liked this post. One suggestion to improve readability would be for the first mention of a concept in this post (eg savages theorem) to hyperlink to the previous post that described it, or to a wiki article with details.

[-]Manfred12y20

Thanks! I'll do that now.

[-]Gurkenglas12y00

Nevermind. Would be nice if you could actually delete comments in the first minute after posting them.

[This comment is no longer endorsed by its author]Reply

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

18

Logic as Probability

18

18