Logical Inductors that trust their limits

by Scott Garrabrant1 min read20th Sep 2016No comments

3

Logical InductionLogical Uncertainty
Personal Blog
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Here is another open question related to Logical Inductors. I have not thought about it very long, so it might be easy.

Does there exist a logical inductor over PA such that for all :

  1. PA proves that exists and is in , and

  2. ?


Note that need not be computable so this does not happen by default.

For example, consider the logical inductor described in the paper with the extra condition that if ever the deductive state is discovered to be inconsistent, all probabilities are set to 0 forever. This clause will never be executed, since PA is consistent, but since this clause exists, PA can prove that exists. (PA proves that if PA is consistent, the limit exists using the proof in the paper, and if PA is inconsistent, the limit exists and equals 0.)

However, this logical inductor will not satisfy the second property. Consider the sentence . will converge to 1, while will converge to the probability according to that is consistent. (PA proves that if PA is consistent, the limit is 1 using the proof in the paper, and that if PA is inconsistent, the limit is 0.)

If a Logical Inductor with the above property is found, there are many follow up questions you could ask.

Can you make an analogue of the self trust property that works for ? Does the above property imply that self trust property?

Is there some simple extra condition that could be added to the definition of a Logical Inductor, that implies everything we could want about beliefs about ?


A good place to start with this question might be to analyze a logical inductor that in the spirit of the Demski prior adds sentences to the deductive state only if they are propositionally consistent with all previous sentences. This way, PA will prove that the algorithm defines a logical inductor over some consistent propositional theory (even if it does not know if that theory is PA).

3

2 comments, sorted by Highlighting new comments since Today at 9:09 AM
New Comment

I still feel like I don't understand most things in the paper, but I think this construction should work:

Say we start with some deductive sequence that is PA-complete. Now redefine the computation of this deductive sequence to include the condition that if ever at some stage we find that PA is inconsistent (i.e. that is propositionally inconsistent) we continue by computing according to where is the lexicographically-first sentence propositionally consistent with .

Since PA is consistent this deductive process is extensionally identical to the one we had before, but now PA proves that it's a deductive process: assuming consistency it's a PA-complete deductive process, and assuming inconsistency it's over some very arbitrary, but complete and computable, propositional theory. We can now carry out the construction in the paper to get the necessary logical inductor over PA which we denote by . Here PA proves the convergence of the logical inductor as in the paper.

Fix and , and write and . We want to say that eventually , but by self-trust and linearity it's enough to show that eventually . This would follow from eventually having (as PA proves that which gives that )

[edit2: this paragraph is actually false as written, I have a modified proof that works for PA+¬Con(PA), but it suggests that this construction is broken] From the proof that we have that there is some such that PA proves that for . Define to be . This is an efficiently computable sequence of sentences of which all but finitely many are theorems, so we get that (consider starting the inductor at some later point after all the non-theorems have passed) which means that there is some where implies which proves the result. [edit: some polishing to this paragraph]

Note, this ignores some subtleties with the fact that could be irrational and therefore not a bona fide Logically Uncertain Variable. These concerns seem beside the point as we can get the necessary behavior by formalizing Logically Uncertain Variables slightly differently in terms of Dedekind cuts or perhaps more appropriate CDFs: Cumulative Distribution Formulae.

I initially played around with the idea of freezing the final probability assignments when you find a contradiction, but couldn't get it to work. The difficulty was that if you assumed the inconsistency of PA you got that the limit was simply the assignment at some future time, but you couldn't defer to this because the function was not computable according to PA (by the consistency results in the paper). Something along these lines might still work.

As I worked on this approach I noticed a statement that seems both true and interesting which I delayed proving because I wasn't sure how to get everything else to work. Let then it seems to me that we should have for any deferral function

The reason why I think it's true is that conditioning on which is true should only improve our estimate of , but if this estimate is actually improved it will be exploited by some poly-time trader. It would be very interesting if this weren't the case since then conditioning on a true statement might actually be bad for us, but if it is true we automatically get the same for instead by some simple algebra (we could then say that infinitary consistency and inconsistency are irrelevant for determining probabilities at any computable times).

Edit, Proof: Let be constructed as in the paper. Then proves that and are logical inductors. We get that

Here the final expression can be expanded out as

Fix we can pick large enough that believes that is within an interval of size with probability at least . Then believes that is equivalent to or (depending on whether is above or below some threshold) with probability at least . In both of these cases we get that (or perhaps some extra epsilons here)

Now we can cancel and get that the expression is eventually within (for some ) of