Naturalistic trust among AIs: The parable of the thesis advisor's theorem

[-]passive_fist12y100

Forgive me if this is a stupid question, but all of this seems to be very fragile and dependent on subtle properties. It seems like even the slightest modifications, like giving the theorem prover a (possibly very tiny, but nonzero) probability of error, which would always occur in a real-life scenario, would ruin both the Lobian argument and the solution given here. How is this argument at all relevant to real-life ai development?

[-]Benya12y130

Yes, a real-life reasoner would have to use probabilistic reasoning to carry out these sorts of inference. We do not have a real understanding yet of how to do probabilistic reasoning about logical statements, though, although there has been a bit of work about it in the past. This is one topic MIRI is currently doing research on. In the meantime, we also examine problems of self-reference in ordinary deductive logic, since we understand it very well. It's not certain that the results there will carry over in any way into the probabilistic setting, and it's possible that these problems simply disappear if you go to probabilistic reasoning, but there's no reason to consider this overwhelmingly likely, and if they don't it seems likely that at least some insights gained while thinking about deductive logic will carry over. In addition, when an AI reasons about another AI, it seems likely that it will use deductive logic when reasoning about the other AI's source code, even if it also has to use probabilistic reasoning to connect the results obtained that way to the real world, where the other AI runs on an imperfect processor and its source code isn't known with certainty.

More here.

[-]Luke_A_Somers12y40

I'll accept that doing everything probabilistically is expensive, but I really don't see how it wouldn't solve the problem to at least assign probabilities to imported statements. The more elements in the chain of trust, the weaker it is. Eventually, someone needs it reliably enough that it becomes necessary to check it.

And of course any chain of trust like that ought to have a system for providing proof upon demand, which will be invoked roughly every N steps of trust. The recipients of the proof would then become nodes of authority on the issue.

This seems rather how actual people operate (though we often skip the 'where to get proof of this' step), and so any proof that it will become unworkable has a bit of a steep hill to climb.

[-]passive_fist12y10

I see. Thanks for the link.

[-]cousin_it12y60

Great post. Thank you for improving my intuition.

[-]V_V12y50

I'm under the impression that all these "Löbian obstacles" essentially boil down to the "you can't be sure that you are not insane" kind of stuff.

IIUC, your thesis advisors scenario can be solved by having a first thesis advisor who only outputs theorems which are provable within some formal system and every other thesis advisor trusting by axiom his predecessor.
This doesn't work in your (MIRI's) framework because you want the agent to trust not its ancestors but its successors, and you want this to hold for an unbounded number of generations.
But this may just be a problem of your framework, not some fundamental propriety of computational agents. At least, you haven't produced a satisfactory argument for that.

Anyway, I think that framing these problems in terms of provability logic might be inappropriate, as concepts like "beliefs", "trust" or "agents" are difficult to formalize in that language, hence you get a mix of formal and informal reasoning, which is the traditional recipe for paradoxes and errors.

It's much better to frame these problems directly in terms of computer programs. This way agents, internal states, world physics, etc., can be defined in a much more precise way.

In this framework the main "Gödelian" result is Rice's theorem, which, among other things, implies that functional equivalence between arbitrary programs is uncomputable. Nevertheless, there exist under-approximations of functional equivalence which are computable, and this can be exploited by "cliquing" strategies in scenarios where programs have to "trust" each other (e.g. program-equilibrium prisoner's dilemma).
Cliquing based on a lexical equivalence predicate is restrictive, obviously, but it can be generalized on any predicate that under-approximates functional equivalence.
Of course, this assumes that you start with a program that is free of bugs, OS and compilers are free of bugs, hardware never glitches, and so on. But, unrealistic as it is, this seems to be an unavoidable assumption: no program will ever be able to debug itself, or to certify that the hardware it runs on works as advertised.

[-]cousin_it12y10

Yeah, good points. Previously discussed in this post and this comment. (I see you also commented there.) I'd love to see more progress in that direction.

[-]V_V12y00

Thanks. Added a comment in the WD/EY subthread.

[-]RolfAndreassen12y50

Probably a stupid question: What is the difference between a proof, and a proof that something is provable? I understand the difference between a constructive proof and an existence proof, but in the case of proofs it seems that they should be the same.

Do we have any actual examples of proof-by-proving-provability?

[-]Benya12y20

There is a way to write a predicate Proves(p,f) in the language of PA which is true if f is the Gödel number of a formula and p is the Gödel number of a proof of that formula from the axioms of PA. You can then define a predicate Provable(f) := exists p. Proves(p,f); then Provable(f) says that f is the Gödel number of a provable formula. Writing "A" for the Gödel number of the formula A, we can then write

PA |- Provable("A")

to say that there's a proof that A is provable, and

PA |- Provable("Provable("A")")

to say that there's a proof that there's a proof that A is provable. (Of course, PA |- A just says that there is a proof of A.)

On the metalevel, we know that

PA |- A

if and only if

PA |- Provable("A").

On the other hand, PA proves neither A -> Provable("A") nor Provable("A") -> A for general A.

(Hope that helps?)

[-]RolfAndreassen12y20

Let me see if I can put that in my own words; if not, I didn't understand it. You are saying that humans, who do not operate strictly by PA, know that a proof of the existence of a proof is itself a proof; but a reasoner strictly limited to PA would not know any such thing, because it's not a theorem of PA. (PA being just an example - it could be any formal system, or at least any formal system that doesn't include the concept of proofs among its atoms, or concepts.) So such a reasoner can be shown a proof that a proof of A exists, but will not know that A is therefore a theorem of PA. Correct?

To me this seems more like a point about limitations of PA than about AI or logic per se; my conclusion would be "therefore, any serious AI needs a formal system with more oomph than PA". Is this a case of looking at PA "because that's where the light is", ie it's easy to reason about; or is there a case that solving such problems can inform reasoning about more realistic systems?

[-]CalmCanary12y30

Strictly speaking, Lob's Theorem doesn't show that PA doesn't prove that the provability of any statement implies that statement. It just shows that if you have a statement in PA of the form (If S is provable, then S), you can use this to prove S. The part about PA not proving any implications of that form for a false S only follows if we assume that PA is sound.

Therefore, replacing PA with a stronger system or adding primitive concepts of provability in place of PA's complicated arithmetical construction won't help. As long as it can do everything PA can do (for example, prove that it can prove things it can prove), it will always be able to get from (If S is provable, then S) to S, even if S is 3*5=56..

[-]RolfAndreassen12y00

Let me see what happens if I put in a specific example. Suppose that

If 3*5=35 is a theorem of PA, then 3*5=35

is a theorem of PA. Let me refer to "3*5=35 is a theorem" as sentence 1; "3*5=35" as sentence 2, and the implication 1->2 as sentence 3. Now, if 3 is a theorem, then you can use PA to prove 2 even without actually showing 1; and then PA has proven a falsehood, and is inconsistent. Is that a correct statement of the problem?

If so... I seem to have lost track of the original difficulty, sorry. Why is it a worry that PA will assert that it can prove something false, but not a worry that it will assert something false? If you're going to worry that sentence 3 is a theorem, why not go straight to worrying that sentence 2 is a theorem?

[-]TobyBartels12y00

We generally take it for granted that sentence 2 (because it is false) is not a theorem (and therefore sentence 1 is false), and the argument is meant to show that sentence 3 is therefore also not a theorem (even though it is true, since sentence 1 is false). This is a problem because we would like to use reasoning along the lines of sentence 3.

To see the real issue, you should replace sentence 2 with a conjecture whose truth you don't yet know but would like to know (and modify sentences 1 and 3 correspondingly). Now wouldn't you like sentence 3 to be a true? If you knew that sentence 1 was true, wouldn't you like to conclude that sentence 2 is true? Yet if you're PA, then you can't do that.

[-]xnn12y30

"therefore, any serious AI needs a formal system with more oomph than PA"

The problem with that is that the same argument goes through in exactly the same way with any stronger system replacing PA. You might first try something like adding a rule "if PA proves that PA proves S, then S". This solves your original problem, but introduces new ones: there are now new statements that your system can prove that it can prove, but that it can't prove. Eliezer discusses this system, under the name PA+1, in You Provably Can't Trust Yourself .

[-]Mitchell_Porter12y20

our agent treats proofs checked inside the boundaries of its own head different from proofs checked somewhere in the environment

Why is that a problem? Except in the same sense that gravity is a problem when you want to lift something.

It's just a fact that people and machines make mistakes. The ideal approach to results proved by other beings is to treat them as less than certain. Indeed, an ideal reasoner might want to extend this principle to its own thoughts too, if only a sensible way to do this could be found.

If you want an ideal reasoner to treat conclusions it has derived itself, and conclusions derived by another reasoner which is probably also ideal, with the same degree of credence, then the ideal approach is for the first reasoner to doubt its own conclusions in the way that it doubts the conclusions of the other reasoner.

Throughout, I have meant "ideal" from a perspective of epistemic caution, where you value not believing wrong things. If you're in a hurry, or really do trust certain other reasoners, then of course you can simply design a reasoner to treat the output of certain other reasoners as apriori valid.

[-][anonymous]12y20

The "tiling agents" issue mostly wasn't relevant to this article, but that is the goal of getting around Lob's Theorem. If and only if you can get around Lob's Theorem and prove things regarding formal systems as complex as yourself, then and only then can you construct a reasoning system more powerful than yourself and set it to work on your own task.

Otherwise, you can't prove that an agent with a higher-order logic than yourself will function according to your goals or, put another way, retain your beliefs (ie: I don't want to replace myself with an agent who will reason that the sky is green when I'm quite sure it's blue).

[-]ThisSpaceAvailable12y00

Can the concept of "my" reasoning, as opposed to "someone else's" reasoning, be formalized? Suppose I manage to write an AI that can run on my laptop. I boot my laptop up, run the program, and it finds a proof of X. It writes "a proof of X exists" on my hard drive in the "what statements has proofs database". I shut my laptop down, reboot my laptop, and restart the program. It checks the "what statements has proofs database", and finds the "a proof of X exists" statement. Is that "its" reasoning? It was created by that program. But it wasn't created by that instance of the program. What if a bunch of people are running AIs, all with the same source code, on different laptops. If one of those AIs queries another, is that "its own reasoning", or another AI's? What if Saturday, an AI says "I knew on Friday that X is true, I didn't record my proof, but I know that I am a rational thinker, so I know it's true". But the reason it knew on Friday that X is true was that it knew on Thursday that X is true, and so on. Is this the same paradox?

Also, there is a difference between creating an AI that you trust, and finding one. One can imagine a universe in which an infinite past chain of AIs exists, but the idea of creating an infinite chain of reasoning is a lot harder to imagine.

There are some problems with the idea that PA can be perfectly substantiated physically, but putting that aside, there's even further problems in extrapolating into the past. What's to stop me from creating an AI with the same source code as you, and putting "2+2 =5 is true" in its memory banks? Can you tell the difference between an AI that has put "there is a proof of X" in its memory banks according to its programming, and an AI that was simply created with "there is a proof of X" in its memory banks to begin with? Even if it "remembers" going through the process of coming up with a proof, that memory could have been there to begin with. If it remembers all of its reasoning, you could check that that reasoning is consistent with its source code, but that would take as much work as just running the calculations over again.

[-]lmm12y00

Is biting the bullet on the thesis advisor analogy really such a problem? Given an infinite human history, it seems like if the proposition was actually false then at some point someone would have noticed.

[-]Roxolan12y80

if the proposition was actually false then at some point someone would have noticed.

You're thinking of real human beings, when this is just a parable used to make a mathematical point. The "advisors" are formal deterministic algorithms without the ability to jump out of the system and question their results.

LESSWRONG
LW

LESSWRONG
LW

37

Naturalistic trust among AIs: The parable of the thesis advisor's theorem

37

37