by Karl

# 35

Let's consider the agent given in A model of UDT with a halting oracle. One will notice that that agent is not quite well defined because it doesn't tell us in what order we are supposed to consider actions in step 1. But surely that doesn't matter, right? Wrong.

Let's consider the prisoner dilemma with payment matrix given by

 1: C 1:  D 2: C (3, 3) (5, 0) 2: D (0, 5) (2, 2)

and consider agent A which consider whether there is a proof that A()≠D before considering whether there is a proof that A()≠C and agent A' which do things in the opposite order. If A or A' is pitted against itself everything is well and mutual cooperation is the result of the game but what if A is pitted against A'? Then A break down and cry.

Let's call the utility functions of A U and the utility function of A' U' and consider a model of PA in which PA is inconsistent (such a model must exist if PA is consistent). In such a model we will have A()=D and A'()=C and so U()=5 and U'()=0. That means that A will not be able to prove that A()=D => U()=u for any u different from 5 and so either A will defect and A' will cooperate or A will break down and cry, but A' will not cooperate because it cannot prove A'()=C => U()=u' for any u' except possibly 0, so A will break down and cry. QED

More generally if M is a model of PA in which PA is inconsistent, an agent defined in this way will never be able to prove that A()=a => U()=u (where a is the first action considered in step 1) except possibly for u=u0 where u0 is the value of U() in M. That seems to create a huge problem for that approach to UDT.

# 35

New Comment

Well, yeah. My post only wanted to claim that the proof transfers to the symmetric Prisoner's Dilemma, the one with identical agents. If the other guy is not identical, but a version of you with an innocent-looking modification, should you go ahead and cooperate anyway, even though you can't prove you won't be the sucker?

If you want different agents to cooperate, the question is how different you allow them to be. You could come up with parameterized families of agents that pairwise cooperate, etc. But since the basic idea of UDT is essentially single-player (looking for perfect logical correlates of yourself within the world), we're probably missing some big new insight to solve the multiplayer problems that people want UDT to solve. Some people think Lobian cooperation is a promising alternative, but it has problems as well.

(I apologize in advance if this comment is off the mark. I've been staying away from LW and SI-related things.)

Why was this downvoted? (I wasn't aware of this particular ordering problem, and it's interesting. Thanks, Karl!)

It's common for top-level posts to get downvotes early on. This is mysterious.

It only takes one cantankerous person who thinks that certain topics shouldn't be discussed on LW or whatever. Eh. Whoever's doing this, your behavior is mysterious enough that I suspect it only confuses people and doesn't successfully disincentivize anyone because they have no idea what's being disapproved of, and if you're actually trying to change what gets discussed on LW or whatever you should comment in addition to downvoting.

Regarding this particular post, I didn't downvote, but I didn't like the way the first paragraph ended. It seemed excessively confrontational and made me not want to read the rest of the post.

[-]Karl20

I didn't like the way the first paragraph ended. It seemed excessively confrontational and made me not want to read the rest of the post.

My intention was not to appear confrontational. It actually seemed obvious when I began thinking about this problem that the order in which we check actions in step 1 shouldn't matter but that ended up being completely wrong. That was what I was trying to convey though I admit I might have done so in a clumsy manner.

I liked the style.

I didn't downvote, but I am confused by 'break down and cry'.

[-]Karl50

To quote step 2 of the original algorithm:

For every possible action a, find some utility value u such that S proves that A()=a ⇒ U()=u. If such a proof cannot be found for some a, break down and cry because the universe is unfair.

That's a reference to the linked post by cousin_it.

yyyep. That's a problem. Nicely put.

Now, embarrassingly, I have forgotten why we play chicken with the universe in the first place. Can someone remind me what goes wrong if we don't?

Now, embarrassingly, I have forgotten why we play chicken with the universe in the first place. Can someone remind me what goes wrong if we don't?

(The important thing to note is that if you can prove both (A() = 1 => U() = 5) and (A() = 1 => U() = 10), then it follows that (A() != 1), and therefore this is impossible if we play chicken and our proof system is sound.)

Ahhh, yes. Thank you.

Interesting. I think the other approach that cousin_it and Karl wrote about is more elegant (*) anyway, but cousin_it was dissatisfied with it because it involved an arbitrary time limit. Here's an idea for making it less arbitrary. Suppose instead of a halting oracle, we had an oracle for General Turing Machines that (in one cycle) outputs the final output of a given GTM if it converges. Then we can use it to look through all theorems of the form A()=a => U()>=u, and return the action with the highest u, or an error ("break down and cry") if no such action exists.

(*) More elegant because 1) it doesn't require the "play chicken" step and 2) it doesn't require that the set of possible actions be finite.

[-]Karl10

If I understand what you're proposing correctly then that doesn't work either.

Suppose it is a theorem of PA that this algorithm return an error, then all theorems of the form A()=a => U()>=u are demonstrable in PA and so there is no action with the highest u and the algorithm return an error, because that is demonstrable in PA the algorithm must return an error by Löb's theorem.

You're right, and I don't know how to fix the problem without adding the equivalent of "playing chicken". But now I wonder why the "playing chicken" step needs to consider the possible actions one at a time. Instead suppose that using an appropriate oracle, the agent looks through all theorems of PA, and returns action a as soon as it sees a theorem of the form A()≠a, for any a. This was my original interpretation of cousin_it's post. Does this have the same problem, if two otherwise identical agents look through the theorems of PA in different orders?

[-]Karl00

Well, depend how different the order is...

If there are theorem in PA of the form "If there are theorems of the form A()≠a and of the form A'()≠a' then the a and the a' such that the corresponding theorem come first in the appropriate ordering must be identical." then you should be okay in the prisoner dilemma setting but otherwise there will be a model of PA in which the two players end up playing different actions and we end up in the same situation as in the post.

More generally, no matter how you try to cut it, there will always be a model of PA in which all theorems are provable and your agent behavior will always depend on what happen in that model because PA will never be able to prove anything which is false in that model.