Thoughts on the 5-10 Problem

[-]Dagon6y60

I think I missed the indirection required to use Lob's theorem (which I thought was about not being able to prove that a statement is unprovable, not about proving false things - that is, for our formal systems, we accept incompleteness but not incorrectness).

Mainly I don't see where you have the actual proposition setup in your "proof" - the knowledge that you're choosing between $5 and $10, not between $5 and "something else, which we are allowed to assume is $0". If you don't know (or your program ignores) that you will be offered $10 iff you reject the $5, you'll OF COURSE take the $5. You can test this in real humans: go offer someone $5 and don't say anything else. If they turn it down, congratulate them on their decision-making and give them $10. If they take the $5, tell them they fell victim to the 5-10 problem and laugh like a donkey.

FactorialCode's comment _is_ an application that shows the problem - it's clearly "among things I can prove, pick the best", not "evaluate the decision of passing up the $5". I'd argue that it's _ALSO_ missing the important part of the setup - you can prove that taking $10 gives you $10 as easily as that $5 gives you $5, and the "look for proof" is handwaved (better than absent) without showing why the proof.

[-]Liam Donovan6y10

My (possibly very incorrect) takeaway from the post, as someone with very little background in mathematical logic, was that "If I can prove x has higher utility than y, then I will do x" (statement 1) is a bad heuristic for an EDT agent that can reason about its own source code, because outputting x will be a fixed point of this decision process* even when this does not return higher utility. Specifically, an EDT agent will choose action x iff the utility of choosing x is higher than that of choosing y (assuming the utilities are different). Thus, assuming statement 1 is equivalent to assuming "if I can prove x has higher utility than y, x has higher utility than y" (statement 2) for the EDT agent. Because assuming statement 2 leads to absurd conclusions (like the agent taking the 5 dollar bill), assuming it is a bad heuristic.

This use of Lob's theorem seems to do exactly what you want: show that we can't prove a statement is unprovable. If we prove a statement of the form "if a is provable then a is true" , then the contrapositive "if a is not true then it is not provable" follows. However, I thought the point of the post is that we can't actually prove a statement of this form, namely the statement "if x does not have higher utility than y, then I cannot prove that x has higher utility than y" (statement 3). Statement 3 is necessary for the heuristic in statement 1 to be useful, but the post shows that it is in fact false.

The point of the post isn't to prove something false, it's to show that we can't prove a statement is unprovable.

*I'm not sure if I'm these terms correctly and precisely :/

[-]Tofly6y10

Yes, you could make the code more robust by allowing the agent to act once its found a proof that any action is superior. Then, it might find a proof like

U(F) = 5

U(~F) = 10

10 > 5

U(~F) > U(F)

However, there's no guarantee that this will be the first proof it finds.

When I say "look for a proof", I mean something like "for each of the first 10^(10^100)) Godel numbers, see if it encodes a proof. If so, return that action.

In simple cases like the one above, it likely will find the correct proof first. However, as the universe gets more complicated (as our universe is), there is a greater chance that a spurious proof will be found first.

[-]Chris_Leong6y*30

I suggested a similar approach in deconfusing logical counterfactuals where we erase information about the agent so that we end up with multiple possible agents, though I wouldn't be surprised if other people have also tried asking their agent to ask about how other agents reason. Your approach is different in that the original agent isn't included among the set of agents considered and that seems like a useful adaption that I hadn't considered as long as we can provide appropriate justification. I also provide some more just

Anyway, it's good to see someone else thinking along a similar (but different) track and I'd be curious to hear what you think about my approach.

[-]FactorialCode6y20

If I can try and make your solution concrete for the original 5-10 problem. Would it look something like this?

A() := 
    let f(x) :=
         Take A() := x as an axiom instead of A() := this function
         Take U() := to be the U() in the original 5-10 problem
         Look for a proof of "U() = y"
         return y
    in 
         return argmax f(x) 
    where x in {5,10}

[-]Gurkenglas6y20

Edit: So the reason we don't get the 5-and-10 problem is that we don't get ☐(☐(A=5=>U=5 /\ A=10=>U=0) => (A=5=>U=5) /\ A=10=>U=0), because ☐ doesn't have A's source code as an axiom. Okay. (Seems like this solves the problem by reintroducing a cartesian barrier by which we can cleanly separate the decision process from all else.) (My own favorite solution A = argmax_a(max_u(☐(A=a=>U>=u))) also makes ☐ unable to predict the outcome of A's sourcecode, because ☐ doesn't know it won't find a proof for A=10=>U>=6.)

[-]FactorialCode6y20

How would f() map 10 to 0? Wouldn't that require that from

A() := 10
U() := 
     if A() = 10
         return 10
     if A() = 5
         return 5

there's a proof of

U() = 0

My understanding is that in the original formulation, the agent takes it's own definition along with a description of the universe and looks for proofs of the form

[A() = 10 -> U() = x] & [A() = 5 -> U() = y ]

But since "A()" is the same in both sides of the expression, one of the implications is guaranteed to be vacuously true. So the output of the program depends on the order in which it looks for proofs. But here f looks for theorems starting from different axioms depending on it's input, so "A()" and "U()" in f(5) can be different than "A()" and "U()" when f(10).

[-]NothingnessAbove6y10

As far as I can tell, this problem is an exercise in logical uncertainty. Consider an example agent A which makes a decision between, say, options a and b, with possible outcomes U=0, U=5, and U=10. In general, the agent uses its logical uncertainty estimator to compare the expected utilities 0*P(U=0|a)+5*P(U=5|a)+10*P(U=10|a) to 0*P(U=0|b)+5*P(U=5|b)+10*P(U=10|b). Of course, this causes a divide by zero error if A is certain of which action it will take. To avoid this, if A ever proves that it will take an action, it will immediately take a different action, regardless of the expected utility assigned to that action. So, if A ever proves what action it takes in advance of taking it, it will be wrong, and unsound. Thus if A is sound it cannot prove in advance what action it will take. In the $5 and $10 game, A will correctly assess the expected values, and choose the $10 because it is higher. It will be able to correctly assess the expected value of the $5 because it will hold nonzero probability of taking the $5 before it makes its decision. Why will it hold this nonzero probability, when $10 is the obvious choice? Because, since by Lob's theorem A can't prove itself sound, and if it is unsound, it might prove in advance that it will take the $10, and therefore take the $5 instead.

I don't know if this was clear but this is not a full answer, because logical uncertainty is hard and I'm just assuming agent A is somehow good at it.

Edit: How does A calculate the expected utility of another agent being in its position, when it is nontrivially embedded in its environment? Of course, if the agent is not embedded, the 5-10 problem ceases to be an issue(AIXI is not bothered by it), for precisely this reason: it's easy to see what the counterfactual world where the agent decided to take another action looks like, rather than in this case where that counterfactual world might be logically impossible or ill-defined.

[-]Liam Donovan6y10

What was wrong with specifying an agent that uses "[decision theory] unless it's sure it'll make a decision, then it makes the opposite choice"?

[-]Tofly6y20

Doesn't that mean the agent never makes a decision?

[This comment is no longer endorsed by its author]Reply

[-]Suh Dude6y20

Not really. It means that the agent never mathematically proves that it will make a decision before it makes it. The agent makes decisions by comparing expected utilities on different actions, not by predicting what it will do.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

18

Thoughts on the 5-10 Problem

18

18

5 dollars is better than 10 dollars

Simplified Example

Solution?