9 Value learners & wireheading

by Manfred

3rd Feb 2016

6 min read

3

9

Personal Blog

9

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:54 AM

[-][anonymous]10y10

"Even if the Go-playing AI couldn't modify itself to only care about the current way it computes values of actions, it might make suboptimal moves that limit its future options, because its future self will compute values of actions the 'wrong' way."

Please correct me if I'm misunderstanding something: why would a value-learner care about retaining its current values? I'm having trouble seeing the jump from the Go planning process and the statement that a Dewey learner of sufficient intelligence would want to self-sabotage.

Reply

[-]Manfred10y60

Attempt one:

Suppose that you were a hedonist, and that your decision-making process was to only care about the next three years. So you have a genius plan - you'll take out a loan that you don't have to pay back for 3 years, and then spend the money as hedonistically as possible, and then after those 3 years are up you'll probably lose your house or get convicted of fraud or something but whatever.

But then you realize that your future selves also care about the next three years, for them. And so in two years your future self is going to be all stressed out and focused on paying off the loan or going into hiding in Zimbabwe or something, which detracts from your genius plan. So the really genius plan that gets the most utility over the next three years would both take out the loan, and also somehow ensure your future self had a good time and didn't, like, worry about paying back the loan.

Attempt two:

Check out the example used in this paper of a "sophisticated planner" (figure 1). It realizes that its decision-making criteria are going to drift over time, so it takes a suboptimal route so that its future self can't screw up the genius plan. When we approve of the past agent's values we call this "forward thinking" and "sophisticated," but when we don't favor the past agent over its future selves, we call it "self-sabotage."

Reply

[-][anonymous]10y00

This helps, thank you. I almost objected by saying something like "I have a lot of goals that would be better achieved by a better decision making process, or a different decision making process" but once you've altered that, there's not a perfect guarantee that your goals will remain the same.

I actually typed out a bunch of responses, but got to the point where I'm not on-topic anymore. I think I understand the challenge a little better now, though!

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

9

Value learners & wireheading

9

9