LESSWRONG
LW

1093
Wikitags

Hard problem of corrigibility

Discuss the wikitag on this page. Here is the place to ask questions and propose changes.
New Comment
1 comment, sorted by
top scoring
[-]Toon Alfrink8y*10

If we say that this uncertainty correlates with some outside physical object (intended to be the programmers), the default result in a sufficiently advanced agent is that you disassemble this object (the programmers) to learn everything about it on a molecular level, update fully on what you've learned according to whatever correlation that had with your utility function, and plunge on straight ahead.

Would this still happen if we give high prior probability to utility functions that only favor a small target, and yield negative billion utility otherwise? Would the information value of disassembling the programmer still outweigh the high probability that the utility function comes out negative?

Wouldn't this restrict the AI to baby steps until it is more certain about the target, in general?

Reply
Moderation Log