Hard problem of corrigibility

If we say that this uncertainty correlates with some outside physical object (intended to be the programmers), the default result in a sufficiently advanced agent is that you disassemble this object (the programmers) to learn everything about it on a molecular level, update fully on what you've learned according to whatever correlation that had with your utility function, and plunge on straight ahead.

Would this still happen if we give high prior probability to utility functions that only favor a small target, and yield negative billion utility otherwise? Would the information value of disassembling the programmer still outweigh the high probability that the utility function comes out negative?

Wouldn't this restrict the AI to baby steps until it is more certain about the target, in general?

LESSWRONG
LW

LESSWRONG
LW

Hard problem of corrigibility

Hard problem of corrigibility