Am I understanding the problem of fully updated deference correctly?

The problem is more about corrigibility - making an AI that will shut down if you ask it to. The idea is that uncertainty about utility function isn't a good way to implement corrigibility. When you tell the AI to shut down because you know more about the true utility function, it might reply "nah, I won't shut down, I've learned enough to optimize the true utility now".

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

3

Am I understanding the problem of fully updated deference correctly?

3

3