My thought is that, instead of an AI purely trying to maximize a utility function, we give it a goal of reaching a certain utility, and then each time it reaches that goal, it shuts off and we can decide if we want to set the utility ceiling higher. Clearly, this could have some negative effects, but it could be useful if we used it in concert with other safety precautions.

New Answer
New Comment

1 Answers sorted by

Maybe_a

Apr 11, 2022

60

Thanks for giving it a think.

Turning off is not a solved problem, e.g. https://www.lesswrong.com/posts/wxbMsGgdHEgZ65Zyi/stop-button-towards-a-causal-solution 

Finite utility doesn't help, as long as you need to use probability. So you get, 95% chance of 1 unit of utility is worse than 99%, is worse than 99.9%, etc. And then you apply the same trick to probabilities you get a quantilizer. And that doesn't work either https://www.lesswrong.com/posts/ZjDh3BmbDrWJRckEb/quantilizer-optimizer-with-a-bounded-amount-of-output-1 

The point is that the AI turns itself off after it fulfills its utility function, just like a called function. It doesn't maximize utility, it sets utility to greater than 25. There is no ghost in the machine that wants to be alive, and after its utility function is met, it will be inert. 

None of the objections listed in the 'stop button' hypothetical apply.

I'm not sure I understand your objection to finite utility. Here's my model of what you're saying:

I have a superintelligent agent with a utility ceiling of 25, where utility is equivalent to paperc

... (read more)