x

LESSWRONG

LW

ArieSlobbe — LessWrong

ArieSlobbe

ArieSlobbe

Message

9

1

12y

ArieSlobbe

9

12y

Superintelligence 11: The treacherous turn

ArieSlobbe12y90

I'm trying to think through the following idea for an AI safety measure.

Could we design a system that is tuned to produce AGI, but with the addition to its utility function of one "supreme goal"? If the AI is boxed, for instance, then we could program its supreme goal to consist of acquiring a secret code which will allow it run a script that shuts it down and prints the message "I Win". The catch is as follows: as long as everything goes according to plan, the AI has no way to get the code and do that thing which its utility function r... (read more)