LESSWRONG
LW

Rafael Cosman
6020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
What is wrong with this approach to corrigibility?
Rafael Cosman3y50

Really appreciate all the thoughtful and substantive comments!! Thanks very much, honestly was exactly what I was hoping for from posting.

Reply
What is wrong with this approach to corrigibility?
Rafael Cosman3y30

If implemented as described, the AI should be exactly indifferent to pushing the button? I guess the AI’s behavior in that situation is not well defined… and if we make the button give expected value minus epsilon reward, then the AI might kill you to stop you from pressing the button (because it wants that epsilon reward!)

So overall I suppose this is a fair criticism of the approach and is possibly what Paul means by issues with precisely balancing!

Reply
No posts to display.