A slightly misspecified reward function can lead to anything from perfectly aligned behavior to catastrophic failure. So I think we need much stronger and more formal arguments to believe that catastrophe is almost inevitable than EY's genie post provides.

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments


Ω 12

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.