Still unsafe, in both cases.

The second case is simpler. Think about it in analogy to a wish-granting genie/demon: if we have some intuitive argument that our wish-contract is safe and a few human-designed tests, do we really expect it to have no loopholes exploitable by the genie/demon? I certainly wouldn't bet on it. The problem here is that the AI is smarter than we are, and can find loopholes we will not think of.

The first case is more subtle, because most of the complexity is hidden under a human-intuitive abstraction layer. If we had an unaligned... (read more)

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments

37

Ω 12


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.