The problem with tests is that the AI behaving well when weak enough to be tested doesn't guarantee it will continue to do so.

If you are testing a system, that means that you are not confidant that it is safe. If it isn't safe, then your only hope is for humans to stop it. Testing an AI is very dangerous unless you are confidant that it can't harm you.

A paperclip maximizer would try to pass your tests until it was powerful enough to trick its way out and take over. Black box testing of arbitrary AI's gets you very little safety.

Also s... (read more)

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments

37

Ω 12


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.