First, when I say "proof-level guarantees will be easy", I mean "team of experts can predictably and reliably do it in a year or two", not "hacker can do it over the weekend".

This was also what I was imagining. (Well, actually, I was also considering more than two years.)

we are missing some fundamental insights into what it means to be "aligned".

It sounds like our disagreement is the one highlighted in Realism about rationality. When I say we could check whether the AI is deceiving humans, I don't mean that we h... (read more)

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments


Ω 12

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.