I believe the empirical claim. As I see it, the main issue is Goodhart: an AGI is probably going to be optimizing something, and open-ended optimization tends to go badly. The main purpose of proof-level guarantees is to make damn sure that the optimization target is safe. (You might imagine something other than a utility-maximizer, but at the end of the day it's either going to perform open-ended optimization of something, or be not very powerful.)

The best analogy here is something like an unaligned wish-granting genie/demon. You want to be really ca... (read more)

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments


Ω 12

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.