I believe the empirical claim. As I see it, the main issue is Goodhart: an AGI is probably going to be optimizing something, and open-ended optimization tends to go badly. The main purpose of proof-level guarantees is to make damn sure that the optimization target is safe. (You might imagine something other than a utility-maximizer, but at the end of the day it's either going to perform open-ended optimization of something, or be not very powerful.)

The best analogy here is something like an unaligned wish-granting genie/demon. You want to be really ca... (read more)

AI Alignment Open Thread August 2019

