I just skimmed through On the Controllability of Artificial Intelligence, and am wondering if others have read it and what they think about it. It made me quite scared.

In particular: is AI Alignment simply unsolvable/not fully solvable?

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Charlie Steiner

Jun 10, 2022

60

Thanks for the link!

I think there's space for the versions of "AI control" he lays out to be impossible, while it's still possible to build AI that makes the future go much better than it otherwise would have.

For example, one desideratum he has is that our current selves, "", shouldn't be bossed around (via the AI) by versions of ourselves that have e.g. gone through some simulated dispute-resolution procedure. Which is a defensible consequence of "control," but is I think way too strong if all we want is for the future to be good.

Thanks for your reaction!

I think this is generally my vision, after thinking about it a bit more, as well.

It also seems to me that if there's absolutely, really no way at all to make an agent starter than you do things that are good for you, then an agent that realizes that wouldn't FOOM.

Jeff Rose

Jun 10, 2022

-130

Yes, AI Alignment is not fully solvable.  In particular, if an AGI has the ability to self-improve arbitrarily and has a complicated utility function it will not be possible to guarantee that an aligned AI remains aligned.