LESSWRONG
LW

[ Question ]

Is AI Alignment Impossible?

1 min read10th Jun 20222 answers No comments

3

I just skimmed through On the Controllability of Artificial Intelligence, and am wondering if others have read it and what they think about it. It made me quite scared.

In particular: is AI Alignment simply unsolvable/not fully solvable?

New to LessWrong?

Getting Started

Is AI Alignment Impossible?

6Charlie Steiner

New Answer

New Comment

2 Answers sorted by
top scoring

Charlie Steiner

Jun 10, 2022

60

Thanks for the link!

I think there's space for the versions of "AI control" he lays out to be impossible, while it's still possible to build AI that makes the future go much better than it otherwise would have.

For example, one desideratum he has is that our current selves, "", shouldn't be bossed around (via the AI) by versions of ourselves that have e.g. gone through some simulated dispute-resolution procedure. Which is a defensible consequence of "control," but is I think way too strong if all we want is for the future to be good.

Thanks for your reaction!

I think this is generally my vision, after thinking about it a bit more, as well.

It also seems to me that if there's absolutely, really no way at all to make an agent starter than you do things that are good for you, then an agent that realizes that wouldn't FOOM.

Jun 10, 2022

-130

Yes, AI Alignment is not fully solvable. In particular, if an AGI has the ability to self-improve arbitrarily and has a complicated utility function it will not be possible to guarantee that an aligned AI remains aligned.