Is AGI alignment even possible in the long term? Will AGI simply outsmart our best defenses? It would be, after all, superhuman (and by an enormous margin). Isn’t it likely that an AGI will recognize what actions humans took to control it and simply undo those controls? Or just create a novel move, like AlphaGo did, and completely sidestep them. An AGI could also just wait until conditions are favorable to take charge. What is time to an immortal intelligence? Especially time as short as a few human lifespans. Unless mis-alignment is physically impossible, it seems as if all attempts will ultimately be futile. I hope I’m wrong.

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Raemon

Apr 28, 2022

90

The idea is to make an AGI that actually just wants to help us, rather than an AGI that wants to do something else but is constrained.

I recommend Scott's Superintelligence FAQ for some basics if you haven't read it before. 

Thanks for answering and pointing out the FAQ Raemon! What Scott describes sounds like a harmonious relationship between humans and AGI. Is that a fair summary?.

Chinese Room

Apr 28, 2022

10

Provided that AGI becomes smart enough without passing through the universe-destroying paperclip maximizer stage, one idea could be inventing a way for humanity to be, in some form, useful to the AGI, e.g. as a time-tested biological backup 

A mutually beneficial relationship would be great! I have a hard time believing that the relationship would remain mutually beneficial over long time periods though.

Regarding the universe destroying part, it’s nice to know that half dark galaxies haven’t been discovered, at least not yet. By half dark I mean galaxies that are partially destroyed. That’s at least weak evidence that universe destroying AIs aren’t already in existence.

1Chinese Room2y
I wouldn't call being kept as biological backup particularly beneficial for humanity, but it's the only plausible way humanity being useful enough for a sufficiently advanced AGI I can currently think of. Destroying the universe might just take long enough for AGI to evolve itself sufficiently to reconsider. I should have actually used "earth-destroying" instead in the answer above.