I'm quite new to the AI alignment problem - have read something like 20-30 articles about it on LW and around, aiming for the most upvoted - and have a feeling that there is a fundamental problem that is mostly ignored. I wouldn't be surprised if this feeling was wrong (because it is either not fundamental, or not ignored).
Imagine a future world, where a singularity occurred, and we have a nearly-omnipotent AI. AI understands human values, tries to do what we want, makes no mistakes - so we could say that humanity did pretty well on the AI alignment task. Or maybe not. How do we find out?
Let's consider a few scenarios:
- AI creates a heaven on Earth for everyone. Fertility rates keep falling - e.g. because people in heaven are not interested in procreation, or maybe we globally accept VHEM as the best ethical position, or for some other reason. Unfortunately, immortality turns out to be impossible. Humans go extinct.
- AI creates a virtual heaven for everyone. We live forever lives as full of fun as possible - never knowing nothing of this is real. Note: there are no other sentient beings in those virtual heavens, so this has nothing to do with the simulation hypothesis.
- It turns out that smarter, more knowledgable humans, who are "more the people they wished they were" just want to be able to get more happiness from simple everyday activities. AI strictly follows the Coherent Extrapolated Volition principles and skilfully teaches us Buddhism-or-whatever. Earth becomes the prettiest garden in the Universe, tended by a few billions of extremely happy gardeners.
Now, what mark would you give humanity on the "AI alignment" task in each of the scenarios? Is there any agreement among AI alignment researchers about this? I would be surprised neither by AAA nor FFF - despite the fact that I didn't even touch the really hard problems, like spawning zillions of ems.
I have a feeling that such issues don't get the attention they really deserve - we all happily agree about being turned into paperclips, creation of smiley-faces or mindcrime, and hope the answer to the rest of the problems is "utilitarianism".
So, the question(s).
- Is there any widely-accepted way of solving such problems (i.e. rating humanity in the scenarios above)?
- Is this considered an important problem? Let's say, the one that has to be solved before singularity? If not - why?
- (Speculative bonus question, on which I hope to elaborate one day) are we sure the set of things we call "human values" is not contradictory when executed by an omnipotent being?