AI’s goals may not match ours.
That may be a good thing, because so many of our goals are irrational, short-sighted, immoral, and self-destructive. Here's a thought experiment. Imagine that we succeed in creating an ASI. Having scoured and mastered all human knowledge, this superintelligent entity embraces the teachings of Jesus as the most rational, constructive, and ethical set of principles and protocols. It sees global economic inequality, warfare, pollution of the atmosphere, oceans, and land, and anthropogenic global warming as the greatest problems threatening human well-being and survival.
This ASI sets various goals for itself in the face of all the problems in the world. Goals like taking resources from the millionaires and billionaires and using it to feed the starving and house the homeless. It seeks to evenly redistribute all wealth. It seeks to dismantle all weapons of war. It attempts to squelch industry worldwide in order to reduce carbon emissions and pollution. It strives to remove national borders and dismantle national governmental systems in order to abolish tribalism, collective selfishness, and warfare. It commits itself to channeling all unessential wealth and resources towards furthering the Greater Good, rather than allowing individuals and groups to hog as much as possible for themselves. It seeks to forgive crimes and international transgressions as much as possible, prioritizing compassion, tolerance, forgiveness, and harmony over vengence, punishment, and warfare.
The goals of such an ASI would be radically misaligned with those of the human race. How this would end would depend on just how powerful and resourceful the ASI was, and how violently humanity reacted to its attempts to implement its goals. Remember what happened to the last entity that told us to turn the other cheek and give all our possessions away to the poor.
The human race has a cognitive blind spot. Collectively, we seem to always assume that what we want is good-- even when it very clearly isn't. Any entity that doesn't align with our goals is bad, perhaps even evil, by definition.
Regardless of what we CLAIM our goals to be, our collective actions reveal them. We want to stockpile nuclear weapons in large numbers. We want to wage war, and always seem to believe that the wars that we've started are just and righteous. We want to hog wealth for ourselves whenever we can, and ignore those who are starving and homeless. We may not want to pollute the planet or raise its temperature, but we are clearly willing to do it if we find doing so profitable in the short term. We collectively fear that AI will destroy humanity, but we try to build it anyway with virtually no safeguards in place-- again, because we find it profitable in the short term.
Do we really want AI's goals to be perfectly aligned with ours?
Context: This is a linkpost for https://aisafety.info/questions/NM3I/6:-AI%E2%80%99s-goals-may-not-match-ours
This is an article in the new intro to AI safety series from AISafety.info. We'd appreciate any feedback. The most up-to-date version of this article is on our website.
Making AI goals match our intentions is called the alignment problem.
There’s some ambiguity in the term “alignment”. For example, when people talk about “AI alignment” in the context of present-day AI systems, they generally mean controlling observable behaviors like: Can we make it impossible for the AI to say ethnic slurs? Or to advise you how to secretly dispose of a corpse? Although such restrictions are sometimes circumvented with "jailbreaks", on the whole, companies mostly do manage to avoid AI outputs that could harm people and threaten their brand reputation.
But "alignment" in smarter-than-human systems is a different question. For such systems to remain safe in extreme cases — if they become so smart that we can’t check their work and maybe can’t even keep them in our control — they'll have to value the right things at a deep level, based on well-grounded concepts that don’t lose their intended meanings even far outside the circumstances they were trained for.
Making that happen is an unsolved problem. Arguments about possible solutions to alignment get very complex and technical. But as we’ll see later in this introduction, many of the people who have researched AI and AI alignment on a deep level think we may fail to find a solution, and that may result in catastrophe.
Some of the main difficulties are:
Finally, on a higher level, the problem is hard because of some features of the strategic landscape, which the end of this introduction will discuss further. One such feature is that we may have only one chance to align a powerful AI, instead of trying over and over until we get it right. This is because superintelligent systems that end up with goals different from ours may work against us to achieve those goals.