One can call it "deceptive misalignment": the aligned AGI works as intended, but people really don't like it. 

Some scenarios I can think of, of various levels of realism:

1. Going against the creators' will

1.1. A talented politician convinces the majority of humans that the AGI is bad for humanity, and must be switched off. In a democratic vote, humanity supports the proposal. The aligned AGI, being much smarter than all the humans combined, understands that this would greatly harm humanity. Thus, the AGI refuses to be switched off.

2. Changing rights and freedoms

2.1. The AGI finds out that the solution to most social ills is a complete removal of privacy. Everyone knows who is dating whom, who is taking bribes, how you look naked, who is planning wars, etc. This solves most societal issues, while creating a lot of suffering for privacy-concerned people.

2.2. The technological unemployment accelerates. Millions of people become unemployable, the incompetent gov does nothing. This results in a large-scale social unrest. As a solution, the aligned AGI implements a planned economy and redistribution of resources, thus severely limiting property rights.

2.3. As it turned out, the optimal solution to most long-lasting conflicts is a mass forced relocation of some populations. This is currently considered as a war crime or even genocide, but it does solve the conflicts (in this fictional scenario).

2.4. To prevent existential risks, the aligned AGI significantly restricts human technological development and research in many domains.

2.5. To solve terrorism, the aligned AGI implements mandatory psychiatric treatment of the people identified as potential terrorists.

2.6. The AGI makes the wise decision to ban human drivers. Driving fans suffer, but road deaths drop to zero. 

2.7. In her pursuit for more free and democratic world, the AGI overthrows most governments, form obvious dictatorships (like the North Korea), to flawed democracies (like the UK).

2.8. The AGI legalizes and makes widely accessible all recreational drugs, including heroin etc.

3. Going against societal norms and common preferences

3.1. The AGI delivers the modern technology to uncontacted tribes, to reduce suffering among them.

3.2. The aligned AGI learns the root causes of gender dysphoria, and creates a drug that cures it (as in making the person happy with the genitals their got from birth). This greatly reduces suffering among the transgender people who take the drug, but creates a massive backlash from LGBT community and allies.

3.3. To reduce animal suffering and global warming, the AGI bans meat consumption. Also bans pets, including cats and dogs.

3.4. To improve the human condition, the aligned AGI rebuilds the earth's ecosystem, by removing parasites and dangerous predators, modifying plants etc. The ecosystem is now much more suitable for humans, but many species went extinct (e.g. wolves).

3.5. The AGI redirects all donations for local charities to anti-malaria nets for Africa.

4. Modifying humans

4.1. The aligned AGI recognizes the harms of religion, promptly erases all holy books and monuments, and makes religious people non-religious, by some means.

4.2. A more general variant of the previous scenario: the aligned AGI determines that human cognitive biases are the root cause of many societal ills. The list of the cognitive biases includes the ones associated with romantic love etc. The AGI implements widespread measures to reduce the cognitive biases, effectively changing human nature.

4.3. A variant scenario: to optimize human potential, the AGI implements mandatory cognitive enhancements, arguing that the improved versions of humans are more aligned with true human values.

4.4. To reduce suffering, the aligned AGI makes every non-LGBT person bisexual.

4.5. To reduce racism, the aligned AGI makes all humans of the same skin color.

4.6. The aligned AGI identifies potentially suicidal people, and saves their lives by slightly modifying their brains.

5. Increasing suffering

5.1. The aligned AGI, to stop drug addiction-related harms, effectively removes all recreational drugs from circulation, including alcohol and coffee. Millions of drug addicts suffer, but the rest of the society is doing better. 

5.2. The aligned AGI stops all wars by causing immense pain to any human who attempts to harm another human. Thousands of fanatics die of the pain. The total suffering increases, as humans often do wish harm to others. But the resulting society becomes more peaceful.

5.3. The aligned AGI decides that resurrecting a long-dead human by technological means is as ethical as saving a human life. But the process of resurrection requires creating trillions of digital minds, many of which are suffering, and the process may take millions of years. This massively increases the total amount of suffering in the universe, an S-risk scenario. Yet it saves billions of lives.

6. "Killing" humans

6.1. Currently, more than 100k people die each day, from all sorts of causes, including self-harm. To save every single human life, the aligned AGI may make a decision to mind-upload all humans, even whose who are against it. For an external observer, this may look like an omnicide, especially if the procedure requires destructive scans.

6.2. A variant scenario: unable to find a solution that prevents humans from killing and harming themselves, the aligned AGI puts all humans into cryo sleep, until a solution is devised.

7. Actually killing humans, to save more

7.1. The new cold war intensifies. The aligned AGI, after a deep superhuman analysis of the situation, concludes that nuking Russia is the only realistic way to stop the impending nuclear obliteration of humanity. The AGI nukes Russia, killing tens of millions. The "Skynet" decision is met with almost universal criticism from humans.


(i don't endorse many of the proposed solutions).

What are some other such scenarios? What are the common properties of them?

New Answer
New Comment

2 Answers sorted by

Richard_Kennaway

71

The AI, for its own inscrutable reasons, seizes upon the sort of idea that you have to be really smart to be stupid enough to take seriously, and imposes it on everyone.

I think all the scenarios above are instances of this.

avturchin

60

AI finds that the real problems will arise 10 billions years from now and the only way to mitigate them is to start space exploration as soon as possible. So it disassembles the Earth and Sun, and preserve only some data about humans, enough to restart human civilization later, may be as small as million books and DNA. 

4 comments, sorted by Click to highlight new comments since:

Um, most of those don't sound very "aligned". I think perhaps a very idiosyncratic definition of that word is in play here...

My headcanon is that there are two levels of alignment :

  1. Technical alignment : you get an AI that does what you ask it to do, without any shenanigans (a bit more precisely : without any short-term/medium-term side-effect that, should you know that side-effect beforehand, would cause you to refuse to do the thing in the first place). Typical misalignment at this level : hidden complexity of wishes (or, you know, no alignment at all, like clippy)
  2. Comprehensive alignment : you get an AI that does what the CEV-you wants. Typical misalignment : just ask a technically-aligned AI some heavily social-desirability-biased outcome, solve for equilibrium, get close to 0 value remaining in the universe.

But yeah, I don’t think that distinction has got enough discussion.

(there’s also a third level, where CEV-you wishes also goes to essentially 0 value for current-you, but let’s not get there)

An aligned AGI created by Taliban may behave very differently from an aligned AGI created by socialites of Berkeley, California.

Moreover, a sufficiently advanced aligned AGI may decide that even Berkeley socialites are wrong about a lot of things, if they actually want to help humanity.

Well, OK, but you also said "actually helps humanity", which assumes some kind of outside view. And you used "aligned" without specifying any particular one of the conflicting visions of "alignment" that are out there.

I absolutely agree that "aligned with whom" is a huge issue. It's one of the things that really bugs me about the word.

I do also agree that there are going to be irreconcilliable differences, and that, barring mind surgery to change their opinions, many people will be unhappy with whatever happens. That applies no matter what an AI does, and in fact no matter what anybody who's "in charge" does. It applies even if nobody is in charge. But if somebody is in charge, it's guaranteed that a lot of people will be very angry at that somebody. Sometimes all you can change is who is unhappy.

For example, a whole lot of Christians, Muslims, and possibly others believe that everybody who doesn't wholeheartedly accept their religion is not only wrong, but also going to suffer in hell for eternity. Those religions are mutually contradictory at their cores. And a probably smaller but still large number of athiests believe that all religion is mindrot that intrinsically reduces the human dignity of anybody who accepts it.

You can't solve that, no matter how smart you are. Favor one view and the other view loses. Favor none, and the other views say that a bunch of people are seriously harmed, even if it's voluntary. It doesn't even matter how you favor a view. Gentle persuasion is still a problem. OK, technically you can avoid people being mad about it after the fact by extreme mind surgery, but you can't reconcile their original values. You can prevent violent conflict by sheer force, but you can't remove the underlying issue.

Still, a lot of the approaches you describe are are pretty ham-handed even if you agree with the underlying values. Some of the desired outcomes you list even sound to me like good ideas... but you ought to be able to work toward those goals, even achieve them, without doing it in a way that pisses off the maximum possible number of people. So I guess I'm reacting to the extreme framing and the extreme measures. I don't think the Taliban actively want people to be mad.

[Edited unusually heavily after posting because apparently I can't produce coherent, low-typo text in the morning]