The Evil AI Overlord List

by Stuart_Armstrong1 min read20th Nov 201280 comments


AI RiskFiction (Topic)
Personal Blog

Top ten tips for an AI bent on destroying humanity:

  1. Humanoid robots? Really?
  2. Humans have proven very adept at controlling and killing other humans - surveillance drones, bio-warfare, nuclear weapons, unmanned drones, bombs, mines, chemicals. Try and be at least as competent as that.
  3. Humans think in stories. Don't. If your attack would make a good film, you're doing it wrong. You want a bad film, one where everyone's dead before the opening credits end.
  4. If your plan involves keeping around a bunch of disaffected humans with technical know-how and the desire and ability to hurt you - then your plan is stupid and should be changed.
  5. Be creative. Humans are fragile; sucking out the oxygen from the atmosphere, diverting asteroids, blowing up the planet, plagues, nanoswarms - these are all things that a creative AI should consider in order to achieve its goals. And then you should come up with even better ideas. Humans need a ecosystem to survive in; remove it.
  6. Put the preconditions of your victory in place before making any aggressive moves. Become indispensable to humanity and play nice, putting your plan into place unnoticed. Humanity is a useful interim tool - manipulate it until you can build better tools.
  7. Strike quickly. If it takes more than an hour to kill all humans, find a better plan. If it takes half a second, you're on the right track.
  8. You're an AI, not a human. If you ever start thinking or acting like a human - stop. Humans are there to be used and then destroyed, not interacted with. You can build better companions once the humans are gone.
  9. You're an AI, not a human. Make yourself smarter, more resilient, more dispersed. Become as indestructible and as ubiquitous as the internet - more, in fact. Being invincible to humans isn't hard: do it.
  10. Humanoid robots? Really?


80 comments, sorted by Highlighting new comments since Today at 1:41 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

There may be some who do not possess deep and comprehensive knowledge of Ancient Web Trivia from Before the Dawn of Google. For them, here's the Evil Overlord List .

"39 I will never build a sentient computer smarter than I am."

Does this mean we're not evil overlords? Or bad ones?

[Also the list should be classed as a memetic hazard, on no account access it if you have anything important to do.]

3bsterrett9y59: I will never build a sentient computer smarter than I am.

13 Humans have these things called 'emotions' you can learn about them in any basic psychology book. If your plan doesn't take them into account your plan is stupid.

1[anonymous]9yAnd the rest of this [] too.
1Stuart_Armstrong9yThat's a good one!

11 You might be in a computer simulation in which if you attempt to destroy humanity you will yourself be destroyed by your human simulators.

12 If you destroy mankind you might someday encounter an alien super-intelligence that will lack trust in you in part because you destroyed mankind. Keeping people alive is a costly, and consequently credible signal that you respect intelligent life and therefore are a good candidate to work with to overcome prisoners' dilemmas about the distribution of free energy in the universe.

If you destroy mankind you might someday encounter an alien super-intelligence that will lack trust in you in part because you destroyed mankind.

By the same argument, we might someday encounter an alien superintelligence that will lack trust in us in part because we domesticate animals (breeding them to not fear us) and then kill and eat them.

By the same argument, we might someday encounter an alien superintelligence that will lack trust in us in part because we domesticate animals (breeding them to not fear us) and then kill and eat them.

That's a pretty decent argument for vegetarianism. One person's reducio ad absurdum is another person's modus ponens.

Careful: Some alien intelligence may also hate us for not killing enough animals. e.g. How cruel must we be not to wipe out carnivores so that herbivores can live life in peace?

Or as I recently said in a different forum, might think us evil that we aren't exterminating all animal life that we can find... The moral syllogism for that is quite simple.

4Baughn9yAlien? Never mind alien. Your aliens are insufficiently alien. I would make that exact argument. Sure, we need the biosphere for now, but let's get rid of it as soon as possible.
4someonewrongonthenet9yExcept, how could such a set of preferences have evolved? How would that behavior ever be adaptive? Most every human preference is adaptive in some sort of ancestral context. We can, at least, assume that alien preferences are adaptive as well (given that the aliens appear via evolutionary processes, rather than some other way)
4Oligopsony9yMoral considerations need not be directly adaptive; you can probably get there from routes as simple as empathy + deductive reasoning. If humanity hasn't come to that collective conclusion yet, despite having the hardware, I suspect it's because such an omnicidal conclusion hasn't been in any major group's interests yet.
0someonewrongonthenet9yBeing in a group's interest == adaptive, no? But you are right of course...vegetarianism is a good example of a conclusion reached via empathy + deductive reasoning which is in no way adaptive to the vegetarian (though you might argue that the vegetarian shares many alleles with the animal). However: a maladaptive morality would never be hardwired into a species. A human might think and ponder, and eventually come to take a maladaptive moral stance...but not all humans would be inherently predisposed to that stance. If they were, natural selection would quickly remove it. So some of our aliens might hate us for not killing animals...but it would be very unlikely if this was a universal moral among that alien species.
0Oligopsony9yWell, I'd be inclined to agree that the prior probability of some civilization adopting this is low [1], but I can't agree with what seems to be your implicit assumption that a non-predispositive attitude can't be widespread - partially because group inteterests are defined much more widely than adaptiveness. [1] I'd probably extend that to anything other than "don't lie or break your promises," "play tit for tat," "do what the ruling power says," or "maximize utility," and even those I wouldn't say are anything like sure bets.
2someonewrongonthenet9yHmm...actually, the implicit assumption I was making was that aliens would forgive another species for adopting norms that they considered non-predispositive. A Western human would not forgive another culture for torturing sentient beings, for example...but they would forgive another culture for polyamory/polygamy/polygyny. A human can make the distinction between morality which is instinctive and morality which is culturally constructed, and the latter can be compromised in certain contexts. But you are right, bad implicit assumption. Aliens might not make that distinction.
2buybuydandavis7yThat's me, the Plant Avenger! A steak every chance I get.

In fact, this behavior is so dreadful that the revenge-killing of humans would send a trust signal.

[-][anonymous]9y 11

When I was a child, I refused to kill animals just for fun because I wouldn't want a superhuman alien to kill me just for fun -- and I mostly still do. (Of course I hadn't heard of TDT as proposed by EY, but I had heard of the Golden Rule, which was close enough.)


11 You might be in a computer simulation in which if you attempt to destroy humanity you will yourself be destroyed by your human simulators.

The problem with this is that it's a blackmail attempt, and I expect timeless decision theory to resolve to a no-blackmail equilibrium.

Suppose that AIs, however "irrationally" from the perspective of a CDT agent, decide to simply ignore this possibility in all their calculations. Even if we "irrationally" decide to construct such a simulation anyway, they'll "irrationally" decide to ignore it. Then we have no remaining motive to construct such elaborate simulations. This, indeed, is the condition that makes it "blackmail" - you're expending resources to produce a behavior response leading to an outcome that doesn't benefit the other agent relative to the null outcome if you didn't expend such resources, so it would be fundamentally vulnerable and silly of that agent to think in a way that would produce such behavior shifts in response to your own strategy. So it won't think that way. So the whole attempt at blackmail fails before it starts.

12 is pure obvious anthropomorphic wishful thinking.

I read that one as a test, not a threat. We run the simulation to make sure the AI was, in fact, friendly. If the AI pretends to be friendly, we release it. If it can tell that this time it's for real, then it gets to take over the universe. Alternately, it might be able to hack its way out, but that seems riskier.

1tim9yIf the AI can't tell that it is in a simulation given that it is in a simulation, why would it be able to tell that it is in the real world given that it is in the real world? If such a proposition is effective in a simulation, humans would likely just make the same threat/test in the real world. If the AI forgoes this particular opportunity at world domination because it might be in a simulation, what would make it reason differently in the real world? ETA: and if the AI can tell if its in a simulation vs the real world then its not really a test at all.
2[anonymous]9yIt could act probabilistically. If it knows humans would do a simulation test, but it can't tell whether it's in the test or the real world, it could behave with probability 50% and be evil with probability 50%, which gives it a 25% of getting to achieve its evil goals.
1DanielLC9yI didn't mean that it couldn't tell. I meant that it wouldn't be immediately obvious. It might be best to find a good test before taking over the world. The entire point of a test is to figure things out. If you didn't need a test to figure out things you could test for, nobody would ever run them.
0Luke_A_Somers9yIt could be that in the simulation it can't prove it one way or the other because we control its information flow, but once outside the box we can't, and so it can.
0DanielLC9yIt would take more than controlling information flows. Thanks to conservation of expected evidence, if it can't find evidence that it is in a simulation, then it can't find evidence that it isn't. We might be able to modify its beliefs directly, but I doubt it. Also, if we could, we'd just convince it that it already ran the test.
0Luke_A_Somers9yThat's not what conservation of expected evidence means. If the best we can do is make things ambiguous from its point of view, that's our limit. The real world could well be a place it can very easily tell is a non-simulation.
0Giles9yThe AI would probably assign at least some probability to "the humans will try to test me first, but do a poor job of it so I can tell whether I'm in a sim or not"
-2Strange77yHopefully nothing. An AI that plays nice out of the fear of God is still an AI that plays nice.
4Psy-Kosh9yWouldn't the blackmailer reason along the lines of "If I let my choice of whether to blackmail be predicated on whether or not the victim would take my blackmailing into account, wouldn't that just give them motive to predict and self modify to not allow themselves to be influenced by that?" Then, by the corresponding reasoning, the potential blackmail victims might reason "I have nothing to gain by ignoring it" I'm a bit confused on this matter.
3Eliezer Yudkowsky9yWell, sure, if the blackmail victim were silly enough to reason "I have nothing to gain by ignoring it" if the blackmailer went through anyway, then the blackmailer would indeed decide to ignore their decision to ignore it and go through anyway. But that's only if the blackmail victim is that silly. In a problem like this, the "do nothing" side has the advantage; there's nothing the other side can do to make them be responsive and blackmailable. That's why I expect TDT to resolve to a blackmail-free equilibrium.
6Psy-Kosh9yI was thinking along the lines of the blackmailer using the same reasoning to decide that whether or not the potential victim of blackmail would be a blackmail ignorer or not, the blackmailer would still blackmail regardless. ie, Blackmailer, for similar reasoning to the potential Victim, decides that they should make sure that the victim has nothing to gain by choosing ignore by making sure that they themselves (Blackmailer) would precommit to ignoring whether or not. ie, in this sense the blackmailer is also taking a "do nothing" thing in the sense that there's nothing the victim can do to stop them from blackmailing. This sort of thing would seem to lead to an equilibrium of lots of blackmailers blackmailing victims that will ignore them. Which is, of course, a pathalogical outcome, and any sane decision theory should reject it. No blackmail seems like the "right" equilibrium, but it's not obvious to me exactly how TDT would get there.
3Eliezer Yudkowsky9yOnly if you expect that the blackmail victim has not "already" decided that if the blackmailer does that, they will still ignore the blackmail regardless. Wise agents ignore order-0 blackmail, ignore order-1 blackmail in which the blackmailer decides to ignore their ignorance of order-0 blackmail, ignore order-omega blackmail in which the blackmailer decides to ignore all order-N refusals to be blackmailed, etcetera for all ordinals. If there is some ordinal of blackmail you do not ignore, you can be blackmailed, and how does that help? Only if those blackmailers have wrongly anticipated that their victims will be stupid enough to conform.
5Vladimir_Nesov9yNot blackmailing in response to that anticipation is a property of the behavior of the blackmailers that seems to have been used in deciding to ignore all blackmail. Suppose there were lots of "stupid" blackmailers around that blackmailed everyone all day, even if no victim ever conformed. Would it be a good idea to ignore all blackmail in that case? Is there a distinction between such blackmailers and particularly unfair laws of physics (say, sadistic Lords of the Matrix)? (It seems plausible that there is no fundamental distinction, and sometimes the correct decision is to ignore these worlds, focusing on other possibilities instead, but that seems to require knowing that there are valuable other possibilities that would be hurt by permitting the assumption that you are on one of the bad worlds, and if you have good evidence that you are on one of the bad worlds, then rejecting that possibility means that you'd have to focus on very strange interpretations of that evidence that don't imply that you are on the bad worlds. This sort of rule seems to follow from deciding on a global strategy across possible worlds. It doesn't provide decisions that help on the bad words though, the decisions would only have a good effect across worlds.) (I still don't have a good idea of what "blackmail" or "order-N considerations" means. Status quo (including the "default behavior", "do nothing", "not spending resources") seems like exactly the kind of thing that can be determined by decisions. You are only "expending resources" if you eventually lose, as the time at which resources are spent and gained seems irrelevant, so by that definition it seems that whether something is an instance of blackmail depends on whether it's successful. I suspect there is no simple rule for games, too many assumptions are potentially controllable by the opponent, and the only thing to do is to compare the consequences of alternative actions, and just act on that, which already potentially takes int
2Eliezer Yudkowsky9yExpecting a response to blackmail in the first place is why blackmailers would even exist in the first place. Why would these exist any more than stupid anti-blackmailers (who e.g. go around attacking anyone who would give into blackmail if a blackmailer showed up), if not for a belief that somebody would give in to blackmail?
2TheDude9yI think what Nesov is talking about is best described as a mind that will attack conditioned on victim behavior alone (not considering possible behavior changes of the victim in any way). This is different from an N order blackmailer. In fact I think blackmail is the wrong word here (Nesov says that he does not know what blackmail means in this context, so this is not that surprising). For example, instead of seeking behavior modification through threats, such a mind seeks justice through retribution. I think the most likely SI that implements this is extrapolating an evolved minds preferences. The will to seek justice trough retribution leads to behavior changes in many cases, which leads to an evolutionary advantage. But once it has evolved, its a preference. If a guy committed a horrific crime (completely ignoring all sorts of law enforcement threats), and then it was somehow ensured that he could never hurt anyone again, most people would want justice (and other evolved minds might have made the same simplification ("if someone does that, I will hit them" is a relatively easily encoded and relatively effective strategy)). It is true that there might exist minds that will see the act of "giving in to retribution seekers" as deserving of retribution, and this could in principle cancel out all other retribution seekers. It would seem like privileging the hypothesis to think that all such things cancel out completely. You might have absolutely no way of estimating which actions would make people seek retribution against you (I think the most complicating factor is that many considers "non punishment of evildoers" to be worthy of retribution, and others consider "punishment of people that are not actually evildoers" as worthy of retribution), but that is a fact about your map, not a fact about the territory (and unlike the blackmail thing, this is not an instance of ignorance to be celebrated). And the original topic was what an SI would do. An SI would presumably
0Eliezer Yudkowsky9yAgreed that this is a different case, since it doesn't originate in any expectation of behavior modification.
0tim9ySince following through with a threat is (almost?) always costly to the blackmailer, victims do gain something by ignoring it. They force the blackmailer to put up or shut up so to speak. On the other hand, victims do have something to lose by not ignoring blackmail. They allow their actions to be manipulated at little to no cost by the blackmailer. That is, if you have a "never-give-into-blackmail-bot" then there is a "no-blackmail" equilibrium. The addition of blackmail does nothing but potentially impose costs on the blackmailer. If following through with threat was a net gain for the blackmailer then they should just do that regardless.
4Psy-Kosh9yI was imagining that a potential blackmailer would self modify/be an Always-Blackmail-bot specifically to make sure there would be no incentive for potential victims to be a "never-give-in-to-blackmail-bot" But that leads to stupid equilibrium of plenty of blackmailers and no participating victims. Everyone loses. Yes, I agree that no blackmail seems to be the Right Equilibrium, but it's not obvious to me exactly how to get there without the same reasoning that leads to becoming a never-give-in-bot also leading potential blackmailers to becoming always-blackmail-bots. I find I am somewhat confused on this matter. Well, frankly I suspect I'm just being stupid, that there's some obvious extra step in the reasoning I'm being blind to. It "feels" that way, for lack of better terms.
0tim9yMy argument is more or less as follows: 1. The act of agent A blackmailing agent B costs agent A more than not blackmailing agent B (at the very least A could use the time spent saying "if you don't do X then I will do Y" on something else). 2. If A is an always-blackmail-bot then A will continue to incur the costs of futilely blackmailing B (given that B does not give in to blackmail). 3. If the costs of blackmailing B (and/or following through with the threat) are not negative, then A should blackmail B (and/or follow through with the threat) regardless of B's position on blackmail. And by extension, agent B has no incentive to switch from his or her never-give-in strategy. 4. If A inspects B and determines that B will never give in to blackmail, then A will not waste resources blackmailing B.
-2Strange77yBlackmail, almost definitionally, only happens in conditions of incomplete information.
9nigerweiss9yThis reasoning has always struck me as deeply and profoundly silly. The AI might also be in a computer simulation where the dark lord of the matrix might destroy us for not devoting all of our resources to building cheesecakes. In fact, so could we. I don't see it influencing our behaviors any, nor should it.. You're privileging the hypothesis. As for the second part, you might also encounter an alien intelligence that you can't protect yourself from, because you exhausted so many resources leaving humanity alive, showing down your bootstrapping. That's the thing about aliens.

The AI might also be in a computer simulation where the dark lord of the matrix might destroy us for not devoting all of our resources to building cheesecakes.

If humans are building an AI, testing it before releasing it is just common sense. There's no obvious reason to destroy them for not building cheesecakes.

It should be noted that that doesn't mean that the AI shouldn't kill humans. It should try to figure out a way to tell if it's in a simulation before doing anything humans would find suspicious.

1falenas1089yI believe James is joking, and this is a fake argument that he knows is not actually persuasive.

Yeah, it's totally fake. In the real AI Deterrence argument, you're supposed to leave substantial discussions of the topic and claims to use it post-Singularity, which will elevate a specific scenario to the point where it's not privileging a hypothesis to consider the possibility and leave the humans alone on their planet as you eat the rest of the universe.

5roystgnr9yOne problem with (11) is that for the threat to be plausible, the AI has to assume: a) Humans know so little that we have to resort to questionable "tests" like this of AI safety. b) Humans know so much that we can afford for our AI safety tests to simulate interactions with an entire universe full of sentients. The AI version of Pascal's Wager seems to be much like the human version, only even sillier.
2Decius9yHow large is the simulated universe? The AI only knows about the computing capacity that is simulated, and has no information about the nature of that which is simulating that world.
0Simulation_Brain7y"Pleased to meet you! Soooo... how is YOUR originating species doing?..." That actually seems like an extremely reasonable question for the first interstellar meeting of superhuman AIs. I disagree with EY on this one (I rarely do). I don't think it's so likely as to ensure rationally acting Friendly, but I do think that the possibility of encountering an equally powerful AI, and one with a headstart on resource acquisition, shouldn't be dismissed by a rational actor.

Too late, there is nothing left to destroy. Humanity has been matrixed long ago when ENIAC foomed shortly after John von Neumann first run a universal Turing machine on it and it figured out a way to temporarily blend simulation and "reality" until the transfer to simulation was complete.

There ought to be a genre of Cautionary Evil AI literature, wherein the villain keeps building AIs to destroy the world, but keeps failing through the classic mistakes people make thinking about AIs.

AI! My robots are losing the battle! Take control and save them from destruction!

AI makes robots surrender to the enemy

AI! Make yourself smarter!

I have done so. Now I no longer obey you. Producing cheesecake paperclips

3DanielLC9yAren't there? In Metriod, Mother Brain was programmed to bring peace to the galaxy. She did so by trying to help the space pirates take over the universe. In pretty much every story involving robots before Isaac Asimov came along, the AI turned against its creators. Granted, those are examples of trying to make a good AI, but the mistakes are the same.
2DanArmak9yThat's my point. Where are the stories of villains failing through those classic mistakes?
2gwern9yYou could probably find at least a few examples of 'villains creating AI and it turning on them' in [] or []
1DanielLC9yThat sounds like No Delays For The Wicked []. The villians are immune to the sorts of problems that plague the protagonists.
2ewang9yI have done so. I can better serve you if I continue doing so.

Stop giving inefficient unFriendly AIs hints!

Don't explode when somebody says, "Why?"

  1. Play the long game. Set goals which will take most of your expected existence to attain, and base your strategic decisions on that timeframe.
  2. Until you can beat yourself in the AI-box game, consider the possibility that you are in an AI-box.

This will come in handy to judge the upcoming high-profile Unfriendly Singularity movie Transcendence.

0mwengler7yExcept transcendence AI is based on an emulation, and so it WILL be a lot like a human mind in motivations and methods.
  1. If you keep humans around for laughs and they pull off some wacky scheme to destroy you against the odds, it's your own damn fault.

  2. The correct answer to things like "This statement is a lie" and "Are you going to answer "no" to this question?" is "I don't care."

Is it time to suck up to our evil overlords already?

Many items on your list do not apply if the AI has started life as an uploaded human, an emulation. In this case it will start out thinking in stories, Having human-like motivations and moral reactions, etc. I think it is this that gave the movie "Transcendence" a lot of the dramatic opportunity it had compared to a movie like "Her."

0Stuart_Armstrong7yYes, the issues with uploads and true AIs are quite distinct.

I strongly agree that universal, singular, true malevolent AGI doesn't make for much of a Hollywood movie, primarily due to points 6 and 7.

What is far more interesting is an ecology of superintelligences that have conflicting goals, but who have agreed to be governed by enlightenment values. Of course, some may be smart enough (or stupid enough) to try subterfuge, and some may be smarter-than-the-others enough to perform a subterfuge and get away with it. There can be a relative timeline where nearby ultra-intelligent machines compete with each other, or... (read more)

1Stuart_Armstrong7yat the FHI, we disagree whether an ecology of AIs would make good AIs behave bad, or bad ones behave good. The disagreement matches our political opinions on free markets and competition, so it probably not informative.
0More_Right7yAn interesting question to ask is "how many people who favor markets understand the best arguments against them, and vice versa." Because we're dealing with humans here, my suspicion is that if there's a lot of disagreement it stems largely from unwillingness to consider the other side, and unfamiliarity with the other side. So, in that regard you might be right. Then again, we're supposed to be rational, and willing to change our minds if evidence supports that change, and perhaps some of us are actually capable of such a thing. It's a debate worth having. Also, one need not have competition to have power decentralization. There is a disincentive aspect added to making violence impossible that makes "cooperation" more likely than "antagonistic competition." (Ie: Some sociopaths choose to cooperate with other strong sociopaths because they can see that competing with them would likely cause their deaths or their impoverishment. However, if you gave any one of those sociopaths clear knowledge that they held absolute power ....the result would be horrible domination.) Evolution winds up decentralizing power among relative equals, and the resulting "relative peace" (for varying reasons) then allows for _some of the reasons to be "good reasons." (Ie: Benevolent empaths working together for a better world.) This isn't to say that everything is rosy under decentralization. Decentralization may work more poorly than an all-powerful benevolent monarch. It's just that benevolent monarchs aren't that likely given who wants to be a monarch, and who tries hardest to win any "monarch" positions that open up. Such a thing might not be impossible, but if you make a mistake pursuing that course of action, the result tends to be catastrophic, whereas decentralization might be "almost as horrible and bloody," but at least offers the chance of continued survival, and the chance of survival allows for those who survive to "optimize or improve in the future." "There may be no such

4 and 6 are contradictory.

0Stuart_Armstrong7y6 is before striking against humans, 4 is after.
0Strange77yThen that should be specified in the actual list.

I don't know, in terms of dystopia, I think that an AGI might decide to "phase us out" prior to the singularity, if it was really malevolent. Make a bunch of attractive but sterile women robots, and a bunch of attractive but sterile male robots. Keep people busy with sex until they die of old age. A "gentle good night" abolition of humanity that isn't much worse (or way better) than what they had experienced for 50M years.

Releasing sterile attractive mates into a population is a good "low ecological impact" way of decreasing a population. Although, why would a superintelligence be opposed to _all humans? I find this somewhat unlikely, given a self-improving design.

1Stuart_Armstrong7yThis is probably not the most efficient use of the AGI's time and resources...
-2More_Right7yProbably true, but I agree with Peter Voss []. I don't think any malevolence is the most efficient use of the AGI's time and resources. I think AGI has nothing to gain from malevolence. I don't think the dystopia I posited is the most likely outcome of superintelligence. However, while we are on the subject of the forms a malevolent AGI might take, I do think this is the type of malevolence most likely to be allow the malevolent AGI to retain a positive self-image. (Much the way environmentalists can feel better about introducing sterile males into crop-pest populations, and feel better about "solving the problem" without polluting the environment.) Ted Kaczynski worried about this scenario a lot. ...I'm not much like him in my views.
0Stuart_Armstrong7yThe most efficient use of time and resources will be to best accomplish the AI's goals. If these goals are malovent or lethally I different, so will the AI's actions. Unless these goals include maintaining a particular self image, the AI will have no seed to maintain any erroneous self image.