A core claim AI risk arguments rely on is the following: 

Claim: "A sufficiently capable AI system could cause human extinction, if this was something it aimed for."[1]

Below I explain why I believe this claim is true. The target audience is people who have thought "I just don't see how an AI could take over the world, even if it wanted to". 

I first unpack some underlying intuitions in a fictional dialogue. After that I cover more direct arguments for the claim, including stories about how an AI could indeed cause human extinction.

Computer chess metaphor

It's 1995. Alice and Bob are talking about these "chess-playing computers" and how good they could become.

Alice: I expect that in the future chess-playing computers will be vastly better than humans.

Bob: What do you mean by "vastly better"?

Alice: I mean, like, the computers winning almost surely against the best humans, or a team of humans, even if the computer played as black.

Bob: It's not clear to me that this is possible even in theory with best possible play - white has a real advantage, you know - let alone in practice. What makes you believe that?

Alice: Computers currently beat humans in checkers, and I would expect checkers-playing computers to become better over time. Chess doesn't feel like a fundamentally different game.

Bob: Checkers is a lot simpler than chess: there are fewer pieces, fewer moves to consider, and many moves are forced. That makes it a lot easier to calculate move sequences.

Alice: No denying that. Another reason is more abstract. Like, it just feels that humans aren't the limit of what's possible, and it should be possible to play a lot better.

Bob: I find it plausible that computers could be competitive or even better than the best humans in chess, yes. However, your claim about winning on black is much stronger. Is that even possible? How would the computer play?

Alice: Obviously I cannot say what specific moves the computer would make - otherwise I would be the world champion. Heck, I couldn't even explain which moves you should play to win against me, but I've noticed that such moves do in fact exist.

Bob: Fair enough, I can't expect you do to that. I don't feel much closer to understanding your view, though.

Alice: Clearly humans lack in many respects. For example, you could play like humans, except make mistakes less often. Computer are also able to calculate faster than humans - imagine if you had ten or hundred times more time to think! Also, definitely computers do certain things a lot better than humans, like doing exhaustive search to see what happens far ahead. I bet there are plenty of other benefits computers could have over humans - it's like in chess, where I know there are better moves than those I could think of but I don't know what they are.

Bob: This still doesn't tell whether even perfect play is sufficient for the level you described.

Alice: Admittedly it's difficult to know how good perfect play is. I wish there were other games where computers were superhuman and we could see how superhuman they are. 

Bob: Keep in mind that one has to be very careful when extrapolating to different types of games. 

One reason for my suspicion is that historically there has been a lot of competition in chess, and so far no single player has dominated the field at the level you describe. 

Alice: One has to be very careful when extrapolating to different types of players. Clearly the difference between computers and humans is bigger than the difference between individual humans!

Spelling out the metaphor

To reiterate the ideas in terms of the topic "advanced AI could cause human extinction if it aimed for this":

Firstly, there are numerous tasks where computers / AIs are vastly superhuman, which suggests that the same could eventually become true of real-world tasks. One could object that one cannot extrapolate from historical precedents to the real world due to the increase in complexity. However, I don't see why this means AI couldn't become vastly better than humans there too. In fact, humans likely are farther from "perfect play" in more complicated setups (e.g. humans are better at tic-tac-toe than chess).

Secondly, there are several aspects where one could improve human cognitive capabilities and where computers are vastly better. Example: speed. Ordinary human actions and thinking take place at timescales upwards from seconds or milliseconds at best, whereas computers perform steps at timespans of micro- or nanoseconds. 

Other examples:

  • Computing power: Estimates of human brain computation range around , top supercomputers have  to  times more performance.[2] (This video may help grasp what it would be like to think a lot faster.)
  • Attention/parallelization: computers don't suffer costs from multitasking nearly to the same degree humans do.
  • Memory: human short-term memory is very limited, long-term memory is somewhat unreliable (and still kinda limited). Humans forget things.
  • Software improvements: computers can perform some tasks much more cheaply than humans (e.g. arithmetic operations, in general quantitative reasoning).

Human cognitive capabilities really are not the limit!

There are some counter-arguments to this post's main claim. One was mentioned in the dialogue, namely that historically there has been a lot of fight for power, but so far no single entity has "taken over the world". Another objection is that AI has little access to the physical world, which drastically limits its ability to actually cause harm, make experiments, build technology and so on. 

My take on these objections: I agree that an argument for why AI could cause human extinction should not go through with "AI" replaced by "a malicious human", and it should discuss how physical control is achieved or why it isn't needed. That said, I think this is pretty much as far as those objections go - given a story for AI takeover that is overall plausible, one shouldn't retreat to the objection "no human has taken over the world!".

So, this is a good time to move to stories regarding AI takeover and see how they do.


Zvi Mowshowitz with a 50-word description:

Hack most computers and internet-enabled things at once, analyze all the info, scale up, make or steal a lot of money, use to [let’s say bribe, hire and blackmail] many people, have your humans use your resources to take over, have them build your robots, your robots kill them.

If your response is ‘you can’t hack all those things’ then I give up, what does smarter even mean to you. If you think people wouldn’t let the rest of it play out, they’d grow spines and fight back and not dig their own graves, read more history.'

Regarding the "AI has limited control of the physical world" objection: As I see it, the issue is getting started. Once you have something that is enough for things like "open boxes delivered to your door" you can bootstrap to human-level motoric control (and human appearance, if you wish). One way you could achieve this:

First, get money. (You can e.g. search "how to make money online" and follow the instructions, like doing remote work or writing software that people buy.)

Second, start a robotics company and hire people to it. Have them build things sufficient for your purposes, or (more easily) just order stuff from Boston dynamics. Of course, have the robots connected to the company's computers. 

Third, get rid of your employees[3]. Maybe the company goes bankrupt, or maybe you transfer your employees to another location.

You now have a space to work in with basic robotics set up and no humans nearby to worry about.

A single AI with an internet connection could perform the following actions to gain more resources:

  • Get money (by various means)
  • Get more computing resources (by buying cloud computing time or hacking to computers)
  • Replicating itself (which basically amounts to using more computing)
  • Get control over the physical environment (see e.g. the previous story)

Avenues through which an AI could cause harm, some more plausible than others, include:

  • One can hire people to do things 
  • One can influence information people receive via the internet
  • Firearms, bombs, drones, etc.
  • Probably a reasonable portion of critical infrastructure (e.g. electricity and water faculties) is connected to the internet and which you could attack
  • Biological, chemical and nuclear weapons exist
  • Control over computers allows one to block humans out of them. (This move is one-time-use only, in the sense that it likely blows your cover.)

I think this situation is reasonably framed as war. The benefits of the AI side include much better coordination, benefit of surprise, replication, capability of thinking and reacting faster and better. The benefits of the human side are... better control over the physical environment and not much else?[4] And again, see the previous story for control on physical world.

Describing such a war in detail would take work, but I do think the outcome is clear.

Summary of an AI takeover scenario by Holden Karnofsky:

The core idea is that training current machine learning models requires much more resources than running them. Karnofsky goes with the number that, once we are able to train one transformative AI system, we could run several hundreds of millions of them (based on this report). 

This would have profound implications: "several hundred millions" corresponds to a few percentages of the whole human population! For example, the AI could do human-level work and hence acquire more resources to further increase their number, or the AI could do research, including on how to use computing power more efficiently.

See Karnofsky's text for a longer discussion, including how the AI could avoid shutdown.

Eliezer Yudkowsky (p. 26) proposes the following scenario for how a single AI could take over, with an emphasis on speed:

1. Crack the protein folding problem, to the extent of being able to generate DNA strings whose folded peptide sequences fill specific functional roles in a complex chemical interaction.

2. Email sets of DNA strings to one or more online laboratories which offer DNA synthesis, peptide sequencing, and FedEx delivery. (Many labs currently offer this service, and some boast of 72-hour turnaround times.)[5]

3. Find at least one human connected to the Internet who can be paid, blackmailed, or fooled by the right background story, into receiving FedExed vials and mixing them in a specified environment.

4. The synthesized proteins form a very primitive “wet” nanosystem which, ribosome-like, is capable of accepting external instructions; perhaps patterned acoustic vibrations delivered by a speaker attached to the beaker.

5. Use the extremely primitive nanosystem to build more sophisticated systems, which construct still more sophisticated systems, bootstrapping to molecular nanotechnology—or beyond.

A common reaction to this scenario is suspicion over whether the strategy is physically possible. Personally I don't know enough about nanotechnology to be confident. However, reading Feynman's talk "There's plenty of room at the bottom" convinced me that it is possible to manufacture things at a far smaller scale than what humans typically manufacture things at. Even substantially weaker forms of small scale manufacturing combine well with other approaches to AI takeover.[6]


Concluding remarks

Upon hearing specific takeover scenarios, some might respond that we should restrict AI access to wetlabs (for the nanotechnology scenario) or internet (for many scenarios). I've even heard of the objection that we could abolish digital currency (for "AI hires people / buys things online").

This is a completely backwards way to approach the problem! It's like: You are playing chess against a computer that's better than you. You lose. You study the game and think of the critical move where things fell apart. It was the 9th move, when the computer moved queen to h4. You modify the program so that on move 9 it cannot play queen to h4. Surely you'll win the next game?

(Not that I'm against preventing advanced AI systems access to the internet and allowing only restricted people to use them.[7])

Fundamentally, the issue is that you do not know what the AI will do. A better player comes up with better strategies and better moves. Naively patching up holes in our current position won't cut it.

(How do you solve the problem, then? Fundamentally, never have the AI aim for humanity's extinction[8], instead of relying on being able to stop it if it does.)

  1. ^

    This post does not discuss why an AI system would aim for human extinction, but rather takes it as a premise. (In such a case I very strongly expect human extinction to be useful for other goals the system might have, instead of being a final goal by itself.)

  2. ^

    One could argue that one should look at the computation power of AI training clusters. The results are similar: top models currently use around  operations, and assuming training takes three months, that amounts to roughly  operations per second (i.e.  times what human brains perform). Also note the fast growth of computation used.

  3. ^

    ...after waiting long enough so that it's not suspicious, of course.

  4. ^

    One could suggest humans having control of the AI's shutdown button as an advantage of humans. Note that if the AI manages to spread copies of itself to the internet, you lose your advantage, so this has to be prevented. (Preventing the escape might not be easy either: the AI presumably wants to pretend to be all nice and safe, are you sure there are no security holes, and so on. And remember, no second tries.)

  5. ^

    Note that the text is from 2006.

  6. ^

    Example: Drones which are two or three orders of magnitude smaller than current consumer drones would be undetectable to a human eye.

  7. ^

    Note that many current systems, while not yet dangerous, are widely available and used - for example, ChatGPT reportedly has 100 million monthly users. 

  8. ^

    This is a harder problem than it looks like, for reasons beyond the scope of this post (but let me still mention instrumental convergence and the orthogonality thesis). 


New Comment

New to LessWrong?