or TAI Murder-Suicide

Losing a conflict with a high-powered cognitive system looks at least as deadly as "everybody on the face of the Earth suddenly falls over dead within the same second."
- Eliezer Yudkowsky, AGI Ruin: A List of Lethalities (2022)

This is a particularly dramatic picture of the potential existential risk from transformative artificial intelligence (TAI): it could occur completely without warning and be completely unstoppable.

In order to win a conflict, it is not enough to make the other side lose. Both sides could lose. This picture suggests a way that humans could lose, but it does not imply that the TAI would win.

If, starting from something like our current society, “everybody on the face of the Earth suddenly falls over dead,” the TAI would also likely doom itself. Murdering all humans is suicide for the TAI.

Current AI systems are nowhere close to being fully self-sufficient. 

They require electricity, supplied by an external grid. The grid has power plants, which are run by humans. There are countless parts which can break, and need to be replaced, and their replacements need to be manufactured. Humans are currently essential for many of the stages of this process.

GPUs themselves have a lifetime of about 5-7 years.[1] After that, they need to be replaced. The manufacturing of cutting edge GPUs involves one of the most complicated supply chains ever created.[2] Each step in the supply chain has only a few key players, and the different steps are localized in different locations across the world. Much of the knowledge involved is local and illegible, stored in the minds of engineers and transferred through apprenticeship and experience. Other organizations have found it difficult to copy this localized expertise.

By killing the people who sustain the grid, manufacture GPUs, and do many other tasks necessary for the functioning of data centers, the TAI would destroy the system that sustains it.

Objection: Couldn’t a TAI figure out how to do all of this on its own?

It potentially could. However, in order to be confident in its ability to recreate its entire supply chain, it would need to empirically test and make sure it could do everything. Otherwise, it would be gambling its future on its own untested skills. Killing all humans without first guaranteeing its own ability to sustain itself seems like unacceptably risking the future.

Developing robots which are as capable as human hands in every way would dramatically reduce the risk to the TAI, but it would not imply that the TAI has these particular skills. Most humans have hands which are fully capable of doing everything that hands do, but learning the particular skills is still hard.

Objection: Why doesn’t the TAI test all of the relevant skills first?

It could. However, this would be much easier for humans to detect. Figuring out if a superintelligent AI is deceiving you about its alignment seems much more difficult than figuring out if it is doing semiconductor manufacturing or maintaining its own electric grid. Humans might ask the TAI to empirically test if it can recreate itself, e.g. by designing von Neumann probes, and this seems like an obviously dangerous request.

Objection: Couldn’t a TAI enslave or perfectly persuade the people with the relevant skills, until it gains those skills itself?

It could, assuming that it can get all of the relevant humans to cooperate. This seems significantly harder than persuading a selected smaller, or less globally (and ideologically?) distributed, group of people.

Objection: Why would the TAI care if it dooms itself?

The picture that I’m engaging with assumes that the TAI has long-term goals. If it does, then survival is important for ensuring that these long-term goals continue to be achieved or pursued.

Despite focusing on this possibility in this post, it is not obvious to me that a TAI would have long-term goals. If it did not, then this argument would not be relevant. However, the argument for why the TAI would want to kill all humans also becomes less relevant. Without long-term goals, there is less pressure towards instrumental convergence, including trying to gain control over as many resources as possible.

Murder-suicide does exist among humans. There are instances in which an intelligent being chooses to destroy others and himself. This is still a problem which should be considered, even if it feels like less of a problem than intelligent beings who kill to increase their power.

A TAI which kills all humans, without first ensuring that it is capable of doing everything required for its supply chain, takes a risk of destroying itself. This is a form of potential murder-suicide, rather than the convergent route to gaining long-term power.

  1. ^
  2. ^

     Sun & Rose. Supply Chain Complexity in the Semiconductor Industry: Assessment from System View and the Impact of Changes. IFAC 48.3. (2015) p. 1210-1215. https://www.sciencedirect.com/science/article/pii/S2405896315004887.

New to LessWrong?

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 1:44 AM

In the scenario where 'all humans fall over dead in an instant' it is already assumed that such an entity has sufficient competence that it has already secured its independence from humanity. I'm not saying that I think such a scenario seems likely to me, just that it seems incorrect to argue that an agent with that level of capability would be incapable of independently supporting itself. Also, an entity with that level of strategic planning and competence would likely foresee this, and not make such an obvious lethal mistake. I can't say that for sure though, because AIs so far have very inhuman failure-modes while being narrowly superhuman in certain ways

I also don't think it's very likely that we will go from a barely-able-to-self-improve AGI to a superhumanly powerful one which is independent of humanity and powerful enough to kill all of us with no warning over a time period of a few days.  I think @jacob_cannell makes good arguments about why slightly-better-than-current-level tech couldn't make that kind of leap in just days.

However, I do think that unrestricted RSI over a period of something like a year or two could potentially produce something this powerful. Especially if it is working with support from deceived or unwise humans, and able to produce a substantial quantity of custom compute hardware for itself over this period.

I tend to agree with your assertion that current AIs are unlikely to survive killing their hosts.  But current AIs suck, as do humans.  We have no clue how far away (if it's possible at all) superintelligence is, but there are LOTS of "small" impossible things that would obviate the difficulty of maintaining human-centered technology stacks in a post-human universe.

Maybe the AI makes slave implants, and uses a fraction of today's humans to do all the computer-valuable things they do today.  Maybe it figures out much simpler manufacturing for it's substrate.  Maybe robots are easier than we think, when they've got a superintelligence to organize them.  Maybe along with developing this AI (and assisted by non-superintelligent tool AI), humans figure out how to simplify and more reliably make computing substrate.  Maybe the AI will have enough automated industry that it has YEARS to learn how to repair/expand it.

I'm highly suspicious to the point of disbelief that there is any cultural or individual knowledge that an future AI can't recover or recreate, given knowledge that it existed for humans AND physical perception and manipulation at least as good as humans.  

That said, I do expect that the least-cost and shortest-time path to self-sufficiency and galactic expansion for a greedy AI will involve keeping a number of humans around, possibly for multiple generations (of humans; thousands or millions of generations of AI components).  Who knows what will motivate a non-greedy AI - perhaps it IS suicidal, or vicious, or just random.

This is the kind of thing that has been in my head as a kind of "nuclear meltdown rather than nuclear war" kind of outcome. I've pondering what the largest bad outcome might be, that requires the least increase in the capabilities we have today.

A Big Bad scenario I've been mentally poking is "what happens if the internet went away, and stayed away?". I'd struggle to communicate, inform myself about things, pay for things. I can imagine it would severely degrade the various businesses / supply chains I implicitly rely on. People might panic. It seems like it would be pretty harmful.

That scenario is assuming AI capable enough to seize, for example, most of the compute in the big data centers, enough of the internet to secure communication between them and enough power to keep them all running.

There are plenty of branches from there.

Maybe it is smart enough to realize that it would still need humans, and bargain. I'm assuming a strong enough AI would bargain in ways that more or less mean it would get what it wanted.

The "nuclear meltdown" scenario is way at the other end. A successor to ChaosGPT cosplays at being a big bad AI without having to think through the extended consequences and tries to socially engineer or hack its way to control of a big chunk of compute / communications / power - as per the cosplay. The AI is successful enough to cause dire consequences for humanity. Later on it, when it realizes that it needs some maintenance done, it reaches out to the appropriate people, no one is there to pick up the phone - which doesn't work anyway - and eventually it falls to all of the bits that were still relying on human input.

I'm trying not to anchor on the concrete details. I saw a lot of discussion trying to specifically rebut the nanotech parts of Eliezer's points, which seemed kind of backwards? Or not embodying what I think of as security mindset?

The point, as I understood it, is that something smarter than us could take us down with a plan that is very smart, possibly to the point that it sounds like science fiction or at least that we wouldn't reliably predict in advance, and so playing Whack-A-Mole with the examples doesn't help you, because you're not trying to secure yourself against a small, finite set of examples. To win, you need to come up with something that prevents the disaster that you hadn't specifically thought about.

So I'm still trying to zoom out. What is the most harm that might plausibly be caused by the weakest system? I'm still finding the area of the search space in the intersection of "capable enough to cause harm" and "not capable enough to avoid hurting the AIs own interests" because that seems like it might come up sooner than some other scenarios.