First post in sequence AI Defense in Depth: A Layman's Guide

An AI researcher

AI Risk for Cavemen

A parable:

A caveman is looking at a group of cavemen shamans performing a strange ritual that summons many strange, inscrutable, and increasingly useful daemons. Some of the shamans say they're going to eventually summon a daemon so powerful that it could solve all problems, now and forever. But they warn that if this same daemon has no heart, just like every other daemon that has ever been summoned, it might devour the whole world because of unfathomable daemon urges. Other shamans say that this cannot happen, that the ritual cannot bring a daemon so powerful into the world. Besides, some of these second group of shamans claim, even a daemon so powerful would not eat the world because it would necessarily care about cavemen, in spite of being a daemon, due to being so powerful.

The first group of shamans points out that power does not require a heart, and that nobody has any idea how to give a daemon a heart. The second group of shamans ignores them. All the shamans continue the ritual. Increasingly more powerful daemons are being summoned, and the concerned group warns the World-Devourer could show up much sooner than anyone had thought.

The caveman wonders which of the shamans are right. He also wonders why the ritual continues as is. It’s like no one listens to the concerned shamans. If they’re right the world will be eaten! What are they thinking? The caveman wonders whether the shamans can be trusted.

A Primer on Long-term Thinking

Any sufficiently advanced technology is indistinguishable from magic.
-- Arthur C. Clarke

Back here in our apparently wholly natural world, we have invented something like magic. It is called science, and on the whole it has worked out pretty well so far.


Climate change is a very thorny problem, and it is not looking like we will manage to fix it. While it will hurt, a lot, it is unlikely to kill us. The best shamanic estimate is 1/1000 chance of human extinction per century.

But even so. Not great.

What if we had taken the problem seriously when we first acknowledged it, we wonder? Could we have prevented this? Maybe. But even then, it would have been painful. In the 60s, greenhouse gas (GHG) emitting technologies were already thoroughly embedded into every aspect of civilization. It would have been better to have handled it then, but it would still have been a monumental task.

What if we had taken it seriously in the 19th century? Some shamans at the time were already theorizing about it. Could it have been stopped then? That would definitely have been the easiest time in which to stop it, before GHG emitting tech had become too widespread, but when there was a plausible route from the then current technologies to planetary disaster.

It would still have been a massive political struggle. GHG tech is too powerful, too useful, to easily prevent everyone from using it.

But it’s not the 19th century. Here in the 21st century, we do pull off regulating scientific endeavors, as the lack of genetically engineered humans anywhere on the planet shows.

It is the 19th century in another important sense though: there is another planetary disaster looming, and we will never be better positioned to tackle it than now.

AI Risk for Laymen

The looming planetary disaster is unaligned AI. What does that mean? It just means the proliferation of AI systems that do not care at all for human concerns. Like every other piece of software ever written.

But unlike any other piece of software, AI technologies have the potential of outmatching humans across every relevant domain, such that humanity would no longer have control over their future, outcompeted by software, at the mercy of software, just like gorillas and lions are at our mercy.

Like with gorillas and lions, AI wouldn’t need to harbor any “ill-will” to humanity to harm or eradicate us: we don’t hate gorillas, we just don’t care enough whether our actions do or do not harm them. Since intellect is power, and we’re so much smarter than gorillas, we’re also much more powerful than them, such that the gorillas can ultimately do nothing when our actions harm them.

“But wait!” you might object. “It’s impossible for AI to do that to us! We know exactly what intelligence is and what it can or cannot do!"

I say “Huh?”.

You blink.


“But hang on. We don’t have any reason to think we could develop an AI so powerful.”

Well, you may think that, but you and me are laymen. Here is what leading AI expert Stuart Russell has to say on the matter:

The risks of superintelligence can also be dismissed by arguing that superintelligence cannot be achieved. These claims are not new, but it is surprising now to see AI researchers themselves claiming that such AI is impossible. For example, a major report from the AI100 organization, “Artificial Intelligence and Life in 2030 [PDF]”, includes the following claim: “Unlike in the movies, there is no race of superhuman robots on the horizon or probably even possible.”

To my knowledge, this is the first time that serious AI researchers have publicly espoused the view that human-level or superhuman AI is impossible—and this in the middle of a period of extremely rapid progress in AI research, when barrier after barrier is being breached. It’s as if a group of leading cancer biologists announced that they had been fooling us all along: They’ve always known that there will never be a cure for cancer.

What could have motivated such a volte-face? The report provides no arguments or evidence whatever. (Indeed, what evidence could there be that no physically possible arrangement of atoms outperforms the human brain?) I suspect that the main reason is tribalism—the instinct to circle the wagons against what are perceived to be “attacks” on AI. It seems odd, however, to perceive the claim that superintelligent AI is possible as an attack on AI, and even odder to defend AI by saying that AI will never succeed in its goals. We cannot insure against future catastrophe simply by betting against human ingenuity.

"But Stuart Russell is just one guy (who's a computer science professor at Berkeley, neurosurgery professor at UCSF, DARPA advisor, and author of the leading textbook on AI)! Facebook's Yann LeCun disagrees!"

Shane Legg, co-founder of Google's DeepMind, is with Russell

Andrew Ng (roles have included Chief Scientist at Baidu, Google Brain lead, Coursera founder) is with me!

We can keep playing Pokémon with scientists for quite a while (go Steve Omohundro!).

But the very fact that we can is the problem. If we want to know what the collective AI community, academia, industry, and government, thinks of AI safety, there is no consensus to gesture to!

And ordinarily, this would not matter. Why should mere cavemen care about the arcane disputes of shamans? They'll figure it out.

Well. Remember the estimate above of 1/1000 odds of human extinction from climate change? It's 1/10 odds of extinction from unaligned AI.

"Bull. No way that Toby Ord guy is right."

How would you know? How do you know you're not standing at the door of the control room at Chernobyl the night of the disaster, hearing the technicians bickering while Anatoly Dyatlov downplays the risks, downplays even the idea that there are any risks at all?

But forget Ord. Here is what the community thinks of the risk of AI:

Sorcerous gibberish

I can kinda make some sense out of that, but let's defer to a shaman:

  1. Even people working in the field of aligning AIs mostly assign “low” probability (~10%) that unaligned AI will result in human extinction
  1. While some people are still concerned about the superintelligence scenario, concerns have diversified a lot over the past few years
  1. People working in the field don't have a specific unified picture of what will go wrong

That's all very calm. Very sedate. But let's bring it back to Chernobyl at the night of the fatal test. You are a guard outside the control room, hearing some bickering. You show some initiative and boldly enter the room. "Everything okay?" you ask.

One of the technicians replies:

We think there are 10% odds that if we continue the test the reactor will explode.

Another says:

Some of us here are worried about an explosion, but others think some other bad stuff is more likely to happen.

And another:

Bottom line, we are just not sure what will go wrong, but we think something will.

Dyatlov says:

Worrying about an RBMK reactor explosion is like worrying about the Sun turning into a red giant and swallowing the Earth. Continue the test.

The guard, being a sensible person who defers to domain experts when it comes to their subject matter does... what? What does the guard do? What do you do when experts bicker over what appears to be a life or death matter?

I don't know. But for starters, it sounds like some of the experts don't belong in the room. You should stay in the room because technical experts don’t handle high pressure, low-information situations too well, being used to having the time and leisure to dot every i and cross every t before reaching a conclusion. We have evidence of that. If they could handle them, Chernobyl would not have happened, because the technicians would just have beaten Dyatlov up and tossed him out of the room. If they could handle them, Roger Boisjoly would have found a way to prevent the Challenger launch, even if he had to scream or bloody some manager noses!

This guide starts from there. We start by determining who are the Dyatlovs in the room and evicting them, even if it takes extraordinary, heroic action. Because what some of the experts are saying is that an AI Chernobyl may not be survivable.

In the meantime, tell other laymen!




New Comment

New to LessWrong?