The very short version of the AI risk argument is that an AI that is *better than people at achieving arbitrary goals in the real world* would be a very scary thing, because whatever the AI tried to do would then actually happen. As stories of magically granted wishes and sci-fi dystopias point out, it's really hard to specify a goal that can't backfire, and current techniques for training neural networks are generally terrible at specifying goals precisely. If having a wish granted by a genie is dangerous, having a wish granted by a genie that can't hear you clearly is even more dangerous.

Current AI systems certainly fall far short of being able to achieve arbitrary goals in the real world better than people, but there's nothing in physics or mathematics that says such an AI is *impossible*, and progress in AI often takes people by surprise. People just don't know what the actual time limit is, and unless we humans have a good plan *before* someone makes a scary AI that has a goal, things are going go very badly.


I've posted versions of this two-paragraph argument in various places online and used it in person, and it usually goes over pretty well; I think it explains pretty clearly and simply what the AI x-risk community is actually afraid of. I figured I'd post it here for everyone's convenience.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 11:48 AM

Okay, I'm gonna take my skeptical shot at the argument, I hope you don't mind! 

an AI that is *better than people at achieving arbitrary goals in the real world* would be a very scary thing, because whatever the AI tried to do would then actually happen

It's not true that whatever the AI tried to do would happen. What if an AI wanted to travel faster than the speed of light, or prove that 2+2=5, or destroy the sun within 1 second of being turned on? 

You can't just say "arbitrary goals", you have to actually explain what goals there are that would be realistically achievable by an realistic AI that could be actually built in the near future. If those abilities fall short of "destroy all of humanity", then there is no x-risk. 

As stories of magically granted wishes and sci-fi dystopias point out, it's really hard to specify a goal that can't backfire

This is fictional evidence. Genies don't exist, and if they did, it probably wouldn't be that hard to add enough caveats to your wish to prevent global genocide. A counterexample might be the use of laws: sure, there are loopholes, but not big enough that the law would let you off on a broad daylight killing spree. 

Current AI systems certainly fall far short of being able to achieve arbitrary goals in the real world better than people, but there's nothing in physics or mathematics that says such an AI is *impossible*

Well, there is laws of physics and maths that put limits on available computational power, which in turn puts a limit on what an AI can actually achieve. For example, a perfect Bayesian reasoner is forbidden by the laws of mathematics. 

This is fictional evidence. Genies don’t exist

Also, it's an argument from selective stupidity. An ASI doesn't have to interpret things literally as result of cognitive limitation.

Eh I think it seems somewhat easy for an AI to understand what our wish is at a common sense level, gpt4 can clearly understand to a degree. However, it's yet to be proven if we can make them care about it (i.e. no deceptive alignment). 

Part of the problem is that humans themselves are often bad at knowing what they want. :/

If anyone here has a better phrasing for something in the two paragraphs, feel free to let me know. I'm hoping for something that people can link to, copy/paste, or paraphrase out loud when someone asks why we think AI risk is a real thing.

Something to consider: Most people already agree that AI risk is real and serious. If you're discussing it in areas where it's a fringe view, you're dealing with very unusual people, and might need to put together very different types of arguments, depending on the group. That said...

stop.ai's one-paragraph summary is

OpenAI, DeepMind, Anthropic, and others are spending billions of dollars to build godlike AI. Their executives say they might succeed in the next few years. They don’t know how they will control their creation, and they admit humanity might go extinct. This needs to stop.

The rest of the website has a lot of well-written stuff.

Some might be receptive to things like Yudkowsky's TED talk:

Nobody understands how modern AI systems do what they do. They are giant, inscrutable matrices of floating-point numbers that we nudge in the direction of better performance until they inexplicably start working. At some point, the companies rushing headlong to scale AI will cough out something that's smarter than humanity. Nobody knows how to calculate when that will happen. My wild guess is that it will happen after zero to two more breakthroughs the size of transformers.

What happens if we build something smarter than us that we understand that poorly? Some people find it obvious that building something smarter than us that we don't understand might go badly. Others come in with a very wide range of hopeful thoughts about how it might possibly go well. Even if I had 20 minutes for this talk and months to prepare it, I would not be able to refute all the ways people find to imagine that things might go well.

But I will say that there is no standard scientific consensus for how things will go well. There is no hope that has been widely persuasive and stood up to skeptical examination. There is nothing resembling a real engineering plan for us surviving that I could critique. This is not a good place in which to find ourselves.

And of course, you could appeal to authority by linking the CAIS letter, and maybe the Bletchley Declaration if statements from the international community will mean anything.

(None of those are strictly two-paragraph explanations, but I hope it helps anyway.)