In another post, I published four hypotheses on uncontrollable AI as an existential risk. Here I give some background on why I chose this particular framing.

Talking about existential risks from AI with people outside of the AI alignment or effective altruism communities can be quite frustrating. Often, what seems obvious to people in the community is met with deep skepticism by others, making a productive discussion difficult. As a result, the arguably most urgent problem we face today is almost completely neglected by politicians and the general scientific community.

I think there are many reasons for this, and I don’t claim to have analyzed them thoroughly, so there is more work to be done here. But from my own experience, there are some key factors that make it difficult for people to see that there is a real problem that needs to be addressed now. The most important one in my opinion is that existential risks from AI are usually linked to terms like “AGI”, “transformative AI (TAI)” or even “superintelligence”. This has several drawbacks. 

First of all, these terms are quite vague, so it is easy for two people to have very different understandings of them. Also, vague problems are much easier to ignore than concrete ones. 

Second, these terms (with the possible exception of TAI) are anthropomorphic: The human mind is defined as the ultimate benchmark of intelligence and it is implicitly assumed that as long as AIs are not “smarter than us”, there is no existential risk. Also, the expected timeline for AGI depends a lot on the individual view of how complex the human mind really is. If, for example, you believe that “consciousness” is in principle unachievable in a machine, you’ll probably think that AGI is impossible, or at least a very long way off, and therefore there is no reason to be concerned. Even if you think that consciousness can in principle be achieved in computers, you might equate “developing AGI” with “simulating the human brain” or at least “fully understanding the workings of the brain”. This is grossly misleading. While it seems plausible that AGI or superintelligence would indeed pose an existential risk, it is by no means clear that superiority to the human mind in most aspects is a necessary condition for that. In particular, “consciousness” in the way philosophers or psychologists usually understand it is probably not needed. 

Third, AGI and superintelligence are often associated with science fiction, so it is easy to dismiss them as “not real”, especially in the absence of concrete proof that we’re close to developing AGI.

Fourth, there is an inherent conflict of interest: Leading labs like Deepmind and OpenAI are committed to developing AGI, so any attempt to ban it or even slow down research because it might pose an existential risk would likely be met with fierce opposition. This may also be true for decision-makers outside of AI who are committed to “free markets” and/or have a strongly positive view of technology in general. For this reason, people concerned about existential risks from AI are sometimes compared to Luddites or ridiculed, as Andrew Ng did with his famous comparison to “fear of overpopulation on Mars”.

To avoid these problems and foster a productive discussion about the benefits and risks of advanced AI, I propose that we talk about the risks of “uncontrollable AI” instead of AGI or superintelligence. By “uncontrollable”, I mean that the AI is able to counter most measures humans take to either limit its capabilities to act or correct its decisions. More details can be found in hypothesis 1 here. Apart from avoiding most of the problems mentioned above, “uncontrollable AI” is clearly a term that invites caution. Most people will see an “uncontrollable” technology as inherently bad and something to be avoided. I guess few AI developers would object to the claim that “uncontrollable AI would be a bad thing”.

Yampolskiy and others have given convincing arguments that any superintelligent AI would be uncontrollable. But it is not clear at all that to be uncontrollable, an AI has to be superintelligent. It may be sufficient that it is good enough at manipulating humans and/or technology in order to beat us at what I call the “dominance game”. It is currently unclear what exactly are necessary conditions for that, so here is a promising and important field for research.

It should be obvious that an uncontrollable AI pursuing the wrong goal would pose an existential threat. On the other hand, it may be possible that an uncontrollable AI pursuing the “right” goal could be beneficial to the future of humanity (although I am personally doubtful of that). However, in this case, the burden of proof clearly lies with the one proposing to develop such an AI. It seems reasonable, for example, to ban the development of uncontrollable AI unless it is provably beneficial. Even without a formal ban, a common global understanding among AI developers that uncontrollable AI is to be avoided under all circumstances, at least until the value alignment problem has been solved, would significantly reduce the risk of creating such a system.

By reframing the problem from AGI/superintelligence risks towards risks from uncontrollable AI, I hope that we’ll have a more open and productive discussion about the specific problems in AI development we need to avoid. It might enable us to research in more detail what exactly makes an AI uncontrollable, and where to draw “red lines” so that we can safely develop advanced AI.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 7:24 AM

I very much agree with your arguments here for re-focussing public explanations around not developing ‘uncontrollable AI’.

Two other reasons why to switch framing:

  1. For control/robotic engineers and software programmers, ‘AGI’ I can imagine is often a far-fetched idea that has no grounding in concrete gears-level principles of engineering and programming. But ‘uncontrollable’ (or unexplainable, or unpredictable) AI is something I imagine many non-ML engineers and programmers in industry to feel intuitively firmly against. Like, you do not want your software architecture to crash or your manufacturing plant to destabilise because of uncontrollable AI.

  2. Some of AI x-safety researchers’ writings and discussions about ‘AGI’ and ‘superintelligence’ seem to have prompted and confirmed the validity of initiatives by technology start-up leaders to develop ‘AGI’ they can ‘scientifically understand’ and control with engineering techniques to eg. solve all the world’s problems. Sam Altman and Elon Musk founded OpenAI after each reading and publicly commending Nick Bostrom’s book ‘Superintelligence’. If we keep publicly talking about how powerful, intelligent and generally utilisable AI could become, but that it might be uncontrollably unsafe, then we risk pulling entrepreneurial techno-optimits’ focus toward the former exciting-sounding part of our messaging. This leaves the latter part as an afterthought (‘Of course I’ll recruit smart researchers too to make it safe’).

I also think past LessWrong presentations of ideas around a superintelligent singleton with coherent preferences/goal structure have contributed to shaping a kind of ideological bubble that both AI capability researchers and x-safety researchers are engaging in. A coherent single system seems, all else equal, the easiest to control the parameters of. I have reasons to think that this is a mistaken representation of how generally-capable self-learning machines would end up looking like.

Thank you for your comments, which I totally agree with.

I like “rogue AI” over “uncontrollable AI” because you could substitute a five syllable word for a one syllable word, but otherwise I agree.

Also my experience in talking with people about this topic is that most ”normies” find AI scary & would prefer it not be developed, but for whatever reason the argument for a singularity or intelligence explosion in which human-level artificial intelligence is expected to rapidly yield superhuman AGI is unconvincing or silly-seeming to most people outside this bubble, including technical people. I’m not really sure why.

most ”normies” find AI scary & would prefer it not be developed, but for whatever reason the argument for a singularity or intelligence explosion in which human-level artificial intelligence is expected to rapidly yield superhuman AGI is unconvincing or silly-seeming to most people outside this bubble, including technical people. I’m not really sure why.

That's what I have experienced as well. I think one reason is that people find it difficult to imagine exponential growth - it's not something our brains are made for. If we think about the future, we intuitively look at the past and project a linear trend we seem to recognize. 

I also think that if something is a frequent topic in science fiction books and movies, people see it as less likely to become real, so we SF writers may actually make it more difficult to think clearly about the future, even though sometimes developers are inspired by SF. Most of the time, people realize only in hindsight that some SF scenarios may actually come true.

I think it's amazing how fast we go from "I don't believe that will ever be possible" to "that's just normal". I remember buying my first laptop computer with a color display in the nineties. If someone had told me that not much more than ten years later there would be an iPhone with the computing power of a supercomputer in my pocket, I'd have shaken my head in disbelief.

I kind of feel like it’s the opposite, people actually do anchor their imagination about the future on science fiction & this is part of the problem here. Lots of science fiction features a world with a bunch of human-level AIs walking around but where humans are still in comfortably in charge and non-obsolete, even though it’s hard to argue for why this would actually happen.

Yes, that's also true: There is always a lonely hero who in the end puts the AGI back into the box or destroys it. Nothing would be more boring than writing a novel about how in reality the AGI just kills everyone and wins. :( I think both is possible - that people imagine the wrong future and at the same time don't take it seriously.