Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I think one reason machine learning researchers don't think AI x-risk is a problem is because they haven't given it the time of day. And on some level, they may be right in not doing so!

We all need to do meta-level reasoning about what to spend our time and effort on. Even giving an idea or argument the time of day requires it to cross a somewhat high bar, if you value your time. Ultimately, in evaluating whether it's worth considering a putative issue (like the extinction of humanity at the hands (graspers?) of a rogue AI), one must rely on heuristics; by giving the argument the time of day, you've already conceded a significant amount of resources to it! Moreover, you risk privileging the hypothesis or falling victim to Pascal's Mugging.

Unfortunately, the case for x-risk from out-of-control AI systems seems to fail many powerful and accurate heuristics. This can put proponents of this issue in a similar position to flat-earth conspiracy theorists at first glance. My goal here is to enumerate heuristics that arguments for AI takeover scenarios fail.

Ultimately, I think machine learning researchers should not refuse to consider AI x-risk when presented with a well-made case by a person they respect or have a personal relationship with, but I'm ambivalent as to whether they have an obligation to consider the case if they've only seen a few headlines about Elon. I do find it a bit hard to understand how one doesn't end up thinking about the consequences of super-human AI, since it seems obviously impactful and fascinating. But I'm a very curious (read "distractable") person...


A list of heuristics that say not to worry about AI takeover scenarios:

  • Outsiders not experts: This concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus.
  • Ludditism has a poor track record: For every new technology, there's been a pack of alarmist naysayers and doomsday prophets. And then instead of falling apart, the world got better.
  • EtA: No concrete threat model: When someone raises a hypothetical concern, but can't give you a good explanation for how it could actually happen, it's much less likely to actually happen. Is the paperclip maximizer the best you can do?
  • It's straight out of science fiction: AI researchers didn't come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies.
  • It's not empirically testable: There's no way to falsify the belief that AI will kill us all. It's purely a matter of faith. Such beliefs don't have good track records of matching reality.
  • It's just too extreme: Whenever we hear an extreme prediction, we should be suspicious. To the extent that extreme changes happen, they tend to be unpredictable. While extreme predictions sometimes contain a seed of truth, reality tends to be more mundane and boring.
  • It has no grounding in my personal experience: When I train my AI systems, they are dumb as doorknobs. You're telling me they're going to be smarter than me? In a few years? So smart that they can outwit me, even though I control the very substrate of their existence?
  • It's too far off: It's too hard to predict the future and we can't really hope to anticipate specific problems with future AI systems; we're sure to be surprised! We should wait until we can envision more specific issues, scenarios, and threats, not waste our time on what comes down to pure speculation.

I'm pretty sure this list in incomplete, and I plan to keep adding to it as I think of or hear new suggestions! Suggest away!!

Also, to be clear, I am writing these descriptions from the perspective of someone who has had very limited exposure to the ideas underlying concerns about AI takeover scenarios. I think a lot of these reactions indicate significant misunderstandings about what people working on mitigating AI x-risk believe, as well as matters of fact (e.g. a number of experts have voiced concerns about AI x-risk, and a significant portion of the research community seems to agree that these concerns are at least somewhat plausible and important).

New Comment
14 comments, sorted by Click to highlight new comments since: Today at 7:56 AM

Here's another: AI being x-risky makes me the bad guy.

That is, if I'm an AI researcher and someone tells me that AI poses x-risks, I might react by seeing this as someone telling me I'm a bad person for working on something that makes the world worse. This is bad for me because I derive import parts of my sense of self from being an AI researcher: it's my profession, my source of income, my primary source of status, and a huge part of what makes my life meaningful to me. If what I am doing is bad or dangerous, that threatens to take much of that away (if I also want to think of myself as a good person, meaning I either have to stop doing AI work to avoid being bad or stop thinking of myself as good), and an easy solution to that is to dismiss the arguments.

This is more generally a kind of motivated cognition or rationalization, but I think it's worth considering a specific mechanism because it better points towards ways you might address the objection.

This doesn't seem like it belongs on a "list of good heuristics", though!

Another important improvement I should make: rephrase these to have the type signature of "heuristic"!

I pushed this post out since I think it's good to link to it in this other post. But there are at least 2 improvements I'd like to make and would appreciate help with:

I helped make this list in 2016 for a post by Nate, partly because I was dissatisfied with Scott's list (which includes people like Richard Sutton, who thinks worrying about AI risk is carbon chauvinism):

Stuart Russell’s Cambridge talk is an excellent introduction to long-term AI risk. Other leading AI researchers who have expressed these kinds of concerns about general AI include Francesca Rossi (IBM), Shane Legg (Google DeepMind), Eric Horvitz (Microsoft), Bart Selman (Cornell), Ilya Sutskever (OpenAI), Andrew Davison (Imperial College London), David McAllester (TTIC), and Jürgen Schmidhuber (IDSIA).

These days I'd probably make a different list, including people like Yoshua Bengio. AI risk stuff is also sufficiently in the Overton window that I care more about researchers' specific views than about "does the alignment problem seem nontrivial to you?". Even if we're just asking the latter question, I think it's more useful to list the specific views and arguments of individuals (e.g., note that Rossi is more optimistic about the alignment problem than Russell), list the views and arguments of the similarly prominent CS people who think worrying about AGI is silly, and let people eyeball which people they think tend to produce better reasons.

Is there a better reference for " a number of experts have voiced concerns about AI x-risk "? I feel like there should be by now...

I hope someone actually answers your question, but FWIW, the Asilomar principles were signed by an impressive list of prominent AI experts. Five of the items are related to AGI and x-risk. The statements aren't really strong enough to declare that those people "voiced concerns about AI x-risk", but it's a data-point for what can be said about AI x-risk while staying firmly in the mainstream.

My experience in casual discussions is that it's enough to just name one example to make the point, and that example is of course Stuart Russell. When talking to non-ML people—who don't know the currently-famous AI people anyway—I may also mention older examples like Alan Turing, Marvin Minsky, or Norbert Wiener.

Thanks for this nice post. :-)

Yeah I've had conversations with people who shot down a long list of concerned experts, e.g.:

  • Stuart Russell is GOFAI ==> out-of-touch
  • Shane Legg doesn't do DL, does he even do research? ==> out-of-touch
  • Ilya Sutskever (and everyone at OpenAI) is crazy, they think AGI is 5 years away ==> out-of-touch
  • Anyone at DeepMind is just marketing their B.S. "AGI" story or drank the koolaid ==> out-of-touch

But then, even the big 5 of deep learning have all said things that can be used to support the case....

So it kind of seems like there should be a compendium of quotes somewhere, or something.

Sounds like their problem isn't just misleading heuristics, it's motivated cognition.

Oh sure, in some special cases. I don't this this experience was particularly representative.

Sort of related to a couple points you already brought up (not in personal experience, outsiders not experts, science fiction), but worrying about AI x-risk is also weird, i.e. it's not a thing everyone else is worrying about, so you use some of your weirdness-points to publicly worry about it, and most people have very low weirdness budgets (because of not enough status to afford more weirdness, low psychological openness, etc.).

Flo's summary for the Alignment Newsletter:

Because human attention is limited and a lot of people try to convince us of the importance of their favourite cause, we cannot engage with everyone’s arguments in detail. Thus we have to rely on heuristics to filter out insensible arguments. Depending on the form of exposure, the case for AI risks can fail on many of these generally useful heuristics, eight of which are detailed in this post. Given this outside view perspective, it is unclear whether we should actually expect ML researchers to spend time evaluating the arguments for AI risk.

Flo's opinion:

I can remember being critical of AI risk myself for similar reasons and think that it is important to be careful with the framing of pitches to avoid these heuristics from firing. This is not to say that we should avoid criticism of the idea of AI risk, but criticism is a lot more helpful if it comes from people who have actually engaged with the arguments.

My opinion:

Even after knowing the arguments, I find six of the heuristics quite compelling: technology doomsayers have usually been wrong in the past, there isn't a concrete threat model, it's not empirically testable, it's too extreme, it isn't well grounded in my experience with existing AI systems, and it's too far off to do useful work now. All six make me distinctly more skeptical of AI risk.

There is an issue of definition here. Categories of scenario exist where it is unclear if they constitute an "AI takeover" even though there is recognition of a real and likely risk of some type. Almost everyone stakes out positions at binary extremes of outcome, good or bad, without much consideration for plausible quasi-equilibrium states in the middle that fall out of some risk models. For researchers working in the latter camp, it will feel a bit like a false dichotomy.

As another heuristic, the inability to arrive at a common set of elementary computational assumptions, grounded in physics, whence the AI risk models are derived is sufficient reason to be skeptical of any particular AI risk model without knowing much else.

I think this list is interesting and potentially useful, and I think I'm glad you put it together. I also generally think it's a good and useful norm for people to seriously engage with the arguments they (at least sort-of/overall) disagree with.

But I'm also a bit concerned about how this is currently presented. In particular:

  • This is titled "A list of good heuristics that the case for AI x-risk fails".
  • The heuristics themselves are stated as facts, not as something like "People may believe that..." or "Some claim that..." (using words like "might" could also help).
    • A comment of yours suggests you've already noticed this. But I think it'd be pretty quick to fix.
  • Your final paragraph, a very useful caveat, comes after listing all the heuristics as facts.

I think these things will have relatively small downsides, given the likely quite informed and attentive audience here. But a bunch of psychological research I read a while ago (2015-2017) suggests there could be some degree of downsides. E.g.:

Information that initially is presumed to be correct, but that is later retracted or corrected, often continues to influence memory and reasoning. This occurs even if the retraction itself is well remembered. The present study investigated whether the continued influence of misinformation can be reduced by explicitly warning people at the outset that they may be misled. A specific warning--giving detailed information about the continued influence effect (CIE)--succeeded in reducing the continued reliance on outdated information but did not eliminate it. A more general warning--reminding people that facts are not always properly checked before information is disseminated--was even less effective. In an additional experiment, a specific warning was combined with the provision of a plausible alternative explanation for the retracted information. This combined manipulation further reduced the CIE but still failed to eliminate it altogether.

And also:

Information presented in news articles can be misleading without being blatantly false. Experiment 1 examined the effects of misleading headlines that emphasize secondary content rather than the article’s primary gist. [...] We demonstrate that misleading headlines affect readers’ memory, their inferential reasoning and behavioral intentions, as well as the impressions people form of faces. On a theoretical level, we argue that these effects arise not only because headlines constrain further information processing, biasing readers toward a specific interpretation, but also because readers struggle to update their memory in order to correct initial misconceptions.

Based on that sort of research (for a tad more info on it, see here), I'd suggest:

  • Renaming this to something like "A list of heuristics that suggest the case for AI x-risk is weak" (or even "fails", if you've said something like "suggest" or "might")
  • Rephrasing the heuristics to stated as disputable (or even false) claims, rather than facts. E.g., "Some people may believe that this concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus." ETA: Putting them in quote marks might be another option for that.
  • Moving what's currently the final paragraph caveat to before the list of heuristics.
  • Perhaps also adding sub-points about the particularly disputable dot points. E.g.:
    • "(But note that several AI experts have now voiced concern about the possibility of major catastrophes from advanced AI system, although there's still not consensus on this.)"

I also recognise that several of the heuristics really do seem good, and probably should make us at least somewhat less concerned about AI. So I'm not suggesting trying to make the heuristics all sound deeply flawed. I'm just suggestng perhaps being more careful not to end up with some readers' brains, on some level, automatically processing all of these heuristics as definite truths that definitely suggest AI x-risk isn't worth of attention.

Sorry for the very unsolicited advice! It's just that preventing gradual slides into false beliefs (including from well-intentioned efforts that do actually contain the truth in them!) is sort of a hobby-horse of mine.

Also, one other heuristic/proposition that, as far as I'm aware, is simply factually incorrect (rather than "flawed but in debatable ways" or "actually pretty sound") is "AI researchers didn't come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies." So there it may also be worth pointing out in some manner that, in reality, quite early on prominent AI researchers raised concerns somewhat similar to those discussed now.

E.g., I. J. Good apparently wrote in 1959:

Whether [an intelligence explosion] will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.