Cross-posted to the EA forum.
- In August 2020, we conducted an online survey of prominent AI safety and governance researchers. You can see a copy of the survey at this link.
- We sent the survey to 135 researchers at leading AI safety/governance research organisations (including AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI) and a number of independent researchers. We received 75 responses, a response rate of 56%.
- The survey aimed to identify which AI existential risk scenarios (which we will refer to simply as “risk scenarios”) those researchers find most likely, in order to (1) help with prioritising future work on exploring AI risk scenarios, and (2) facilitate discourse and understanding within the AI safety and governance community, including between researchers who have different views.
- In our view, the key result is that there was considerable disagreement among researchers about which risk scenarios are the most likely, and high uncertainty expressed by most individual researchers about their estimates.
- This suggests that there is a lot of value in exploring the likelihood of different AI risk scenarios in more detail, especially given the limited scrutiny that most scenarios have received. This could look like:
- Fleshing out and analysing the scenarios mentioned in this post which have received less scrutiny.
- Doing more horizon scanning or trying to come up with other risk scenarios, and analysing them.
- At this time, we are only publishing this abbreviated version of the results. We have a version of the full results that we may publish at a later date. Please contact one of us if you would like access to this, and include a sentence on why the results would be helpful or what you intend to use them for.
- We welcome feedback on any aspects of the survey.
It has been argued that AI could pose an existential risk. The original risk scenarios were described by Nick Bostrom and Eliezer Yudkowsky. More recently, these have been criticised, and a number of alternative scenarios have been proposed. There has been some useful work exploring these alternative scenarios, but much of this is informal. Most pieces are only presented as blog posts, with neither the detail of a book, nor the rigour of a peer-reviewed publication. For further discussion of this dynamic, see work by Ben Garfinkel, Richard Ngo and Tom Adamczewski.
The result is that it is no longer clear which AI risk scenarios experts find most plausible. We think this state of affairs is unsatisfactory for at least two reasons. First, since many of the proposed scenarios seem underdeveloped, there is room for further work analyzing them in more detail. But this is time-consuming and there are a wide range of scenarios that could be analysed, so knowing which scenarios leading experts find most plausible is useful for prioritising this work. Second, since the views of top researchers will influence the views of the broader AI safety and governance community, it is important to make the full spectrum of views more widely available. The survey is intended to be a first step in this direction.
We asked researchers to estimate the probability of five AI risk scenarios, conditional on an existential catastrophe due to AI having occurred. There was also a catch-all “other scenarios” option.
These were the five scenarios we asked about, and the descriptions we gave in the survey:
- A single AI system with goals that are hostile to humanity quickly becomes sufficiently capable for complete world domination, and causes the future to contain very little of what we value, as described in “Superintelligence”.
- Part 2 of “What failure looks like”
- This involves multiple AIs accidentally being trained to seek influence, and then failing catastrophically once they are sufficiently capable, causing humans to become extinct or otherwise permanently lose all influence over the future.
- Part 1 of “What failure looks like”
- This involves AIs pursuing easy-to-measure goals, rather than the goals humans actually care about, causing us to permanently lose some influence over the future (excluding cases where the “Superintelligence” scenario or Part 2 of “What failure looks like” also occur).
- Some kind of war between humans, exacerbated by developments in AI, causes an existential catastrophe. AI is a significant risk factor in the catastrophe, such that no catastrophe would be occurred without the developments in AI. The proximate cause of the catastrophe is the deliberate actions of humans, such as the use of AI-enabled, nuclear or other weapons. See Dafoe (2018) for more detail.
- Intentional misuse of AI by one or more actors causes an existential catastrophe (excluding cases where the catastrophe was caused by misuse in a war that would not have occurred without developments in AI). See Karnofsky (2016) for more detail.
We chose these five scenarios because they have been most prominent in previous discussions about different AI risk scenarios. For more details about the survey, you can find a copy of it at this link.
There was considerable disagreement among researchers about which risk scenarios are most likely
If you take the median response for each scenario and compare them, those (conditional) probabilities are fairly similar (between 10% and 12.5% for the five given scenarios, and 20% for “other scenarios”). However, individual responses vary greatly (from the median). For instance, most respondents thought at least one scenario was quite unlikely:
- 96% of respondents assigned ≤10% (conditional) probability to at least one scenario.
- 89% of respondents assigned ≤10% (conditional) probability to at least two scenarios.
- 64% of respondents assigned ≤10% (conditional) probability to at least three scenarios.
There were a number of outliers: for each scenario, at least one respondent estimated them to have ≥70% (conditional) probability.
For each scenario (including “other scenarios”), the mean absolute deviation of responses was somewhere between 9% and 18%.
- E.g. for the “Superintelligence” scenario, the mean absolute deviation was 13%. This means that the average (absolute) distance from the mean estimate was 13 percentage points.
- To help interpret this: recall that the means are all between 15% and 25% (see footnote 4) - so the mean absolute deviations are relatively large compared to (conditional) probabilities themselves.
For each scenario (including “other scenarios”), the interquartile range of responses was somewhere between 15% and 31%.
- E.g. for the “Superintelligence” scenario, the first quartile response was 5% and the third quartile response was 20% (so the interquartile range was 15%).
These statistics suggest considerable disagreement among researchers about which risk scenarios are the most likely.
Researchers are uncertain about which risk scenarios are most likely
The median self-reported confidence level given by respondents was 2, on a seven point Likert scale from 0 to 6, where:
- Confidence level 0 was labelled “completely uncertain, I selected my answers randomly”, and
- Confidence level 6 was labelled “completely certain, like probability estimates for a fair dice”.
Researchers put substantial credence on “other scenarios”
The “other scenarios” option had the highest median probability, at 20%. Some researchers left free-form comments describing these other scenarios. Most of them have seen no public write-up, and the others have been explored in less detail than the five scenarios we asked about.
Together, these three results suggest that there is a lot of value in exploring the likelihood of different risk scenarios in more detail. This could look like:
- Fleshing out and analysing the scenarios mentioned in this post, in more detail.
- This seems especially important given that the median and mean probability estimates were similar for all the scenarios, and yet the “Superintelligence” scenario has received far more scrutiny than the others.
- Doing more horizon scanning or trying to come up with other risk scenarios, and analysing them.
- Other than the “Superintelligence” scenario, almost all other risk scenarios (including those mentioned in this post) were only made salient in the last four years or so.
Recent “failure stories” by Andrew Critch and Paul Christiano - which seem to have been well-received and appreciated - also suggest that there is value in exploring different risk scenarios in more detail. Likewise, Rohin Shah advocates for this kind of work, and AI Impacts has recently compiled a collection of stories to clarify, explore or appreciate possible future AI scenarios.
One important caveat is the tractability of exploring the likelihood of different AI risk scenarios in more detail. The existence of considerable disagreement, despite there having been some attempts to clarify and discuss these issues, could suggest that making progress on this is difficult. However, we think there has been relatively little effort towards this kind of work so far, and that there is still a lot of low-hanging fruit.
Additionally, there were a number of limitations in the survey design, which are summarised in this document. If we were to run the survey again, we would do many things differently. Whilst we think that our main findings stand up to these limitations, we nonetheless advise taking them cautiously, and as just one piece of evidence - among many - about researchers’ views on AI risk.
Other notable results
Most of this community’s discussion about existential risk from AI focuses on scenarios involving one or more powerful, misaligned AI systems that take control of the future. This kind of concern is articulated most prominently in “Superintelligence” and “What failure looks like”, corresponding to three scenarios in our survey (the “Superintelligence” scenario, part 1 and part 2 of “What failure looks like”). The median respondent’s total (conditional) probability on these three scenarios was 50%, suggesting that this kind of concern about AI risk is still prevalent, but far from the only kind of risk that researchers are concerned about today.
69% of respondents reported that they have lowered their probability estimate in the “Superintelligence” scenario (as described above) since the first year they were involved in AI safety/governance. This may be because they now assign relatively higher probabilities to other risk scenarios happening first, and not necessarily because they think that fast takeoff or other premises of the “Superintelligence” scenario are less plausible than they originally did.
At this time, we are only publishing this abbreviated version of the results. We have a version of the full results that we may publish at a later date. Please contact one of us if you would like access to this, and include a sentence on why the results would be helpful or what you intend to use them for.
We would like to thank all researchers who participated in the survey. We are also grateful for valuable comments and feedback from JJ Hepburn, Richard Ngo, Ben Garfinkel, Max Daniel, Rohin Shah, Jess Whittlestone, Rafe Kennedy, Spencer Greenberg, Linda Linsefors, David Manheim, Ross Gruetzemacher, Adam Shimi, Markus Anderljung, Chris McDonald, David Kreuger, Paolo Bova, Vael Gates, Michael Aird, Lewis Hammond, Alex Holness-Tofts, Nicholas Goldowsky-Dill, the GovAI team, the AI:FAR group, and anyone else we ought to have mentioned here. This project grew out of AISC and FHI SRF. All errors are our own.
We will not look at any responses from now on; this is intended just to show what questions were asked, and in case any readers are interested in thinking through their own responses. ↩︎
AI existential risk scenarios are sometimes called threat models. ↩︎
Bostrom describes many scenarios in the book “Superintelligence”. We think that this scenario is the one that most people remember from the book, but nonetheless, we think it was probably a mistake to refer to this particular scenario by this name. ↩︎
Likewise, the mean responses for the five given scenarios are all between 15% and 18%, and the mean response for “other scenarios” was 25%. ↩︎
Other similar results: 77% of respondents assigned ≤5% (conditional) probability to at least one scenario; 51% of respondents assigned ≤5% (conditional) probability to at least two scenarios. ↩︎
For another way of interpreting this, consider that if respondents were evenly split into six completely “polarised” camps, each of which put 100% probability on one option and 0% on the others, then the mean absolute deviation for each scenario would be ~28%. ↩︎
As per footnote 3, the particular scenario we are referring to here is not the only scenario described in “Superintelligence”. ↩︎
This technically seems to include cases like: AGI is not developed by 2050, and a nuclear war in the year 2050 causes an existential catastrophe, but if an aligned AGI had been developed by then, it would have prevented the nuclear war. I don't know if respondents interpreted it that way.
Thanks for pointing this out. We did intend for cases like this to be included, but I agree that it's unclear if respondents interpreted it that way. We should have clarified this in the survey instructions.
Thanks for your comment! I think your critique is justified.
My best guess is that this consideration was not salient for most participants and probably didn't distort the results in meaningful ways, but it's of course hard to tell and DanielFilan's comment suggests that it was not irrelevant.
We are aware of a number of other limitations, especially with regards to the mutual exclusivity of different scenarios. We've summarized these limitations here.
Overall, you should take the results with a grain of salt. They should only be seen as signposts indicating which scenarios people find most plausible.
As a respondent, I remember being unsure whether I should include those catastrophes.
That seems like a really bad conflation? Is one question combining the risk of "too much" AI use and "too little" AI use?
That's even worse than the already widely smashed distinctions between "can we?" "should we?" And "will we?"
Yes, it is. Combining these cases seems reasonable to me, though we definitely should have clarified this in the survey instructions. They're both cases where humanity could avoided an existential catastrophe by making different decisions with respect to AI.
But the action needed to avoid/mitigate in those cases is very different, so it doesn't seem useful to get a feeling for "how far off of ideal are we likely to be" when that is composed of:
1. What is the possible range of AI functionality (as constrained by physics)? - ie what can we do?
2. What is the range of desirable outcomes within that range? - ie what should we do?
3. How will politics, incumbent interests, etc. play out? - ie what will we actually do?
Knowing that experts think we have a (say) 10% chance of hitting the ideal window says nothing about what an interested party should do to improve those chances. It could be "attempt to shut down all AI research" or "put more funding into AI research" or "it doesn't matter because the two majority cases are "General AI is impossible - 40%" and "General AI is inevitable and will wreck us - 50%""
Thanks for the reply - a couple of responses:
No, these cases aren't included. The definition is: "an existential catastrophe that could have been avoided had humanity's development, deployment or governance of AI been otherwise". Physics cannot be changed by humanity's development/deployment/governance decisions. (I agree that cases 2 and 3 are included).
That's correct. The survey wasn't intended to understand respondents' views on interventions. It was only intended to understand: if something goes wrong, what do respondents think that was? Someone could run another survey that asks about interventions (in fact, this other recent survey does that). For the reasons given in the Motivation section of this post, we chose to limit our scope to threat models, rather than interventions.
Planned summary for the Alignment Newsletter:
(Moderation note: added to the Alignment Forum from LessWrong.)
Great project. I'd love to hear more details. Somehow I missed this post when it was released but it was pointed out to me yesterday.
I've been developing a project for the past couple of months that lines up quite closely (specifically, the goal of exploring additional scenarios as you highlighted in the takeaways). I have a very short time horizon for the completion of this particular part of the project (which has interfered with refining the survey much as I'd have liked) but I'd be happy to share any results.
The project I've been working on cobbling together is broadly similar. I compiled a list of AI scenario "dimensions," key aspects of different scenarios, with three to four conditions for each dimension. Conditions are basically the direction each could go in a scenario, e.g. "takeoff" would be a dimension, and "fast, slow, moderate (controlled), and (uncontrolled) would be example conditions, or "AI paradigm" would be a dimension, with deep learning, hybrid, new paradigm, embodiment, deep learning plus something else would be the conditions).
The plan so far is to try and get judgments on the individual components or plausible components of each scenario and then use a scenario mapping tool (based on GMA, with some variations) to cluster all possible combinations.
I have a longer version of the survey for both impact and likelihood, and a short version for just likelihood that's easier to complete. GMA doesn't usually use elicitation, so this could be interesting, but thus far the questions have been a challenge.
This should provide a large grouping of possible combinations to explore. I'm requesting likelihood (and impact) rankings on each, which should refine the number of options, and then we can parse different clusters to explore unique potential futures (Without rankings, the output is in the tens of millions of options). A more detailed overview is here if you're curious, or shoot me a direct message. I hope to try and put together a more comprehensive version later in the year with other data sources as well.