This is my list of resources I send to machine learning (ML) researchers in presenting arguments about AI safety. New resources have been coming out fast, and I’ve also been user-testing these, so the top part of this post are my  updated (Nov 2022) recommendations. The rest of the post (originally posted June 2022) has been modified for organization but mostly left for reference; I make occasional additions to it (last updated June 2023).

Core recommended resources

Core readings for ML researchers[1]

Overall

Arguments for risk from advanced AI systems

Orienting 

Research directions

 

Core readings for the public

 

Core readings for EAs

(Readings that are more philosophical and involve x-risk and discussion of AGI-like systems, so expected to be less liked by ML researchers (I have some limited data suggesting this), but they’re anecdotally well-liked by EAs)

Getting involved for EAs

If you haven't read Charlie's writeup about research, or Gabe's writeup about engineering, worth a look! Richard Ngo's AGI safety career advice is also good. Also if you're interested in theory, see John Wentworth's writeup about independent research, and Vivek wrote some alignment exercises to try (also see John Wentworth’s work in general). With respect to outreach, I’d try to use a more technical pitch than what Vael used; I think Sam Bowman's pitch is pretty great, and Marius also has a nice writeup of his pitch (not specific to NLP).


Full list of recommended resources

These reading choices are drawn from the various other reading lists (also Victoria Krakovna’s); this is not original in any way, just something to draw from if you're trying to send someone some of the more accessible resources. 

Public-oriented

Central Arguments

Technical Work on AI alignment

How does this lead to xrisk / killing people though?

Forecasting (When might advanced AI be developed?)

Calibration and Forecasting

Common Misconceptions

Counterarguments to AI safety (messy doc): 

Collection of public surveys about AI


 

Miscellaneous older text

Text I'm no longer using but still use for reference sometimes.
 

If you’re interested in getting into this:



 

Introduction to large-scale risks from humanity, including "existential risks" that could lead to the extinction of humanity

Chapter 3 is on natural risks, including risks of asteroid and comet impacts, supervolcanic eruptions, and stellar explosions. Ord argues that we can appeal to the fact that we have already survived for 2,000 centuries as evidence that the total existential risk posed by these threats from nature is relatively low (less than one in 2,000 per century).

Chapter 4 is on anthropogenic risks, including risks from nuclear war, climate change, and environmental damage. Ord estimates these risks as significantly higher, each posing about a one in 1,000 chance of existential catastrophe within the next 100 years. However, the odds are much higher that climate change will result in non-existential catastrophes, which could in turn make us more vulnerable to other existential risks.

Chapter 5 is on future risks, including engineered pandemics and artificial intelligence. Worryingly, Ord puts the risk of engineered pandemics causing an existential catastrophe within the next 100 years at roughly one in thirty. With any luck the COVID-19 pandemic will serve as a "warning shot," making us better able to deal with future pandemics, whether engineered or not. Ord's discussion of artificial intelligence is more worrying still. The risk here stems from the possibility of developing an AI system that both exceeds every aspect of human intelligence and has goals that do not coincide with our flourishing. Drawing upon views held by many AI researchers, Ord estimates that the existential risk posed by AI over the next 100 years is an alarming one in ten.

Chapter 6 turns to questions of quantifying particular existential risks (some of the probabilities cited above do not appear until this chapter) and of combining these into a single estimate of the total existential risk we face over the next 100 years. Ord's estimate of the latter is one in six.

 

How AI could be an existential risk

  • AI alignment researchers disagree a weirdly high amount about how AI could constitute an existential risk, so I hardly think the question is settled. Some plausible ones people are considering (copied from the paper)
  • "Superintelligence"
    • A single AI system with goals that are hostile to humanity quickly becomes sufficiently capable for complete world domination, and causes the future to contain very little of what we value, as described in “Superintelligence". (Note from Vael: Where the AI has an instrumental incentive to destroy humans and uses its planning capabilities to do so, for example via synthetic biology or nanotechnology.)
  • Part 2 of “What failure looks like
    • This involves multiple AIs accidentally being trained to seek influence, and then failing catastrophically once they are sufficiently capable, causing humans to become extinct or otherwise permanently lose all influence over the future. (Note from Vael: I think we might have to pair this with something like "and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive")
  • Part 1 of “What failure looks like
    • This involves AIs pursuing easy-to-measure goals, rather than the goals humans actually care about, causing us to permanently lose some influence over the future. (Note from Vael: I think we might have to pair this with something like "and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive")
  • War
    • Some kind of war between humans, exacerbated by developments in AI, causes an existential catastrophe. AI is a significant risk factor in the catastrophe, such that no catastrophe would be occurred without the developments in AI. The proximate cause of the catastrophe is the deliberate actions of humans, such as the use of AI-enabled, nuclear or other weapons. See Dafoe (2018) for more detail. (Note from Vael: Though there's a recent argument that it may be unlikely for nuclear weapons to cause an extinction event, and instead it would just be catastrophically bad. One could still do it with synthetic biology though, probably, to get all of the remote people.)
  • Misuse
    • Intentional misuse of AI by one or more actors causes an existential catastrophe (excluding cases where the catastrophe was caused by misuse in a war that would not have occurred without developments in AI). See Karnofsky (2016) for more detail.
  • Other
     

 

Governance, aimed at highly capable systems in addition to today's systems

It seemed like a lot of your thoughts about AI risk went through governance, so wanted to mention what the space looks like (spoiler: it's preparadigmatic) if you haven't seen that yet!

 

AI Safety in China

AI Safety community building, student-focused (see academic efforts above)

 

If they're curious about other existential / global catastrophic risks:

Large-scale risks from synthetic biology 

Large-scale risks from nuclear

Why I don't think we're on the right timescale to worry most about climate change:


List for "Preventing Human Extinction" class

I've also included a list of resources that I had students read through for the course Stanford first-year course "Preventing Human Extinction". 

When might advanced AI be developed?

Why might advanced AI be a risk?

Thinking about making advanced AI go well (technical)

Thinking about making advanced AI go well (governance)

Optional (large-scale risks from AI)

Natural science sources

 

  1. ^

    See https://www.lesswrong.com/posts/gpk8dARHBi7Mkmzt9/what-ai-safety-materials-do-ml-researchers-find-compelling

  2. ^

    I swear I didn't set out to self-promote here-- it's just doing weirdly well on user testing for both EAs and ML researchers at the moment (this is partly because it's relatively current; I expect it'll do less well over time)

    Note: I've written a new version of this talk that goes over the AI risk arguments through March 2023, and there's a new website talking about my interview findings (ai-risk-discussions.org).

  3. ^

    Hi X, 

    [warm introduction]

    In the interests of increasing options, I wanted to reach out and say that I'd be particularly happy to help you explore synthetic biology pathways more, if you were so inclined. I think it's pretty plausible we'll get another worse pandemic in our lifetimes, and worth investing a career or part of a career to work on it. Especially since so few people will make that choice, so a single person probably matters a lot compared to entering other more popular careers. 

    No worries if you're not interested though-- this is just one option out of many. I'm emailing you in a batch instead of individually so that hopefully you feel empowered to ignore this email and be done with this class :P. Regardless, thanks for a great quarter and hope you have great summers!

    If you are interested:

69

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 2:28 PM

I notice that Eliezer and MIRI are missing. Why is this? Low prestige amongst the academic community? Harsh writing style?

I don't mean to open a can of worms or anything. It just seems worth engaging with reality and not shying away from it.

A great point, thanks! I've just edited the "There's also a growing community working on AI alignment" section to include MIRI, and also edited some of the academics' names and links.

I don't think it makes sense for me to list Eliezer's name in the part of that section where I'm listing names, since I'm only listing some subset of academics who (vaguely gesturing at a cluster) are sort of actively publishing in academia, mostly tenure track and actively recruiting students, and interested in academic field-building. I'm not currently listing names of researchers in industry or non-profits (e.g. I don't list Paul Christiano, or Chris Olah), though that might be a thing to do. 

Note that I didn't choose this list of names very carefully, so I'm happy to take suggestions! This doc came about because I had an email draft that I was haphazardly adding things to as I talked to researchers and needed to promptly send them resources, getting gradually refined when I spotted issues. I thus consider it a work-in-progress and appreciate suggestions. 

With respect to the fact that I don't immediately point people at LessWrong or the Alignment Forum (I actually only very rarely include the "Rationalist" section in the email-- not unless I've decided to bring it up in person, and they've reacted positively), there's different philosophies on AI alignment field-building. One of the active disagreements right now is how much we want new people coming into AI alignment to be the type of person who enjoy LessWrong, or whether it's good to be targeting a broader audience. 

I'm personally currently of the opinion that we should be targeting a broader audience, where there's a place for people who want to work in academia or industry separate from the main Rationalist sphere, and the people who are drawn towards the Rationalists will find their way there either on their own (I find people tend to do this pretty easily when they start Googling), or with my nudging if they seem to be that kind of person. 

I don't think this is much "shying away from reality" -- it feels more like engaging with it, trying to figure out if and how we want AI alignment research to grow, and how to best make that happen given the different types of people with different motivations involved.

I'm personally currently of the opinion that we should be targeting a broader audience

Is the implication that, in order to target a broader audience, you think it would be wise to avoid mentions of LessWrong? Is that because you fear such mentions would turn them off?

If so, that seems like an important thing to take note of. Such a perception seems like a bad thing that we should try to fix. On the other hand, it is also possible that it is a net positive because it keeps the community from being "diluted".

I don't think this is much "shying away from reality"

I didn't mean to imply that you personally were. What I meant when I used that phrase is that this feels like a touchy subject that I myself wanted to flinch away from, but I don't actually think I should flinch away from.

There's a mention of the rationalist community.

True, but despite that fact, it still feels like Eliezer and MIRI are purposefully left out.

How it feels depends on how prominence you them to have.

Don't sleep on this stuff Vael Gates keeps putting out. They're doing the lord's work.

Love this! Added it to our list of AI safety curricula, reading lists, and courses.

Thanks for sharing this.  

Thanks for doing that Kat!

Amazing! Would you be happy for some of the content here to be used as a basis for Stampy answers?

Sure! This isn't novel content; the vast majority of it is drawn from existing lists, so it's not even particularly mine. I think just make sure the things within are referenced correctly, and you should be good to go!