LESSWRONG
LW

Collections and ResourcesAI Alignment FieldbuildingAI RiskAI Safety Public MaterialsAI
Frontpage

21

[ Question ]

Best introductory overviews of AGI safety?

by JakubK
13th Dec 2022
2 min read
A
3
9

21

This is a linkpost for https://forum.effectivealtruism.org/posts/aa6wwy3zmLxn7wLNb/best-introductory-overviews-of-agi-safety
Collections and ResourcesAI Alignment FieldbuildingAI RiskAI Safety Public MaterialsAI
Frontpage

21

Best introductory overviews of AGI safety?
6Thomas Larsen
2JakubK
2Michael Tontchev
2JakubK
2Tor Økland Barstad
2JakubK
2Ebenezer Dukakis
1JakubK
2the gears to ascension
New Answer
New Comment

3 Answers sorted by
top scoring

Thomas Larsen

Dec 13, 2022

60

My favorite for AI researchers is Ajeya's Without specific countermeasures, because I think it does a really good job being concrete about a training set up leading to deceptive alignment. It also is sufficiently non-technical that a motivated person not familiar with AI could understand the key points. 

Add Comment
[-]JakubK3y20

Forgot to include this. It's sort of a more opinionated and ML-focused version of Carlsmith's report and has a corresponding video/talk (as does Carlsmith).

Reply

Michael Tontchev

Jun 07, 2023

20

Want to add this one:

https://www.lesswrong.com/posts/B8Djo44WtZK6kK4K5/outreach-success-intro-to-ai-risk-that-has-been-successful

This is the note I wrote internally at Meta - it's had over 300 reactions, as well as people reaching out to me saying it has convinced them to switch to working on alignment.

Add Comment
[-]JakubK2y20

Thanks for writing and sharing this. I've added it to the doc.

Reply

Tor Økland Barstad

Dec 13, 2022

20

Good initiative.

Regarding introductions to a popular audience, I feel like Tim Urban wrote an intro that also is worth mentioning: Part 1 - Part 2 - Reply from Luke Muehlhauser

Another one is A Response to Steven Pinker on AI (Rob Miles)

Btw, I sometimes recommend Superintelligence by Nick Bostrom (but that's an entire book)

Will be interesting to see what kinds of introductions that are available one or a few years from now. Some people have created good introductions, but I do feel as if there is room for improvement.

Btw, I think Rob Miles is working on a collaborative FAQ: https://stampy.ai/wiki/Main_Page (which he talks about here)

Add Comment
[-]JakubK3y20

Yeah Tim Urban's is perhaps the most enjoyable / fun read. But I worry that skeptics won't take it seriously.

Reply
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 5:34 AM
[-]Ebenezer Dukakis2y20

Just saw this https://www.lesswrong.com/posts/5rsa37pBjo4Cf9fkE/a-newcomer-s-guide-to-the-technical-ai-safety-field

Reply
[-]JakubK2y10

Thanks, I added it to the doc.

Reply
[-]the gears to ascension3y20

I appreciate already having a big list of candidates, so I can't comment with one!

Reply
Moderation Log
Curated and popular this week
A
3
3

I'm interested what people think are the best overviews of AI risk for various types of people. Below I've listed as many good overviews as I could find (excluding some drafts), splitting based on "good for a popular audience" and "good for AI researchers." I'd also like to hear if people think some of these intros are better than others (prioritizing between intros). I'd be interested to hear about podcasts and videos as well.

I am maintaining a list at this Google doc to incorporate people's suggestions.

Popular audience: 

  • Vox (Kelsey Piper)
  • AI alignment (Wikipedia -- perhaps the most important to get right!)
  • Why alignment could be hard with modern DL (Ajeya Cotra)
  • Stampy wiki and aisafety.info (Stampy team)
  • The most important century blog post series summary and AI could defeat all of us combined and Why would AI "aim" to defeat humanity (Holden Karnofsky)
  • Current work in AI alignment (Paul Christiano)
  • Future of Life Institute (Ariel Conn)
  • Why worry about future AI? (Gavin Leech)
  • 80k full profile (Benjamin Hilton)
  • AGI Ruin: A list of lethalities (Eliezer Yudkowsky)
  • Extinction Risk from Artificial Intelligence (Michael Cohen)
  • Set Sail For Fail? On AI risk (Nintil)
  • A shift in arguments for AI risk (Tom Adamczewski)
  • Is power-seeking AI an x-risk (Joseph Carlsmith)
  • More is different for AI (Jacob Steinhardt)
  • The Basic AI Drives (Steve Omohundro)
  • AI Risk Intro 1: Advanced AI Might Be Very Bad (TheMcDouglas and LRudL)
  • Four Background Claims (Nate Soares)

AI researchers:

  • CHAI bibliography
  • Unsolved Problems in ML Safety (Dan Hendrycks)
  • AI safety from first principles (Richard Ngo)
  • The alignment problem from a DL perspective (Richard Ngo)
  • X-risk analysis for AI research (Dan Hendrycks and Mantas Mazeika)
  • Why I think more NLP researchers should engage with AIS concerns (Sam Bowman)
  • More thoughts on outreach to researchers: Marius Hobbhahn, AISFB and Resources I sent to AI researchers about AIS (Vael Gates)

Alignment landscape:

  • My Overview of the AI Alignment Landscape: A Bird's Eye View (Neel Nanda)
  • My understanding of what everyone in technical alignment is doing and why (Thomas Larsen and Eli Lifland) and Alignment org cheat sheet (Thomas Larsen and Akash Wasil)

Podcasts and videos:

  • Intro to AI Safety, Remastered (Rob Miles)
  • Researcher Perceptions of Current and Future AI (Vael Gates)
  • AI Fire Alarm (Connor Leahy)
  • Eliezer Yudkowksy interview with Sam Harris -- full audio is still only available to Sam Harris subscribers
  • Richard Ngo and Paul Christiano on AXRP
  • Ensuring smarter-than-human intelligence has a positive outcome (Nate Soares)
  • AI Alignment: Why It's Hard, and Where to Start (Eliezer Yudkowsky)
  • Brian Christian and Ben Garfinkel on the 80,000 Hours Podcast
  • Ajeya Cotra and Rohin Shah on the Future of Life Institute Podcast
  • Some audio recordings of the readings above (e.g. Cold Takes audio, reading of 80k intro, EA Forum posts, EA Radio, Astral Codex Ten Podcast, Less Wrong Curated Podcast, Nonlinear Library)
  • "Recommended EA talks/videos for university groups" includes some generally good channels to look for more videos from in the future