I'd like to distill AI Safety posts and papers, and I'd like to see more distillations generally. Ideally, posts and papers would meet the following criteria:

  • Potentially high-impact for more people to understand
  • Uses a lot of jargon or is generally complex and difficult to understand
  • Not as well-known as you think they should be (in the AI X-risk space)

What posts meet these criteria?

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Thomas Kwa

Aug 25, 2023

167

Academic papers seem more valuable, as posts are often already distilled (except for things like Paul Christiano blog posts) and the x-risk space is something of an info bubble. There is a list of safety-relevant papers from ICML here, but I don't totally agree with it; two papers I think it missed are

  • HarsanyiNet, an architecture for small neural nets that basically restricts features such that you can easily calculate Shapley value contributions of inputs
  • This other paper on importance functions, which got an oral presentation.

If you want to get a sense of how to do this, first get fast at understanding papers yourself, then read Rohin Shah's old Alignment Newsletters and the technical portions of Dan Hendrycks's AI Safety Newsletters.

To get higher value technical distillations than this, you basically have to talk to people in person and add detailed critiques, which is what Lawrence did with distillations of shard theory and natural abstractions.

Edit: Also most papers are low quality or irrelevant; my (relatively uninformed) guess is that 92% of big 3 conference papers have little relevance to alignment, and of the remainder, 2/3 of posters and 1/3 of orals are too low quality to be worth distilling. So you need to have good taste.

trevor

Aug 25, 2023

80

Raemon's new rationality paradigm (might be better to wait until the launch test is finished). The CFAR handbook is also pretty distillable.

The Superintelligence FAQ (allegedly one of the best ways to introduce someone to AI safety)

OpenAI's paper on the use of AI for manipulation (important for AI macrostrategy)

Cyborgism

Please don't throw your mind away (massive distillation potential, but trickier than it looks)

The Yudkowsky Christiano debate (I tried showing this to my 55-yo dad and he couldn't parse it and bounced off because he knows software but not econ, the AI chapter from the precipice later got him to take AI safety seriously)

Stuff on AI timelines is generally pretty good, the authors have a tangible fear of getting getting lampooned by the general public/journalists/trolls for making the tiniest mistake, so they make the papers long and hard to read; if you distill them, that diffuses responsibility.