LESSWRONG
LW

AXRP - the AI X-risk Research Podcast

Sep 26, 2023 by DanielFilan

Transcripts of AXRP episodes.

12AXRP Episode 1 - Adversarial Policies with Adam Gleave
Ω
DanielFilan
5y
Ω
5
13AXRP Episode 2 - Learning Human Biases with Rohin Shah
Ω
DanielFilan
5y
Ω
0
27AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch
Ω
DanielFilan
5y
Ω
0
43AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger
Ω
DanielFilan
4y
Ω
10
35AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy
Ω
DanielFilan
4y
Ω
12
26AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes
Ω
DanielFilan
4y
Ω
3
34AXRP Episode 7 - Side Effects with Victoria Krakovna
Ω
DanielFilan
4y
Ω
6
22AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell
Ω
DanielFilan
4y
Ω
1
59AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant
Ω
DanielFilan
4y
Ω
2
34AXRP Episode 10 - AI’s Future and Impacts with Katja Grace
Ω
DanielFilan
4y
Ω
2
19AXRP Episode 11 - Attainable Utility and Power with Alex Turner
Ω
DanielFilan
4y
Ω
5
38AXRP Episode 12 - AI Existential Risk with Paul Christiano
Ω
DanielFilan
4y
Ω
0
25AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo
Ω
DanielFilan
3y
Ω
1
25AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy
Ω
DanielFilan
3y
Ω
10
34AXRP Episode 15 - Natural Abstractions with John Wentworth
Ω
DanielFilan
3y
Ω
1
20AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving
Ω
DanielFilan
3y
Ω
0
16AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
Ω
DanielFilan
3y
Ω
0
12AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong
Ω
DanielFilan
3y
Ω
1
45AXRP Episode 19 - Mechanistic Interpretability with Neel Nanda
Ω
DanielFilan
2y
Ω
0
22AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson
Ω
DanielFilan
2y
Ω
2
12AXRP Episode 21 - Interpretability for Engineers with Stephen Casper
Ω
DanielFilan
2y
Ω
1
52AXRP Episode 22 - Shard Theory with Quintin Pope
Ω
DanielFilan
2y
Ω
11
22AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu
Ω
DanielFilan
2y
Ω
0
55AXRP Episode 24 - Superalignment with Jan Leike
Ω
DanielFilan
2y
Ω
3
43AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
Ω
DanielFilan
2y
Ω
0
14AXRP Episode 26 - AI Governance with Elizabeth Seger
DanielFilan
2y
0
69AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
Ω
DanielFilan
1y
Ω
10
12AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil
DanielFilan
1y
0
20AXRP Episode 29 - Science of Deep Learning with Vikrant Varma
Ω
DanielFilan
1y
Ω
1
25AXRP Episode 30 - AI Security with Jeffrey Ladish
Ω
DanielFilan
1y
Ω
0
72AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
Ω
DanielFilan
1y
Ω
4
20AXRP Episode 32 - Understanding Agency with Jan Kulveit
Ω
DanielFilan
1y
Ω
0
34AXRP Episode 33 - RLHF Problems with Scott Emmons
Ω
DanielFilan
1y
Ω
0
23AXRP Episode 34 - AI Evaluations with Beth Barnes
Ω
DanielFilan
1y
Ω
0
21AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
Ω
DanielFilan
10mo
Ω
0
25AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
Ω
DanielFilan
9mo
Ω
0
21AXRP Episode 37 - Jaime Sevilla on Forecasting AI
Ω
DanielFilan
9mo
Ω
3
14AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
Ω
DanielFilan
8mo
Ω
0
12AXRP Episode 38.1 - Alan Chan on Agent Infrastructure
DanielFilan
8mo
0
34AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory
Ω
DanielFilan
7mo
Ω
0
41AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment
Ω
DanielFilan
7mo
Ω
0
20AXRP Episode 38.3 - Erik Jenner on Learned Look-Ahead
Ω
DanielFilan
7mo
Ω
0
11AXRP Episode 38.4 - Shakeel Hashim on AI Journalism
DanielFilan
6mo
0
9AXRP Episode 38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
Ω
DanielFilan
6mo
Ω
0
10AXRP Episode 38.6 - Joel Lehman on Positive Visions of AI
Ω
DanielFilan
5mo
Ω
0
10AXRP Episode 38.7 - Anthony Aguirre on the Future of Life Institute
DanielFilan
5mo
0
13AXRP Episode 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
DanielFilan
4mo
0
26AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability
Ω
DanielFilan
3mo
Ω
0
28AXRP Episode 41 - Lee Sharkey on Attribution-based Parameter Decomposition
Ω
DanielFilan
1mo
Ω
1
12AXRP Episode 42 - Owain Evans on LLM Psychology
Ω
DanielFilan
1mo
Ω
0
12AXRP Episode 43 - David Lindner on Myopic Optimization with Non-myopic Approval
Ω
DanielFilan
25d
Ω
0
12AXRP Episode 44 - Peter Salib on AI Rights for Human Safety
Ω
DanielFilan
12d
Ω
0
31AXRP Episode 45 - Samuel Albanie on DeepMind’s AGI Safety Approach
Ω
DanielFilan
3d
Ω
0