LESSWRONG
LW

AI TimelinesAlignment Research Center (ARC)Deceptive AlignmentResponsible Scaling PoliciesRLHFWorld Modeling
Frontpage

19

Paul Christiano on Dwarkesh Podcast

by ESRogs
3rd Nov 2023
1 min read
0

19

This is a linkpost for https://www.dwarkeshpatel.com/p/paul-christiano
AI TimelinesAlignment Research Center (ARC)Deceptive AlignmentResponsible Scaling PoliciesRLHFWorld Modeling
Frontpage

19

New Comment
Moderation Log
More from ESRogs
View more
Curated and popular this week
0Comments

Dwarkesh's summary:

Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out!

We discuss:

  • Does he regret inventing RLHF, and is alignment necessarily dual-use?
  • Why he has relatively modest timelines (40% by 2040, 15% by 2030),
  • What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
  • Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,
  • His current research into a new proof system, and how this could solve alignment by explaining model's behavior
  • and much more.