Quintin Pope

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

Introduction I recently watched Eliezer Yudkowsky's appearance on the Bankless podcast, where he argued that AI was nigh-certain to end humanity. Since the podcast, some commentators have offered pushback against the doom conclusion. However, one sentiment I saw was that optimists tended not to engage with the specific arguments pessimists like Yudkowsky offered. Economist Robin Hanson points out that this pattern is very common for small groups which hold counterintuitive beliefs: insiders develop their own internal language, which skeptical outsiders usually don't bother to learn. Outsiders then make objections that focus on broad arguments against the belief's plausibility, rather than objections that focus on specific insider arguments. As an AI "alignment insider" whose current estimate of doom is around 5%, I wrote this post to explain some of my many objections to Yudkowsky's specific arguments. I've split this post into chronologically ordered segments of the podcast in which Yudkowsky makes one or more claims with which I particularly disagree. I have my own view of alignment research: shard theory, which focuses on understanding how human values form, and on how we might guide a similar process of value formation in AI systems. I think that human value formation is not that complex, and does not rely on principles very different from those which underlie the current deep learning paradigm. Most of the arguments you're about to see from me are less: > I think I know of a fundamentally new paradigm that can fix the issues Yudkowsky is pointing at. and more: > Here's why I don't agree with Yudkowsky's arguments that alignment is impossible in the current paradigm. My objections Will current approaches scale to AGI? Yudkowsky apparently thinks not ...and that the techniques driving current state of the art advances, by which I think he means the mix of generative pretraining + small amounts of reinforcement learning such as with ChatGPT, aren't r

363Mar 21, 2023

Quintin Pope

Message

4935

665

397

Counting arguments provide no evidence for AI doom

Crossposted from the AI Optimists blog. AI doom scenarios often suppose that future AIs will engage in scheming— planning to escape, gain power, and pursue ulterior motives, while deceiving us into thinking they are aligned with our interests. The worry is that if a schemer escapes, it may seek world...

Feb 27, 2024107

QAPR 5: grokking is maybe not that big a deal?

[Thanks to support from Cavendish Labs and a Lightspeed grant, I've been able to restart the Quintin's Alignment Papers Roundup sequence.] Introduction Grokking refers to an observation by Power et al. (below) that models trained on simple modular arithmetic tasks would first overfit to their training data and achieve nearly...

Jul 23, 2023114

Research agenda: Supervising AIs improving AIs

[This post summarizes some of the work done by Owen Dudney, Roman Engeler and myself (Quintin Pope) as part of the SERI MATS shard theory stream.] TL;DR Future prosaic AIs will likely shape their own development or that of successor AIs. We're trying to make sure they don't go insane....

Apr 29, 202376

Evolution provides no evidence for the sharp left turn

Does human evolution imply a sharp left turn from AIs? Arguments for the sharp left turn in AI capabilities often appeal to an “evolution -> human capabilities” analogy and say that evolution's outer optimization process built a much faster human inner optimization process whose capability gains vastly outstripped those which...

Apr 11, 2023214

Quintin Pope's Shortform

Mar 26, 20237

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

Mar 21, 2023363

A Short Dialogue on the Meaning of Reward Functions

Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. The following is a short slack dialogue between Leon Lang, Quintin Pope, and Peli Grietzer that emerged as part of the SERI-MATS stream on shard theory. Alex Turner encouraged us to share it. To follow...

Nov 19, 202245

Load More (7/28)

LESSWRONG
LW

LESSWRONG
LW

Quintin Pope

Quintin Pope

Quintin Pope

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

The shard theory of human values

Evolution provides no evidence for the sharp left turn

Humans provide an untapped wealth of evidence about alignment

Quintin Pope

Counting arguments provide no evidence for AI doom

QAPR 5: grokking is maybe not that big a deal?

Research agenda: Supervising AIs improving AIs

Evolution provides no evidence for the sharp left turn

Quintin Pope's Shortform

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

A Short Dialogue on the Meaning of Reward Functions

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

The shard theory of human values

Evolution provides no evidence for the sharp left turn

Humans provide an untapped wealth of evidence about alignment

Counting arguments provide no evidence for AI doom

QAPR 5: grokking is maybe not that big a deal?

Research agenda: Supervising AIs improving AIs

Evolution provides no evidence for the sharp left turn

Quintin Pope's Shortform

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

A Short Dialogue on the Meaning of Reward Functions

Quintin Pope

Quintin Pope

Quintin Pope

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

The shard theory of human values

Evolution provides no evidence for the sharp left turn

Humans provide an untapped wealth of evidence about alignment

Quintin Pope

Counting arguments provide no evidence for AI doom

QAPR 5: grokking is maybe not *that* big a deal?

Research agenda: Supervising AIs improving AIs

Evolution provides no evidence for the sharp left turn

Quintin Pope's Shortform

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

A Short Dialogue on the Meaning of Reward Functions

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

The shard theory of human values

Evolution provides no evidence for the sharp left turn

Humans provide an untapped wealth of evidence about alignment

Counting arguments provide no evidence for AI doom

QAPR 5: grokking is maybe not *that* big a deal?

Research agenda: Supervising AIs improving AIs

Evolution provides no evidence for the sharp left turn

Quintin Pope's Shortform

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

A Short Dialogue on the Meaning of Reward Functions

QAPR 5: grokking is maybe not that big a deal?

QAPR 5: grokking is maybe not that big a deal?