16

24th Nov 2020

1 min read

16 Ω 8

What are some substantial critiques of the agent foundations research agenda?

Where by agent foundations I am referrring the area of research referred to by Critch in this post, which I understand as developing concepts and theoretical solutions for idealized problems related to AI safety such as logical induction.

Agent FoundationsAI

Frontpage

16 Ω 8

Critiques of the Agent Foundations agenda?

12Rob Bensinger

6technicalities

0technicalities

New Answer

New Comment

3 Answers sorted by
top scoring

Rob Bensinger

Dec 04, 2020

120

Some that come to mind (note: I work at MIRI):

2016: Open Philanthropy Project, Anonymized Reviews of Three Recent Papers from MIRI’s Agent Foundations Research Agenda (separate reply from Nate Soares, and comments by Eliezer Yudkowsky)
2017: Daniel Dewey, My current thoughts on MIRI's "highly reliable agent design" work (replies from Nate Soares in the comments)
2018: Richard Ngo, Realism about rationality
2018: Wolfgang Schwarz, On Functional Decision Theory
2019: Will MacAskill, A Critique of Functional Decision Theory (replies from Abram Demski in the comments)

I'd also include arguments of the form 'we don't need to solve agent foundations problems, because we can achieve good outcomes from AI via alternative method X and it's easier to just do X'. E.g., (from 2015) Paul Christiano's Abstract Approval-Direction.

Also, some overviews that aren't trying to argue against agent foundations may still provide useful maps of where people disagree (though I don't think e.g. Nate would 100% endorse any of these), like:

2016: Jessica Taylor, My current take on the Paul-MIRI disagreement on alignability of messy AI
2017: Jessica Taylor, On motivations for MIRI's highly reliable agent design research
2020: Issa Rice, Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate

technicalities

Dec 04, 2020

Nostalgebraist (2019) sees it as equivalent to solving large parts of philosophy: a noble but quixotic quest. (He also argues against short timelines but that's tangential here.)

Here is what this ends up looking like: a quest to solve, once and for all, some of the most basic problems of existing and acting among others who are doing the same. Problems like “can anyone ever fully trust anyone else, or their future self, for that matter?” In the case where the “agents” are humans or human groups, problems of this sort have been wrestled with for a long time using terms like “coordination problems” and “Goodhart’s Law”; they constitute much of the subject matter of political philosophy, economics, and game theory, among other fields.

The quest for “AI Alignment” covers all this material and much more. It cannot invoke specifics of human nature (or non-human nature, for that matter); it aims to solve not just the tragedies of human coexistence, but the universal tragedies of coexistence which, as a sad fact of pure reason, would befall anything that thinks or acts in anything that looks like a world.

It sounds misleadingly provincial to call such a quest “AI Alignment.” The quest exists because (roughly) a superhuman being is the hardest thing we can imagine “aligning,” and thus we can only imagine doing so by solving “Alignment” as a whole, once and forever, for all creatures in all logically possible worlds. (I am exaggerating a little in places here, but there is something true in this picture that I have not seen adequately talked about, and I want to paint a clear picture of it.)

There is no doubt something beautiful – and much raw intellectual appeal – in the quest for Alignment. It includes, of necessity, some of the most mind-bending facets of both mathematics and philosophy, and what is more, it has an emotional poignancy and human resonance rarely so close to the surface in those rarefied subjects. I certainly have no quarrel with the choice to devote some resources, the life’s work of some people, to this grand Problem of Problems. One imagines an Alignment monastery, carrying on the work for centuries. I am not sure I would expect them to ever succeed, much less to succeed in some specified timeframe, but in some way it would make me glad, even proud, to know they were there.

I do not feel any pressure to solve Alignment, the great Problem of Problems – that highest peak whose very lowest reaches Hobbes and Nash and Kolomogorov and Gödel and all the rest barely began to climb in all their labors...

#scott wants an aligned AI to save us from moloch; i think i'm saying that alignment would already be a solution to moloch

technicalities

Dec 04, 2020

Stretching the definition of 'substantial' further:

Beth Zero was an ML researcher and Sneerclubber with some things to say. Her blog is down unfortunately but here's her collection of critical people. Here's a flavour of her thoughtful Bulverism. Her post on the uselessness of Solomonoff induction and the dishonesty of pushing it as an answer outside of philosophy was pretty good.

Sadly most of it is against foom, against short timelines, against longtermism, rather than anything specific about the Garrabrant or Demski or Kosoy programmes.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

16

[ Question ]

Critiques of the Agent Foundations agenda?

16

Ω 8

16

Ω 8

3 Answers sorted by
top scoring

Dec 04, 2020

Dec 04, 2020

Dec 04, 2020

16

[ Question ]

Critiques of the Agent Foundations agenda?

16

Ω 8

16

Ω 8

3 Answers sorted by top scoring

Dec 04, 2020

Dec 04, 2020

Dec 04, 2020

3 Answers sorted by
top scoring