15

LESSWRONG
LW

14
Goodhart's LawAI
Frontpage

17

New Paper Expanding on the Goodhart Taxonomy

by Scott Garrabrant
14th Mar 2018
1 min read
4

17

This is a linkpost for https://arxiv.org/pdf/1803.04585.pdf

17

New Paper Expanding on the Goodhart Taxonomy
7Ben Pace
1Davidmanheim
2Davidmanheim
1adam_shimi
New Comment
4 comments, sorted by
top scoring
Click to highlight new comments since: Today at 7:12 PM
[-]Ben Pace8y70

Woo! Good ideas -> papers! I like this and that it happened. Nice going especially to David Manheim.

Reply
[-]Davidmanheim8y10

Thanks!

Reply
[-]Davidmanheim8y20

We'd love any feedback people have on the write-up.

Note: I'm in the middle of writing an extension of this work that gets much more into adversarial situations.

Reply
[-]adam_shimi7y10

First, thanks to both of you for writing this really nice paper. I have two questions, which are more about my understanding (or lack thereof) than about issues with the work.

My first question is some example I have in mind, and if my classification for it makes sense. In this example, the regulator is a grant-maker, and the agents are researchers in a field that is very theoretical. Our regulator wants to optimize for as much concrete applications as possible, regardless of what researchers are interested in (any similarity with the real world is obviously unintended). In order to reach this goal, the regulator will fund in priority grant-proposals promising applications. Yet the researchers could just write about applications while only pursuing the theory (still no intented similarity with the real world...).

It seems clear to me that this is an instance of adversarial goodhart, and more specifically of adversarial misalignment goodhart. Do you also think it is the case? If not, why?

My second question is more of a request: what "concrete" example could you give me of non-causal cobra effect goodhart? I have some trouble visualizing it.

Reply
Moderation Log
More from Scott Garrabrant
View more
Curated and popular this week
4Comments
Goodhart's LawAI
Frontpage