LESSWRONG
LW

5169
Wikitags

Goodhart's Law

Edited by Ruby, Vladimir_Nesov, et al. last updated 19th Mar 2023

Goodhart's Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart's Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for "the stuff that humans care about", it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart's law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

  • Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.
  • Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.
  • Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.
  • Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

See Also

  • Groupthink, Information cascade, Affective death spiral
  • Adaptation executers, Superstimulus
  • Signaling, Filtered evidence
  • Cached thought
  • Modesty argument, Egalitarianism
  • Rationalization, Dark arts
  • Epistemic hygiene
  • Scoring rule
Subscribe
Discussion
3
Subscribe
Discussion
3
Posts tagged Goodhart's Law
621Humans are not automatically strategic
AnnaSalamon
15y
278
512How much do you believe your results?
Eric Neyman
2y
18
312Why Agent Foundations? An Overly Abstract Explanation
Ω
johnswentworth
4y
Ω
60
239Noticing the Taste of Lotus
Valentine
7y
81
226Replacing Karma with Good Heart Tokens (Worth $1!)
Ben Pace, habryka
4y
173
218Goodhart Taxonomy
Ω
Scott Garrabrant
8y
Ω
34
210Is Clickbait Destroying Our General Intelligence?
Eliezer Yudkowsky
7y
68
210Embedded Agency (full-text version)
Ω
Scott Garrabrant, abramdemski
7y
Ω
17
180When is Goodhart catastrophic?
Ω
Drake Thomas, Thomas Kwa
2y
Ω
30
126Goodhart's Law in Reinforcement Learning
Ω
jacek, Joar Skalse, OliverHayman, charlie_griffin, Xingjian Bai
2y
Ω
22
119Optimization Amplifies
Ω
Scott Garrabrant
7y
Ω
12
119Soft optimization makes the value target bigger
Ω
Jeremy Gillen
3y
Ω
20
117The Importance of Goodhart's Law
blogospheroid
16y
123
116Robust Delegation
Ω
abramdemski, Scott Garrabrant
7y
Ω
10
103Scaling Laws for Reward Model Overoptimization
Ω
leogao, John Schulman, Jacob_Hilton
3y
Ω
13
Load More (15/123)
Add Posts