Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Goodhart’s Law states that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." However, this is not a single phenomenon. I propose that there are (at least) four different mechanisms through which proxy measures break when you optimize for them.

The four types are Regressional, Causal, Extremal, and Adversarial. In this post, I will go into detail about these four different Goodhart effects using mathematical abstractions as well as examples involving humans and/or AI. I will also talk about how you can mitigate each effect.

Throughout the post, I will use V to refer to the true goal and use U to refer to a proxy for that goal which was observed to correlate with V and which is being optimized in some way.

Quick Reference

Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.

Model: When U is equal to V+X, where X is some noise, a point with a large U value will likely have a large V value, but also a large X value. Thus, when U is large, you can expect V to be predictably smaller than U.

Example: height is correlated with basketball ability, and does actually directly help, but the best player is only 6'3", and a random 7' person in their 20s would probably not be as good

Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.

Model: If V causes U (or if V and U are both caused by some third thing), then a correlation between V and U may be observed. However, when you intervene to increase U through some mechanism that does not involve V, you will fail to also increase V.

Example: someone who wishes to be taller might observe that height is correlated with basketball skill and decide to start practicing basketball.

Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.

Model: Patterns tend to break at simple joints. One simple subset of worlds is those worlds in which U is very large. Thus, a strong correlation between U and V observed for naturally occuring U values may not transfer to worlds in which U is very large. Further, since there may be relatively few naturally occuring worlds in which U is very large, extremely large U may coincide with small V values without breaking the statistical correlation.

Example: the tallest person on record,Robert Wadlow, was 8'11" (2.72m). He grew to that height because of a pituitary disorder, he would have struggled to play basketball because he "required leg braces to walk and had little feeling in his legs and feet."

Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

Model: Consider an agent A with some different goal W. Since they depend on common resources, W and V are naturally opposed. If you optimize U as a proxy for V, and A knows this, A is incentivized to make large U values coincide with large W values, thus stopping them from coinciding with large V values.

Example: aspiring NBA players might just lie about their height.

Regressional Goodhart

When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.

Abstract Model

When U is equal to V+X, where X is some noise, a point with a large U value will likely have a large V value, but also a large X value. Thus, when U is large, you can expect V to be predictably smaller than U.

The above description is when U is meant to be an estimate of V. A similar effect can be seen when U is only meant to be correlated with V by looking at percentiles. When a sample is chosen which is a typical member of the top p percent of all U values, it will have a lower V value than a typical member of the top p percent of all V values. As a special case, when you select the highest

Goodhart’s Lawstates that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." However, this is not a single phenomenon. I propose that there are (at least) four different mechanisms through which proxy measures break when you optimize for them.The four types are Regressional, Causal, Extremal, and Adversarial. In this post, I will go into detail about these four different Goodhart effects using mathematical abstractions as well as examples involving humans and/or AI. I will also talk about how you can mitigate each effect.

Throughout the post, I will use V to refer to the true goal and use U to refer to a proxy for that goal which was observed to correlate with V and which is being optimized in some way.

## Quick Reference

Model: When U is equal to V+X, where X is some noise, a point with a large U value will likely have a large V value, but also a large X value. Thus, when U is large, you can expect V to be predictably smaller than U.Example: height is correlated with basketball ability, and does actually directly help, but the best player is only 6'3", and a random 7' person in their 20s would probably not be as goodModel: If V causes U (or if V and U are both caused by some third thing), then a correlation between V and U may be observed. However, when you intervene to increase U through some mechanism that does not involve V, you will fail to also increase V.Example: someone who wishes to be taller might observe that height is correlated with basketball skill and decide to start practicing basketball.Model: Patterns tend to break at simple joints. One simple subset of worlds is those worlds in which U is very large. Thus, a strong correlation between U and V observed for naturally occuring U values may not transfer to worlds in which U is very large. Further, since there may be relatively few naturally occuring worlds in which U is very large, extremely large U may coincide with small V values without breaking the statistical correlation.Example: the tallest person on record,Robert Wadlow, was 8'11" (2.72m). He grew to that height because of a pituitary disorder, he would have struggled to play basketball because he "required leg braces to walk and had little feeling in his legs and feet."Model: Consider an agent A with some different goal W. Since they depend on common resources, W and V are naturally opposed. If you optimize U as a proxy for V, and A knows this, A is incentivized to make large U values coincide with large W values, thus stopping them from coinciding with large V values.Example: aspiring NBA players might just lie about their height.## Regressional Goodhart

When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.

## Abstract Model

When U is equal to V+X, where X is some noise, a point with a large U value will likely have a large V value, but also a large X value. Thus, when U is large, you can expect V to be predictably smaller than U.

The above description is when U is meant to be an estimate of V. A similar effect can be seen when U is only meant to be correlated with V by looking at percentiles. When a sample is chosen which is a typical member of the top p percent of all U values, it will have a lower V value than a typical member of the top p percent of all V values. As a special case, when you select the highest