Review

This is a more technical followup to the last post, putting precise bounds on when regressional Goodhart leads to failure or not. We'll first show conditions under which optimization for a proxy fails, and then some conditions under which it succeeds. (The second proof will be substantially easier.)

Related work

In addition to the related work sections of the previous post, this post makes reference to the textbook An Introduction to Heavy-Tailed and Subexponential Distributions, by Foss et al. Many similar results about random variables are present in the textbook, though we haven't seen this posts's results elsewhere in the literature before. We mostly adopt their notation here, and cite a few helpful lemmas.

Main result: Conditions for catastrophic Goodhart

Suppose that  and  are independent real-valued random variables. We're going to show, roughly, that if

  •  is subexponential (a slightly stronger property than being heavy-tailed).
  •  has lighter tails than  by more than a linear factor, meaning that the ratio of the tails of  and the tails of  grows​​ superlinearly.[1]

then .

Less formally, we're saying something like "if it requires relatively little selection pressure on  to get more of  and asymptotically more selection pressure on  to get more of , then applying very strong optimization towards  will not get you even a little bit of optimization towards  - all the optimization power will go towards ."

Proof sketch and intuitions

 The conditional expectation  is given by ,[2] and we divide the integral in the numerator into 4 regions, showing that each region's effect on the conditional expectation of V is similar to that of the corresponding region in the unconditional expectation .

The regions are defined in terms of a slow-growing function  such that the fiddly bounds on different pieces of the proof work out. Roughly, we want it to go to infinity so that  is likely to be less than  in the limit, but grow slowly enough that the shape of 's distribution within the interval  doesn't change much after conditioning.

In the table below, we abbreviate the condition  as .

RegionWhy its effect on  is smallExplanation
  is too lowIn this region,  and , both of which are unlikely.
The tail distribution of X is too flat to change the shape of 's distribution within this region.
 is low, and .There are increasing returns to each bit of optimization for X, so it's unlikely that both X and V have moderate values.[3]
 is too lowX is heavier-tailed than V, so the condition that  is much less likely than  in .

Here's a diagram showing the region boundaries at , and  in an example where  and , along with a negative log plot of the relevant distribution:

Note that up to a constant vertical shift of normalization, the green curve is the pointwise sum of the blue and orange curves.

Full proof

To be more precise, we're going to make the following definitions and assumptions:

  • Let  be the PDF of  at the value . We assume for convenience that  exists, is integrable, etc, though we suspect that this isn't necessary, and that one could work through a similar proof just referring to the tails of . We won't make this assumption for .
  • Let  and , similarly for  and 
  • Assume  has a finite mean:  converges absolutely.
  •  is subexponential.
    • Formally, this means that 
    • This happens roughly whenever  has tails that are heavier than  for any  and is reasonably well-behaved; counterexamples to the claim "long-tailed implies subexponential" exist, but they're nontrivial to exhibit.
    • Examples of subexponential distributions include log-normal distributions, anything that decays like a power law, the Pareto distribution, and distributions with tails asymptotic to  for any .
  • ​We require for  that its tail function is substantially lighter than X's, namely that for some .[1]
    • This implies that .

With all that out of the way, we can move on to the proof. 

The unnormalized PDF of  conditioned on  is given by . Its expectation is given by .

Meanwhile, the unconditional expectation of V is given by  .

We'd like to show that these two expectations are equal in the limit for large . To do this, we'll introduce . (More pedantically, this should really be , which we'll occasionally use where it's helpful to remember that this is a function of .)

For a given value of  is just a scaled version of , so the conditional expectation of  is given by . ​But because , the numerator and denominator of this fraction are (for small ) close to the unconditional expectation and , respectively.

We'll aim to show that for all  we have for sufficiently large  that  and , which implies (exercise) that the two expectations have limiting difference zero. But first we need some lemmas.

Lemmas

Lemma 1: There is  depending on  such that:

  • (a) 
  • (b) 
  • (c)