The best laid schemes of mice and men

Go often askew,

And leave us nothing but grief and pain,

For promised joy!

- Robert Burns (translated)

Consider the following question:

A team of decision analysts has just presented the results of a complex analysis to the executive responsible for making the decision. The analysts recommend making an innovative investment and claim that, although the investment is not without risks, it has a large positive expected net present value... While the analysis seems fair and unbiased, she can’t help but feel a bit skeptical. Is her skepticism justified?

^{1}

Or, suppose Holden Karnofsky of charity-evaluator GiveWell has been presented with a complex analysis of why an intervention that reduces existential risks from artificial intelligence has astronomical expected value and is therefore the type of intervention that should receive marginal philanthropic dollars. Holden feels skeptical about this 'explicit estimated expected value' approach; is his skepticism justified?

Suppose you're a business executive considering *n* alternatives whose 'true' expected values are* *μ_{1}, ..., μ_{n}. By 'true' expected value I mean the expected value you would calculate if you could devote unlimited time, money, and computational resources to making the expected value calculation.^{2} But you only have three months and $50,000 with which to produce the estimate, and this limited study produces estimated expected values for the alternatives V_{1}, ..., V_{n}.

Of course, you choose the alternative i* that has the highest estimated expected value V_{i*}. You implement the chosen alternative, and get the realized value x_{i*}.

Let's call the difference x_{i*} - V_{i*} the 'postdecision surprise'.^{3} A positive surprise means your option brought about more value than your analysis predicted; a negative surprise means you were disappointed.

Assume, too kindly, that your estimates are unbiased. And suppose you use this decision procedure many times, for many different decisions, and your estimates are unbiased. It seems reasonable to expect that *on average* you will receive the estimated expected value of each decision you make in this way. Sometimes you'll be positively surprised, sometimes negatively surprised, but on average you should get the estimated expected value for each decision.

Alas, this is not so; your outcome will usually be *worse* than what you predicted, even if your estimate was unbiased!

Why?

...consider a decision problem in which there are

kchoices, each of which has true estimated [expected value] of 0. Suppose that the error in each [expected value] estimate has zero mean and standard deviation of 1, shown as the bold curve [below]. Now, as we actually start to generate the estimates, some of the errors will be negative (pessimistic) and some will be positive (optimistic). Because we select the action with thehighest[expected value] estimate, we are obviously favoring overly optimistic estimates, and that is the source of the bias... The curve in [the figure below] fork= 3 has a mean around 0.85, so the average disappointment will be about 85% of the standard deviation in [expected value] estimates. With more choices, extremely optimistic estimates are more likely to arise: fork= 30, the disappointment will be around twice the standard deviation in the estimates.^{4}

This is "the optimizer's curse." See Smith & Winkler (2006) for the proof.

#### The Solution

The solution to the optimizer's curse is rather straightforward.

...[we] model the uncertainty in the value estimates explicitly and use Bayesian methods to interpret these value estimates. Specifically, we assign a prior distribution on the vector of true values μ = (μ

_{1}, ..., μ_{n}) and describe the accuracy of the value estimates V = (V_{1}, ..., V_{n}) by a conditional distribution V|μ. Then, rather than ranking alternatives. based on the value estimates, after we have done the decision analysis and observed the value estimates V, we use Bayes’ rule to determine the posterior distribution for μ|V and rank and choose among alternatives based on the posterior means...The key to overcoming the optimizer’s curse is conceptually very simple: treat the results of the analysis as uncertain and combine these results with prior estimates of value using Bayes’ rule before choosing an alternative. This process formally recognizes the uncertainty in value estimates and corrects for the bias that is built into the optimization process by adjusting high estimated values downward. To adjust values properly, we need to understand the degree of uncertainty in these estimates and in the true values..

^{5}

To return to our original question: *Yes*, some skepticism is justified when considering the option before you with the highest expected value. To minimize your prediction error, treat the results of your decision analysis as uncertain and use Bayes' Theorem to combine its results with an appropriate prior.

#### Notes

^{1} Smith & Winkler (2006).

^{2} Lindley et al. (1979) and Lindley (1986) talk about 'true' expected values in this way.

^{3} Following Harrison & March (1984).

^{4} Quote and (adapted) image from Russell & Norvig (2009), pp. 618-619.

^{5} Smith & Winkler (2006).

#### References

Harrison & March (1984). Decision making and postdecision surprises. *Administrative Science Quarterly, 29*: 26–42.

Lindley, Tversky, & Brown. 1979. On the reconciliation of probability assessments. *Journal of the Royal Statistical Society, Series A, 142*: 146–180.

Lindley (1986). The reconciliation of decision analyses. *Operations Research, 34*: 289–295.

Russell & Norvig (2009). *Artificial Intelligence: A Modern Approach, Third Edition*. Prentice Hall.

Smith & Winkler (2006). The optimizer's curse: Skepticism and postdecision surprise in decision analysis. *Management Science, 52*: 311-322.

But all you've done after "adjusting" the expected value estimates was producing a new batch of expected value estimates, which just shows that the original expected value estimates were not done very carefully (if there was an improvement), or that you face the same problem all over again...

Am I missing something?

In statistics the solution you describe is called Hierarchical or Multilevel Modeling. You assume that you data is drawn from a set of distributions which have their parameters drawn from another distribution. This automatically shrinks your estimates of the distributions towards the mean. I think it's a pretty useful trick to know and I think it would be good to do a writeup but I think you might need to have a decent grasp of bayesian statistics first.

The central point of the optimizer's curse not one I have seen before and is a very interesting point.

The solution however leaves me feeling slightly unhappy. It isn't obvious to me what prior one should use in this sort of context. I suspect that a rough estimate by simply using the rule of thumb that the more complicated a logical chain the more likely there is a problem in it might do similar work at a weaker level.

Have you tried to apply this sort of reasoning explicitly to various existential risk considerations? If so, what did you get?

Reminds me of the winner's curse in auctions - the selected bid is the one that is the highest and so most likely to be due to overconfidence/bias.

Am I missing something, or does the post just say that we shouldn't use frequentist "unbiased estimators" as if they were Bayesian posterior expected values?

Is there an example where applying this correction to the expected values changes the decision?

In any group there's going to be random noise, and if you choose an extreme value, chances are that value was inflated by noise. In Bayesian,

giventhat something has the highest value, it probably hadpositive noise, not just positive signal. So the correction is to correct out the expected positive noise you get from explicitly choosing the highest value. Naturally, this correction is greater for when the noise is bigger.So imagine choosing between black boxes. Each black box has some number of gold coins in it, and also two numbers written on it. The first number, A, on the box is like the estimated expected value, and the second number, B, is like the variance. What happened is that someone rolled two distinct dice with B sides, subtracted die 1 from die 2, and added that to the number of gold coins in the box.

So if you see a box with 40, 3 written on it, you know that it has an expected value of 40 gold coins, but might have as few as 37 or as many as 43.

Now comes the problem: I put 10 boxes in front of you, and tell you to choose the one with the most gold coins. The first box is 50, 1 - a very low-variance box. But the last 9 boxes are all high-uncertainty, all with ... (read more)

Would this issue also apply to picking a contractor for a project based on the lowest bid?

Sometimes contractors run out of money before finishing and you have to pay more or they leave you with a half-finished project :(

Lukeprog, if I've understood you correctly, then this is no good; this is a corner case. The question to be answered here is whether we should expect a "common sense" executive who favors plans with a high prior estimate to do better than a "technical" analyst who favors plans that perform well according to the formal estimation criteria. By assuming that all prior estimates are identical except for bias, this assumption e... (read more)

I'm not sure how exactly this differs from the GiveWell blog post along the same lines? You seem to both be dealing with roughly the same problem (decision making under uncertainty), and reach the same conclusion (pay attention to the standard deviation, use Bayesian updates)

I did find your graph in the middle a rather useful illustration, but otherwise don't feel like I've come away with anything really new...

Well, to start with, Luke has provided an actual mechanism for this mistake to occur by.

This is interesting, but I don't see how to apply the solution. Presumably I either have no priors; or the priors are going to be generated by the same process I use to generate the values I am combining them with.

The resulting bias should be smaller if you choose the top 2 or 3 alternatives. E.g., give to 3 charities, not to 1.

How do market traders deal with this problem?

If I understand this correctly, there's an empirical problem.

How optimistic your most optimistic estimate is going to be is going to be a matter of temperament and knowledge for individuals, and group culture for groups. It seems to me that the correction would need to be determined by experience. Or is this the "appropriate prior" problem?

When I'd only seen the title for this article, I thought it was going to be about the question of how much effort you should put into optimizing.

This is nit-picky, but I don't think you should attribute to Robert Burns anything other than the words he actually wrote. Meanings change a lot in translation, and it's not quite fair to do that through invisible sleight of hand. "Robert Burns (standard English translation)" would serve to CYA.

The original lines:

are little different than the version Luke quoted, and are mostly understandable (with the exception "gang aft agley") to a sophisticated English reader with no special knowledge. I am somewhat inclined to call that version a

rewriterather than a translation, just as I would consider some modernized versions of Shakespeare to not be translations, but rewrites.The standard problem of drawing lines in a continuum rears its head again. There are some reasonable arguments for calling Scots from this time a dialect of English, and many others for calling it a separate language. This is complicated by people's personal and national identities being involved. Questions like these generally end up being settled more by politics than by details of the different linguistic varieties involved.

Scots Gaelic is not Scots (is not Scottish English, though modern speakers of Scots do generally code switch into it with ease, sometimes in a continuous way). Scots Gaelic is a Gaelic, Celtic language. Scots is Germanic. Burns wrote in Scots.

Scots Gaelic is a thing, but it is not the language in which Burns wrote. That's just called Scots. I wouldn't ordinarily have mentioned it, but... you're coming off as a bit snobby here. (O wad some Power the giftie gie us, am I right?)

This may be high status in certain social circles (having interacted with the snooty Ivy League educated New York poets also, they certainly think so) but to a lot of people doing so comes across as obnoxious and pretentious, that is an attempt to blatantly signal high status in a way that signals low status.

The highest status thing to do (and just optimal as far as I can tell for actually conveying information) is to include the original and the translation also.

I find it interesting that everyone here is focusing on status; couldn't it just be that crediting translations is absolutely necessary for the basic scholarly purpose of judging the authority and trustworthiness of the translation and even the original text? And that failing to provide attribution demonstrates a lack of academic expertise, general ignorance of the slipperiness of translation ('hey, how important could it be?'), and other such problems.

I know I find such information indispensable for my anime

Evangelionresearch (I treat translations coming from ADV very differently from translations by Olivier Hague and that different from translations by Bochan_bird, and so on, to give a few examples), so how much more so for real scholarship?Well, what I originally [see edit] wrote was "It's wrong (deprives the translator of rightful credit) -- and, FWIW, it's also low-status." I think people found the "low-status" part of my claim more interesting, but it wasn't the primary reason I reacted badly to seeing a translation uncredited as such.

Edit: on reflection, this wasn't my original justification. I simply reacted with gut-level intuition,knowingit was wrong. Every other explanation is after-the-fact, and therefore suspect.Note Carl Shulman's counterargument to the assumption of a normal prior here and the comments traded between Holden and Carl.

"If your prior was that charity cost-effectiveness levels were normally distributed, then no conceivable evidence could convince you that a charity could be 100x as good as the 90th percentile charity. The probability of systematic error or hoax would always be ludicrously larger than the chance of such an effective charity. One could not believe, even in hindsight, that paying for Norman Borlaug’s team to work on the Green Revo... (read more)

quick feedback or question.

In this part: Assume, too kindly, that your estimates are unbiased. And suppose you use this decision procedure many times, for many different decisions, and your estimates are unbiased.

the second time you mention the unbiased makes no sense to me and looks like a typo.

If X = Skill + Luck, with Skill and Luck both random variables, then selecting max(X) will get you something that has high Skill and high Luck.

If Estimate = TrueVal + Error, then max(Estimate) will have both high TrueVal and high Error.

This obvious insight has many applications, especially when the selection is done over a very large number of entities, e.g. trying to emulate the habits of billionaires in order to become rich.

Very interesting. I'm going to try my hand at a short summary:

Assume that you have a number of different options you can choose, that you want to estimate the value of each option and you have to make your best guess as to which option is most valuable. In step one, you generate individual estimates using whatever procedure you think is best. In step 2 you make the final decision, by choosing the option that had the highest estimate in step one.

The point is: even if you have unbiased procedures for creating the individual estimates in step one (ie procedur... (read more)