Contaminated by Optimism

by Eliezer Yudkowsky6 min read6th Aug 200878 comments


Personal Blog

Followup to: Anthropomorphic Optimism, The Hidden Complexity of Wishes

Yesterday, I reprised in further detail The Tragedy of Group Selectionism, in which early biologists believed that predators would voluntarily restrain their breeding to avoid exhausting the prey population; the given excuse was "group selection".  Not only does it turn out to be nearly impossible for group selection to overcome a countervailing individual advantage; but when these nigh-impossible conditions were created in the laboratory - group selection for low-population groups - the actual result was not restraint in breeding, but, of course, cannibalism, especially of immature females.

I've made even sillier mistakes, by the way - though about AI, not evolutionary biology.  And the thing that strikes me, looking over these cases of anthropomorphism, is the extent to which you are screwed as soon as you let anthropomorphism suggest ideas to examine.

In large hypothesis spaces, the vast majority of the cognitive labor goes into noticing the true hypothesis.  By the time you have enough evidence to consider the correct theory as one of just a few plausible alternatives - to represent the correct theory in your mind - you're practically done.  Of this I have spoken several times before.

And by the same token, my experience suggests that as soon as you let anthropomorphism promote a hypothesis to your attention, so that you start wondering if that particular hypothesis might be true, you've already committed most of the mistake.

The group selectionists did not deliberately extend credit to the belief that evolution would do the aesthetic thing, the nice thing.  The group selectionists were doomed when they let their aesthetic sense make a suggestion - when they let it promote a hypothesis to the level of deliberate consideration.

It's not like I knew the original group selectionists.  But I've made analogous mistakes as a teenager, and then watched others make the mistake many times over.  So I do have some experience whereof I speak, when I speak of instant doom.

Unfortunately, the prophylactic against this mistake, is not a recognized technique of Traditional Rationality.

In Traditional Rationality, you can get your ideas from anywhere.  Then you weigh up the evidence for and against them, searching for arguments on both sides.  If the question hasn't been definitely settled by experiment, you should try to do an experiment to test your opinion, and dutifully accept the result.

"Sorry, you're not allowed to suggest ideas using that method" is not something you hear, under Traditional Rationality.

But it is a fact of life, an experimental result of cognitive psychology, that when people have an idea from any source, they tend to search for support rather than contradiction - even in the absence of emotional commitment (see link).

It is a fact of life that priming and contamination occur: just being briefly exposed to completely uninformative, known false, or totally irrelevant "information" can exert significant influence on subjects' estimates and decisions.  This happens on a level below deliberate awareness, and that's going to be pretty hard to beat on problems where anthropomorphism is bound to rush in and make suggestions - but at least you can avoid deliberately making it worse.

It is a fact of life that we change our minds less often than we think.  Once an idea gets into our heads, it is harder to get it out than we think.  Only an extremely restrictive chain of reasoning, that definitely prohibited most possibilities from consideration, would be sufficient to undo this damage - to root an idea out of your head once it lodges.  The less you know for sure, the easier it is to become contaminated - weak domain knowledge increases contamination effects.

It is a fact of life that we are far more likely to stop searching for further alternatives at a point when we have a conclusion we like, than when we have a conclusion we dislike.

It is a fact of life that we hold ideas we would like to believe, to a lower standard of proof than ideas we would like to disbelieve.  In the former case we ask "Am I allowed to believe it?" and in the latter case ask "Am I forced to believe it?"  If your domain knowledge is weak, you will not know enough for your own knowledge to grab you by the throat and tell you "You're wrong!  That can't possibly be true!"  You will find that you are allowed to believe it.  You will search for plausible-sounding scenarios where your belief is true.  If the search space of possibilities is large, you will almost certainly find some "winners" - your domain knowledge being too weak to definitely prohibit those scenarios.

It is a fact of history that the group selectionists failed to relinquish their folly.  They found what they thought was a perfectly plausible way that evolution (evolution!) could end up producing foxes who voluntarily avoided reproductive opportunities(!).  And the group selectionists did in fact cling to that hypothesis.  That's what happens in real life!  Be warned!

To beat anthropomorphism you have to be scared of letting anthropomorphism make suggestions.  You have to try to avoid being contaminated by anthropomorphism (to the best extent you can).

As soon as you let anthropomorphism generate the idea and ask, "Could it be true?" then your brain has already swapped out of forward-extrapolation mode and into backward-rationalization mode.  Traditional Rationality contains inadequate warnings against this, IMO.  See in particular the post where I argue against the Traditional interpretation of Devil's Advocacy.

Yes, there are occasions when you want to perform abductive inference, such as when you have evidence that something is true and you are asking how it could be true.  We call that "Bayesian updating", in fact.  An occasion where you don't have any evidence but your brain has made a cute little anthropomorphic suggestion, is not a time to start wondering how it could be true.  Especially if the search space of possibilities is large, and your domain knowledge is too weak to prohibit plausible-sounding scenarios.  Then your prediction ends up being determined by anthropomorphism.  If the real process is not controlled by a brain similar to yours, this is not a good thing for your predictive accuracy.

This is a war I wage primarily on the battleground of Unfriendly AI, but it seems to me that many of the conclusions apply to optimism in general.

How did the idea first come to you, that the subprime meltdown wouldn't decrease the value of your investment in Danish deuterium derivatives?  Were you just thinking neutrally about the course of financial events, trying to extrapolate some of the many different ways that one financial billiard ball could ricochet off another?  Even this method tends to be subject to optimism; if we know which way we want each step to go, we tend to visualize it going that way.  But better that, than starting with a pure hope - an outcome generated because it ranked high in your preference ordering - and then permitting your mind to invent plausible-sounding reasons it might happen.  This is just rushing to failure.

And to spell out the application to Unfriendly AI:  You've got various people insisting that an arbitrary mind, including an expected paperclip maximizer, would do various nice things or obey various comforting conditions:  "Keep humans around, because diversity is important to creativity, and the humans will provide a different point of view."  Now you might want to seriously ask if, even granting that premise, you'd be kept in a nice house with air conditioning; or kept in a tiny cell with life support tubes and regular electric shocks if you didn't generate enough interesting ideas that day (and of course you wouldn't be allowed to die); or uploaded to a very small computer somewhere, and restarted every couple of years.  No, let me guess, you'll be more productive if you're happy.  So it's clear why you want that to be the argument; but unlike you, the paperclip maximizer is not frantically searching for a reason not to torture you.

Sorry, the whole scenario is still around as unlikely as your carefully picking up ants on the sidewalk, rather than stepping on them, and keeping them in a happy ant colony for the sole express purpose of suggesting blog comments.  There are reasons in my goal system to keep sentient beings alive, even if they aren't "useful" at the moment.  But from the perspective of a Bayesian superintelligence whose only terminal value is paperclips, it is not an optimal use of matter and energy toward the instrumental value of producing diverse and creative ideas for making paperclips, to keep around six billion highly similar human brains.  Unlike you, the paperclip maximizer doesn't start out knowing it wants that to be the conclusion.

Your brain starts out knowing that it wants humanity to live, and so it starts trying to come up with arguments for why that is a perfectly reasonable thing for a paperclip maximizer to do.  But the paperclip maximizer itself would not start from the conclusion that it wanted humanity to live, and reason backward.  It would just try to make paperclips.  It wouldn't stop, the way your own mind tends to stop, if it did find one argument for keeping humans alive; instead it would go on searching for an even superior alternative, some way to use the same resources to greater effect.  Maybe you just want to keep 20 humans and randomly perturb their brain states a lot.

If you can't blind your eyes to human goals and just think about the paperclips, you can't understand what the goal of making paperclips implies.  It's like expecting kind and merciful results from natural selection, which lets old elephants starve to death when they run out of teeth.

A priori, if you want a nice result that takes 10 bits to specify, then a priori you should expect a 1/1024 probability of finding that some unrelated process generates that nice result.  And a genuinely nice outcome in a large outcome space takes a lot more information than the English word "nice", because what we consider a good outcome has many components of value.  It's extremely suspicious if you start out with a nice result in mind, search for a plausible reason that a not-inherently-nice process would generate it, and, by golly, find an amazing clever argument.

And the more complexity you add to your requirements - humans not only have to survive, but have to survive under what we would consider good living conditions, etc. - the less you should expect, a priori, a non-nice process to generate it.  The less you should expect to, amazingly, find a genuine valid reason why the non-nice process happens to do what you want.  And the more suspicious you should be, if you find a clever-sounding argument why this should be the case.  To expect this to happen with non-trivial probability is pulling information from nowhere; a blind arrow is hitting the center of a small target.  Are you sure it's wise to even search for such possibilities?  Your chance of deceiving yourself is far greater than the a priori chance of a good outcome, especially if your domain knowledge is too weak to definitely rule out possibilities.

No more than you can guess a lottery ticket, should you expect a process not shaped by human niceness, to produce nice results in a large outcome space.  You may not know the domain very well, but you can understand that, a priori, "nice" results require specific complexity to happen for no reason, and complex specific miracles are rare.

I wish I could tell people:  "Stop!  Stop right there!  You defeated yourself the moment you knew what you wanted!  You need to throw away your thoughts and start over with a neutral forward extrapolation, not seeking any particular outcome."  But the inferential distance is too great; and then begins the slog of, "I don't see why that couldn't happen" and "I don't think you've proven my idea is wrong."

It's Unfriendly superintelligence that tends to worry me most, of course.  But I do think the point generalizes to quite a lot of optimism.  You may know what you want, but Nature doesn't care.

Personal Blog