SotW: Avoid Motivated Cognition

by Eliezer Yudkowsky9 min read28th May 201283 comments


Personal Blog

(The Exercise Prize series of posts is the Center for Applied Rationality asking for help inventing exercises that can teach cognitive skills.  The difficulty is coming up with exercises interesting enough, with a high enough hedonic return, that people actually do them and remember them; this often involves standing up and performing actions, or interacting with other people, not just working alone with an exercise booklet and a pencil.  We offer prizes of $50 for any suggestion we decide to test, and $500 for any suggestion we decide to adopt.  This prize also extends to LW meetup activities and good ideas for verifying that a skill has been acquired.  See here for details.)

The following awards have been made:  $550 to Palladias, $550 to Stefie_K, $50 to lincolnquirk, and $50 to John_Maxwell_IV.  See the bottom for details.  If you've earned a prize, please PM StephenCole to claim it.  (If you strongly believe that one of your suggestions Really Would Have Worked, consider trying it at your local Less Wrong meetup.  If it works there, send us some participant comments; this may make us update enough to test it.)

Lucy and Marvin are walking down the street one day, when they pass a shop showing a large chocolate cake in the window.

"Hm," says Lucy, "I think I'll buy and eat that chocolate cake."

"What, the whole thing?" says Marvin.  "Now?"

"Yes," says Lucy, "I want to support the sugar industry."

There is a slight pause.

"I don't suppose that your liking chocolate cake has anything to do with your decision?" says Marvin.

"Well," says Lucy, "I suppose it could have played a role in suggesting that I eat a whole chocolate cake, but the reason why I decided to do it was to support the sugar industry.  Lots of people have jobs in the sugar industry, and they've been having some trouble lately."

Motivated cognition is the way (all? most?) brains generate false landscapes of justification in the presence of attachments and flinches.  It's not enough for the human brain to attach to the sunk cost of a PhD program, so that we are impelled in our actions to stay - no, that attachment can also go off and spin a justificational landscape to convince the other parts of ourselves, even the part that knows about consequentialism and the sunk cost fallacy, to stay in the PhD program.

We're almost certain that the subject matter of "motivated cognition" isn't a single unit, probably more like 3 or 8 units.  We're also highly uncertain of where to start teaching it.  Where we start will probably end up being determined by where we get the best suggestions for exercises that can teach it - i.e., end up being determined by what we (the community) can figure out how to teach well.

The cognitive patterns that we use to actually combat motivated cognition seem to break out along the following lines:

  1. Our conceptual understanding of 'motivated cognition', and why it's defective as a cognitive algorithm - the "Bottom Line" insight.
  2. Ways to reduce the strength of the rationalization impulse, or restore truth-seeking in the presence of motivation: e.g., Anna's "Become Curious" technique.
  3. Noticing the internal attachment or internal flinch, so that you can invoke the other skills; realizing when you're in a situation that makes you liable to rationalize.
  4. Realigning the internal parts that are trying to persuade each other: belief-alief or goal-urge reconciliation procedures.

And also:

  • Pattern recognition of the many styles of warped justification landscape that rationalization creates - being able to recognize "motivated skepticism" or "rehearsing the evidence" or "motivated uncertainty".
  • Specific counters to rationalization styles, like "Set betting odds" as a counter to motivated uncertainty.

Exercises to teach all of these are desired, but I'm setting apart the Rationalization Patterns into a separate SotW, since there are so many that I'm worried 1-4 won't get fair treatment otherwise.  This SotW will focus on items 1-3 above; #4 seems like more of a separate unit.

Conceptual understanding / insights / theoretical background:

The core reasons why rationalization doesn't work are given in The Bottom Line and Rationalization.  The Bayesian analysis of selective search is given in What Evidence Filtered Evidence? and Conservation of Expected Evidence.

For further discussion, see the entire Against Rationalization sequence, also The Meditation on Curiosity (for the Litany of Tarski).

Some key concepts (it'd be nice if some exercise taught a gut-level understanding thereof, although as always the goal is to t each skills rather than concepts):

  • Once you write down the answer on the bottom line of a piece of paper in pen, it's already right or already wrong, and won't change regardless of what clever arguments you write on the lines above.
  • What determines your life outcome isn't how cleverly you argue for the foregone conclusion - what determines life outcomes is the algorithm that chooses which side to argue for, what you actually do.
  • Rationality isn't something you can use to argue for your side; you can never say "Please come up with a rational argument for X"; the only chance for rationality to operate is when you're deciding which side to be on.
  • Evidence that has passed through a filter takes on a different Bayesian import.  (Currently handled in the Bayes unit.)
  • It is impossible for a rational agent to search for evidence to look at that will send their beliefs in a predetermined direction.  (Currently handled in the Bayes unit.)
  • There's such a thing as a correct probability to assign to an uncertain proposition, and this in turn determines the weight to lend that possibility in our actions.  Even when things are uncertain, any cognition that makes you put too much action-weight on the wrong belief or choice is screwing you up.
  • If you're selective about where you look for flaws, or how hard you look for flaws, every new fallacy you learn how to detect makes you that much stupider.

(We might also need an exercise just for getting people to understand the concept of motivated cognition at all.  When Anna and Michael ran their first session on motivated cognition, they found that while most participants immediately recognized the notion of 'rationalization' from examples like Lucy above, several people had no idea what they were talking about - they didn't see why anyone would ever want to use a technique like the Litany of Tarski.  Yes, we know you're skeptical, we also couldn't see how that could possibly be true a priori, but sometimes the evidence just punches you in the nose.  After some investigation, it seems entirely possible that Alicorn has simply never rationalized, ever.  Other cases (not Alicorn's) suggest that some people might have a very low need for verbal justification; even if they feel guilty about breaking their diet, they feel no urge to invent an elaborate excuse - they just break their diet.  On the other hand, LW!Hermione failed to reproduce this experiment - she couldn't find anyone who didn't immediately recognize "rationalization" after 10 tries with her friends.  We notice we are confused.)

(The upshot is that part of the challenge of constructing a first unit on motivated cognition may be to "Explain to some participants what the heck a 'rationalization' is, when they don't remember any internal experience of that" or might even be "Filter out attendees who don't rationalize in the first place, and have them do a different unit instead."  Please don't be fascinated by this problem at the expense of the primary purpose of the unit, though; we're probably going to award at most 1 prize on this subtopic, and more likely 0, and there's an existing thread for further discussion.)

Countering the rationalization impulse / restoring truth-seeking:

The Tarski method:  This is the new name of what we were previously calling the Litany of Tarski:  "If the sky is blue, I want to believe the sky is blue; if the sky is not blue, I want to believe the sky is not blue; let me not become attached to beliefs I may not want."

Example:  Suppose you walk outside on a fall day wearing a short-sleeved shirt, when you feel a slightly chill breath of air on your arms.  You wonder if you should go back into the house and get a sweater.  But that seems like work; and so your mind quickly notes that the Sun might come out soon and then you wouldn't need the sweater.


  It stays cold enough to require a sweater It gets warm enough that no sweater is needed.
You believe you need a sweater A warm walk in a toasty sweater. Your walk is ruined forever by the need to carry an extra sweater.
You believe you don't need a sweater You are cold!  Cold cold cold!  Why didn't you get a sweater? Free and unencumbered, you stroll along as the warm Sun comes out overhead.

Visualizing all 4 quadrants of this binary proposition - the world is like A and I believe A, the world is like B and I believe A, etc. - should, in principle, emotionally confirm the truth of the proposition:  "If it will be cold, I want to believe it's cold; if it's not cold, I want to believe it's not cold; let me not become attached to beliefs I may not want."

Eliezer and Anna, when using this method against the temptation to believe X, visualize only the quadrant "The world is not like X and I believe X" to remind themselves of the consequences; e.g. we would only visualize the "You are cold!" quadrant.  Michael Smith (aka "Val", short for Valentine) says that after some practice on this technique as a kata, he was able to visualize all 4 quadrants quickly and that visualizing all 4 seemed to help.

Val also used an upside-down W-diagram with the two worlds at the top and the four beliefs at the bottom, to emphasize the idea that the world is there first, and is fixed, and we have only a choice of what to believe within a fixed world, not a choice of which background world to live in.  The Tarski Method embodies a "Start from the world" mental process in which you visualize the world being there first, and your belief coming afterward; a similar "Start from the world" rule is likewise emphasized in the Bayes unit, wherein one starts from a world and asks about the probability of the evidence, rather than starting from the evidence and trying to make it match up with a world.

When we actually tested a unit based on asking people to draw Tarski squares, it didn't work very well - possibly because people didn't seem to understand what it was for, or when they would use it; possibly because it wasn't a group exercise.  In any case, we already tried teaching this the obvious way ("Go draw Tarski squares!") and it didn't work.  But it still seems worth teaching if someone can invent a better exercise, because it's something that multiple CfAR people actually use to counter the rationalization impulse / restore truthseeking in real life.

Become Curious:  Detect non-curiosity and become curious.  Anna's main alarm signal is when she notices that she's not curious in the middle of a conversation - that she doesn't have an impulse-to-find-out the answer - and then try to make herself curious about the subject of discussion.  Besides visualizing the not-X-and-believe-X quadrant of the Tarski diagram, this is also something you may be able to do by brute introspection - remember the feeling of curiosity, and try to call it up.  (This is probably in the top 3 most important things I learned from Anna. -- EY)

Take Pride in Your Objectivity:  Julia teaches this as a primary counter in her Combat Reflexes unit (how to avoid instantly defending or attacking).  Eliezer does this every time he admits he's wrong on the Internet - congratulates himself on being such a great rationalist, in order to apply counter-hedons to the flash of pain that would otherwise be associated.

Visualize a Fixed Probability:  This is what Eliezer used as a child to stop being scared of the dark - he would deliberately visualize a murderer standing with a knife behind a door, then visualize his own thoughts having no effect on the fixed probability that any such murderer was actually present.  In other words, the notion of a "true probability" that his thoughts couldn't affect, countered the fear of thoughts affecting reality.  Visualizing there being a fixed frequency of worlds, or a lawful probability that a Bayesian agent would assign, can help in perceiving the futility of rationalization because you're trying to use arguments to move a lawful probability that is fixed.  This is also part of the domain of Lawful Uncertainty, the notion that there are still rules which apply even when we're unsure (not presently part of any unit).

Imagine the Revelation:  Anna imagines that the answer is about to be looked up on the Internet, that Omega is about to reveal the answer, etc., to check if her thoughts would change if she was potentially about to be embarrassed right now.  This detects belief-alief divergence, but also provides truthseeking impulse.

Knowing the Rules:  And finally, if you have sufficient mastery of probability theory or decision theory, you may have a procedure to follow which is lawful enough, and sufficiently well-understood, that rationalization can't influence it much without the mistake being blatant even to you.  (In a sense, this is what most of Less Wrong is about - reducing the amount of self-honesty required by increasing the obviousness of mistakes.)

Noticing flinches and attachments, and raising them to conscious attention:

A trigger for use of curiosity-restoration or the Tarski Method:  Noticing what it feels like for your mind to:

  • Quickly glimpse a disliked argument before sweeping it under a mental rug (flinch)
  • Glimpse a conclusion, find it unacceptable, quickly start generating arguments against it (flinch)
  • Be centered on a conclusion, automatically counter all arguments against it (attachment)
  • Instantly attack a new idea, instantly defend an old idea (this is the subject of Julia's Combat Reflexes unit)
Learning to notice these events introspectively seems extremely important - we all use it heavily in daily practice - but we don't know how to teach that.
Anna observes that Rejection Therapy is often a good time to observe oneself rationalizing, as apparently many participants reported that their mind started generating crazy reasons not to approach someone with a request.
Anna also says that she's been self-rewarding each time she notices a flinch or attachment, i.e., she's trying to train her inner pigeon to notice (not, one hopes, training the flinching or attachment!)  It's possible we could ask participants to self-reward each event of "noticing the flinch or attachment" while doing Rejection Therapy, but we still need other ideas.
Along similar lines of internal behaviorism, Eliezer avoids rewarding himself for rationalizing by repeating the phrase "Only congratulate yourself for actually changing a probability estimate or policy" on any occasion where he hasn't changed his mind after argument - as opposed to e.g. feeling any sense of reward for having defeated an incoming argument; even if the incoming argument happens to be wrong, still, "Only congratulate yourself for actually changing a probability estimate or policy."
Another thing most of us do is name attachments or flinches out loud, in conversation, as we notice them, in order to reduce their strength, i.e. "This is probably a complete post-facto rationalization, but..." (Eliezer) or "I may just be trying to avoid having my status reduced, but..." (Anna).  (Note:  This requires enough trust that nearby people also know they're flawed themselves, that you don't feel embarrassed for confessing your own flaws in front of them.  In other words, you have to tell embarrassing stories about your own failures of rationality, before other people will feel that they can do this around you.)

Anna's anti-rationalization makes heavy use of noticing suspect situations where the outside view says she might rationalize - cases where her status is at stake, and so on - and specific keywords like "I believe that" or "No, I really believe that".  She wants to try training people to notice likely contexts for rationalization, and to figure out keywords that might indicate rationalization in themselves.  (Eliezer has never tried to train himself to notice keywords because he figures his brain will just train itself to avoid the trigger phrase; and he worries about likely-context training because he's seen failure modes where no amount of evidence or sound argument is enough to overcome the suspicion of rationalization once it's been invoked.)

"Look toward the painful thought instead of away from it" is an important reflex to install to counter flinches, but would probably require some sort of hedonic support - like a strong, pre-existing pride in objectivity, or a social support group that applauds, or something to stop this from being pure negative reinforcement.

Awards for previous SotW suggestions:

$550 to Palladias for the Monday-Tuesday game, which has been tested ($50) and now adopted ($500) into the Be Specific unit (though it might be moved to some sort of Anticipation unit later on).

$550 to Stefie_K for her suggestion to have the instructor pretend to be someone who really wants you to invest in their company, but is never specific; also $50 to daenrys for the  "More Specific!" improv-game suggestion.  In combination these inspired the Vague Consultant game ("Hi, I'm a consultant, I'm here to improve your business processes!"  "How?"  "By consulting with stakeholders!") which has now been adopted into the Be Specific unit.

$50 to lincolnquirk for the "Channel Paul Graham" game, which we tested.  We all thought this would work - it was our highest-rated candidate suggestion - but it didn't get positive audience feedback.  Congratulations to lincolnquirk on a good suggestion nonetheless.

We haven't yet tested, but definitely intend to at least test, and are hence already awarding $50 to, the following idea:

$50 to John Maxwell IV for the Choose Your Own Adventure suggestion for the Consequentialism unit.

To claim a prize, send a LessWrong private message (so we know it originates from the same LW user account) to StephenCole.