I feel like you omit the possibility that the trait of motivated reasoning is like the “trait” of not-flying. You don’t need an explanation for why humans have the trait of not-flying, because not-flying is the default. Why didn’t this “trait” evolve away? Because there isn’t really any feasible genomic changes that would “get rid” of not-flying (i.e. that would make humans fly), at least not without causing other issues.
RE “evolutionarily-recent”: I guess your belief is that “lots of other mammals engaging in motivated reasoning” is not the world we live in. But is that right? I don’t see any evidence either way. How could one tell whether, say, a dog or a mouse ever engages in motivated reasoning?
My own theory (see [Valence series] 3. Valence & Beliefs) is that planning and cognition (in humans and other mammals) works by an algorithm that is generally very effective, and has gotten us very far, but which has motivated reasoning as a natural and unavoidable failure mode. Basically, the algorithm is built so as to systematically search for thoughts that seem good rather than bad. If some possibility is unpleasant, then the algorithm will naturally discover the strategy of “just don’t think about the unpleasant possibility”. That’s just what the algorithm will naturally do. There isn’t any elegant way to avoid this problem, other than evolve an entirely different algorithm for practical intelligence / planning / etc., if indeed such an alternative algorithm even exists at all.
Our brain has a hack-y workaround to mitigate this issue, namely the “involuntary attention” associated with anxiety, itches, etc., which constrain your thoughts so as to make you unable to put (particular types of) problems out of your mind. In parallel, culture has also developed some hack-y workarounds, like Reading The Sequences, or companies that have a red-teaming process. But none of these workarounds completely solves the issue, and/or they come along with their own bad side-effects.
Anyway, the key point is that motivated reasoning is a natural default that needs no particular explanation.
Once one learns to spot motivated reasoning in one's own head, the short term planner has a much harder problem. It's still looking for outputs-to-rest-of-brain which will result in e.g. playing more Civ, but now the rest of the brain is alert to the basic tricks. But the short term planner is still looking for outputs, and sometimes it stumbles on a clever trick: maybe motivated reasoning is (long-term) good, actually? And then the rest of the brain goes "hmm, ok, sus, but if true then yeah we can play more Civ" and the short term planner is like "okey dokey let's go find us an argument that motivated reasoning is (long-term) good actually!".
In short: "motivated reasoning is somehow secretly rational" is itself the ultimate claim about which one would motivatedly-reason. It's very much like the classic anti-inductive agent, which believes that things which have happened more often before are less likely to happen again: "but you've been wrong every time before!" "yes, exactly, that's why I'm obviously going to be right this time". Likewise, the agent which believes motivated reasoning is good actually: "but your argument for motivated reasoning sure seems pretty motivated in its own right" "yes, exactly, and motivated reasoning is good so that's sensible".
... which, to be clear, does not imply that all arguments in favor of motivated reasoning are terrible. This is meant to be somewhat tongue-in-cheek; there's a reason it's not in the post. But it's worth keeping an eye out for motivated arguments in favor of motivated reasoning, and discounting appropriately (which does not mean dismissing completely).
I think motivated reasoning is mostly bad, but there is some value to having some regularization towards consistency with past decisions. for example, there are often two almost-equally good choices you can make, but you need to commit to one instead of indefinitely waffling between the two, which is way worse than either option. having some delusional confidence via motivated reasoning can help you commit to one of the options and see it through. I've personally found that my unwillingness to accept motivated reasoning also has the side effect that I spend a lot more time in decision paralysis.
diffusion planning is entirely made from motivated reasoning and performs pretty well. this is imo a reasonable exemplar for a slightly broader belief I have, which is that the simplest hypothesis for motivated reasoning is that reasoning from percept-feature to outcome-feature (prediction), outcome feature to motor feature (control), outcome feature to percept feature (wishful thinking), are not trivially distinguishable when you have something confuseable for "big ol' diffusion model over whatever", and so avoiding motivated reasoning is hard in a messy system. There's enough pressure to avoid a lot of it, but given that reasoning from outcome-feature to motor-feature is a common need, going through correlational features that mix representations is totally allowed-by-substrate and thus common anywhere there isn't sufficient pressure against it.
of course, I'm being kind of sloppy in my claims here.
(This is a less-specific version of saying "it's just active inference", because the active inference math hasn't clicked for me yet, so I can't claim that it's exactly active inference; but it does seem like in general, planning-by-inference ought to be the default, as hinted by the fact that you can get it just by jiggling stuff around diffusion style.)
Your explanation about the short-term planner optimizing against the long-term planner seems to suggest we should only see motivated reasoning in cases where there is a short-term reward for it.
It seems to me that motivated reasoning also occurs in cases like gamblers thinking their next lottery ticket has positive expected value, or competitors overestimating their chances of winning a competition, where there doesn't appear to be a short-term benefit (unless the belief itself somehow counts as a benefit). Do you posit a different mechanism for these cases?
I've been thinking for a while that motivated reasoning sort of rhymes with reward hacking, and might arise any time you have a generator-part Goodharting an evaluator-part. Your short-term and long-term planners might be considered one example of this pattern?
I've also wondered if children covering their eyes when they get scared might be an example of the same sort of reward hacking (instead of eliminating the danger, they just eliminate the warning signal from the danger-detecting part of themselves by denying it input).
Motivated reasoning is a misfire of a generally helpful heuristic: try and understand why what other people are telling you makes sense.
In a high trust setting, people are usually well-served by assuming that there’s a good reason for what they’re told, what they believe, and what they’re doing. Saying, “figure out an explanation for why your current plans make sense” is motivated reasoning, but it’s also a way to just remember what the heck you’re doing and to coordinate effectively with others by anticipating how they’ll behave.
The thing to explain, I think, is why we apply this heuristic in less than full trust settings. My explanation for that is that this sense-making is still adaptive even in pretty low-trust settings. The best results you can get in a low-trust (or parasitic) setting are worse than you’d get in a higher-trust setting, but sense-making it typically leads to better outcomes than not.
In particular, while it’s easy in retrospect to pick a specific action (playing Civ all night) and say “I shouldn’t have sense-made that,” it’s hard to figure out in a forward-looking way which settings or activities do or don’t deserve sense-making. We just do it across the board, unless life has made us into experts on how to calibrate our sense-making. This might look like having enough experience with a liar to disregard everything they’re saying, and perhaps even to sense-make “ah, they’re lying to me like THIS for THAT reason.”
In summary, motivated reasoning is just sense-making, which is almost always net adaptive. Specific products, people and organizations take advantage of this to exploit people’s sense-making in limited ways. If we focus on the individual misfires in retrospect, it looks maladaptive. But if you had to predict in advance whether or not to sense-make any given thing, you’d be hard-pressed to do better than you’re already doing, which probably involves sense-making quite a bit of stuff most of the time.
I remember the BBQ benchmark which had the LLMs(!) exhibit such reasoning. Maybe motivated reasoning is more adaptive than we think, as I conjectured back when Eli Tyre asked this question first?
LLMs mimic human text. That is the first and primary thing they are optimized for. Humans motivatedly reason, which shows up in their text. So, LLMs trained to mimic human text will also mimic motivated reasoning, insofar as they are good at mimicking human text. This seems like the clear default thing one would expect from LLMs; it does not require hypothesizing anything about motivated reasoning being adaptive.
I also see an additional mechanism for motivated reasoning to emerge. Suppose that we have an agent who is unsure of its capabilities (e.g. GPT-5 who arguably believed its time horizon to be 20-45 mins). Then the best thing the agent could do to increase its capabilities would be to attempt[1] tasks a bit more difficult than the edge of the agent's capabilities and to either succeed by chance or do something close to success and/or have the agent's capabilities increase from mere trying, which is the case at least in Hebbian networks. Then the humans who engaged in such reasoning found it easier to keep trying and had the latter trait, and not motivated reasoning itself, correlate with success.
Or, in the case of LLMs, have the hosts give such a task and let the model try the task.
There’s a standard story which says roughly "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans". I do not think that story stands up well under examination; when I think of standard day-to-day examples of motivated reasoning, that pattern sounds like a plausible generator for some-but-a-lot-less-than-all of them.
Examples
Suppose it's 10 pm and I've been playing Civ all evening. I know that I should get ready for bed now-ish. But... y'know, this turn isn't a very natural stopping point. And it's not that bad if I go to bed half an hour late, right? Etc. Obvious motivated reasoning. But man, that motivated reasoning sure does not seem very socially-oriented? Like, sure, you could make up a story about how I'm justifying myself to an imaginary audience or something, but it does not feel like one would have predicted the Civ example in advance from the model "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans".
Another class of examples: very often in social situations, the move which will actually get one the most points is to admit fault and apologize. And yet, instead of that, people instinctively spin a story about how they didn't really do anything wrong. People instinctively spin that story even when it's pretty damn obvious (if one actually stops to consider it) that apologizing would result in a better outcome for the person in question. Again, you could maybe make up some story about evolving suboptimal heuristics, but this just isn't the behavior one would predict in advance from the model "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans".
That said, let’s also include an example where "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans" does seem like a plausible generator. Suppose I told a partner I’d pick them up on my way home at 6:00 pm, but when 6:00 pm rolls around I’m deep in an interesting conversation and don’t want to stop. The conversation continues for a couple hours. My partner is unhappy about this. But if I can motivatedly-reason my way to believing that my choice was justified (or at least not that bad), then I will probably have a lot easier time convincing my partner that the choice was justified - or at least that we have a reasonable disagreement about what’s justified, as opposed to me just being a dick. Now personally I prefer my relationships be, uh, less antagonistic than that whole example implies, but you can see where that sort of thing might be predicted in advance by the model "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans".
Looking at all these examples (and many others) together, the main pattern which jumps out to me is: motivated reasoning isn't mainly about fooling others, it's about fooling oneself. Or at least a part of oneself. Indeed, there's plenty of standard wisdom along those lines: "the easiest person to fool is yourself", etc. Yes, there are some examples where fooling oneself is instrumentally useful for negotiating with others. But humans sure seem to motivatedly-reason and fool themselves in lots of situations which don’t involve any other humans (like the Civ example), and situations in which the self-deception is net harmful socially (like the apology class of examples). The picture as a whole does not look like "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans".
So why do humans motivatedly reason, then?
I’m about to give an alternative model. First, though, I should flag that the above critique still stands even if the alternative model is wrong. "Motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans" is still basically wrong, even if the alternative I’m about to sketch is also wrong.
With that in mind, model part 1: motivated reasoning simply isn’t adaptive. Even in the ancestral environment, motivated reasoning decreased fitness. The obvious answer is just correct.
What? But then why didn’t motivated reasoning evolve away?
Humans are not nearly fitness-optimal, especially when it comes to cognition. We have multiple arguments and lines of evidence for this fact.
First, just on priors: humans are approximately the stupidest thing which can cognitively “take off”, otherwise we would have taken off sooner in ancestral history, when we were less smart. So we shouldn’t expect humans to be optimal minds with all the bugs worked out.
Second, it sure does seem like humans have been evolving at a relatively quick clip, especially the brain. It’s not like we’ve been basically the same for tens of millions of years; our evolution is not at equilibrium, and wasn’t at equilibrium even before agriculture.
Third, it sure does seem like humans today have an awful lot of cognitive variation which is probably not fitness-neutral (even in the ancestral environment). The difference between e.g. an IQ-70 human and an IQ-130 human is extremely stark, mostly genetic, and does not seem to involve comparably large tradeoffs on other axes of fitness in the ancestral environment (e.g. IQ-130 humans do not get sick twice as often or burn twice as many calories as IQ-70 humans).
So in general, arguments of the form “<apparently-suboptimal quirk of human reasoning> must be adaptive because it didn’t evolve away” just… aren’t that strong. It’s not zero evidence, but it’s relevant mainly when the quirk is something which goes back a lot further in the ancestral tree than humans.
(This does mean that e.g. lots of other mammals engaging in motivated reasoning, in a qualitatively similar way to humans, would be much more compelling evidence that motivated reasoning is adaptive.)
Ok, but then why do humans motivatedly reason?
Even if we accept that human cognition is not nearly fitness-optimal, especially when it comes to cognition, that doesn’t tell us which particular cognitive bugs humans have. It doesn’t predict motivated reasoning specifically, out of the bajillions of possibilities in the exponentially large space of possible cognitive bugs. It doesn’t positively predict motivated reasoning, it just negates the argument that motivated reasoning must somehow be fitness-optimal.
Our above argument does predict that motivated reasoning must have shown up recently in human evolutionary history (otherwise it would have evolved away). And motivated reasoning does seem innate to humans by default (as opposed to e.g. being installed by specific cultural memes), so it must have come from one or a few genetic changes. And those changes must have increased fitness overall, otherwise they wouldn’t have spread to the whole population. So, insofar as we buy those premises… motivated reasoning must be a side-effect of some other evolutionarily-recent cognitive changes which were overall beneficial, despite motivated reasoning itself being net negative.
Can we guess at what those changes might be?
Observation: in examples of motivated reasoning, it feels like our brains have two internal plan-evaluators. One of them is a relatively short-sighted, emotionally-driven plan evaluator. The other is focused more on the long term, on reputation and other people’s reactions, on all the things one has been told are good or bad, etc; that one is less myopic. The basic dynamic in motivated reasoning seems to be the shorter-range plan-evaluator trying to trick the longer-range plan evaluator.
Thus, model part 2: the longer-range plan evaluator is a recent cognitive innovation of the human lineage. Other animals sometimes do long-range-oriented things, but usually not in a general purpose way; general purpose long-range planning seems pretty human specific. The shorter sighted plan evaluator is still just doing basically the same thing it’s always done: it tries to find outputs it can feed to the rest of the brain which will result in good-feeling stuff short term. In humans, that means the short sighted search process looks for outputs it can feed to the long range planner which will result in good-feeling stuff short term. Thus, motivated reasoning: the short sighted search process is optimizing against the long range planner, just as an accident of working the same way the short sighted process always worked throughout evolutionary history.
For example, when I’m playing Civ at 10 pm, my long range planner is like “ok bed time now”, but my short range planner is like “oh no that will lose good-feeling stuff right now, let’s try spitting some other outputs into rest-of-brain to see if we can keep the good-feeling stuff”. And sometimes it hits on thoughts like “y'know, this turn isn't a very natural stopping point” or “it's not that bad if I go to bed half an hour late, right?”, which mollify the long range planner enough to keep playing Civ. In an ideal mind, the short range and long range planners wouldn’t optimize against each other like this; both do necessary work sometimes. But humans aren’t ideal minds, the long range planner is brand spanking new (evolutionarily) and all the bugs haven’t been worked out yet. The two planners just kinda both got stuck in one head and haven’t had time to evolve good genetically hardcoded cooperative protocols yet.