The Rationality Community’s Rationality Problem: “Irrationality” Is a Modeling Error

James Miller

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

What if every “cognitive bias” is actually rational behavior under correctly modeled incentives? What if the problem isn’t that people are irrational, but that we’ve been modeling their incentive structures wrong?

This isn’t a semantic trick. It’s a testable claim with falsification criteria. If the framework I’m presenting holds, it means most of what we call “irrationality” is actually us failing to model which incentives dominate in a given context. Confirmation bias often looks like a cognitive defect, but it can be rational protection of belonging when truth-seeking threatens group membership. Sunk cost “fallacy” can be rational risk avoidance when switching costs are uncertain and immediate.

I’m going to make this case below. I expect some of you will want to call this determinism or excuse-making. It isn’t. It’s a structural claim: behavior tracks perceived payoffs, so the leverage is in changing the payoff landscape, not moralizing at individuals. If you give me a counterexample that survives a steelman incentive model, I’ll treat it as a real update, not a debate point.

The Law of Incentive Primacy

People respond to what pays. This isn't cynicism, it's observation. Across every domain of human activity, from individual psychology to institutional behavior, the same pattern holds: when the structure rewards something, people do it. When it punishes something, they avoid it. What we call culture, morality, or institutional norms are downstream effects of underlying incentive architectures that determine which behaviors succeed and which ones die out.

Here “pays” means anything the nervous system treats as net-positive expected value across the tiers, including social reward, risk reduction, and identity protection, not just money.

The mistake most analysis makes is treating incentives as one variable among many, something that influences behavior alongside values, leadership quality, or ideological commitment. The Incentive Primacy Framework argues otherwise. Incentives aren't just influential; they're primary. They define what counts as rational within any given structure, and they do so with enough consistency that outcomes become predictable once you map them correctly. Values, temperament, intelligence, culture, and history still matter, but they function as parameters shaping how incentives are perceived and weighted, not as independent causes that float free of consequence.

Most incentive processing is not conscious. People do not sit there and compute tiers. The nervous system continuously estimates payoffs, including safety, belonging, reputational risk, effort, uncertainty, relief. What we experience as “a choice” is often the final output of that weighting, followed by a story that makes the behavior legible to ourselves and others.

This doesn't eliminate agency. People still choose, but choice is the output layer, not the control panel. Free will, as traditionally conceived, is mostly illusory. Not because people are robots, but because the “will” is largely the felt experience of an autonomic valuation process choosing among constrained options, given learned priors and perceived payoffs. The freedom people actually have is real, but it exists inside the incentive terrain, not outside it.

Individual choices occur within landscapes where some paths lead to reward and others to punishment, where some options are visible and others hidden, where short-term payoffs often overwhelm long-term consequences. The framework's central claim is simple: if you want to understand why systems produce the outcomes they do, why they succeed, why they fail, why they persist in obvious dysfunction, you need to stop asking what people should do and start asking what they're rewarded for doing.

Most failure isn't moral. It's structural. Responsibility doesn’t disappear, it relocates. Blaming individuals for responding rationally to misaligned incentives is like blaming water for flowing downhill. The problem isn't the water; it's the terrain.

Two common misreads are worth killing early. First, this is not nihilism, it is leverage. If behavior tracks payoffs, then redesigning payoffs changes outcomes. Second, this is not “anything goes”. Structural causality is exactly why accountability should target terrain design, not moral scolding of predictable behavior.

Prediction: In domains where truth-seeking is socially punished, debiasing education will underperform interventions that change reputational payoffs for accuracy.

Incentives at the Individual Level

This operates at every scale, including inside people's heads. When someone makes a choice that looks irrational from the outside, choosing short-term relief over long-term health, consuming polarizing media, staying in obviously dysfunctional situations, the typical response is to diagnose it as a failure of willpower or intelligence. But that's almost never what's happening. What's happening is that the person is responding rationally to their perceived incentive structure, which weights things differently than an external observer might. That weighting is often autonomic, and partially opaque even to the person themselves.

Ease matters. Certainty matters. Social validation matters. The effort required to verify information, to resist conformity, to make a decision that puts you at odds with your peer group, these are real costs, and people weigh them against real benefits. The aggregation of millions of these individual rational responses produces systemic outcomes. Misaligned systems don't just persist because powerful actors exploit them; they persist because ordinary people follow the path of least resistance within the incentive environment they perceive. What people call akrasia or irrationality is usually tier conflict under opacity. Short-horizon relief outcompetes long-horizon goals when consequences are delayed, uncertain, or socially costly.

The Delegation Problem

One of the most predictable patterns is that people will delegate agency to institutions when doing so reduces cognitive load, uncertainty, and personal responsibility. This isn't laziness; it's rational. Self-governance requires sustained effort, delayed rewards, and potential social friction. Institutional dependence offers psychological safety, social legitimacy, and relief from the burden of decision-making.

When systems reward compliance more than they reward active participation, people naturally choose dependence over autonomy. This creates a feedback loop: institutions expand because people reward them with trust and compliance, and people become more dependent because institutions provide structured certainty. Over time, dependency becomes the rational equilibrium, even when it produces worse outcomes at the societal level.

This isn't isolated to healthcare or welfare systems, though those are obvious examples. It shows up anywhere the incentive structure makes it more rewarding to outsource decisions than to make them yourself, in financial planning, in education, in political engagement. The mechanism is the same: when the structure makes passivity profitable and agency costly, don't be surprised when people choose passivity.

Self-Opacity and Social Distortion

The incentive distortion doesn't start at the institutional level. It starts inside individual psychology. People continuously calibrate their behavior based on expected social reward and anticipated reputational cost. This calibration is rarely a conscious strategy, it is largely automatic threat and reward processing. In environments where social norms are authentic and transparent, this calibration reinforces cooperation. But when perceived norms are artificially elevated, through curated social media, competitive self-presentation, or institutional signaling, individuals face escalating pressure to conceal rather than reveal.

Most personal opacity isn't malicious. It's rational defense. When there's a gap between real behavior and perceived norms, and when visibility carries asymmetric personal cost, concealment becomes the obvious strategy. People don't hide because they're immoral. They hide because the environment punishes honesty and rewards performance.

Prediction: Systems will drift toward fragmentation or centralization in proportion to opacity. Partial transparency produces oscillation between the two

The Two-Factor Model: Incentives and Opacity

Incentives don't operate in isolation. Their behavioral force is shaped by the degree of opacity in the system, whether consequences are visible, delayed, or hidden. The two factors interact:

When incentives are aligned and consequences are transparent, cooperation and truth-seeking become rational. When incentives are misaligned and consequences are obscured, exploitation and dysfunction become rational. Aligned incentives without transparency create stagnation or gaming, because beneficial behaviors lack visible reinforcement. Misaligned incentives with transparency create rapid correction or collapse, because harmful behaviors can no longer hide.

Opacity is a force multiplier. It modulates the payoff landscape by obscuring risks and externalities. This is what distinguishes this framework from traditional economic models: behavior isn't shaped by incentive structures alone, but by how clearly those incentives and their consequences are perceived across time and social context.

The Hierarchy of Incentive Dominance

Not all incentives carry equal weight. Across evolutionary psychology, anthropology, and behavioral economics, a recurring finding is that humans weight incentives according to perceived survival, identity, and stability value. Financial gain matters, but it rarely outranks kinship loyalty, social belonging, or existential meaning when these incentives conflict.

These tiers are not beliefs people explicitly endorse, they are dominant weighting regimes that activate under different perceived conditions.

The framework recognizes six tiers of incentive dominance:

Kin: Kinship and in-group survival sits at the foundation. Evolutionary selection favors protecting genetic lineage and close allies, which manifests as nepotism, familial loyalty in hiring and business, and preferential treatment of in-group members. These incentives routinely override financial or institutional concerns.

Tribe: Social belonging and reputation operates through the neurological and psychological machinery that maintains status and avoids exclusion. The need to belong isn't optional, it's wired in. This shows up as group signaling, political tribalism, public moral alignment, and avoidance of dissent even at personal cost. Reputation incentives often dominate rational economic choice when social identity is at stake.

Safety: Personal stability and risk avoidance reflects the drive to minimize exposure to loss, uncertainty, and volatility. People sacrifice upside opportunity to maintain stability. This manifests as compliance with norms, avoidance of whistleblowing, and preference for predictable environments over uncertain but potentially better ones.

Means: Resources and proxies constitute what most people think of as "incentives", money, assets, time, social credit. But here's the critical insight: these function as a conversion mechanism, not as terminal goals. People don't pursue money for its own sake; they pursue what money converts into within their dominant tier. For someone operating from kinship incentives, resources mean security and continuity. For someone driven by belonging, they're proof of worth and status signals. For someone focused on purpose or transcendence, they represent freedom to experiment or sustain coherent principles.

Because this tier serves every other tier, it appears universally motivating. This creates the illusion that financial incentives are primary, which explains why economic policy and market design focus almost exclusively on them, and why they consistently fail to produce predicted outcomes when deeper incentives are misaligned.

Meaning: Purpose and legacy activate when lower tiers are sufficiently satisfied. This is where meaning, innovation, and long-term contribution enter the picture. People operating from this tier care about visible impact, collective benefit, and leaving something behind that matters.

Principle: Transcendence and stewardship represent the rarest but most transformative tier. This is where actors redesign incentive environments themselves, founding new governance models, creating transparency infrastructures, pursuing long-term stewardship. It's rare because it requires freedom from lower-tier constraints, but when it activates, it produces systemic change.

Understanding this hierarchy matters because it explains why purely economic incentives fail when social or existential incentives are misaligned, why reputation often outweighs money, and why people make choices that look irrational until you map which tier is actually driving them.

I’m treating the tiers as a dominance ordering here for clarity, but the more precise version is a context-sensitive weighting model, I’m deliberately not formalising that in this short summary.

Cultural Modulation

The tier structure itself appears universal, but the relative weighting and activation thresholds vary by culture. Cross-cultural research shows predictable differences: collectivist cultures consistently weight belonging over economic gain, high power-distance cultures maintain stability as dominant even when reform pressures build, and individualist or post-materialist cultures activate purpose and transcendence earlier because baseline survival needs are more secure.

This doesn't undermine the framework, it enhances it. Cultural variance is an input into incentive weighting, not an exception to incentive causality. The order of tiers remains constant; what shifts is the threshold at which a higher tier overtakes a lower one.

The Myth of Irrationality

Traditional models rely on "irrationality" to explain behavior that deviates from objective logic or financial utility. (By “irrationality” I mean an unexplained primitive. Errors, noise, limited cognition, and internal conflict are compatible with incentive primacy.) The framework rejects this entirely. What looks irrational is actually a failure of the observer to model the actor's subjective incentive structure accurately.

Most actions are locally rational relative to the weighted incentive environment at the moment of decision, even when they are globally self-defeating in hindsight, because time horizons differ. This doesn't mean all decisions are optimal in hindsight or that self-harm and regret don't exist. It means most actions maximize the actor's highest-weighted incentive at that time, whether that's financial gain, reputational protection, avoidance of shame, preservation of identity, cognitive ease, ideological coherence, or survival.

When perception is altered, by neurological conditions, trauma, information asymmetry, or environmental toxicity, behavior remains rational to that altered perception. Someone experiencing obsessive-compulsive patterns responds rationally to an environment that feels threatening even when external observers see no threat. Someone with substance dependency responds rationally to reward pathways that have been recalibrated to overweight immediate relief.

The framework doesn't explain why perceptual differences arise. It claims that behavior follows from whatever incentive landscape is perceived, and that “irrationality” is often just the observer failing to model incentives, constraints, or perception. The rationality is in the response to perception, not in the accuracy of perception itself.

This has immediate implications: any system treating behavior as irrational will misdiagnose root causes and design failed interventions. Systems that model the full incentive structure, including non-monetary drivers like identity, status, belonging, and perceived safety, produce predictable behavioral outcomes.

Morality is often post-hoc rationalisation. Not in the sense that ethics are fake, but in the sense that humans frequently use moral language to narrate and justify behaviors that were selected by incentives first. Moral stories help coordinate groups and preserve identity, but they are a weak tool for predicting behavior under pressure. Incentives predict. Moral narratives explain after.

Human behavior doesn't need to be morally scolded into shape. It needs incentive landscapes where the desired behavior is the locally rewarded equilibrium. The failure of existing systems isn't human irrationality; it's structural misalignment between what individuals are rewarded for and what systems claim to want.

If this framework holds, the implications are significant. “Teaching rationality” fails consistently not because people are stubborn or stupid, but because you can’t teach people to override their incentive structures. You can only change the terrain so truth-seeking becomes the locally rational strategy.

I’m most interested in:

Where does this model make predictions that differ from standard rationality frameworks?

What evidence would falsify the core claim that behavior is rational within its perceived incentive structure?

Are there domains where incentive primacy clearly fails to predict outcomes?

Note on AI assistance: I developed this framework over several months of independent work. For this writeup, I used Claude (Anthropic) to help convert dense academic prose into clearer language, stress-test arguments for weak points, and identify which objections to pre-empt. The core ideas, tier structure, falsification criteria, and predictions are my own. Claude's contribution was editorial: "what if you said it this way?" and "that section is unclear." I'm disclosing this because LessWrong's policy requires transparency about AI use, and because I'm arguing that transparency restores accountability, it would be hypocritical not to model it. As a systems thinker, I find one such compromise for that potential 'ability' is a contrasting inability to tackle administration/documentation. ai has been like a personal PA, doing the grunt work. I hope (and trust) that I would not be penalised for that.

LESSWRONG
LW