Rationalists lose when others choose

At various times, we've argued over whether rationalists always win.  I posed Augustine's paradox of optimal repentance to argue that, in some situations, rationalists lose.  One criticism of that paradox is that its strongest forms posit a God who penalizes people for being rational.  My response was, So what?  Who ever said that nature, or people, don't penalize rationality?

There are instances where nature penalizes the rational.  For instance, revenge is irrational, but being thought of as someone who would take revenge gives advantages.1

EDIT:  Many many people immediately jumped on this, because revenge is rational in repeated interactions.  Sure.  Note the "There are instances" at the start of the sentence.  If you admit that someone, somewhere, once faced a one-shot revenge problem, then cede the point and move on.  It's just an example anyway.

Here's another instance that more closely resembles the God who punishes rationalism, in which people deliberately punish rational behavior:

If rationality means optimizing expected utility, then both social pressures and evolutionary pressures tend, on average, to bias us towards altruism.  (I'm going to assume you know this literature rather than explain it here.)  An employer or a lover would both rather have someone who is irrationally altruistic.  This means that, on this particular (and important) dimension of preference, rationality correlates with undesirability.2

<ADDED>: I originally wrote "optimizing expected selfish utility", merely to emphasize that an agent, rational or not, tries to maximize its own utility function.  I do not mean that a rational agent appears selfish by social standards.  A utility-maximizing agent is selfish by definition, because its utility function is its own.  Any altruistic behavior that results, happens only out of self-interest.  You may argue that pragmatics argue against this use of the word "selfish" because it thus adds no meaning.  Fine.  I have removed the word "selfish".

However, it really doesn't matter.  Sure, it is possible to make a rational agent that acts in ways that seem unselfish. Irrelevant.  Why would the big boss settle for "unselfish" when he can get "self-sacrificing"?  It is often possible to find an irrational agent that acts more in your interests, than any rational agent will.  The rational agent aims for equitable utility deals.  The irrational agent can be inequitable in your favor.

This whole barrage of attacks on using the world 'selfish' are yet again missing the point.  If you read the entire post, you'll see that it doesn't matter if you think that rational agents are selfish, or that they can reciprocate.  You just have to admit that most persons A would rather deal with an agent B having an altruistic bias, or a bias towards A's utilities, than an agent having no such bias.  The level of selfishness/altruism of the posited rational agent is irrelevant, because adding a bias towards person A's utility is always better for person A.  Comparing "rational unbiased person" to "altruistic idiot" is not the relevant comparison here.  Compare instead "person using decision function F with no bias" vs. "person using decision function F with excess altruism".3

(Also note that, in the fMRI example, people don't get to see your utility function.  They can't tell that you have a wonderful  Yudkowskian utility function that will make you reliable.  They can only see that you don't have the bias most people do that would make most people a better employee.)

The real tricky point of this argument is whether you can define "irrational altruism" in a way that doesn't simply mean "utility function that values altruism".  You could rephrase "Choice by others encourages bias toward altruism" as "Choice by others selects for utility functions that value altruism highly".

Does an ant have an irrationally high bias towards altruism?  It may make more sense to say that an ant is less of an invididual, and more of a subroutine, than a human is.  So it is perfectly all right with me if you prefer to say that these forces select for valuing altruism, rather than saying that they select for bias.  The outcome is the same either way:  When one agent gets to choose what other agents succeed, and that agent can observe their biases and/or decision functions, those other agents are under selection pressure to become less like individuals and more like subroutines of the choosing agent.  You can call this "altruistic bias" or you can call it "less individuality".


There are a lot of other situations where one person chooses another person, and they would rather choose someone who is biased, in ways encouraged by society or by genetics, than someone more rational.  When giving a security clearance, for example, you would rather give it to someone who loved his country emotionally, than to someone who loved his country rationally; the former is more reliable, while the rational person may suddenly reach an opposite conclusion on learning one new fact.

It's hard to tell how altruistic someone is.  But the May 29, 2009 issue of Science has an article called "The Computation of Social Behavior".  It's extremely skimpy on details, especially for a 5-page article; but the gist of it is that they can use functional magnetic resonance imaging to monitor someone making decisions, and extract some of that person's basic decision-making parameters.  For example (they mention this, although it isn't clear whether they can extract this particular parameter), their degree of altruism (the value they place on someone else's utility vs. their own utility).  Unlike a written exam, the fMRI exam can't be faked; your brain will reveal your true parameters even if you try to lie and game the exam.

So, in the future, being rational may make you unemployable and unlovable, because you'll be unable to hide your rationality.

Or maybe it already does?


Here is the big picture:  The trend in the future is likely to be one of greater and greater transparency of every agent's internal operations, whether this is via fMRI or via exchanging source code.  Rationality means acting to achieve your goals.  There will almost always be other people who are more powerful than you and who have resources that you need, and they don't want you to achieve your goals.  They want you to achieve their goals.  They will have the power and the motive to select against rationality (or to avoid building it in in the first place.)

All our experience is with economic and behavioral models that assume independent self-interested agents.  In a world where powerful people can examine the utility functions of less-powerful people, and reward them for rewriting their utility functions (or just select ones with utility functions that are favorable to the powerful people, and hence irrational), then having rational, self-interested agents is not the equilibrium outcome.

In a world in which agents like you or I are manufactured to meet the needs of more powerful agents, even more so.

You may claim that an agent can be 'rational' while trying to attain the goals of another agent.  I would instead say that it isn't an agent anymore; it's just a subroutine.

The forces I am discussing in this post try to turn agents into subroutines.  And they are getting stronger.


1 Newcomb's paradox is, strangely, more familiar to LW readers.  I suggest replacing discussions of one-boxing by discussions of taking revenge; I think the paradoxes are very similar, but the former is more confusing and further-removed from reality.  Its main advantage is that it prevents people from being distracted by discussing ways of fooling people about your intentions - which is not the solution evolution chose to that problem.

2 I'm making basically the same argument that Christians make when they say that atheists can't be trusted.  Empirical rejection of that argument does not apply to mine, for two reasons:

  1. Religions operate on pure rewards-based incentives, and hence destroy the altruistic instinct; therefore, I intuit that religious people have a disadvantage rather than an advantage compared to altruists WRT altruism.
  2. Religious people can sometimes be trusted more than atheists; the problem is that some of the things they can be trusted to do are crazy.

3 This is something LW readers do all the time:  Start reading a post, then stop in the middle and write a critical response addressing one perceived error whose truth or falsity is actually irrelevant to the logic of the post.

55 comments, sorted by
magical algorithm
Highlighting new comments since Today at 9:17 AM
Select new highlight date

It's truly amazing just how much of the posts and discussions on LW you repeatedly ignore, Phil. There is a plurality opinion here that it can be rational to execute a strategy which includes actions that don't maximize utility when considered as one-shot actions, but such that the overall strategy does better.

I can genuinely understand disagreement on this proposal, but could you at least acknowledge that the rest of us exist and say things like "first-order rationality finds revenge irrational" or "altruistic sacrifices that violate causal decision theory" instead?

"first-order rationality finds revenge irrational"

I'm not sure what you mean by "first order rationality". But whatever the definition, it seems that it's not first order rationality itself that finds revenge irrational, but your own judgment of value, that depends on preferences. An agent may well like hurting people who previously hurt it (people who have a property of having previously hurt it).

Huh— a Google search returns muddled results. I had understood first-order (instrumental) rationality to mean something like causal decision theory: that given a utility function, you extrapolate out the probable consequences of your immediate options and maximize the expected utility. The problem with this is that it doesn't take into account the problems with being modeled by others, and thus leaves you open to being exploited (Newcomblike problems, Chicken) or losing out in other ways (known-duration Prisoner's Dilemma).

I was also taking for granted what I assumed to be the setup with the revenge scenario: that the act of revenge would be a significant net loss to you (by your utility function) as well as to your target. (E.g. you're the President, and the Russians just nuked New York but promised to stop there if you don't retaliate; do you launch your nukes at Russia?)

Phil's right that a known irrational disposition towards revenge (which evolved in us for this reason) could have deterred the Russians from nuking NYC in the first place, whereas they knew they could get away with it if they knew you're a causal decision theorist. But the form of decision process I'm considering (optimizing over strategies, not actions, while taking into account others' likely decision algorithms given a known strategy for me) also knowably avenges New York, and thus deters the Russians.

EDIT: First paragraph was a reply to Vladimir's un-edited comment, in which he also asked what definition of first-order rationality I meant.

Okay. First-order rationality finds revenge irrational. I'm not ignoring it. It is simply irrelevant to the point I was making. A person who does your will because it makes them happy to do so, or because they are irrationally biased to do so, is more reliable than one who does your will as long as his calculus tells him to.

A person who does your will because it makes them happy to do so, or because they are irrationally biased to do so, is more reliable than one who does your will as long as his calculus tells him to.

Not if the latter explicitly exhibits the form of that calculus; then you can extrapolate their future decisions yourself, more easily than you can extrapolate the decisions of the former. Higher-order rationality includes finding a decision algorithm which can't be exploited if known in this manner.

Of course, actually calculating and reliably acting accordingly is a high standard for unmodified humans, and it's a meaningful question whether incremental progress toward that ideal will lead to a more reliable or less reliable agent. But that's an empirical question, not a logical one.

Not if the latter explicitly exhibits the form of that calculus; then you can extrapolate their future decisions yourself, more easily than you can extrapolate the decisions of the former.

More easily? It's more easy to predict decisions based on a calculus, than decisions based on stimulus-response? That's simply false.

Note that in the fMRI example, it is impossible to examine the calculus. You can only examine the level of bias. There is no way for somebody to say, "Oh, he's unbiased, but he has an elaborate Yudkowskian utility function that will lead him to act in ways favorable to me."

If rationality means optimizing expected selfish utility

...but it doesn't (except in the trivial sense that says any action I take to achieve my values is thus "selfish").

And it may be perfectly rational (of high instrumental value) to be significantly altruistic (in your behavior), even if you place no terminal value whatsoever on helping other people, if it's what it takes to live comfortably in society, and you value your own comfort...

Yes, thank you. I think Eliezer, Nick, and the others complaining about this are confusing "acting selfishly" with "acting in a way that society judges as selfish".

If rationality means optimizing expected selfish utility

This is a convenient word swap. Simplifying slightly, and playing a little taboo, we get:

"If you have a strictly selfish utility function, and you have a system of thinking that is especially good at satisfying this function, people will never trust you where your interests may coincide."

Well, yes. Duh.

But if people actually liked your utility function, they'd want you to be more, not less, rational. That is, if both my lover and I value each others' utility about as much as our own, we both want each other to be rational, because we'd be maximizing a very similar utility function. If, as your example requires, my coefficient for my lover's utility is zero, they'd want me to be irrational precisely because they want my behaviour to maximize a term that has no weight in my utility function (unless of course their utility function also has a zero coefficient for their utility, which would be unusual).

Rationality, as generally used on this site, refers to a method of understanding the world rather than a specific utility function. Because it has been redefined here, this seems neither insightful nor a serious problem for rationality.

That was pretty close to what "instrumental rationality" means. Utility functions are not /necessarily/ selfish - but the ones biology usually makes are.

Yes, it's trivial. That doesn't make it untrue. "Selfish" = trying to achieve your values, rather than a blend of your values and other people's values.

'selfish', as it's used in ethics and ordinary speech, is a vice involving too much concern for oneself with respect to others. If virtue theory is correct, acting selfishly is bad for oneself.

There are instances where nature penalizes the rational. For instance, revenge is irrational, but being thought of as someone who would take revenge gives advantages.

I would generally avoid calling a behavior irrational without providing specific context. Revenge is no more irrational than a peacock's tail. They are both costly signals that can result in a significant boost to your reputation in the right social context...if you are good enough to pull them off.

Well, always revenge is more rational than always forgive, anyway. I would expect most people here to know about Axelrod's tit-for-tat, so maybe Phil means something else by revenge than the obvious.

Not all revenge takes place in prisoner dilemmas. I think somebody, preferably somebody more informed than me, should write LW posts on the dynamics of repeated Chicken (there was some literature out there on this last time I looked).

There are instances where nature penalizes the rational. For instance, revenge is irrational, but being thought of as someone who would take revenge gives advantages.

My decision theory which this margin is too small to contain, would, in fact, take revenge, as well as one-boxing on Newcomb's Problem, keeping its promise to Parfit's Hitchhiker, etcetera, so long as it believed the other could correctly simulate it, or attached high probability to being correctly simulated. (Nor would it be particularly difficult to simulate! The decision is straightforward enough.)

If rationality means optimizing expected selfish utility

And having gotten that far, I gave up on the article.

My decision theory which this margin is too small to contain,

I am probably not the only individual who remains curious as to when you might stumble upon a sufficiently spacious margin for this purpose.

If you want to be thought of as someone who would take revenge, then it's rational to do what you can to obtain this status, which may or may not include actually taking revenge (you could boast about taking revenge on someone that no one you lied to is likely to meet, for example).

As for being subjected to a fMRI exam, I don't see how it's relevant. If nothing you can possibly do can have any effect on the result of the exam, then rationality (or irrationality) doesn't enter into it. Rationality is about decision-making and the beliefs that inform it; if the desired future is impossible to reach at the moment you make your decision, you haven't 'lost', because you were never in the game to begin with.

So, in the future, being rational may make you unemployable and unlovable, because you'll be unable to hide your rationality.

This seems either irrelevant or contradictory. If we're incapable of altering our behavior to account for rationality-punishers, then the issue is moot, it's just plain discrimination against a minority like any other. If we are capable, and we don't account for them, then we're not being rational.

If altering your behavior to account for rationality-punishers requires training yourself to be irrational, the issue is not moot.

I still think what you're saying is contradictory. We're using "rationality" to mean "maximizing expected utility", correct? If we are aware that certain classes of attempts to do so will be punished, then we're aware that they will not in fact maximize our expected utility, so by definition such attempts aren't rational.

It seems like you're picking and choosing which counterfactuals "count" and which ones don't. How does punishment differ from any other constraint? If I inhabited a universe in which I had an infinite amount of time and space with which to compute my decisions, I'd implement AIXI and call it good. The universe I actually inhabit requires me to sacrifice that particular form of optimality, but that doesn't mean it's irrational to make theoretically sub-optimal decisions.

This is Bayesians vs. Barbarians all over again. If it's better for you to be seen as someone who precommits to certain behaviors, be that kind of person, even if the local choices made in accordance with the precommitment look disadvantageous. By failing to follow a commitment on one occasion, you may demolish the whole cause for which precommitment was made, and so if that cause is dear to you, don't be "clever", just stick to the plan.

The Bayesians vs. Barbarians scenario is more complicated, because one can argue ways that a rational society could fend off the barbarians.

In this scenario, however, someone looks into your brain and sees how biased you are, and deliberately rejects you if you're too rational. There's no arguing around it.

But, yes, maybe this is too similar to stuff we've already gone over.

Newcomb's problem and taking revenge seem like specific instances of the more general problem of making credible commitments. Keeping promises is also a candidate for an example taken from this class of problems.

The idea in general is interesting, but this argument itself is (still) rather incoherent. "Irrational" in this context does not seem to be different from "having an unusually high coefficient for the utility of the deciding entity," and if it does, I'm really curious as to what that is. If you give actual examples of what a real irrational person would be like, it would make this argument much more coherent. Basically, you seem to be baking part of a utility function into rational and irrational, which seems wholly inappropriate.

If your idea of rational v. irrational is that if Bob has to decide between hiring "Joe" and "Joe who really, really values Bob's utility a whole lot but is in no other respect different," then it seems like you don't have much of a point. Employers/lovers/decision makers will not be facing this dichotomy, and so it is of no real concern.

Also, employers care not about how much you value their goals, but how well you accomplish them, and rationality seems to be a relevant positive in this respect.

Oh, and I'm pretty sure I can want to help my spouse accomplish her goals without being a subroutine of my spouse. The whole subroutine argument seems convoluted and, well, unrelated to rationality.

Although the consensus seems to be that this post by PhilGoetz is an unhelpful, uninformed one, I believe I got something out of it:

1) I had never before even realized the similarity between Newcomb's problem and revenge. Sorry. bows head

2) It suggests to me a better way to phrase the problem:

a) Replace Omega with "someone who's really good at reading people" and give example of how she (makes more sense as a she) caught people in lies based on subtle facial expressions, etc.

b) Restate the question as "Are you the sort of person who would one-box?" Or "Do/should you make it a habit of one-boxing in cases like this?" rather than "Would you one-box?" This subtle difference is important.

If the above is all obvious, it's because I've done a poor job following the Newcomb threads, as many here seem to think Phil did, since they didn't interest me.

For instance, revenge is irrational,

Says whom? It seems to me that revenge can easily be rationally justifiable, even if what motivates people to actually do it is usually non-rational emotional states.

It's rational for birds to build nests, but they don't do so because they possess a rational understanding of why. They don't use rationality. They don't have it. But the rational justification for their actions still exists.

I think the idea is that revenge both requires time, effort, and resources, whilst breeding further malcontent between you and the person you take revenge against, causing you to have a greater field of people who would not wish to help you, or who would work against you.

Alternately, if you were to try and make the same person like you better (though that's not always possible), it would confer more advantages to you generally.

It would, of course, depend on the situation. Perhaps not the word "revenge", but "retribution" can indeed be a calculated and well thought out effective response. Note "response", not "reaction". Revenge is a heuristic reaction, not a thought-out response.

Even if Phil's specific examples don't work, the general point does. There exists a situation in which rationality must lose:

An agent, because it is irrational or has strange motivations or for another reason, chooses to reward those agents that are irrational and punish those that are rational. It is smart enough to tell the difference.