I've seen of people on Lesswrong taking cognitive structures that I consider to be biases as terminal values. Take risk aversion for example:

Risk Aversion

For a rational agent with goals that don't include "being averse to risk", risk aversion is a bias. The correct decision theory acts on expected utility, with utility of outcomes and probability of outcomes factored apart and calculated separately. Risk aversion does not factor them.

EDIT: There is some contention on this. Just substitute "that thing minimax algorithms do" for "risk aversion" in my writing. /EDIT

A while ago, I was working through the derivation of A* and minimax planning algorithms from a Bayesian and decision-theoretic base. When I was trying to understand the relationship between them, I realized that strong risk aversion, aka minimax, saves huge amounts of computation compared to the correct decision theory, and actually becomes more optimal as the environment becomes more influenced by rational opponents. The best way win is to deny the opponents any opportunity to weaken you. That's why minimax is a good algorithm for chess.

Current theories about the origin of our intelligence say that we became smart to outsmart our opponents in complex social games. If our intelligence was built for adversarial games, I am not surprised at risk aversion.

A better theoretical replacement, and a plausible causal history for why we have the bias instead of the correct algorithm are convincing to me as an argument against risk aversion as a value the way a rectangular 13x7 pebble heap is convincing to a pebble sorter as an argument against the correctness of a heap of 91 pebbles; it seems undeniable, but I don't have access to the hidden values that would say for sure.

And yet I've seen people on LW state that their "utility function" includes risk aversion. Because I don't understand the values involved, all I can do is state the argument above and see if other people are as convinced as me.

It may seem silly to take a bias as terminal, but there are examples with similar arguments that are less clear-cut, and some that we take as uncontroversially terminal:

Responsibility and Identity

The feeling that you are responsible for some things and not others, like say, the safety of your family, but not people being tortured in Syria, seems noble and practical. But I take it to be a bias.

I'm no evolutionary psychologist, but it seems to me that feelings of responsibility are a quick hack to kick you into motion where you can affect the outcome and the utility at stake is large. For the most part, this aligns well with utilitarianism; you usually don't feel responsible for things you can't really affect, like people being tortured in Syria, or the color of the sky. You do feel responsible to pull a passed out kid off the train tracks, but maybe you don't feel responsible to give them some fashion advice.

Responsibility seems to be built on identity, so it starts to go weird when you identify or don't identify in ways that didn't happen in the ancestral environment. Maybe you identify as a citizen of the USA, but not of Syria, so you feel shame and responsibility about the US torturing people, but the people being tortured in Syria are not your responsibility, even though both cases are terrible, and there is very little you can do about either. A proper utilitarian would feel approximately the same desire to do something about each, but our responsibility hack emphasizes responsibility for the actions of the tribe you identify with.

You might feel great responsibility to defend your past actions but not those other people, even tho neither is worth "defending". A rational agent would learn from both the actions of their own past selves and those of other people without seeking to justify or condemn; they would update and move on. There is no tribal council that will exile you if you change your tune or don't defend yourself.

You might be appalled that someone wishes to stop feeling responsibility for their past selves; "but if they don't feel responsibility for their actions, what will prevent them from murdering people, or encourage them to do good?". A rational utilitarian would do good and not do evil because they wish good and non-evil to be done, instead of because of feelings of responsibility that they don't understand.

This argument is a little harder to see and possibly a little less convincing, but again I am convinced that identity and responsibility are inferior to utilitarianism, tho they may have seemed almost terminal.


Surely justice is a terminal value; it feels so noble to desire it. Again I consider the desire for justice to be a biased heuristic.

in game theory the best solution for iterated prisoners dilemma is tit-for-tat: cooperate and be nice, but punish defectors. Tit-for-tat looks a lot like our instincts for justice, and I've heard that the prisoners dilemma is a simplified analog of many of the situations that came up in the ancestral environment, so I am not surprised that we have an instinct for it.

It's nice that we have a hardware implementation of tit-for-tat, but to the extent that we take it as terminal instead of instrumental-in-some-cases, it will make mistakes. It will work well when individuals might choose to defect from the group for greater personal gain, but what if we discover, for example, that some murders are not calculated defections, but failures of self control caused by a bad upbringing and lack of education. What if we then further discover that there is a two-month training course that has a high success rate of turning murderers into productive members of society. When Dan the Deadbeat kills his girlfriend, and the psychologists tell us he is a candidate for the rehab program, we can demand justice like we feel we ought to at a cost of hundreds of thousands of dollars and a good chunk of Dan's life, or we can run Dan thru the two month training course for a few thousand dollars, transforming him into a good, normal person. People who take punishment of criminals as a terminal value will choose prison for Dan, but people with other interests would say rehab.

One problem with this story is that the two-month murder rehab seems wildly impossible, but so do all of Omega's tricks. I think it's good to stress our theories at the limits, they seem to come out stronger, even for normal cases.

I was feeling skeptical about some people's approach to justice theory when I came up with this one, so I was open to changing my understanding of justice. I am now convinced that justice and punishment instincts are instrumental, and only approximations of the correct game theory and utilitarianism. The problem is, while I was convinced, someone who takes justice as terminal, and is not open to the idea that it might be wrong, is absolutely not convinced. They will say "I don't care if it is more expensive, or that you have come up with something that 'works better', it is our responsibility to the criminal to punish them for their misdeeds.". Part of the reason for this post is that I don't know what to say to this. All I can do is state the argument that convinced me, ask if they have something to protect, and feel like I'm arguing with a rock.

Before anyone who is still with me gets enthusiastic about the idea that knowing a causal history and an instrumentally better way is enough to turn a value into a bias, consider the following:

Love, Friendship, and Flowers

See the gift we give to tomorrow. That post contains plausible histories for why we ended up with nice things like love, friendship, and beauty; and hints that could lead you to 'better' replacements made out of game theory and decision theory.

Unlike the other examples, where I felt a great "Aha!" and decided to use the superior replacements when appropriate, this time I feel scared. I thought I had it all locked out, but I've found some existential angst lurking in the basement.

Love and such seem like something to protect, like I don't care if there are better solutions to the problem they were built to solve; I don't care if game theory and decision theory leads to more optimal replication. If I'm worried that love will go away, then there's no reason I ought to let it, but these are the same arguments as the people who think justice is terminal. What is the difference that makes it right this time?

Worrying and Conclusion

One answer to this riddle is that everyone is right with respect to themselves, and there's nothing we can do about disagreements. There's nothing someone who has one interpretation can say to another to justify their values against some objective standard. By the full power of my current understanding, I'm right, but so is someone who disagrees.

On the other hand, maybe we can do some big million-variable optimization on the contradictory values and heuristics that make up ourselves and come to a reflectively coherent understanding of which are values and which are biases. Maybe none of them have to be biases; it makes sense and seems acceptable that sometimes we will have to go against one of our values for greater gain in another. Maybe I'm asking the wrong question.

I'm confused, what does LW think?


I was confused about this for a while; is it just something that we have to (Gasp!) agree to disagree about? Do we have to do a big analysis to decide once and for all which are "biases" and which are "values"? My favored solution is to dissolve the distinction between biases and values:

All our neat little mechanisms and heuristics make up our values, but they come on a continuum of importance, and some of them sabotage the rest more than others.

For example, all those nice things like love and beauty seem very important, and usually don't conflict, so they are closer to values.

Things like risk aversion and hindsight bias and such aren't terribly important, but because they prescribe otherwise stupid behavior in the decision theory/epistemological realm, they sabotage the achievement of other bias/values, and are therefore a net negative.

This can work for the high-value things like love and beauty and freedom as well: Say you are designing a machine that will achieve many of your values, being biased towards making it beautiful over functional could sabotage achievement of other values. Being biased against having powerful agents interfering with freedom can prevent you from accepting law or safety.

So debiasing is knowing how and when to override less important "values" for the sake of more important ones, like overriding your aversion to cold calculation to maximize lives saved in a shut up and multiply situation.

New Comment
125 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

All preferences have a causal history, and given that those causes tend not to care about efficiency (e.g. evolution, but also society/culture and probably others), I suspect most human "terminal" preferences are like risk-aversion: they seem suited for accomplishing some goal, but there are more efficient or accurate ways of doing so.

So should we self-modify to instead value those more efficient or accurate approaches? In the case of risk-aversion I seem to think the answer is yes, but in the case of love I seem to think that the answer is no. I am not sure why my brain is making this distinction or whether it might be legitimate.

Yup, I'm confused too.

And you didn't even pick the hardest part. One can argue that if we knew more, thought faster, were more the people we wished we were we would be free(er) of cognitive biases like risk aversion (writing that into your utility function seems simply incorrect); and anyone who isn't in favour of wireheading is likely to want to preserve some form of love. Rather, think of justice. I'm inclined to argue that justice-for-the-sake-of-justice is a lost purpose, but I'd expect vigorous disagreement even after "extrapolating"/"improving" people a fair bit.

Why is risk aversion a bias, but love is not? We know that risk aversion is strictly dominated for rational agents, but I think it likely that love is strictly dominated by some clever game-theoretic approach to mating. Why oppose wireheading, for that matter? Like eliminating risk aversion, it's a more efficient way for us to get what we want.

I am still confused.

Have you seen the new conclusion to the OP? Risk aversion has value to us, but it is a 'bias' because is sabotages the achievement of other values. Love has much value to us, but it does not systematically sabotage our other values, so it is a 'value'. The labels 'bias' and 'value' are fuzzy and quantitative.
Love does sabotage my other values. I've made career sacrifices for it (which I don't regret). Given complexity of value, most valuable things require tradeoffs. The difference between a bias and a value may be quantitative, but unless I know how to calculate it that doesn't help very much. Let's try to avoid sticking a "SOLVED" label on this problem before we've properly understood it.
Right. You do have to sacrifice some resources (time, mental energy, etc) that could be used for other things. All values do that, but some of them sabotage you more than just a reasonable resource cost. Sure it does. It helps us not be stupid about belief to know that belief is quantitative, likewise with many other things. It gives us the right mental model for thinking about it, even if we can't do actual calculations. That's why everyone is always talking about utility functions, even tho no one actually has access to one. Knowing that bias vs value is a spectrum helps us not get too concerned about placing them in hard categories. This is a good point. "Solved" is a very serious label.
I'm having to guess about the meaning of the second sentence but if I guessed right then I agree that the mode of decision making used by many people when 'love' comes into it drastically differs from a mode vaguely representing utility maximising - and often not in a healthy way!
Sorry, I had a wrong word in that sentence. Love often comes packed with some shitty thinking, but it doesn't seem to lose its value if we think rationally about it. I wasn't referring to love as a value that has more than a straightforward resource cost, I was referring to stuff like risk aversion, hindsight bias, anger and such that damage your ability to allocate resources, as opposed to just costing resources.
That's how I took it and I agree.
I take it you're aware of Eliezer's ideas about the complexity of (human) value? I'd summarize his ideas as "neither I nor an extrapolated Eliezer want to be wireheaded" (which is an observation, not an argument.) I'd also point you at Yvain's Are wireheads happy? (also consider his wanting/liking/approving classification).) These posts are a very strong argument against classical "make them want it" wireheading (which wireheads want, but do not actually enjoy all that much or approve of at all). Of course, in the least convenient possible world we'd have more sophisticated wireheading, such that the wireheads do like and approve of wireheading. Of course, this does not mean that non-wireheads desire or approve of wireheading, which brings us back to Eliezer's point. Do you feel LW's thought on this subject are incomplete?
I currently do not wish to be wireheaded, but I'm uncertain whether that aversion will turn out to be coherent. A technique for preventing AIs from wireheading has not yet been discovered, and there is no proof that one exists. So in that sense LW's thoughts on the subject are necessarily incomplete. I also don't feel that the least convenient wireheading machine has really been dealt with, that I can recall. Say a machine can really, genuinely convince me that I'm getting all the things that I want and I'm not aware I'm being deceived, so much so that I can't distinguish the real world and the machine, even in principle. I don't know what it would mean for me to say that I didn't prefer that world. What are my preferences about, at that point?
(I agree that the the thinking is incomplete but disagree on the detail regarding in which sense the thinking is incomplete.) While proving things formally about something as complicated as AI is hard it would be misleading to act as if this makes the question "is it possible to have AIs that don't wirehead" at all an open question. The probability that such an AI is impossible is sufficiently small as to be safely ignored. Objective functions which wirehead are fairly obviously just a subset of objective functions which optimize towards any arbitrary state of the universe. That paperclip maximisers could theoretically exist is not something that is an open question on LW and so therefore it would be rather incoherent for LW to have much doubt whether something in a far broader class that includes paperclippers could exist. Where the thoughts are actually more complicated and in doubt is in regards to what humans in general or ourselves in particular actually want. Introspection is far too uncertain in that regard!
While I agree completely with this comment, I'm upvoting almost entirely because of this sentence: It sort of encapsulates what attracted me to this site in the first place.
Thanks, it felt like just leaving the refutation there was more argumentative when I intended more discussion/elaboration so I tried to preemptively de-escalate and emphasize that we mostly agreed.
I agree completely that human value is particularly hard to ascertain. As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about? To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I'm still pretty uncertain.
Yes, that's one place where thinking about AIs is a bit more complex than we're used to. After all us humans seem to handling things simply - we take our input rather literally and just act. If we are creating an intelligent agent such as a paperclip maximiser however we need to both program it both to find itself within the universal wavefunction and tell it which part of the wavefunction to create paperclips in. It seems like the natural thing to create when creating an 'objective' paperclipper is one which maximises physical paperclips in the universe. This means that the clipper must make a probability estimate with regard to how likely it is to be in a simulation relative to how likely it is to be in the objective reality and then trade off it's prospects for influence. If it thinks it is in the universe without being simulated it'll merrily take over and manufacture. If it predicts that it is in a simulated 'experience machine' it may behave in whatever way it thinks will influence the creators to be most likely to allow a paperclip maximiser (itself or another - doesn't care) to escape. I would say yes to this one with perhaps less uncertainty - I have probably thought about the question somewhat more while writing a post and more attention should usually reduce uncertainty. We have a universal wavefunction, we chose a part of it that approximately represents an Everett branch and program it to "Care Here". After all if we think about preferences with respect to Many Worlds and maintain preferences that "add up to normal" then this is basically what we are doing ourselves already.
I'm not sure what motivation there might be to call that 'a wireheading machine' and not 'the universe'.
Sure? :-)
Thanks for being confused with me. It's nice to know I'm not the only one.
I would argue that your perception of bias vs. value is based on what you (unconsciously) perceive would signal higher status.
Signalling and status are useful tools, but if they can explain any behavior then they explain nothing. I want status, yes, of course, I'm human. But I also want to be loved. And I want the safety and stability that risk aversion brings. I'm not in danger of confusing every bias with a terminal value. Falling for the conjunction fallacy doesn't seem to help me get anything I want. But I am genuinely uncertain about where, whether, and how much my biases and values overlap.

For a rational agent with goals that don't include "being averse to risk", risk aversion is a bias. The correct decision theory acts on expected utility, with utility of outcomes and probability of outcomes factored apart and calculated separately. Risk aversion does not factor them.

"Risk Aversion," as a technical term, means that the utility function is concave with respect to its input, like in thelittledoctor's example. I think you're thinking of something else, like the certainty effect. But I don't know of anyone who considers the certainty effect to be a terminal goal rather than an instrumental one (woo, I don't have to compute probabilities!).

A proper utilitarian would feel approximately the same desire to do something about each

And we should be proper utilitarians... why?

what if we discover, for example, that some murders are not calculated defections, but failures of self control caused by a bad upbringing and lack of education.

Then we have evidence they will strike again.

What if we then further discover that there is a two-month training course that has a high success rate of turning murderers into productive members of society.

Does tha... (read more)

Oh, ok. I mean the affect where people make their utility functions "risk averse" to avoid bad possibilities, or just go ahead and avoid bad possibilities, and I have seen people on LW take "risk aversion" (whatever that means to them) as terminal. Because it works better for achieving values that don't include "non-utilitarianism"? Why should we be Bayesians either? Did you see the disclaimer about how this is fictional? I put that there to avoid this... The fictional psychologists assure us that Dan is curable. It would be nice to avoid murders by putting people through the course preemptively, but it's no good to give up on the course afterwards.
You don't "make" your utility function anything; it is what it is. "Risk aversion" just means that the most obvious scale for measuring your utility isn't proportional to your true utility scale. For example, one's utility for money is generally not proportional to the amount of money, and is approximately proportional only if the amount is not large compared to your current wealth. For larger quantities of money it's not even close to proportional. In particular, I -- and most people, for that matter -- do not value $2 billion twice as much as $1 billion. The positive impact on my life of gaining $1 billion dollars at my current level of wealth is vastly greater than the additional positive impact of going from $1 billion to $2 billion. Hence if I were offered a choice between a guaranteed $1 billion dollars vs. a 90% chance of gaining $2 billion dollars, I would choose the sure thing -- the expected utility of the second option is greater than 90% but less than 91% of the expected utility of the first option for me. Likewise, suppose that my net worth is $100,000. An extra $100,000 would be great, but losing $100,000 would be disastrous; it would wipe me out. So my gain in utility from gaining $100,000 is considerably less than my loss of utility from losing $100,000... and it makes sense for me to do things like insure my home. Both of these examples show me to be risk averse. It is not a strategy I have chosen; it is simply a statement of my personal utility function.
I know your position is dominant around here, but I intended to tackle it anyway. If you care about doing good, once you've handled your personal expenses, additional marginal dollars have fixed marginal utility (until you're dealing with enough money to seriously impact the global market for marginal utility). Money utility is linear between the amounts where you're worrying about personal expenses, and the amounts where you're impacting the global market for marginal utility. That's most of the range.
This may be true, and so we might expect someone who was very wealthy to lose their risk-aversion for deicisions where they were sure there was no risk of losses cutting into what they need for personal use. Sounds pretty reasonable for a risk-averse agent to me.
The flaw in this theory is that it assumes the extra money actually gets donated.
humph. if we are not assuming the money gets used, I'm not sure how we can apply any particular utility to it at all.
We can assume the money gets used on oneself, which is much more likely to happen in the stated scenario.
Does it? Because if you don't use Bayes' Rule to navigate through conditional probabilities, you will come up with answers that are objectively wrong. Yes, but in any discussion about bias context matters. If people believe that real systems that serve real people work better with justice, but we can imagine a system in which there are no obvious defects from ignoring justice, that doesn't mean those people are biased. Constructing these sorts of policy questions, particularly centered around a single scenario, typically strikes me as motivated cognition. If we'd like to be kind to murderers (how nice of us!), we can come up with a scenario that suggests that option rather than seeking vengeance (how mean!). But the same justifications can be applied in scenarios that are less contrived, where they look more questionable. Suppose a productive member of society, Mr. B, murders Mr. A because of a calculated defection and gets caught. Mr. B informs us that Mr. A was the only person he would want to murder, with several compelling reasons attached arguing the rest of us needn't fear him. Should we forgive and forget? We don't even need a 2 month training course, so it's cheaper for society, and we have the assurance of experience that Mr. B is productive. (I don't particularly want to delve into casuistry. Suffice it to say there are reasons to say Mr. B should pay the price and Dan should not, but it seems to me that those reasons do not carve reality / game theory at the joints.)
Does it not? Do we know of a better basis for decision theory? Please tell me what you know. When we are faced with having to punish someone, we want to get out of it. Punishing people sucks. The question is whether we can avoid giving the punishment, and still credibly hold the threat of punishment against rational defectors. I think in Deadbeat Dan's case, since he is not a rational defector, we can credibly hold the threat against defectors. In Mr. B's case, we don't care if he'll never do it again, we have pre-committed to punish rational defectors, and must follow thru if we wish to maintain that threat. I don't think this is a case of carving reality into qualitative categories, because we have a quantitative analysis that solves the problem (utility of letting them go vs disutility of undermining rule of law). About excuses
I know this reaction is not rational, but still, my first reaction was: In such environment (where it is possible to tell the difference between irrational and rational crime, and punish accordingly), becoming rational means losing your "get of out the jail once" card, and that's not fair! The more rational you are, the wider range of your possible crimes becomes punishable. You are being punished for being rational. Technically, a good person should not care about limiting their own crime range, and (if the good for everyone is their goal) they should be actually happy they have less chance to harm anyone. But still it somehow sucks to know that while I would be punished for doing X (because I am rational and see the consequences), other person would not be punished for doing a similar thing. I guess this intuition is based on the real-world situations, where the psychologists are not perfect, the justice is not perfect, and therefore any rule like this has big chance to be heavily abused. (As in: If you have a good lawyer, your crimes will be declared irrational, and you will be sentenced to two weeks of group therapy. Meanwhile the average Joe does the same thing and gets hanged.)
I agree with everything you said, but don't understand why you don't think it's "rational". Remember "good" and "rational" are not the same thing.
Maybe rational defector was the wrong way to put it. I don't mean punish people who test high on rationality, I mean punish in the cases where it's a calculated defection for personal gain. Punish in cases where tit for tat is actually an effective strategy. Some crimes just aren't done for personal gain, and those should have alternate strategy. Of course, what the alternate strategy is is still open, and distinguishing between them is difficult, as you say: At our level, I don't think we are able to distinguish between crimes that should get punishment, and things where punishment is ineffective. It's just useful to understand that justice is about game theory, not revenge.
I have not seen a satisfactory way to compare utilities, and so believe that actually running a utilitarian calculation is an unsolved (and I would suspect unsolvable) problem. Why should someone with this view ever be given the position of judge? I would even be leery of entrusting a child to their care for an afternoon, let alone an upbringing. (I assume that by "want to get out of it" you mean "expected negative total value" not "expected negative short-term value." One who delights in punishment is a brute, but one who shirks from meting out just punishment is infirm.) No. Next question.
Not nearly straightforward enough to use the "No. Next question." move on. Deception and various forms of active manipulation are possible. They are rational, not omniscient.
Punishing people sucks the same way paying for stuff you take sucks, or working hard to achieve your goals sucks. You should be able to conceive of the fact that short term suck can pay for long term good. Pretending that punishment is good because it pays for good is stupid and you will get confused if you think like that. A judge or parent who understands that punishment is bad is not necessarily going to not do it. They may also understand that following thru on punishment threats is necessary to keep the threat credible. Those words are loaded with connotation. Why are you using them? Say what is bad about punishing too much or too little without using words like that. You may find that too much punishment is bad because punishment is bad, and not enough punishment is bad because it fails to follow thru on the precommitment to punish that holds up the rule of law. Really? So theres no such thing as extenuating circumstances where we let someone off, but everyone understands that the threat of punishment is still there? Maybe it was an accident, maybe punishing the weather won't make it sunnier, maybe we should deal with the insane a little bit differently.
Yes, of course. Indeed, there are few long term goods that can be purchased without short term suck. But you weren't arguing that punishing criminals was a long term bad, or even insufficiently good. You were arguing that it was short term suck. Invert the order of the sentences, and you have your answer. But I will answer at length: The history and law and order is one of long and painful experience. The common law definition of "assault" did not spring forth from first principles, it was learned. The source of order is deterrence; deterrence rests on expectations; expectations rest on identities. The brute is resisted in a way that the even-handed is not; the infirm are flaunted in a way that the firm are not. Accepting any excuse reduces the credibility of the commitment. Sometimes you may think that reduction is acceptable, but you should never pretend it was absent.
Yes? Punishing criminals sucks, but it pays for the rule of law. I miss your point. still don't get it agree agree wat? I don't understand. What has identity got to do with anything? And too many loaded words. What does "even-handed" even mean, apart from "vaguely good and something to do with justice"? Agreed. I thought you meant there weren't cases that were worth it.
If you consider "not being a brute" part of your identity, you are less likely to act like a brute.
It seems like it depends on whether or not we can easily distinguish between "irrational" crime and calculated defections. In the current world, we can't, so there are game-theoretic reasons to justify similar treatment. But if we could relatively reliably differentiate, it seems like a large waste of resources avoid a cheap treatment that reduces the risk of future irrational crime to negligible levels. And I suspect that's true even if our test was only 75% accurate at telling the difference between "irrational" criminals and calculated defections. That's an interesting impression to have. Not that I know any better, but I'm doubtful of the reliability of any data because it is irrelevant to the US legal system (except for insanity type defenses, and mitigation in death penalty litigation).
Yep. But I don't see significant reason to expect detection systems to outpace tricking systems. 25 to 87% of inmates report suffering a head injury, compared to 8.5% of the general population. The high variation in reports suggests that the data isn't the best quality / most general, but with the most conservative estimate prevalence is at three times higher.
Risk aversion is separate from the properties of utility function. Being risk-averse rather means preferring a guaranteed payoff to a bet with the same expected utility. See here for a numerical example. It is possible to be risk averse even with a convex utility function.
That is a non-standard definition. (Standard definition.) Agents should always be indifferent between bets with identical expected utilities. (They do not always have to be indifferent between bets with identical expected payoffs.) Preferring a guarantee to a bet is the certainty effect, like I claimed in the grandparent.
Rational agents should be. Irrational agents - in this case, prone to risk aversion - would instead be willing to pay a finite cost for the bet to be replaced with the sure deal, thus losing utility. You can fix this by explicitly incorporating risk in the utility function, making the agent rational and not risk-averse any more.
That sounds like a ...drumroll... terminal bias. Enshrining biases as values in your utility function seems like the wrong thing to do.
This strikes me as, though I'm unsure as to which technical term applies here, 'liking your theory too much'. 'Tis necessary to calculate the probability of payoff for each of two exclusive options of identical utility in order to rationally process the choice. If Option A of Utilon 100 occurs with 80% probability and Option B also of Utilon 100 occurs with 79.9% probability, Option A is the more rational choice. To recognise the soundness of Vaniver's following statement, one must acknowledge the necessity of calculating risk. [Additionally, if two options are of unequal utility, differences in payoff probabilities become even more salient as the possible disutility of no payoff should lower one's utility estimate of whichever choice has the lower payoff probability (assuming there is one).]* Honestly the above seems so simple that I very much think I've misunderstood something, in which case please view this as a request for clarification. [...]* This also seems obvious, but on an intuitive mathematical level, thus I don't have much confidence in it; it fit better up there than down here.
Again, what you are saying is a non-standard definition. The commonly used term for the bias you're describing is certainty effect, and risk aversion is used to refer to concave utility functions.
First, concave utility function is just a model for risk aversion which is "the reluctance of a person to accept a bargain with an uncertain payoff rather than another bargain with a more certain, but possibly lower, expected payoff." (wiki) Second, the certainty effect is indeed one of the effects that is captured by my preferred model, but of course it's not limited to it, because it's possible to behave in a risk-averse manner even if none of the offered bets are certain.
Another way to interpret this situation is that the "utility function" being used to calculate the expected value is a fake utility function.
This! If you're risk averse, then you want to avoid risk, and so in the real utility calculation upon which you base your decisions the risk-averse option gets a little extra positive term for being, well, risk-averse. And then the two options no longer have the same expected utility.
Unfortunately, under your new "fixed" utility function there will again be a point of indifference at some slightly different probability/payoff combination, where you, being risk-averse, have to go for the sure deal, so you will end up stuck in an infinite recursion trying to adjust your utility function further and further. I tried to explain this more clearly here.
I don't think that follows. The risk-aversion utility attaches to the choice, not the outcome: I get extra utility for having made a choice with lower expected variance, not for one of the outcomes. If you then offer me a choice between choices, then sure, there will be more risk aversion, but I don't think it's viciously recursive.

I feel like values are defined over outcomes, while biases are defined over cognitive processes.

You could value a bias I suppose, but then you'd be valuing executing particular algorithms over like, saving the world. If that's the case, I think that the people arguing for a bias are looking for an easy way out of a problem, or more attached to their identity than I believe to be useful.

Not that I've reflected that much on it, but that's my intuition coming in.

That seems good, check out my favored solution. I'll have to think about yours more to see if it works.
So how would you apply this idea to the examples in the OP?
This is also my understanding of the distinction.

What atucker said. Someone who falls prey to the Allais paradox can't be Bayesian-rational. Someone who values love, can.

To everyone who proposes to examine the causal origins of human urges: can you try to make that idea more precise? How far back do you go? What if the causal origin is some natural process that doesn't seem to have goals?


Ok let's try this as a solution: All our neat little mechanisms and heuristics make up our values, but they come on a continuum of importance, and some of them sabotage the rest more than others.

For example, all those nice things like love and beauty seem very important, and usually don't conflict, so they are closer to values.

Things like risk aversion and hindsight bias and such aren't terribly important, but because they prescribe otherwise stupid behavior in the decision theory/epistemological realm, they sabotage the achievement of other bias/values, a... (read more)

Risk aversion as a terminal value follows pretty naturally from decreasing marginal utility. For example imagine we have a paperclip-loving agent whose utility function is equal to sqrt(x), where x is the number of paperclips in the universe. Now imagine a lottery which either creates 9 or 25 paperclips, each with 50% probability - an expected net gain of 17 paperclips. Now give the agent a choice between 16.5 paperclips or a run of this lottery. Which choice maximizes the agent's expected utility?

That's not risk aversion, it's just decreasing marginal utility. They look different to me. And it's still not a terminal value, it would be instrumental.
They're really mathematically equivalent ways of expressing the same thing. If they look different to you that's a flaw in your intuition, you may want to correct it.
Ok, let's taboo "risk aversion", I'm talking about what a minimax algorithm does, where it comes up with possibilities, rates them by utility, and takes actions to avoid the worst outcomes. This is contrasted to a system that also computes probabilities to get expected utilities, and acts to maximize that. Sure you can make your utility function strongly concave to hack the traits of the minimax system into a utility maximizer, but saying that they are "mathematically equivalent" seems to be missing the point.
That's called "certainty effect" and no one is claiming that it's a terminal value.
Ok, thanks for the terminology help.
In your example, given this utility function, risk aversion would correspond to consistently preferring guaranteed 16 paperclips to the bet you describe. In this case, by Savage's theorem (see postulate #4) there must exist a finite number δ > 0 such that you would also prefer a guaranteed payoff of 16 to the bet defined by {P(25) = 0.5 + δ, P(9) = 0.5 - δ}, costing you an expected utility of 2δ > 0.
I'm not sure I understand why. The lottery has an expected utility of (sqrt(9)+sqrt(25))/2=4, so shouldn't the agent express indifference between the lottery and 16 guaranteed paperclips? This behavior alone seems risk-averse to me, given that the lottery produces an expected (9+25)/2=17 paperclips. Sidenote, is there a way to use LaTeX on here?
John Maxwell made a LaTeX editor (which gives you Markdown code you can paste into a comment).
Sorry, I made a mistake in the example, it's of course 16 not 15. Edited to correct.
Yes, the agent should - given the defined utility function and that the agent is rational. If, however, the agent is irrational and prone to risk aversion, it will consistently prefer the sure deal to the bet, and therefore be willing to pay a finite cost for replacing the bet with the sure deal, hence losing utility.

This was an interesting exercise in denying human nature. I also wonder why you left love alone. Perhaps you can experiment with behaviors that compromise these things you consider non-terminal in favor of what you identify as their true purpose. I don't know of any argument that could be convincing but to successfully live your life as if they're not terminal. Otherwise, it seems like excessive optimism in telling stories about yourself.

I also don't see how you could maintain the impetus to really live as you suggest, without it being a hugely rewarding p... (read more)

I'm not sure I actually understand any of your comments. Can you clarify? This is generally my philosophy about these things, but then I find myself deciding that justice is a stupid heuristic for social order maintenance, but that line of thought carries through to also say things that I actually value are stupid heuristics. So I guess the point here is that there must be something missing in my understanding of why I'm convinced that some things are undesirable biases, because the current understanding also implicates values. Really it does just come down to what I value, but I just think I ought to be able to understand it.
Sorry I couldn't be clearer. I try to have something definite in mind whenever I write, but I don't do a good job communicating the complete context. I'm wondering how much flexibility any of us have in really changing our internal satisfaction points. For me, reasoning "this is really for this purpose, so I can bypass it" seems merely as plausible as any other placebo belief - thus my emphasis is on trying to really live that way for a while, rather than forming elaborate beliefs about how we should work. It's true that there's lots of variation between individuals in what self-concepts they mark as important. And some people seem to be genuinely plastic - amenable to introspective self-therapy. Those few who are can end up in interesting places if they're intelligent and striving toward improving the world, or even just their understanding of it. But as with hypnosis, I always wonder: is the explanation merely in convincing people that they're changed, or are they really changed?
Probably not much. This is what I was having trouble with. It seems like a convincing argument against a bias to know a better way to accomplish its goals and why it's done that way, but then it breaks down on other things that are closer to values. I've solved the problem for myself by dissolving the qualitative distinction between bias and value. Put them all on a bias-value space arranged by how much we like it and how much it interferes with achieving the other biases/values. If something interferes a lot (like a cognitive error), we call it a bias because following it lowers total value, if something doesn't interfere with much and seems really important (like love or beauty), we call it a value. These labels are fuzzy and transient; desire for beauty may become a bias when designing a system that may be harmed by beauty. See the new conclusion on the OP.
One approach is to make this the definition of the difference between bias and value.
This is a good idea, but I'm leaning towards dissolving the difference in favor of a bias-value spectrum based on value minus sabotage or something.

Sheridan: "What do you want?" Kosh: "Never ask that question!"

People are like dogs, they just sort of do things arbitrarily. If you look beyond the smoke and mirrors of your surface preferences, all you're going to find behind them is more smoke and mirrors. A wise man once suggested to me that I should just treat my brain as an oracle for preferences - give it as good a data as I can, and as much processing power as it needs, and just take what it spits out as gospel, rather than seeking the underlying principles.

That sounds deeply wise, but I think we can do better.
But don't you want to understand the underlying principles?

Not necessarily, your brain might have this annoying property that understanding a moral principal changes it in such a way that it no longer cares about it.

This seems relatively unlikely. I have yet to meet a person of whom I would predict that once they have a firm understanding of morality they would not have a moral aversion to, say, the possibility of raping and killing their entire family. I would also postulate that of those people who would so easily stop caring about such moral considerations when granted knowledge few of them would have a significant aversion to something as instinctively morally minor as understanding gaining. Mind you, the fact that people centered a whole morality myth around an original sin of seeking the knowledge of good and evil suggest that some people are scared of that kind of understanding!
Can you unpack your grounds for trusting your predictions about other people's reflectively consistent moral intuitions, whether about this in particular or more generally? As I think about it myself, I conclude that mostly what I have is a desire to believe that, as believing it makes me far less inclined to hide under the table at Thanksgiving. But I've certainly had the experience of having moral beliefs (or at least, things that I would have described as moral beliefs) dissolve in the face of greater understanding, so my thinking about it is muddled enough that my predictions have wide error bars. Certainly, I would agree that someone with sufficient understanding of the world will generally choose not to rape and kill their entire family when there's no net benefit to be gained by doing so, but that's not quite the same thing.
That isn't a position I have or have presented so it would not make sense for me to try to unpack the null. In particular I have huge amount of doubt regarding reflectively consistent intuitions - even my own. (ie. This is a minor and non-malicious "When will you stop beating your wife?" problem.) My prediction was in regards to counterfactual experience of aversive moral feelings in a provided extreme example of those people that I know. I expect them to get squicked out at the possibility of raping and killing their entire family even if they have understanding of the moral principles. (There may be an exception or two among people I suspect have high functioning sociopathic tendencies but these come in under the second observation regarding them not caring anyway.)
Ah, I see. Sorry, I was understanding "firm understanding of morality" in a more from-the-ground-up, nonhuman sense than I think you meant it... I think I've been overtrained on FAI conversations. Sure, an understanding of morality in a normal, real-world sense doesn't preclude -- nor even noticeably inhibit -- moral aversions at the level of "don't rape and kill my family"; absolutely agreed.
well played.

I think the reason the values/biases you described (risk aversion, justice, responsibility) initially caused you confusion is that all of them are (as other commenters pointed out) very similar to behaviors a calculating consequentialist would use to achieve its values, even if it lacked them. For instance, a consequentialist with strong desires for love and beauty, but no desire for justice, would still behave somewhat similarly to a consequentialist with a desire for justice, because it sees how taking action to deter negative behaviors by other agents ... (read more)

Why couldn't the same be said about love or beauty?
Yes, well said.

Surely justice [i.e. punishment of criminals] is a terminal value; it feels so noble to desire it.

I don't know that many people who consider punishing criminals a end in itself, as opposed to a means to rehabilitate them and/or deter other potential criminals. (Maybe that's because I'm European; I've heard that that is mainly an American thing.)

It's a recurring theme in the animal-training literature that active positive punishment (that is, doing things to an animal they don't want done, like squirting them with a water bottle or hitting them with something) is often reinforcing for the punisher. I don't doubt that a similar pattern arises when humans punish other humans, whether under the label of justice or not.
What do you think we should conclude from the fact that we evolved this behavior?
We are primates who evolved within status hierarchies. The rules imposed by high-status members of a status hierarchy are often to the direct benefit of those members, and even when they aren't, violations of those rules are nevertheless a challenge to those members' statuses. (Indeed, it's not uncommon for sufficiently intelligent high-status group members to create rules for the sole purpose of signalling their status.) Punishing rules violations (at least, if done consistently) reduces the frequency of those violations, which addresses the former threat. Doing so visibly establishes the punisher's dominance over the violator, which addresses the latter threat. Of course, as with any high-status act, it's also a way for ambitious lower-status individuals to signal status they don't have. Unlike many high-status signalling acts, though, punishing someone is relatively safe, since any attempt to censure the punisher for presumption necessarily aligns the censurer with the lower-status punishee, as well as potentially with the rule-violation itself. It ought not be surprising that we've evolved in such a way that behaviors which benefited our ancestors are reinforcing for us.
I have an off-topic question about this theory of ancestral environment. It seems to me that we would expect the behavior you describe if (1) decision theory says it is beneficial, and (2) our reward centers have sufficiently fuzzy definitions that behavioral conditioning of some kind is effective. By contract, you seem to be articulating a strong ancestral environment theory that says the beneficial aspects shown by decision theory analysis were a strong enough selection pressure that there actually are processes in the brain devoted to signalling, status, and the like. (in the same way that there are processes in the brain devoted to sight, or memory) What sort of evidence would distinguish between these two positions? Relatedly, am I understanding the positions correctly, or have I inadvertently set up a straw man?
evolutionary/cognitive boundary tl;dr: people who talk about signaling are confusing everyone.
I like that essay, which I hadn't seen before. But I'm having trouble deciphering whether it endorses what I called the strong ancestral environment hypothesis.
I'd say it doesn't endorse the strong ancestral environment hypothesis (SAEH). The most relevant part of EY's piece is, "Anything originally computed in a brain can be expected to be recomputed, on the fly, in response to changing circumstances." "Mainstream" evolutionary psychologists uphold the "massive modularity hypothesis," according to which the adaptive demands of the ancestral environment gave rise to hardwired adaptations that continue to operate despite different environmental conditions. They deny that a general purpose learning mechanism is capable of solving specific adaptive problems (recomputed on the fly). The cognitive biases are one of the evidentiary mainstays of SAEH, but they are subject to alternative interpretations. The evidence of the plasticity of the brain is perhaps the strongest evidence against massive modularity. I'd also mention that not all primate species are highly stratified. Although chimps are our closest relatives, it is far from clear that the human ancestral environment included comparable stratification. It isn't even clear that a uniform ancestral human environment existed.
That's just false, and EY really should know better.
You are either setting up a straw man, or you have identified a weakness in my thinking that I'm not seeing clearly myself. If you think it might be the latter, I'd appreciate it if you banged on it some more. Certainly, I don't mean to draw a distinction between in this thread between dedicated circuits for "signaling, status, and the like" vs. a more general cognitive capacity that has such things as potential outputs... I intended to be agnostic on that question here, as it was beside my point, although I'm certainly suggesting that if we're talking about a general cognitive capacity, the fact that it routinely gets pressed into service as a mechanism for grabbing and keeping hierarchical status is no accident. But now that you ask: I doubt that any significant chunk of our status-management behavior is hardwired in the way that, say, edge-detection in our visual cortex is, but I doubt that we're cognitively a blank slate in this regard (and that all of our status-management behavior is consequently cultural). As for what sort of evidence I'd be looking for if I wanted to make a more confident statement along these lines... hm. So, I remember some old work on reinforcement learning that demonstrates that while it's a fairly general mechanism in "higher" mammals -- that is, it pretty much works the same way for chaining any response the animal can produce to any stimulus the animal can perceive -- it's not fully general. A dog is quicker to associate a particular smell to the experience of nausea, for example, than it is to associate a particular color to that experience, and more likely to associate a color than a smell to the experience of electric shock. (I'm remembering something from 20 years ago, here, so I'm probably getting it wrong, and it might be outdated anyway. I mean it here only as illustration.) That's the kind of thing I'm talking about: a generalized faculty that is genetically biased towards drawing particular conclusions (whether that bias
To parallel what TheOtherDave said, is it really the case that the retributive theory of justice is essentially rejected in Europe? That said, my impression is that the US is more concerned about this principle than Europe, which I suspect is related to the fact that the US is more religious than Europe.
That's what I thought, until I tried talking to people about how justice could be improved. Some people really do take punishment of criminals as terminal. There are some in this very thread.
I assign greater preferences for universes in which those who make certain actions experience outcomes lower in their preferences than they would have if they had not committed those acts, all else being equal. This roughly translates into treating punishment for certain things as a terminal value as well as an instrumental one. This position does not strike me as one particularly out of accord with reasonable human values.
I've since improved my metaethics to acknoledge that I want punishment for criminals, but it is a rather small want, vastly overpowered by social-good game theory considerations.
(I wonder whether they wouldn't like a world without crime because that would mean there's no-one to punish.)
It's not so much that U(punish criminals) is high it's that U(punish criminal | criminal) is high.

I'm working under the assumption everything individually is a bias until proven otherwise, and find it very unlikely such a proof will be available before the singularity, and after the singularity happens being biased doesn't matter all that much anyway. This also doubles as a safeguard against attempting to make an FAI on my own implementing only my values since by the time I finished no such thing would exist in any meaningful sense.

I have a question for those who claim not to value justice. Would you support introducing something like the nine familial exterminations, basically punishing friends and relatives of the criminal, if its deterrent effect was shown to decrease crime?

No, because that sounds way (morally) expensive. Almost any additional punishment would reduce crime. The question is is it worth it? I think punishments are probably too harsh generally.
I meant that it would reduce crime by an amount greater then the amount of suffering the punishment generated. For example, suppose a man gets fed up and goes on a killing spree, ultimately ending in suicide. However, he has a daughter that he care about and you (somehow) know that he would have been less inclined to go on a killing spree if he thought his daughter would be punished as a result. Would you favor punishing the daughter? Are you sure this isn't just because the image of a harsh punishment is more available then a vague dispersed deterrent effect? Keep in mind, the harsher the punishment, the more effective the deterrent, the less the punishment actually gets carried out.
This is an interesting one. The naive answer is that it doesn't matter who gets punished as long as the incentive is strong enough to overcome the disutility of punishment. You'd have a hard time showing that the incentive is enough, tho: 1. This is not a rational defection, so punishment is only partially useful. 2. Punishing someone else is not as strong a disincentive as punishing the perp. Family bonds are usually weaker than self preservation, and in many cases, totally absent. If you modified it to see if the family bonds were strong before punishing someone else, that would create an incentive to not associate with family. 3. This policy puts a load on the family that is present even if no crime is ever committed. Putting this load on everybody for the sake of a decrease in an already small crime rate probably isn't worth it.
Add in a fourth consideration: If my brother is plotting to overthrow the government and this kind of system is in place I have just been given a massive incentive to assist him in every way possible - and nearly all negative incentive is removed.
Suppose the man also has an "irrational" preference that his daughter not be punished. In my example the man commited suicide so he obviously didn't value his own like as much as his daughter's. There have existed societies where family members were punished for crimes, what you are describing didn't happen in those societies. I'm not quite sure what you mean by this. In any case two comments on the style of your reply. 1. You seem to be trying to argue that this scenario isn't plausible, I find it much more plausible then your two-month rehab scenario, or for that matter any type of reliable rehab. 2. You seem to be displaying the symptoms of trying to defend a fake utility function, i.e., trying to argue why your stated preferences won't force you to do things you find morally repugnant, rather than trying to find preferences that match your intuitions of what's morally repugnant.
I think it needs to be said that punishment is more about sending a message to future possible-criminals than it is about the guy who actually committed the crime. hmm. I don't think a magical reliable rehab is likely to be discovered any time soon, But given the magical rehab and magical psychologists, I don't think there are further problems in my reasoning for rejecting punishment in that case. I find the idea that family punishment would be an effective method of law to be somewhat implausible. oh well. That could be. I will reconsider most of this stuff. Talking right at the level of what I feel about this, without trying to rationalize from other preferences, I am slightly averse to punishing criminals, but accept it's necessity in some cases. I am strongly averse to the idea of punishing innocent people, even if it were as effective, I further find it highly unlikely that you could make it effective, but I think that's unrelated to the preferences.
See the original meaning of "hostage" for some historical examples.
Provided the probability of punishment stays the same.

I think you misdiagnosed the source of responsibility. In keeping with your Syria example, suppose you gave some Syrian advise on how to rebel. Then you'd probably feel responsibility for him even if you don't identify with Syrians. I would argue that responsibility is more based on a (possibly implicit) contract (e.g., if you give advise thus you are responsible for its quality) then on identity.

I think that at the point where you give them advice, if you identify with your actions, that action becomes part of your identity, so the responsibility-is-a-hack-on-identity (or is it the other way) hypothesis also predicts feelings of responsibility. Maybe you have other examples that might better distinguish them?
For example, while I was a TA I felt much more responsible for helping students during office hours then at other times, even though I don't think how much I identified with then changed during those times.
Hmm, that's a good one. Your contract model seems reasonable enough. I'll think about it some more.

Not to be rude, but this article is terminally confused. The principles of rationality do not tell you what your values should be; rather, they guide you in achieving whatever your values actually are. The principles and processes of rational thought and action are the same for all of us, but they lead to different prescriptions for different people. What is a rational action for me is not always a rational action for you, and vice versa, not only because our circumstances are different (and hence we will get different results) but because our values are ... (read more)

Are they really? Our understandings are certainly different, but that can change. it's a bias if it keeps you from achieving your other values. Hmm. Nothing I can say to that except that I think a lot of things aren't worth doing for the satisfaction of punishment, like for example, ruining someones life at a cost of hundreds of thousands of dollars. That's a good point, people you have a connection to carry much greater weight in your utility function.
This is nonsense. I have a desire for ice cream, but also a desire to stick to my diet and lose weight. Oh no, my desire to stick to my diet is preventing my from achieving my desire for ice cream, it must be a bias!
Agreed, but nyan_sandwich touches on an interesting point. Certainly, there are lots of situations where I have one set of cognitive structures that encourage me to behave one way (say, eating ice cream, or experimenting to discover what's actually true about my environment, or whatever) and a different set encouraging me to behave a different way (say, avoiding empty calories, or having high confidence in what I was taught as a child, or whatever). It seems to me that when I call one of those structures a "bias" I'm suggesting two things: first, that I don't endorse it, and second, that it's relatively broad in applicability. But that in turn suggests that I can eliminate a bias simply by endorsing its conclusions, which is.. not uncontroversial.
If they conflict, one or the other is currently a 'bias'. You get to decide which one you like more. Is eating ice-cream more important than your desire to stay healthy? You must overcome your desire to stay healthy. Is staying healthy more important than eating ice-cream? Then you must overcome the desire to eat ice-cream. 'bias' is a fuzzy category referring to the corner of the conflict*value space where value is low and conflict with other values is high. Stretched all the way over to ice-cream and health, it starts to lose meaning. Just talk about which one you want to overcome.
That's really not how I would understand a bias. I would think of a bias as a feature of your psychology that distorts your decision-making process away from the rational; that is, optimal pursuit of your goals. The planning fallacy is a bias, having conflicting goals is just a feature of my utility function.