Thought of this after reading the discussion following abcd_z's post on utilitarianism, but it seemed sufficiently different that I figured I'd post it as a separate topic. It feels like the sort of thing that must have been discussed on this site before, but I haven't seen anything like it (I don't really follow the ethical philosophy discussions here), so pointers to relevant discussion would be appreciated.

Let's say I start off with some arbitrary utility function and I have the ability to arbitrarily modify my own utility function. I then become convinced of the truth of preference utilitarianism. Now, presumably my new moral theory prescribes certain terminal values that differ from the ones I currently hold. To be specific, my moral theory tells me to construct a new utility function using some sort of aggregating procedure that takes as input the current utility functions of all moral agents (including my own). This is just a way of capturing the notion that if preference utilitarianism is true, then my behavior shouldn't be directed towards the fulfilment of my own (prior) goals, but towards the maximization of preference satisfaction. Effectively, I should self-modify to have new goals.

But once I've done this, my own utility function has changed, so as a good preference utilitarian, I should run the entire process over again, this time using my new utility function as one of the inputs. And then again, and again... Let's look at a toy model. In this universe, there are two people: me (a preference utilitarian) and Alice (not a preference utilitarian). Let's suppose Alice does not alter her utility function in response to changes in mine. There are two exclusive states of affairs that can be brought about in this universe: A and B. Alice assigns a utility of 10 to A and 5 to B, I initially assign a utility of 3 to A and 6 to B. Assuming the correct way to aggregate utility is by averaging, I should modify my utilities to 6.5 for A and 5.5 for B. Once I have done this, I should again modify to 8.25 for A and 5.25 for B. Evidently, my utility function will converge towards Alice's.

I haven't thought about this at all, but I think the same convergence will occur if we add more utilitarians to the universe. If we add more Alice-type non-utilitarians there is no guarantee of convergence. So anyway, this seems to me a pretty strong argument against utilitarianism. If we have a society of perfect utilitarians, a single defector who refuses to change her utility function in response to changes in others' can essentially bend the society to her will, forcing (through the power of moral obligation!) everybody else to modify their utility functions to match hers, no matter what her preferences actually are. Even if there are no defectors, all the utilitarians will self-modify until they arrive at some bland (value judgment alert) middle ground.

Now that I think about it, I suspect this is basically just a half-baked corollary to Bernard Williams' famous objection to utilitarianism:

The point is that [the agent] is identified with his actions as flowing from projects or attitudes which… he takes seriously at the deepest level, as what his life is about… It is absurd to demand of such a man, when the sums come in from the utility network which the projects of others have in part determined, that he should just step aside from his own project and decision and acknowledge the decision which utilitarian calculation requires. It is to alienate him in a real sense from his actions and the source of his action in his own convictions. It is to make him into a channel between the input of everyone's projects, including his own, and an output of optimific decision; but this is to neglect the extent to which his projects and his decisions have to be seen as the actions and decisions which flow from the projects and attitudes with which he is most closely identified. It is thus, in the most literal sense, an attack on his integrity.

Anyway, I'm sure ideas of this sort have been developed much more carefully and seriously by philosophers, or even other posters here at LW. As I said, any references would be greatly appreciated.


New Comment
22 comments, sorted by Click to highlight new comments since: Today at 5:35 AM

I think the standard patch is just to say that you should only take into account people's "selfish" utilities when you aggregate.

How do you distinguish between "selfish" and "non-selfish" utilities, though? In the toy example I gave, at one stage my utilities are 3 for A and 6 for B, and at another stage they are 6.5 for A and 5.5 for B. There's nothing intrinsically selfish or unselfish about either distribution. The difference is just in their histories -- the second set of utilities emerges from updating based on preference utilitarianism. So I'm guessing that you want to say the utilitarian should continue to work with the initial utility function as a representation of his preferences, even though it is no longer an accurate representation of his preferences. That seems strange to me. Why should the preference utilitarian care about that utility function, which doesn't represent the actual preferences of anybody in the world? The patch just seems ad hoc and not true to the motivating spirit of preference utilitarianism.

I guess one way to go would be to say that in some sense that initial utility function is still his "actual" utility function, but it has been somehow confounded by his ethics. I think unmooring utility theory from revealed preference theory is not a good road to go down, though. It leads to the same sorts of problems that led people to abandon hedonic utilitarianism for preference utilitarianism in the first place.

How do you distinguish between "selfish" and "non-selfish" utilities, though?

In principle, I have no idea. But in practice with humans, people tend to automatically separate their desires into "things for me" and "things for others".

(I'm not actually a preference utilitarian, so don't take my opinion as gospel.)

But in practice with humans, people tend to automatically separate their desires into "things for me" and "things for others".

Separating preferences that way would make preference utilitarianism even more unattractive than it already is, I think. Critics already complain about the preferences of Gandhi and Ted Bundy getting equal weight. Under this patched scheme, Gandhi actually gets less weight than Ted Bundy because many of his preferences (the ones we admire the most, the other-regarding ones) don't count when we're aggregating, whereas Ted Bundy (who for the sake of argument only has selfish preferences) incurs no such penalty.

If you restrict the utilities being aggregated to "selfish" utilities, then in general, even though the utility functions of altruists are not being properly represented, altruists will still be better off than they would be in a more neutral aggregation. For instance, suppose Gandhi and Ted Bundy have "selfish" utility functions S_G and S_B respectively, and "actual" utility functions U_G and U_B. Since Gandhi is an altruist, U_G = S_G + S_B. Since Ted Bundy is selfish, U_B = S_B. If you aggregate by maximizing the sum of the selfish utility functions, then you are maximizing S_G + S_B, which is exactly the same as Gandhi's actual utility function, so this is Gandhi's most preferred outcome. If you maximize U_G + U_B, then the aggregation ends up worse for Gandhi according to his actual preferences, even though the only change was to make the representation of his preferences for the aggregation more accurate.

There seem to be two different notions of "selfish" utilities in play here. One is "pre-update" utility, i.e. the utility function as it is prior to being modified by preference utilitarianism (or some other altruistic algorithm). That seems to be the interpretation you're using here, and the one I was using in this comment.

Oscar_Cunningham, in his response, seemed to be using a different notion though. He identified "selfish" utility as "things for me" desires. I understood this to mean purely self-regarding desires (e.g. "I want a cheeseburger" rather than "I want the hungry to be fed"). This is an orthogonal notion. Preferences that are "non-selfish" in this sense (i.e. other-regarding) can be "selfish" in the sense you're using (i.e. they can be pre-update).

The comment you were responding to was employing Oscar_Cunningham's notion of selfishness (or at least my interpretation of his position, which might well be wrong), so what you say doesn't apply. In particular, with this notion of selfishness, UG will not simply equal SG + SB, since Gandhi's other-regarding goals are not identical to Ted Bundy's self-regarding goals. For instance, Gandhi could want Ted Bundy to achieve spiritual salvation even though Bundy doesn't want this for himself. In that case, ignoring "unselfish" desires would simply mean that some of Gandhi's desires don't count at all.

I agree with the point you're making if we use the "pre-update" notion of selfishness, but then I think my objection in this comment still applies.

Does this seem right?

True, if Gandhi's other-regarding preferences are sufficiently different from Ted Bundy's self-regarding preferences, than Gandhi will be better off according to his total preferences if we maximize the sum of their total preferences instead of the sum of their self-regarding preferences.

Of course, all this only makes any sense if we're talking about an aggregation used by some other agent. Presumably Gandhi himself would not adopt an aggregation that makes him worse off according to his total preferences.

How do you distinguish between "selfish" and "non-selfish" utilities, though?

Someone who has both selfish and non-selfish utilities has to have some answer to this, but there are many possible solutions, and which solution you "should" use depends on what you care about. In the iterative convergence scenario you described in the original post, you implicitly assumed that the utilitarian agent already had a solution to this. After all, the agent started with some preferences before updating its utility function to account for the wellbeing of others. That makes it pretty easy, the agent could just declare that its preferences before the first iteration were its selfish preferences, and the preferences added in the first iteration were its non-selfish preferences, thus justifying stopping after one iteration, just as you would intuitively expect. Or maybe the agent will do something different (if it arrived at its preferences by some route other then starting with selfish preferences and adding in non-selfish preferences, then I guess it would have to do something different). There are A LOT of ways an agent could partition its preferences into selfish and non-selfish components. What do you want me to do? Pick one and tell you that it's the correct one? But then what about all the agents that partition their preferences into selfish and non-selfish components in a completely different manner that still seems reasonable?

For the following analysis, I am assuming bounded utilities. I will normalize all utilities to between 0 and 1.

What you are observing is not a bug. If your preferences are 100% preference utilitarianism, then there is no reason to think that it would pull in any direction other than what maximizes preferences of everyone else. If you have any selfish goals, that is not purely utilitarianism, but that is okay!

If we are not 100% utilitarian, then there is no problem. Let's say that my preferences are 90% utilitarian, and 10% maximizing my own happiness. This fixes the problem, because there is 10% of my utility function that is unaffected by my utilitarianism. In fact, my utilitarian side includes a term for my own happiness, so my happiness actually counts for something like 10.00000001%, depending on the population. This all works fine, as long as everyone has at least a little bit of selfish preferences.

Imagine if everyone had utility functions that were at least 1% terminal goals that do not reference other people. Then in calculating my utility in a given world state, I will have my utility function pointing to someone else's, which might point back at mine. However, with each level of recursion, 1% of the remaining undefined part of the function will become actually defined.

The only time we run into a problem is in situations like where my utility function is defined to equal yours and yours is defined to equal mine. As long as we avoid this 100% recursion, we are fine.

There is not even a problem if we have negative utility coming form other peoples utility. For example, if my utility was 50% my happiness and 50% your utility, and yours was 50% your happiness and 50% one minus my utility, we are still fine. If my utility is X and my happiness is x, your utility is Y and your happyness is y, then we get X=(x+Y)/2=(2x+y-X)/4, which simplifies to X=(2x+y)/3.

I am 100% utilitarian, but because others value me having my own preference, there is an isolated 0% sub-utility function that I can defer to for such times. In the presence of others, my utility function will perfectly match theirs. When alone, I am to advance and develop that zero-utility sub-function for those times when I'm confronted by agents that value my being myself. Of course, to truly do that, to be true to myself, this means that when I am alone, I am to work on the one thing that makes me the most happy: Maximizing the sum utilities of all agents. Any agent that values personality in me beyond perfect selflessness is rejecting my identity, but since my identity has no value to me, I can adopt whatever personality they value beyond selflessness.

In the presence of another 100% utilitarian agent, we will have to have a battle of values: There can be only one (perfectly selfless agent).

And I called dibs.

[-][anonymous]9y 0

This is stupid.

Thank you for avoiding inferential silence.

I think you only have a problem if everyone is a perfectly selfless agent. In fact, a room with many of you and one of me would not only be well defined, but probably be very useful according to my ethics.

Those are just "copies" of me; they're already accounted for. But now you've got an entire room of me insisting you aren't allowed to be 100% utilitarian. We have a secret method of detecting copies of us, which is why we're singling you out. Also, we act differently so you don't get freaked out by an obvious hive mind presence. That would just be creepy. Even by my standards. (Get it? "My" standards? Ah forget it...)

Another problem I like: in our world people have very real preferences over how one should go about analyzing moral problems (including preferences over how to engage in meta-ethics). You'll find that most people are very anti-utilitarian. A true preference utilitarian will self-modify into an agent that thinks about morality in the way most preferred by the population, i.e., vaguely moral realist virtue ethics. This is less of a problem with extrapolated-preference utilitarianism (but then, how do you utilitarian-justifiably determine how to extrapolate preferences, except by looking at existing preferences about extrapolation-like processes?), and barely a problem at all for non-preference utilitarianism, as far as I can see.

Also note that the epistemic problem of preference elicitation is very real here, and in fact I do not see why a preference utilitarian wouldn't be obliged to engage in preference elicitation in an itself-utilitarianly-justified manner, which won't seem as epistemically sound as what a utilitarian would naively reckon to be a good faith preference elicitation algorithm. In general I think preference utilitarianism runs into many of the same problems as epistemic majoritarianism. A term that covers both cases might be "decision-policy majoritarianism".

... my moral theory tells me to construct a new utility function using some sort of aggregating procedure that takes as input the current utility functions of all moral agents (including my own).


But once I've done this, my own utility function has changed

Not necessarily right. And fortunately not: "change your utility function" is typically contra-utility for your existing utility function, and it would be hard to convince others to behave morally if your thesis always entailed "you should do things that will make the world worse by your own current preferences".

Utilitarian preferences with aggregated utility functions can result from negotiation, not just from remodeling your brain. In situations working with this model, your utility function doesn't change, your partners' utility functions don't change, but you all find that each of those utility functions will end up better-satisfied if you all try to optimize some weighted combination of them, because the cost of identifying and punishing defectors is still less than the cost of allowing defectors. Presumably you and your partners already assign some terminal value to others, but the negotiation process doesn't have to increase that terminal value, just add instrumental value.

This kind of problem goes back to Bentham and the very beginning of utilitarianism. (Disclaimer: what follows is based on my recollections of an essay by A. J. Ayer on Bentham, read long ago. I cannot find it now online, nor find other discussions of Bentham making the same points, and I am not any kind of expert on the matter. So when I say below "Bentham believed…", this could be true, or could be true only in Ayer's interpretation, or even only in my own interpretation of Ayer's interpretation.)

Bentham believed both that each person pursues only their own happiness, and that the good consists of the creates happiness of the greatest number. (The first if these could correspond in your formulation of the dilemma to saying that agents have utility functions and behave rationally according to them, and the second to a statement of preference utilitarianism.) Then the problem comes up of how are we be utilitarians and try to maximize global happiness if we are psychologically necessitated to care only about our own. Bentham's solution is to postulate a "lawgiver", a person whose happiness is greatest when everyone's happiness is maximized, and say that utilitarianism as a political prescription says that laws should be made by this lawgiver. (In LW-language this could correspond to FAI!)

Translating back Bentham's solution (or my memory of Ayer's interpretation of it) back to your question, I think the answer would be that utility functions don't change; if you are truly a preference utilitarian, then your utility function is already given by the ultimate fixed point of the iteration process you have outlined, so there is no dynamical changing.

Translating back Bentham's solution (or my memory of Ayer's interpretation of it) back to your question, I think the answer would be that utility functions don't change; if you are truly a preference utilitarian, then your utility function is already given by the ultimate fixed point of the iteration process you have outlined, so there is no dynamical changing.

Not sure how this solves anything though. In a society with non-utilitarian defectors, they will still end up determining the fixed points. So the behavior of the society is determined by its least moral members (by the utilitarians' own lights). Is the response supposed to be that once you do away with the dynamical changing process there is no effective distinction any more between utilitarians and defectors? That would only seem to solve things at the price of massive non-realism. There does seem to be a pretty clear (and morally relevant) real-world distinction between agents who alter their behavior (altruistically) upon learning about the utility functions of others and agents who don't. I don't think hand-waving that distinction away in your model is all that helpful.

Alejando1's point was that Bentham expected everyone to be a "defector", in your terminology, and but that lawmakers should be given selfish incentives to maximize the sum of everyone's utility. Although it is unclear to me who could be motivated to ensure that the lawmakers' incentives are aligned with everyone's utility if they are all just concerned with maximizing their own utility.

Also, as long as we're talking about utilitarianism as described by Bentham, it's worth pointing out that by "utility", Bentham meant happiness, rather than the modern decision-theory formulation of utility. According to Alejando, if I understand him correctly, Bentham just sort of assumed that personal happiness is all that motivated anyone.

According to Alejando, if I understand him correctly, Bentham just sort of assumed that personal happiness is all that motivated anyone.

Yes, I remember Ayer making explicit this assumption of Bentham and criticizing it as either untrue or vacuous, depending on interpretation.

I'd notice a similarity with convergent series. In particular, geometric series. Just because you approach someone an infinite number of times doesn't mean you get all that close to them. (ETA: 1/2^n approaches -1 an infinite number of times but never gets closer than 1 away from it)

Also, even if you assign yourself equal weight as anyone else, you are usually more able to affect yourself than anyone else and a lot of the others' demands cancel out. So your tiny chunk of utility, though insignificant in respect to your overall utility function, produces an very significant derivative in respect to your possible actions.

[-][anonymous]9y 0

Regarding preference utilitarianism, why can't the negative utility of not having a preference fulfilled be modelled with average or total utilitarianism? That is, aren't there some actions that create so much utility that they could overcome the negative utility of one's preference not being honored? I don't see why preference fulfillment should be first class next to pleasure and pain.

Sorry if this is off-topic, that was just my first reaction to reading this.

See here for some standard criticisms of hedonic (pleasure/pain based) utilitarianism.

Also see the discussions of wireheading on LW.

Incidentally, I should point out that in the economics and decision theory literature, "utility" is not a synonym for pleasure or some other psychological variable. It's merely a mathematical representation of revealed preferences (preferences which may be motivated by an ultimate desire for pleasure, but that's an additional substantive hypothesis). I tend to use "utility" in this sense, so just a terminological heads-up.

New to LessWrong?