[Adapted from an old post on my personal blog]

There's a lot of long-running arguments on the internet that basically consist of people arguing past each other due to differing basic assumptions that they don't know how to make explicit, preventing them from noticing the fundamental disagreement. I've noticed a few of these and tried to see if I can make both sides more explicit. In this post I'd like to try to explicate one.

Let's start with a concrete example; there are a number of people who would say that wireheading is a good thing, which is obviously not the general thinking on LW. What's the source of this disagreement? One possible explanation would be to say that the former are saying "happiness is our only terminal value, all other values are subsidiary to it", while the latter say hell no it's not, but I think there's more to it than that.

Without yet saying what I think the fundamental distinction is, let me give another example that I think stems from the same disagreement. Consider this essay -- and this isn't the only thing I've seen along these lines -- which takes the point of view that obviously a rational person would kill themselves, while to me this just seems... dumb.

So what's going on here? What's the actual distinction that leads to such arguments? Again, I can't know, but here's my hypothesis. I think there are two sorts of thinking going on here; I'm going to call them "goal-thinking" and "desire-thinking" (these are my own terms, feel free to devise better ones).

So -- goal thinking is thinking in terms of what I'm calling "goals". Goals are to be accomplished. If you're thinking in terms of goals, what you're afraid of is being thwarted, or having your capacity to act, to effect your goals, reduced -- being somehow disabled or restrained; if your capabilities are reduced, you have less ability to make an effect on the future and steer it towards what you want. (This is important; goal-thinking thinks in terms of preferences about the future.) The ultimate example of this is death -- if you're dead, you can't affect anything anymore. While it's possible in some unusual cases that dying could help accomplish your goals, it's pretty unlikely; most of the time, you're better off remaining alive so that you can continue to affect things. So suicide is almost always unhelpful. Goals, remember, about the world, external to oneself.

Wireheading is similarly disastrous, because it's just another means of rendering oneself inactive. We can generalize "wireheading" of course to anything that causes one to think one has accomplished one's goals when one hasn't. Or of course to having one's goals altered. We all know this argument; this is just the old "murder pill" argument. Indeed, you've likely noticed by this point that I'm just recapitulating Omohundro's basic AI drives.

Another way of putting this is, goals themselves are driving forces.

So what's the alternative, "desire-thinking", that I'm claiming is how many people think? One answer would be to say, this alternative way of thinking is that "it's all about happiness vs unhappiness" or "it's all about pleasure vs pain", thinking in terms of internal experience rather than the external state of the world -- so for instance, people thinking this way tend to focus on unhappiness, pain, and suffering as the general bad thing, rather than having one's capacity to act reduced.

But, as I basically already said above, I actually don't think this gets at the root of the distinction, because there are still things this fails to explain. For instance, I think it fails to explain the suicide article above, or, say, Buddhism; since applying the goal-thinking point of view but applied to internal experiences instead would just lead to hedonism instead. And presumably there are a number of people thinking that way! (Which may include a number of the "wireheading is good" people.) But we can basically group this in as a variant of goal-thinking. How do we explain the truly troublesome cases above, that don't fit into this?

I think what's actually going on with these cases involves not thinking in terms of goals in the above sense at all, but rather what I'm calling "desires" instead. The distinction is that whereas goals are to be accomplished, desires are to be extinguished. From a goal-thinking point of view, you can model this as having one single goal, "extinguish all desires", which is the only driving force; and the desires themselves are, just, like, objects in the model, not themselves driving forces.

So under the desire-thinking point of view, having one's desires altered can be a good thing, if the new ones are easier. If you can just make yourself not care, great. Wireheading is excellent from this point of view, and even killing oneself can work. Indeed, desire-thinking doesn't really think in terms of preferences about the future, so much as just an anticipation of having preferences in the future (about the then-present).

Now while I, and LW more generally, may sympathize more with the former point of view, it's worth noting that in reality nobody uses entirely one or the other. Or at least, it seems pretty clear that even here people won't actually endorse pure goal-thinking for humans (although it's another matter for AIs; this is one of those times when it's worth remembering that LW really has two different functions -- refining the art of human rationality, and refining the art of AI rationality, and that these are not always the same thing). While I don't have a particular link on-hand, this issue has often been discussed here before in terms of preference regarding flavors of ice cream, and how it's not clear that one should resist modifications to this; this can be explained if one imagines that desire-thinking should be applied to such cases.

Thus when Eliezer Yudkowsky says "I wouldn't want to take a pill that would cause me to want to kill people, because then maybe I'd kill people, and I don't want that", we recognize it as an important principle of decision theory; but when someone says "I don't like spinach, and I'm glad I don't, because if I liked it I'd eat it, and I just hate it", we correctly recognize this as a joke. (Despite it being isomorphic.) Still, despite people not actually being all one way or the other, I think it's a useful way of understanding some arguments that have resulted in a lot of people talking past each other.


11 comments, sorted by Click to highlight new comments since: Today at 11:02 AM
New Comment

I bite the bullet: I aim to use only goal-based thinking. (I dare say I don't completely succeed.) I may have goals like "enjoy eating a tasty meal" or "stop feeling hungry" but those are still goals rather than what you're calling desires.

I don't think the two examples in your final paragraph are isomorphic, and I think they can be seen to be non-isomorphic in purely goal-based terms.

  • All else being equal, I prefer people to live rather than die, and I prefer that my preferences be satisfied. Taking a murder-pill would mean that more people die (at my hand, even) or that my preferences go unsatisfied, or both. So (all else being equal) I don't want to take the murder-pill.
  • All else being equal, I prefer to eat things that I like and not things that I don't like. I (hypothetically) don't like spinach right now, so I don't eat spinach. But if I suddenly started liking spinach, I would become able to eat spinach and thereby eat things I like rather than things I don't. So I would expect to have more of my preferences satisfied if I started liking spinach. So (all else being equal) I do want to start liking spinach.

All of this is a matter of goals rather than (in your sense) desires. I want people to live, I want to have my preferences satisfied, I want to eat things I like, I want not to eat things I dislike.

"But", I hear you cry, "you could equally well say in the first place 'I prefer to live according to my moral principles, and at present those principles include not murdering people, but if I took the pill those preferences would change.'. And you could equally well say in the second place 'I prefer not to eat spinach, and if I started liking spinach then I'd start doing that thing I prefer not to.'. And then you'd get the opposite conclusions." But no, I could not equally well say those things: saying those things would give a wrong account of my preferences. Some of my preferences (e.g., more people living and fewer dying) are about the external world. Some (e.g., having enjoyable eating-experiences) are about my internal state. Some are a mixture of both. You can't just swap one for the other.

(There's a further complication, which is that -- so it seems to me, and I know I'm not alone -- moral values are not the same thing as preferences, even though they have a lot in common. I not only prefer people to live rather than die, I find it morally better that people live rather than die, and those are different mental phenomena.)

Some of my preferences (e.g., more people living and fewer dying) are about the external world. Some (e.g., having enjoyable eating-experiences) are about my internal state. Some are a mixture of both. You can't just swap one for the other.

A way you might distinguish these experimentally: If you are correct about your preferences, you will sometimes want to get new desires. If for example you didn't currently enjoy any kind of food, but prefer having enjoyable eating experiences, you will try to start enjoying some. The desire-agent wouldn't.

Consider this essay [...] which takes the point of view that obviously a rational person would kill themselves

That sounded interestingly different from my usual perspective, so I read it, and it doesn't seem to me to be arguing that at all? At best you could say that it's arguing that if humans were more rational, then suicide rates would go up, which seems much less controversial.

Hm, I suppose that's true. But I think the overall point still stands? It's illustrating a type of thinking that doesn't make sense to one thinking in terms of concrete, unmodifiable goals in the external world.

Is that really true? If you can have "have other people not suffer horribly" as a goal, you can have "not suffer horribly yourself" as a goal too. And if, on balance, your life seems likely to involve a lot of horrible suffering, then suicide might absolutely make sense even though it would reduce your ability to achieve your other goals.

This is perhaps an intermediate example, but I do think that once you're talking about internal experiences to be avoided, it's definitely not all the way at the goal-thinking end.

I'm not convinced. To me, at least, my goals that are about me don't feel particularly different in kind from my goals that are about other people, nor do my goals that are about experiences feel particularly different from my goals that are about things other than experiences.

(It's certainly possible to draw your dividing line between, say, "what you want for yourself" and "what other things you want", but I think that's an entirely different line from the one drawn in the OP.)

OK. I think I didn't think through my reply sufficiently. Something seemed off with what you were saying, but I failed to think through what and made a reply that didn't really make sense instead. But thinking things through a bit more now I think I can lay out my actual objection a bit more clearly.

I definitely think that if you're taking the point of view that suicide is preferable to suffering you're not applying what I'm calling goal-thinking. (Remember here that the description I laid out above is not intended as some sort of intensional definition, just my attempt to explicate this distinction I've noticed.) I don't think goal-thinking would consider nonexistence as some sort of neutral point as many do.

I think the best way of explaining this maybe is that goal-thinking -- or at-least the extreme version which nobody actually uses -- is to simply not consider happiness or suffering as whatever as separate objects worth considering at all, that can be good or bad, or that should be acted on directly; but purely as indicators of whether one is achieving one's goals -- intermediates to be eliminated. In this point of view, suffering isn't some separate thing to be gotten rid of by whatever means, but simply the internal experience of not achieving one's goals, the only proper response to which is to go out and do so. You see?

And if we continue in this direction, one can also apply this to others; so you wouldn't have "not have other people suffer horribly" as a goal in the first place. You would always phrase things in terms of other's goals, and whether they're being thwarted, rather than in terms of their experiences.

Again, none of what I'm saying here necessarily follows from what I wrote in the OP, but as I said, that was never intended as an intensional definition. I think the distinction I'm drawing makes sense regardless of whether I described it sufficiently clearly initially.

I see things slightly differently.

Happiness, suffering, etc., function as internal estimators of goal-met-ness. Like a variable in a computer program that indicates how you're doing. Hence, trying to optimize happiness directly runs the risk of finding ways to change the value of the variable without the corresponding real-world things the variable is trying to track. So far, so good.

But! That doesn't mean that happiness can't also be a thing we care about. If I can arrange for someone's goals to be 50% met and for them to feel either as if they're 40% met or as if they're 60% met, I probably choose the latter; people like feeling as if their goals are met, and I insist that it's perfectly reasonable for me to care about that as well as about their actual goals. For that matter, if someone has goals I find terrible, I may actually prefer their goals to go unmet but for them still to be happy.

I apply the same to myself -- within reason, I would prefer my happiness to overestimate rather than underestimate how well my goals are being met -- but obviously treating happiness as a goal is more dangerous there because the risk of getting seriously decoupled from my goals is greater. (I think.)

I don't think it's necessary to see nonexistence as neutral in order to prefer (in some cases, perhaps only very extreme ones) nonexistence to existence-with-great-suffering. Suffering is unpleasant. People hate it and strive to avoid it. Yes, the underlying reason for that is because this helps them achieve other goals, but I am not obliged to care only about the underlying reason. (Just as I'm not obliged to regard sex as existing only for the sake of procreation.)

I mean, are you actually disagreeing with me here? I think you're just describing an intermediate position.

I don't know for sure whether we're really disagreeing. Perhaps that's a question with no definite answer; the question's about where best to draw the boundary of an only-vaguely-defined term. But it seems like you're saying "goal-thinking must only be concerned with goals that don't involve people's happiness" and I'm saying I think that's a mistake and that the fundamental distinction is between doing something as part of a happiness-maximizing process and recognizing the layer of indirection in that and aiming at goals we can see other reasons for, which may or may not happen to involve our or someone else's happiness.

Obviously you can choose to focus only on goals that don't involve happiness in any way at all, and maybe doing so makes some of the issues clearer. But I don't think "involving happiness" / "not involving happiness" is the most fundamental criterion here; the distinction is actually, as your original terminology makes clear, between different modes of thinking.