You cannot be mistaken about (not) wanting to wirehead

In the comments of Welcome to Heaven, Wei Dai brings up the argument that even though we may not want to be wireheaded now, our wireheaded selves would probably prefer to be wireheaded. Therefore we might be mistaken about what we really want. (Correction: what Wei actually said was that an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like, not that our wireheaded selves would prefer to be wireheaded.)

This is an argument I've heard frequently, one which I've even used myself. But I don't think it holds up. More generally, I don't think any argument that says one is wrong about what they want holds up.

To take the example of wireheading. It is not an inherent property of minds that they'll become desperately addicted to anything that feels sufficiently good. Even from our own experience, we know that there are plenty of things that feel really good, but we don't immediately crave for more afterwards. Sex might be great, but you can still afterwards get fatigued enough that you want to rest; eating good food might be enjoyable, but at some point you get full. The classic counter-example is that of the rats who could pull a lever stimulating a part of their brain, and ended up compulsively pulling it, to the exclusion of all else. People thought this to mean they were caught in a loop of stimulating their "pleasure center", but it later turned out that wasn't the case. Instead, the rats were stimulating their "wants to seek out things -center".

The systems for experiencing pleasure and for wanting to seek out pleasure are separate ones. One can find something pleasurable, but still not develop a desire to seek it out. I'm sure all of you have had times when you haven't felt the urge to participate in a particular activity, even though you knew you'd enjoy the activity in question if you just got around doing it. Conversly, one can also have a desire to seek out something, but still not find it pleasurable when it's achieved.

Therefore, it is not an inherent property of wireheading that we'd automatically end up wanting it. Sure, you could wirehead someone in such a way that the person stopped wanting anything else, but you could also wirehead them in such a way that they were indifferent to whether or not it continued. You could even wirehead them in such a way that they enjoyed every minute of it, but at the same time wanted it to stop.

"Am I mistaken about wanting to be wireheaded?" is a wrong question. You might afterwards think you actually prefer to be wireheaded, or think you prefer not to be wireheaded, but that is purely a question of how you define the term "wireheading". Is it a procedure that makes you want it, or is it not? Furthermore, even if we define wireheading so that you'd prefer it afterwards, that says nothing about the moral worth of wireheading somebody.

If you're not convinced about that last bit, consider the case of "anti-wireheading": we rewire somebody so that they experience terrible, horrible, excruciating pain. We also rewire them so that regardless, they seek to maintain their current state. In fact, if they somehow stop feeling pain, they'll compulsively seek a return to their previous hellish state. Would you say it was okay to anti-wirehead them, since an anti-wirehead will realize they were mistaken about not wanting to be an anti-wirehead? Probably not.

In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense. Previously the person didn't want X, but after trying it out they did want X. X has caused a change in their preferences by altering their brain. This doesn't mean that the pre-X person was wrong, it just means the post-X person has been changed. With the correct technology, anyone can be changed to prefer anything.

You can still be mistaken about whether or not you'll like something, of course. But that's distinct from whether or not you want it.

Note that this makes any thoughts along the lines of "an FAI might extrapolate the desires you had if you were more intelligent" tricky. It could just as well extrapolate the desires we had if we'd had our brains altered in some other way. What makes one method of mind alteration more acceptable than another? "Whether we'd consent to it now" is one obvious-seeming answer, but that too is filled with pitfalls. (For instance, what about our anti-wirehead?)

79 comments, sorted by
magical algorithm
Highlighting new comments since Today at 4:33 AM
Select new highlight date
Moderation Guidelines: expand_more

I'm really surprised that on a site called "Less Wrong", there isn't more skepticism about an argument that one can't be wrong about X, especially when X isn't just one statement but a large category of statements. That doesn't scream out "hold on a second!" to anyone?

Eyup. Humans can be wrong about anything. It's like our superpower.

It might be that he can't be wrong about that, even though he doesn't know for sure that he can't be wrong about it. Infallibility and certainty are distinct concepts.

Certainty (confidence, etc.) is in the mind. Fallibility isn't; you can be prone (or immune) to error even if no one thinks you are.

The point is that 'What if I couldn't be wrong about it?' does not express 'What if I could be certain that I couldn't be wrong about it?'; the latter requires that 1 be a probability, but the former does not, since I might be unable to be wrong about X and yet only assign, say, a .8 probability to X's being true (because I don't assign probability 1 to my own infallibility).

Certainty (confidence, etc.) is in the mind. Fallibility isn't; you can be prone (or immune) to error even if no one thinks you are.

Though no one could ever possibly know. Seriously: fallibility is in the mind. It's a measure of how likely something is to fail; likelihoods are probabilities - and probabilities are (best thought of as being) in the mind.

Rigorously, I think the argument doesn't stand up in its ultimate form. But it's tiptoing in the direction of a very interesting point on how to deal with changing utility functions, especially in circumstances where the changes might be predictable.

The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory. Nor is "judge everything that occures in your future by your utility function at the time", because of lobotomies, addicting wireheading, and so on. Some people have utility functions that they expect will change; and the degree of change allowable may vary from person to person and subject to subject (eg, people opposed to polygamy may have a wide range of reactions to the announcement "in fifty years time, you will approve of polygamy"). Some people trust their own CEV; I never would, but I might trust it one level removed.

It's a difficult subject, and my upvote was in thanks of bringing it up. Susequent posts on the subject I'll judge more harshly.

The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory.

It sounds satisfactory for agents that have utility functions. Humans don't (unless you mean implicit utility functions under reflection, to the extent that different possible reflections converge), and I think it's really misleading to talk as if we do.

Also, while this is just me, I strongly doubt our notional-utility-functions-upon-reflection contain anything as specific as preferences about polygamy.

Also, while this is just me, I strongly doubt our notional-utility-functions-upon-reflection contain anything as specific as preferences about polygamy.

That was just an example; people react differently to the idea that their values may change in the future, depending on the person and depending on the value.

How about "judge by both utility functions and use the most pessimistic result"?

If you take a utility function and multiply all the utilities by 0.01, is it the same utility function? In one sense it is, but by your measure it will always win a "most pessimistic" contest.

Update: thinking about this further, if the only allowable operations on utilities are comparison and weighted sum, then you can multiply by any positive constant or add and subtract any constant and preserve isomorphism. Is there a name for this mathematical object?

Affine transformations. Utility functions are defined up to affine transformation.

In particular, this means that nothing has "positive utility" or "negative utility", only greater or lesser utility compared to something else.

ETA: If you want to compare two different people's utilities, it can't be done without introducing further structure to enable that comparison. This is required for any sort of felicific calculus.

There's a name I can't remember for the "number line with no zero" where you're only able to refer to relative positions, not absolute ones. I'm looking for a name for the "number line with no zero and no scale", which is invariant not just under translation but under any affine transformation with positive determinant.

I'm in an elementary statistics class right now and we just heard about “levels of measurement” which seem to make these distinctions: your first is the interval scale, and second the ordinal scale.

The "number line with no zero, but a uniquely preferred scale" isn't in that list of measurement types; and it says the "number line with no zero and no scale" is the interval scale.

A utility function is just a representation of preference ordering. Presumably those properties would hold for anything that is merely an ordering making use of numbers.

You also need the conditions of the utility theorem to hold. A preference ordering only gives you conditions 1 and 2 of the theorem as stated in the link.

Good point. I was effectively entirely leaving out the "mathematical" in "mathematical representation of preference ordering". As I stated it, you couldn't expect to aggregate utiles.

You can't aggregate utils; you can only take their weighted sums. You can aggregate changes in utils though.

I completely agree. The argument may be wrong but the point it raises, that sloppily assuming things about which possible causal continuations of self I care about, is important.

My initial reaction: we can still use our current utility function, but make sure the CEV analysis or whatever doesn't say "what would you want if you were more intelligentetc?" but instead "what would you want if you were changed in a way you currently want to be changed"?

This includes "what would you want if we found fixed points of iterated changes based on previous preferences", so that if I currently want to value paperclips more but don't care whether I value factories differently, but if upon modifying me to value paperclips more it turns out I would want to value factories more, then changing my preferences to value factories more is acceptable.

The part where I'm getting confused right now (rather, the part where I notice I'm getting confused :)) is that calculating fixed points almost certainly depends on the order of alteration, so that there are lots of different future-mes that I prefer to current-me that are at local maximums.

Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.

I don't think the order issue is a big problem - there is not One Glowing Solution, we just need to find something nice and tolerable.

Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.

That is the question.

I think your heuristic is sound - that seemed screamingly wrong to me as well.

Incorrigibility is way too strong an assertion, but there's a sense in which I cannot be completely wrong about my values, since I'm the only source of information about them; except perhaps to the extent that you can infer them from my fellow human beings, and to that extent humanity as a whole cannot be completely mistaken about its values.

I suspect there may be an analogy with Donaldson's observation that if you think penguins are tiny burrowing insects that live in the Sahara, you're not so much mistaken about penguins as not talking about them at all. However, I can't completely make this analogy work.

How about if X is a set of assertions that logical tautologies are true:

http://en.wikipedia.org/wiki/Tautology_(logic))

http://en.wikipedia.org/wiki/Tautology_(logic)#Definition_and_examples#Definition_and_examples)

An example along similar lines to this post would be: you can't be wrong about thinking you are thinking about X - if you are thinking about X.

Now that is a overconfidence/independent statements anecdote I'll remember. The '7 is prime probability 1' part too.

Nah, these are not "independent" statements, they are all much the same:

They are "I want X" statements.

P v -p is disputed, so someone is wrong there. Also, if you have ever done a 10+ line proof or 10+ place truth table you know it is trivially (pun intended) easy to get those wrong.

I think the concept of a thought and what it is for a thought to be about something needs to be refined before we can say more about the second example. To begin with, if I see a dragonfly and mistake it for a fairy and then start to think about the fairy I saw, it isn't clear that I really am thinking about a fairy.

This conclusion is too strong, because there's a clear distinction that we (or at least I) make intuitively that is incompatible with this reasoning.

Consider the following:

I don't want to try sushi. A friend convinces/bribes/coerces me to try sushi. It turns out I really like sushi, and eat it all the time afterward.

I don't want to try wireheading. I am convinced/bribed/coerced to try wireheading. I really like wireheading, and don't want to stop doing it.

These sequences are superficially identical. Kaj's construction of want suggests I could not have been mistaken about my desire for sushi. However, intuitively and in common language, it makes sense to say that I was mistaken about my desire for sushi. There is, however, something different about saying I was mistaken in not wanting to wirehead. It's an issue of values.

Consider the ardent vegetarian who is coercively fed beef, and likes beef so much that he lacks the willpower to avoid eating it, even though it causes him tremendous psychic distress to do so. It seems reasonable to say he was correct in not wanting to eat beef, and have this judgement be entirely consistent with my being incorrect about not wanting to eat sushi. The issue is whether my action has a non-hedonic value. Eating sushi (for me) does not. Eating beef for him does. His hedonic values get in the way of his utilitarian values.

This dilemma actually integrates a number of rather complex problems. I'm hereby precommitting to making a top-level post about this before Friday. Let's hope it works.

A possible solution to this: The person who does not want to try sushi thinks he will dislike it and say "Yuck!" He actually enjoys it. He is wrong in that he anticipated something different from what happened. A person who does not want to wirehead will anticipate enjoying it immensely, and this will be accurate. The first person's decision to try to avoid sushi is based on a mistaken anticipation, but the second person's decision to avoid wireheading takes into account a correct anticipation.

See my reply to zero_call below. Yes, in baseline humans and with current technology, it does make sense to use the expression "true desire". As technology improves, however, you'll need to define it more and more rigorously. Defining it by reference to your current values is one way.

What makes one method of mind alteration more acceptable than another?

It so happens that there are people working on this problem right now. See for example the current discussion taking place on Vladmir Nesov's blog.

As a preliminary step we can categorize the ways that our "wants" can change as follows (these are mostly taken from a comment by Andreas):

  1. resolving a logical uncertainty
  2. updating in light of new evidence
  3. correcting a past computational error
  4. forgetting information
  5. committing a new computational error
  6. unintentional physical modification (i.e., brain damage)
  7. intentional physical modification
  8. other

Can we agree that categories 1, 2, and 3 are acceptable, 5 and 6 are unacceptable, and 4, 7, and 8 are "it depends"?

The change that I suggested in my argument belongs to category 2, updating in light of new evidence. I wrote that the FAI would "try to extrapolate what your preferences would be if you knew what it felt like to be wireheaded." Does that seem more reasonable now?

For instance, what about our anti-wirehead?

If the FAI tries to extrapolate whether you'd want to be anti-wireheaded if you knew what it felt like to be anti-wireheaded, the obvious answer is no. You seem to assume that the FAI would instead try to predict whether you'd prefer to be anti-wireheaded after you were actually anti-wireheaded, but that change would be more like category 6.

I'm not entirely sure if it's alright to alter someone's mind to update in light of new evidence if they didn't want to update. The same goes for the 1 and 3.

But let's assume, for the sake of argument, that we accept your categorization. Or let's at least assume that the person in question doesn't mind the updating. It seems to me that there are two possible kinds of knowledge about what wireheading feels like, and we must distinguish between which one we mean.

The first kind is abstract, declarative knowledge. This may affect our (instrumental?) preferences, depending on our existing preferences. For instance, I know that people choosing where to live underestimate the effect travel times have on their happiness and overestimate the effect that the amount of space has on their happiness. Knowing this, and preferring to be happy, I might choose a different home than I otherwise would have. I presume you don't mean this kind of knowledge, as we already know in the abstract that wireheading would be the best feeling we could ever possibly experience.

The second kind is a more visceral, experienced kind of knowledge, the knowledge of what it really feels like. Knowing what it feels like to be a bat, to use Nagel's classic example. Here it becomes tricky. It's an open question to what degree you can really add this kind of a knowledge to someone's mind, as the recollection of the experience is necessarily incomplete. We might remember being happy or wireheaded, but just the act of recalling it doesn't return us to a state of mind where we are just as happy as we were back then. Instead we have an abstract memory of having been happy, which possibly activates other emotions on our mind, depending on what sorts of associations have built up around the memory. We might feel an uplifting echo of that happiness, a longing to experience it again, bitterness or sorrow about being unable to relive it, or just a blank indifference.

If an FAI simply simulates a state of mind where knowledge of the experience of wireheadedness has been added, I don't think that will change the person's preferences at all. The recollection of the wirehead state has just became an abstractly recalled piece of knowledge, without any emotional or motivational triggers that would affect one's preferences in any way.

Let me try a different tack here. Suppose you have in front of you two flavors of ice cream. You don't know what they taste like, but you prefer the red one because you like red and that's the only thing you have to go on. Now an FAI comes along and tells you that it predicts if you knew what the flavors taste like, you'd choose the blue one instead. Do you not switch to the blue one?

I presume you don't mean this kind of knowledge, as we already know in the abstract that wireheading would be the best feeling we could ever possibly experience.

Know that it's the "best" is hardly having full declarative knowledge, when we don't know how good "best" is.

If an FAI simply simulates a state of mind where knowledge of the experience of wireheadedness has been added, I don't think that will change the person's preferences at all. The recollection of the wirehead state has just became an abstractly recalled piece of knowledge, without any emotional or motivational triggers that would affect one's preferences in any way.

I don't see how that makes any sense, given my ice cream example.

In the ice cream example, yes, I'll switch to the blue one. But that one is like my previous example of choosing where to live: I switched because I gained information that allowed me to better fulfill my intrinsic preferences. It's not that my actual preferences would have changed. If my preference would have been "I want to eat the best ice cream I can have, for as long as the taste doesn't come from a blue ice cream", (analogous to "I want to experience the best life there is, for as long as the enjoyment doesn't come from wireheading"), I wouldn't have switched.

Know that it's the "best" is hardly having full declarative knowledge, when we don't know how good "best" is.

Fair enough. But even if a person declining to be wireheaded was provided information of exactly how much better "best" would be, I doubt that would sway very many of them. (Though it may sway some, and in that case yes, an FAI telling them this could make them switch.)

I don't see how that makes any sense, given my ice cream example.

Sorry, poor wording on my behalf. Let me reword it:

"If an FAI simply simulates a state of mind where a memory of the experience of wireheadedness has been added, I don't think that will change the person's preferences at all. The recollection of the wirehead state is just the previously known 'wireheading is a thousand times better than any other pleasure I could have' knowledge, stored in a different format. But if no emotional or motivational associations are added, having the same information in a different format shouldn't change any preferences."

I think that resolves most of our disagreement, and I'll think a bit more about your current position. (Have to go to sleep now.) In the mean time, can you please make a correction to your post? As you can see, my argument isn't "our wireheaded selves would probably prefer to be wireheaded" but rather "an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like." I guess you had in your mind the previous argument you heard from others, and conflated mine with theirs.

If my preference would have been "I want to eat the best ice cream I can have, for as long as the taste doesn't come from a blue ice cream", (analogous to "I want to experience the best life there is, for as long as the enjoyment doesn't come from wireheading"), I wouldn't have switched.

But such a preference is neurotic. Wire-heading isn't a discrete, easily distinguishable category. Any number of improvements to your mind are possible. If we start at the very lowest end, chances are that, most of the improvements, you would welcome. Once you have been given those improvements, you would find the next level of improvement desirable. Eventually, you are at the level just below a total wire-head, and you can clearly see that wire-heading is the way to be.

Yet, if you're given the choice upfront, you will refuse to be a wire-head. This is essentially due to pre-conceived (probably wrong) notions of what matters and what wire-heading is. And the FAI would be correct in fixing you, just like it would be correct in fixing a depressed patient.

Good news for you then: Humans are not understimulated rats. There was an experiment where some psychologists gave some subjects electrodes and a device which stimulated their "reward center" (this was back when it was believed that dopamine was the happiness chemical and desire-wireheading was the same as happiness-wireheading) whenever they pushed a button. They also recorded every time the button was pushed. The subjects carried the electrodes for a while (I believe it was a week) and then returned them. All the subjects went about their lives, doing normal things with about their normal amount of motivation. All of them used the button at least a few times and reported that they liked it. But only one guy used it more than ten times per day, and he was intentionally (but unsuccessfully) using it for classical conditioning.

This is the best I find right now and I need to go to bed. They retell the same anecdote that I referred to at the end of that piece.

Here is the relevant part:

Heath tells us some of his patients were given "self-stimulators" similar to the ones used by Old's rats. Whenever he felt the urge, the patient could push any of 3 or 4 buttons on the self-stimulator hooked to his belt. Each button was connected to an electrode implanted in a different part of his brain, and the device kept track of the number of times he stimulated each site. ... We ask Heath if human beings are as compulsive about pleasure as the rats of Old's laboratory that self-stimulated until they passed out. "No," he tells us. "People don't self-stimulate constantly -- as long as they're feeling good. Only when they're depressed does the stimulation trigger a big response. There are so many factors that play into a human being's pleasure response: your experience, your memory system, sensory cues..." he muses.

Though in the version I read several years ago the events were in a different order. And they were actually talking about this as a means to reach the happy equilibrium that Kaj is talking about, so they talked much more about the other subjects in the experiment. I had forgotten that Heath interfered with the gay guy after, because that was kind of downplayed.

I imagine the ultimate wireheading would involve complete happiness and interfacing with the FAI's consciousness, experiencing much more than is possible by a solitary mind.

Now an FAI comes along and tells you that it predicts if you knew what the flavors taste like, you'd choose the blue one instead. Do you not switch to the blue one?

There's a rather enormous leap between the FAI saying, "Y'know, I think you'd like that one more," and the FAI altering your brain so you select that one. Providing new information simply isn't altering someone's mind in this context.

Can we agree that categories 1, 2, and 3 are acceptable, 5 and 6 are unacceptable, and 4, 7, and 8 are "it depends"?

No. If someone -- my next-door neighbor, my doctor, the government, a fictional genie, whoever -- is proposing to rewire my brain, my informed consent beforehand is the only thing that can make it acceptable.

Are you making this as a statement of personal preference, or general policy? What if it becomes practically impossible for a person to give informed consent, as in cases of extreme mental disability?

General policy. For example, if Wei Dai chooses the wirehead route, I might think he's missing out on a lot of other things life has to offer, but that doesn't give me the right to forcibly unwirehead him, any more than he has the right to do the reverse to me.

In other words, he and I have two separate disagreements: of value axioms, whether there should be more to life than wireheading (which is a matter of personal preference), and of moral axioms, whether it's okay to initiate the use of armed force (whether in person or by proxy) to impose one's preferred lifestyle on another (which is a matter of general policy). (And this serves as a nice pair of counterexamples to the theory I have seen floating around that there is a universal set of human values.)

In cases of extreme mental disability, we don't have an entity that is inherently capable of giving informed consent, so indeed it's not possible to apply that criterion. In that case (given the technology to do so) it would be necessary to intervene to repair the disability before the criterion can begin to apply.

rwallace, I'm not sure there is any actual disagreement between us. All I'm saying is that those who have not actually tried wireheading (or otherwise has knowledge about what it feels like to be wireheaded) perhaps shouldn't be so sure that they really prefer not to be wireheaded. And I never mentioned anything about forcibly wireheading people. (Maybe you confused my position with denisbider's?)

The change that I suggested in my argument belongs to category 2, updating in light of new evidence. I wrote that the FAI would "try to extrapolate what your preferences would be if you knew what it felt like to be wireheaded."

I took this to mean that you agreed with denisbider's position of licensing the initiation of force and justifying it based on what the altered version of the victim would prefer after the event -- was that not your intent? If not, then you're right, we don't disagree to anywhere near the extent I had thought.

If this argument is correct, then CEV is very, very bad, since it will produce something that nobody in the world wants.

Thanks, this has clarified some of my thinking on this domain. It also touches on one of my main objection to CEV - I would not trust the opinions of the man that the man I want to be, would want to be. And it get worse the further thart it goes.

We are some messily programmed machines.

My problem with CEV is that who you would be if you were smarter and better-informed is extremely path-dependent. Intelligence isn't a single number, so one can increase different parts of it in different orders. The order people learn things in, and how fully they integrate that knowledge, and what incidental declarative/affective associations they form with the knowledge, can all send the extrapolated person off in different directions. Assuming a CEV-executor would be taking all that into account, and summing over all possible orders (and assuming that this could be somehow made computationally tractable) the extrapolation would get almost nowhere before fanning out uselessly.

OTOH, I suppose that there would be a few well-defined areas of agreement. At the very least, the AI could see current areas of agreement between people. And if implemented correctly, it at least wouldn't do any harm.

Good point, though I'm not too worried about the path dependency myself; I'm more preoccupied with getting some where "nice and tolerable" than somewhere "perfect".

Your examples of getting tired after sex or satisfied after eating are based on current human physiology and neurochemistry, which I think most people here are assuming will no longer confine our drives after AI/uploading. How can you be sure what you would do if you didn't get tired?

I also disagree with the idea that 'pleasure' is what is central to 'wireheading.' (I acknowledge that I may need a new term.) I take the broader view that wireheading is getting stuck in a positive feed-back loop that excludes all other activity, and for this to occur, anything positively-reinforcing will do.* For example, let's say Jane Doe wants to want to exercise, and so modifies her preferences. Now lets say this modification is not calibrated correctly, and so she ends up on the treadmill 24/7, never wanting to get off of it. Though the activity is not pleasurable, she is still stuck in the loop. Even if we would not make a mistake quite this mundane, it is not difficult to imagine similar problems occurring after a few rounds of 'preference modification' by free transhumans. If someone has a drive to be satisfied, then satisfied he shall be, one way or another. Simple solutions, like putting in a preference for complexity, may not be sufficient safeguards either. Imagine an entity that spends all of its time computing and tracing infinite fractiles. Pinnacle of human evolution or wirehead?

*Disclaimer: I haven't yet defined the time parameters. For example, if the loop takes 24 hours to complete as opposed to a few seconds, is it still wireheading? What about 100 years? But I think the general idea is important to consider.

The relevant part of those examples was the fact that it is possible to disentangle pleasure from the desire to keep doing the pleasurable thing. Yes, we could upgrade ourselves to a posthuman state where we don't get tired after eating or sex, and want to keep doing it all the time. But it wouldn't be impossible to upgrade us to a state where pleasure and wanting to do something didn't correlate, either.

I believe the commonly used definition for 'wireheading' mainly centers around pleasure, but your question is also important.

Your examples of getting tired after sex or satisfied after eating are based on current human physiology and neurochemistry, which I think most people here are assuming will no longer confine our drives after AI/uploading. How can you be sure what you would do if you didn't get tired?

I got bored with playing Gran Turismo all the time in less than a week - the timescale might change, but eventually blessed boredom would rescue me from such a loop.

Edit: From most known loops of this type - I agree with your concern about loops in general.

More generally, I don't think any argument that says one is wrong about what they want holds up.

Just to be clear, you don't think one can be mistaken about what one wants? Does this only work in the present tense? If not, the statement "I thought I wanted that, but now I know that I didn't" generates a contradiction - the speaker must be actually lying.

Well, in everyday usage people use the expression the way MrHen put it. If you want to define it like that, then yes, you can be mistaken about what you want.

In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense.

I interpret the confusing language to mean, "I did not predict I would want to do X after doing X or learning more about X." It doesn't explicitly say that, but when I hear people say things similar it is usually some forecast about their future self, not their current self.

I really like the core ideas of this post but some of the particulars are bothersome to me. For example, it confuses things IMO to talk about wireheading as though it can be modified to be whatever we want -- wireheading is wireheading, and it has a rather clear, explicit meaning. (Although the degree of its strength would need to be qualified.)

Anyways, how do you really know what you want? That's the really key question, which I don't think you've really answered. It's not just about redefining terms, IMO. There's real substance to the idea that we have some innate, true sense of desires, yet whose identities elude us. To take the sushi example, the person who tries sushi and loves it had an innate desire, or interest, all along. It might not have been a "want", but the fact that their preferences changed expresses something true about them. It wasn't just a matter of definitions and perspectives and so on.

Maybe what you're saying is that desires are somewhat irrelevant; they can be redefined, reupdated, or completely neglected, and they have little overall worth. So maybe the more interesting question is more straightforward: knowing we would be completely happy and fulfilled in a life of wireheading, should we do it?

wireheading is wireheading, and it has a rather clear, explicit meaning

We've assumed that it has a clear, explicit meaning, but I don't think that's so.

here's real substance to the idea that we have some innate, true sense of desires, yet whose identities elude us.

In baseline humans and with current technology, yes, it does make sense to use the expression "true desire". Not that particular desires would be any more "true" than others, but there may be some unrealized desires which, if fulfilled, would lead to the person becoming happier than if those desires weren't fulfilled. As technology increases, that distinction becomes less meaningful, as we become capable of rebuilding our minds and transforming any desire to such a "true desire".

If you wanted to keep the distinction even with improving technology, you'd define some class of alterations which are "acceptable" and some which aren't. "True desires" would then be any wants that could be promoted to such a status using "acceptable" means. Wei Dai started compiling one possible list of such acceptable alterations.

You're right that where D is desire and t is time, Dx at t1 is not falsified by D(-x) at t2. Nor is it falsified by D(-x at t1) at t2. But you haven't come close to showing where B is belief, BDx is necessarily true, or as a special case BDwh is necessarily true (wh is wireheading). Since the latter, not the former, is the titular claim of the post, you have some work left.

I'm afraid you're a bit too concise for me to follow. Could you elaborate?

Yeah, sorry. I made the comment right after I got back from my model logic class, so I was thinking in sentence letters and logical connectors.

For me this is the key passage in your post:

In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense. Previously the person didn't want X, but after trying it out they did want X. X has caused a change in their preferences by altering their brain. This doesn't mean that the pre-X person was wrong, it just means the post-X person has been changed. With the correct technology, anyone can be changed to prefer anything.

This effectively shows that the claim "I desire X", when made right now can't be falsified by any desires I might have at different times. I actually don't think this a point about technology, but a point about desires. Two desires made at different times are allowed to be contradictory, and we don't even need to bring up wireheading or fancy technology. This phenomenon occurs all the time. We call it regret or changing our mind.

So you have rebutted a common objection to the claim that someone does not want to wirehead. But it doesn't follow from that that your beliefs about your desires in general, or desires to wirehead in particular, are infallible. Given certain conceptions of what desire/preference means and certain assumptions about the transparency of mental content it might follow that you can't be wrong about desires (to wirehead and otherwise). But that hasn't been shown in the OP even though that seems to be the claim the title is making.

Given certain conceptions of what desire/preference means and certain assumptions about the transparency of mental content it might follow that you can't be wrong about desires (to wirehead and otherwise). But that hasn't been shown in the OP even though that seems to be the claim the title is making.

Yes, (like I've stated in the other comments here), if you use a more broad definition of "mistaken about a want", then we can easily conclude that one can be mistaken about their wants. I thought the narrowness of the definition of 'want' I was using would have been clear from the context, but I apparently succumbed to the illusion of transparency.

Others have said this already - but your own motives are one of the things that you can be wrong about.

Silly to worry only about the preferences of your present self - you should also act to change your preferences to make them easier to satisfy. Your potential future self matters as much as your present self does.

Silly to worry only about the preferences of your present self - you should also act to change your preferences to make them easier to satisfy. Your potential future self matters as much as your present self does.

Irony? I gather if the "future self" is a rock, which is a state of existence that is easier to satisfy, this rock doesn't matter as much as your present self.

Furthermore, even if we define wireheading so that you'd prefer it afterwards, that says nothing about the moral worth of wireheading somebody.

Agreed.