You cannot be mistaken about (not) wanting to wirehead

by Kaj_Sotala2 min read26th Jan 201079 comments

41

WireheadingModest Epistemology
Frontpage

In the comments of Welcome to Heaven, Wei Dai brings up the argument that even though we may not want to be wireheaded now, our wireheaded selves would probably prefer to be wireheaded. Therefore we might be mistaken about what we really want. (Correction: what Wei actually said was that an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like, not that our wireheaded selves would prefer to be wireheaded.)

This is an argument I've heard frequently, one which I've even used myself. But I don't think it holds up. More generally, I don't think any argument that says one is wrong about what they want holds up.

To take the example of wireheading. It is not an inherent property of minds that they'll become desperately addicted to anything that feels sufficiently good. Even from our own experience, we know that there are plenty of things that feel really good, but we don't immediately crave for more afterwards. Sex might be great, but you can still afterwards get fatigued enough that you want to rest; eating good food might be enjoyable, but at some point you get full. The classic counter-example is that of the rats who could pull a lever stimulating a part of their brain, and ended up compulsively pulling it, to the exclusion of all else. People thought this to mean they were caught in a loop of stimulating their "pleasure center", but it later turned out that wasn't the case. Instead, the rats were stimulating their "wants to seek out things -center".

The systems for experiencing pleasure and for wanting to seek out pleasure are separate ones. One can find something pleasurable, but still not develop a desire to seek it out. I'm sure all of you have had times when you haven't felt the urge to participate in a particular activity, even though you knew you'd enjoy the activity in question if you just got around doing it. Conversly, one can also have a desire to seek out something, but still not find it pleasurable when it's achieved.

Therefore, it is not an inherent property of wireheading that we'd automatically end up wanting it. Sure, you could wirehead someone in such a way that the person stopped wanting anything else, but you could also wirehead them in such a way that they were indifferent to whether or not it continued. You could even wirehead them in such a way that they enjoyed every minute of it, but at the same time wanted it to stop.

"Am I mistaken about wanting to be wireheaded?" is a wrong question. You might afterwards think you actually prefer to be wireheaded, or think you prefer not to be wireheaded, but that is purely a question of how you define the term "wireheading". Is it a procedure that makes you want it, or is it not? Furthermore, even if we define wireheading so that you'd prefer it afterwards, that says nothing about the moral worth of wireheading somebody.

If you're not convinced about that last bit, consider the case of "anti-wireheading": we rewire somebody so that they experience terrible, horrible, excruciating pain. We also rewire them so that regardless, they seek to maintain their current state. In fact, if they somehow stop feeling pain, they'll compulsively seek a return to their previous hellish state. Would you say it was okay to anti-wirehead them, since an anti-wirehead will realize they were mistaken about not wanting to be an anti-wirehead? Probably not.

In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense. Previously the person didn't want X, but after trying it out they did want X. X has caused a change in their preferences by altering their brain. This doesn't mean that the pre-X person was wrong, it just means the post-X person has been changed. With the correct technology, anyone can be changed to prefer anything.

You can still be mistaken about whether or not you'll like something, of course. But that's distinct from whether or not you want it.

Note that this makes any thoughts along the lines of "an FAI might extrapolate the desires you had if you were more intelligent" tricky. It could just as well extrapolate the desires we had if we'd had our brains altered in some other way. What makes one method of mind alteration more acceptable than another? "Whether we'd consent to it now" is one obvious-seeming answer, but that too is filled with pitfalls. (For instance, what about our anti-wirehead?)

41

79 comments, sorted by Highlighting new comments since Today at 1:40 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I'm really surprised that on a site called "Less Wrong", there isn't more skepticism about an argument that one can't be wrong about X, especially when X isn't just one statement but a large category of statements. That doesn't scream out "hold on a second!" to anyone?

Eyup. Humans can be wrong about anything. It's like our superpower.

You could be wrong about that.

5Eliezer Yudkowsky11yWhat if I couldn't be wrong about that?

Then you would clearly be immune to hemlock, and therefore weigh the same as a duck.

-1timtyler11yThen you would be 100% certain - and 0 and 1 are not probabilities [http://lesswrong.com/lw/mp/0_and_1_are_not_probabilities/].
3Rob Bensinger8yIt might be that he can't be wrong about that, even though he doesn't know for sure that he can't be wrong about it. Infallibility and certainty are distinct concepts.
0timtyler8yFallibility is in the mind [http://lesswrong.com/lw/oj/probability_is_in_the_mind/].
3Rob Bensinger8yCertainty (confidence, etc.) is in the mind. Fallibility isn't; you can be prone (or immune) to error even if no one thinks you are. The point is that 'What if I couldn't be wrong about it?' does not express 'What if I could be certain that I couldn't be wrong about it?'; the latter requires that 1 be a probability, but the former does not, since I might be unable to be wrong about X and yet only assign, say, a .8 probability to X's being true (because I don't assign probability 1 to my own infallibility).
1timtyler8yThough no one could ever possibly know. Seriously: fallibility is in the mind. It's a measure of how likely something is to fail; likelihoods are probabilities - and probabilities are (best thought of as being) in the mind.
9Stuart_Armstrong11yRigorously, I think the argument doesn't stand up in its ultimate form. But it's tiptoing in the direction of a very interesting point on how to deal with changing utility functions, especially in circumstances where the changes might be predictable. The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory. Nor is "judge everything that occures in your future by your utility function at the time", because of lobotomies, addicting wireheading, and so on. Some people have utility functions that they expect will change; and the degree of change allowable may vary from person to person and subject to subject (eg, people opposed to polygamy may have a wide range of reactions to the announcement "in fifty years time, you will approve of polygamy"). Some people trust their own CEV; I never would, but I might trust it one level removed. It's a difficult subject, and my upvote was in thanks of bringing it up. Susequent posts on the subject I'll judge more harshly.

The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory.

It sounds satisfactory for agents that have utility functions. Humans don't (unless you mean implicit utility functions under reflection, to the extent that different possible reflections converge), and I think it's really misleading to talk as if we do.

Also, while this is just me, I strongly doubt our notional-utility-functions-upon-reflection contain anything as specific as preferences about polygamy.

0Stuart_Armstrong11yThat was just an example; people react differently to the idea that their values may change in the future, depending on the person and depending on the value.
1CannibalSmith11yHow about "judge by both utility functions and use the most pessimistic result"?
5Paul Crowley11yIf you take a utility function and multiply all the utilities by 0.01, is it the same utility function? In one sense it is, but by your measure it will always win a "most pessimistic" contest. Update: thinking about this further, if the only allowable operations on utilities are comparison and weighted sum, then you can multiply by any positive constant or add and subtract any constant and preserve isomorphism. Is there a name for this mathematical object?
8RichardKennaway11yAffine transformations [http://en.wikipedia.org/wiki/Affine_transformation]. Utility functions are defined up to affine transformation. In particular, this means that nothing has "positive utility" or "negative utility", only greater or lesser utility compared to something else. ETA: If you want to compare two different people's utilities, it can't be done without introducing further structure to enable that comparison. This is required for any sort of felicific calculus [http://en.wikipedia.org/wiki/Felicific_calculus].
1Paul Crowley11yThere's a name I can't remember for the "number line with no zero" where you're only able to refer to relative positions, not absolute ones. I'm looking for a name for the "number line with no zero and no scale", which is invariant not just under translation but under any affine transformation with positive determinant.
0kpreid11yI'm in an elementary statistics class right now and we just heard about “levels of measurement” [http://en.wikipedia.org/wiki/Levels_of_measurement] which seem to make these distinctions: your first is the interval scale, and second the ordinal scale.
1pengvado11yThe "number line with no zero, but a uniquely preferred scale" isn't in that list of measurement types; and it says the "number line with no zero and no scale" is the interval scale.
0thomblake11yA utility function is just a representation of preference ordering. Presumably those properties would hold for anything that is merely an ordering making use of numbers.
3RichardKennaway11yYou also need the conditions of the utility theorem [http://en.wikipedia.org/wiki/Utility] to hold. A preference ordering only gives you conditions 1 and 2 of the theorem as stated in the link.
0thomblake11yGood point. I was effectively entirely leaving out the "mathematical" in "mathematical representation of preference ordering". As I stated it, you couldn't expect to aggregate utiles.
0Paul Crowley11yYou can't aggregate utils; you can only take their weighted sums. You can aggregate changes in utils though.
1GuySrinivasan11yI completely agree. The argument may be wrong but the point it raises, that sloppily assuming things about which possible causal continuations of self I care about, is important. My initial reaction: we can still use our current utility function, but make sure the CEV analysis or whatever doesn't say "what would you want if you were more intelligentetc?" but instead "what would you want if you were changed in a way you currently want to be changed"? This includes "what would you want if we found fixed points of iterated changes based on previous preferences", so that if I currently want to value paperclips more but don't care whether I value factories differently, but if upon modifying me to value paperclips more it turns out I would want to value factories more, then changing my preferences to value factories more is acceptable. The part where I'm getting confused right now (rather, the part where I notice I'm getting confused :)) is that calculating fixed points almost certainly depends on the order of alteration, so that there are lots of different future-mes that I prefer to current-me that are at local maximums. Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.
1Stuart_Armstrong11yI don't think the order issue is a big problem - there is not One Glowing Solution, we just need to find something nice and tolerable. That is the question.
3RobinZ11yI think your heuristic is sound - that seemed screamingly wrong to me as well.
1Paul Crowley11yIncorrigibility is way too strong an assertion, but there's a sense in which I cannot be completely wrong about my values, since I'm the only source of information about them; except perhaps to the extent that you can infer them from my fellow human beings, and to that extent humanity as a whole cannot be completely mistaken about its values. I suspect there may be an analogy with Donaldson's observation that if you think penguins are tiny burrowing insects that live in the Sahara, you're not so much mistaken about penguins as not talking about them at all. However, I can't completely make this analogy work.
-1timtyler11yHow about if X is a set of assertions that logical tautologies are true: http://en.wikipedia.org/wiki/Tautology_(logic) [http://en.wikipedia.org/wiki/Tautology_(logic\]) http://en.wikipedia.org/wiki/Tautology_(logic)#Definition_and_examples [http://en.wikipedia.org/wiki/Tautology_(logic\]#Definition_and_examples) An example along similar lines to this post would be: you can't be wrong about thinking you are thinking about X - if you are thinking about X.
9Eliezer Yudkowsky11yhttp://www.spaceandgames.com/?p=27 [http://www.spaceandgames.com/?p=27]
4wedrifid11yNow that is a overconfidence/independent statements anecdote I'll remember. The '7 is prime probability 1' part too.
-1timtyler11yNah, these are not "independent" statements, they are all much the same: They are "I want X" statements.
1Jack11yP v -p is disputed, so someone is wrong there. Also, if you have ever done a 10+ line proof or 10+ place truth table you know it is trivially (pun intended) easy to get those wrong. I think the concept of a thought and what it is for a thought to be about something needs to be refined before we can say more about the second example. To begin with, if I see a dragonfly and mistake it for a fairy and then start to think about the fairy I saw, it isn't clear that I really am thinking about a fairy.

This conclusion is too strong, because there's a clear distinction that we (or at least I) make intuitively that is incompatible with this reasoning.

Consider the following:

I don't want to try sushi. A friend convinces/bribes/coerces me to try sushi. It turns out I really like sushi, and eat it all the time afterward.

I don't want to try wireheading. I am convinced/bribed/coerced to try wireheading. I really like wireheading, and don't want to stop doing it.

These sequences are superficially identical. Kaj's construction of want suggests I could not have been mistaken about my desire for sushi. However, intuitively and in common language, it makes sense to say that I was mistaken about my desire for sushi. There is, however, something different about saying I was mistaken in not wanting to wirehead. It's an issue of values.

Consider the ardent vegetarian who is coercively fed beef, and likes beef so much that he lacks the willpower to avoid eating it, even though it causes him tremendous psychic distress to do so. It seems reasonable to say he was correct in not wanting to eat beef, and have this judgement be entirely consistent with my being incorrect about not wanting to eat sushi. T... (read more)

A possible solution to this: The person who does not want to try sushi thinks he will dislike it and say "Yuck!" He actually enjoys it. He is wrong in that he anticipated something different from what happened. A person who does not want to wirehead will anticipate enjoying it immensely, and this will be accurate. The first person's decision to try to avoid sushi is based on a mistaken anticipation, but the second person's decision to avoid wireheading takes into account a correct anticipation.

8Cyan11yNo top level post? I has a sad.
7Psychohistorian11yAnd commitment devices work, if belatedly. [http://lesswrong.com/lw/1op/a_much_better_life/]
1Cyan11yYay! [http://encefalus.com/wp-content/uploads/2009/02/happy_lolcat.jpg]
1Kaj_Sotala11ySee my reply to zero_call below [http://lesswrong.com/lw/1oc/you_cannot_be_mistaken_about_not_wanting_to/1in3]. Yes, in baseline humans and with current technology, it does make sense to use the expression "true desire". As technology improves, however, you'll need to define it more and more rigorously. Defining it by reference to your current values is one way.

The Onion on informing people their values are wrong:

http://www.theonion.com/content/news_briefs/man_who_enjoys_thing

0Peterdjones8yYikes. Shades of Dennett [http://en.wikipedia.org/wiki/Daniel_Dennett]
-1timtyler11yThough it is The Onion, that link seems pretty relevant!

What makes one method of mind alteration more acceptable than another?

It so happens that there are people working on this problem right now. See for example the current discussion taking place on Vladmir Nesov's blog.

As a preliminary step we can categorize the ways that our "wants" can change as follows (these are mostly taken from a comment by Andreas):

  1. resolving a logical uncertainty
  2. updating in light of new evidence
  3. correcting a past computational error
  4. forgetting information
  5. committing a new computational error
  6. unintentional physical modification (i.e., brain damage)
  7. intentional physical modification
  8. other

Can we agree that categories 1, 2, and 3 are acceptable, 5 and 6 are unacceptable, and 4, 7, and 8 are "it depends"?

The change that I suggested in my argument belongs to category 2, updating in light of new evidence. I wrote that the FAI would "try to extrapolate what your preferences would be if you knew what it felt like to be wireheaded." Does that seem more reasonable now?

For instance, what about our anti-wirehead?

If the FAI tries to extrapolate whether you'd want to be anti-wireheaded if you knew what it felt like to be anti-wireh... (read more)

7rwallace11yNo. If someone -- my next-door neighbor, my doctor, the government, a fictional genie, whoever -- is proposing to rewire my brain, my informed consent beforehand is the only thing that can make it acceptable.
0Kazuo_Thow11yAre you making this as a statement of personal preference, or general policy? What if it becomes practically impossible for a person to give informed consent, as in cases of extreme mental disability?
0rwallace11yGeneral policy. For example, if Wei Dai chooses the wirehead route, I might think he's missing out on a lot of other things life has to offer, but that doesn't give me the right to forcibly unwirehead him, any more than he has the right to do the reverse to me. In other words, he and I have two separate disagreements: of value axioms, whether there should be more to life than wireheading (which is a matter of personal preference), and of moral axioms, whether it's okay to initiate the use of armed force (whether in person or by proxy) to impose one's preferred lifestyle on another (which is a matter of general policy). (And this serves as a nice pair of counterexamples to the theory I have seen floating around that there is a universal set of human values.) In cases of extreme mental disability, we don't have an entity that is inherently capable of giving informed consent, so indeed it's not possible to apply that criterion. In that case (given the technology to do so) it would be necessary to intervene to repair the disability before the criterion can begin to apply.
2Wei_Dai11yrwallace, I'm not sure there is any actual disagreement between us. All I'm saying is that those who have not actually tried wireheading (or otherwise has knowledge about what it feels like to be wireheaded) perhaps shouldn't be so sure that they really prefer not to be wireheaded. And I never mentioned anything about forcibly wireheading people. (Maybe you confused my position with denisbider's?)
0rwallace11yI took this to mean that you agreed with denisbider's position of licensing the initiation of force and justifying it based on what the altered version of the victim would prefer after the event -- was that not your intent? If not, then you're right, we don't disagree to anywhere near the extent I had thought.
6Kaj_Sotala11yI'm not entirely sure if it's alright to alter someone's mind to update in light of new evidence if they didn't want to update. The same goes for the 1 and 3. But let's assume, for the sake of argument, that we accept your categorization. Or let's at least assume that the person in question doesn't mind the updating. It seems to me that there are two possible kinds of knowledge about what wireheading feels like, and we must distinguish between which one we mean. The first kind is abstract, declarative knowledge. This may affect our (instrumental?) preferences, depending on our existing preferences. For instance, I know that people choosing where to live underestimate the effect travel times have on their happiness and overestimate the effect that the amount of space has on their happiness. Knowing this, and preferring to be happy, I might choose a different home than I otherwise would have. I presume you don't mean this kind of knowledge, as we already know in the abstract that wireheading would be the best feeling we could ever possibly experience. The second kind is a more visceral, experienced kind of knowledge, the knowledge of what it really feels like. Knowing what it feels like to be a bat, to use Nagel's classic example. Here it becomes tricky. It's an open question to what degree you can really add this kind of a knowledge to someone's mind, as the recollection of the experience is necessarily incomplete. We might remember being happy or wireheaded, but just the act of recalling it doesn't return us to a state of mind where we are just as happy as we were back then. Instead we have an abstract memory of having been happy, which possibly activates other emotions on our mind, depending on what sorts of associations have built up around the memory. We might feel an uplifting echo of that happiness, a longing to experience it again, bitterness or sorrow about being unable to relive it, or just a blank indifference. If an FAI simply simulates a state of mind
5Wei_Dai11yLet me try a different tack here. Suppose you have in front of you two flavors of ice cream. You don't know what they taste like, but you prefer the red one because you like red and that's the only thing you have to go on. Now an FAI comes along and tells you that it predicts if you knew what the flavors taste like, you'd choose the blue one instead. Do you not switch to the blue one? Know that it's the "best" is hardly having full declarative knowledge, when we don't know how good "best" is. I don't see how that makes any sense, given my ice cream example.
6Kaj_Sotala11yIn the ice cream example, yes, I'll switch to the blue one. But that one is like my previous example of choosing where to live: I switched because I gained information that allowed me to better fulfill my intrinsic preferences. It's not that my actual preferences would have changed. If my preference would have been "I want to eat the best ice cream I can have, for as long as the taste doesn't come from a blue ice cream", (analogous to "I want to experience the best life there is, for as long as the enjoyment doesn't come from wireheading"), I wouldn't have switched. Fair enough. But even if a person declining to be wireheaded was provided information of exactly how much better "best" would be, I doubt that would sway very many of them. (Though it may sway some, and in that case yes, an FAI telling them this could make them switch.) Sorry, poor wording on my behalf. Let me reword it: "If an FAI simply simulates a state of mind where a memory of the experience of wireheadedness has been added, I don't think that will change the person's preferences at all. The recollection of the wirehead state is just the previously known 'wireheading is a thousand times better than any other pleasure I could have' knowledge, stored in a different format. But if no emotional or motivational associations are added, having the same information in a different format shouldn't change any preferences."
4Wei_Dai11yI think that resolves most of our disagreement, and I'll think a bit more about your current position. (Have to go to sleep now.) In the mean time, can you please make a correction to your post? As you can see, my argument isn't "our wireheaded selves would probably prefer to be wireheaded" but rather "an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like." I guess you had in your mind the previous argument you heard from others, and conflated mine with theirs.
2Kaj_Sotala11yCorrection added.
0denisbider11yBut such a preference is neurotic. Wire-heading isn't a discrete, easily distinguishable category. Any number of improvements to your mind are possible. If we start at the very lowest end, chances are that, most of the improvements, you would welcome. Once you have been given those improvements, you would find the next level of improvement desirable. Eventually, you are at the level just below a total wire-head, and you can clearly see that wire-heading is the way to be. Yet, if you're given the choice upfront, you will refuse to be a wire-head. This is essentially due to pre-conceived (probably wrong) notions of what matters and what wire-heading is. And the FAI would be correct in fixing you, just like it would be correct in fixing a depressed patient.
1Kaj_Sotala11yThe main problem I have with wireheading is the notion of me simply being and not doing anything else. If I could just alter my mind to be maximally or close to maximally happy nearly all the time, but still letting me do all kinds of different things and still be motivated to do various things [http://www.hedweb.com/hedethic/superwell.html], I'd have a much smaller problem.
6tut11yGood news for you then: Humans are not understimulated rats. There was an experiment where some psychologists gave some subjects electrodes and a device which stimulated their "reward center" (this was back when it was believed that dopamine was the happiness chemical and desire-wireheading was the same as happiness-wireheading) whenever they pushed a button. They also recorded every time the button was pushed. The subjects carried the electrodes for a while (I believe it was a week) and then returned them. All the subjects went about their lives, doing normal things with about their normal amount of motivation. All of them used the button at least a few times and reported that they liked it. But only one guy used it more than ten times per day, and he was intentionally (but unsuccessfully) using it for classical conditioning.
6Morendil11yA reference would be nice - please. :)
5tut11yThis [http://paradise-engineering.com/brain/index.htm] is the best I find right now and I need to go to bed. They retell the same anecdote that I referred to at the end of that piece. Here is the relevant part: Though in the version I read several years ago the events were in a different order. And they were actually talking about this as a means to reach the happy equilibrium that Kaj is talking about, so they talked much more about the other subjects in the experiment. I had forgotten that Heath interfered with the gay guy after, because that was kind of downplayed.
-1denisbider11yI imagine the ultimate wireheading would involve complete happiness and interfacing with the FAI's consciousness, experiencing much more than is possible by a solitary mind.
1Psychohistorian11yThere's a rather enormous leap between the FAI saying, "Y'know, I think you'd like that one more," and the FAI altering your brain so you select that one. Providing new information simply isn't altering someone's mind in this context.

If this argument is correct, then CEV is very, very bad, since it will produce something that nobody in the world wants.

Thanks, this has clarified some of my thinking on this domain. It also touches on one of my main objection to CEV - I would not trust the opinions of the man that the man I want to be, would want to be. And it get worse the further thart it goes.

We are some messily programmed machines.

My problem with CEV is that who you would be if you were smarter and better-informed is extremely path-dependent. Intelligence isn't a single number, so one can increase different parts of it in different orders. The order people learn things in, and how fully they integrate that knowledge, and what incidental declarative/affective associations they form with the knowledge, can all send the extrapolated person off in different directions. Assuming a CEV-executor would be taking all that into account, and summing over all possible orders (and assuming that this could be somehow made computationally tractable) the extrapolation would get almost nowhere before fanning out uselessly.

OTOH, I suppose that there would be a few well-defined areas of agreement. At the very least, the AI could see current areas of agreement between people. And if implemented correctly, it at least wouldn't do any harm.

1Stuart_Armstrong11yGood point, though I'm not too worried about the path dependency myself; I'm more preoccupied with getting some where "nice and tolerable" than somewhere "perfect".

Your examples of getting tired after sex or satisfied after eating are based on current human physiology and neurochemistry, which I think most people here are assuming will no longer confine our drives after AI/uploading. How can you be sure what you would do if you didn't get tired?

I also disagree with the idea that 'pleasure' is what is central to 'wireheading.' (I acknowledge that I may need a new term.) I take the broader view that wireheading is getting stuck in a positive feed-back loop that excludes all other activity, and for this to occur, anyth... (read more)

5Kaj_Sotala11yThe relevant part of those examples was the fact that it is possible to disentangle pleasure from the desire to keep doing the pleasurable thing. Yes, we could upgrade ourselves to a posthuman state where we don't get tired after eating or sex, and want to keep doing it all the time. But it wouldn't be impossible to upgrade us to a state where pleasure and wanting to do something didn't correlate, either. I believe the commonly used definition for 'wireheading' mainly centers around pleasure, but your question is also important.
2RobinZ11yI got bored with playing Gran Turismo all the time in less than a week - the timescale might change, but eventually blessed boredom [http://lesswrong.com/lw/xr/in_praise_of_boredom/] would rescue me from such a loop. Edit: From most known loops of this type - I agree with your concern about loops in general.

More generally, I don't think any argument that says one is wrong about what they want holds up.

Just to be clear, you don't think one can be mistaken about what one wants? Does this only work in the present tense? If not, the statement "I thought I wanted that, but now I know that I didn't" generates a contradiction - the speaker must be actually lying.

1Kaj_Sotala11yWell, in everyday usage people use the expression the way MrHen put it [http://lesswrong.com/lw/1oc/you_cannot_be_mistaken_about_not_wanting_to/1ihu]. If you want to define it like that, then yes, you can be mistaken about what you want.

In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense.

I interpret the confusing language to mean, "I did not predict I would want to do X after doing X or learning more about X." It doesn't explicitly say that, but when I hear people say things similar it is usually some forecast about their future self, not their current self.

I really like the core ideas of this post but some of the particulars are bothersome to me. For example, it confuses things IMO to talk about wireheading as though it can be modified to be whatever we want -- wireheading is wireheading, and it has a rather clear, explicit meaning. (Although the degree of its strength would need to be qualified.)

Anyways, how do you really know what you want? That's the really key question, which I don't think you've really answered. It's not just about redefining terms, IMO. There's real substance to the idea that we have s... (read more)

0Kaj_Sotala11yWe've assumed that it has a clear, explicit meaning, but I don't think that's so. In baseline humans and with current technology, yes, it does make sense to use the expression "true desire". Not that particular desires would be any more "true" than others, but there may be some unrealized desires which, if fulfilled, would lead to the person becoming happier than if those desires weren't fulfilled. As technology increases, that distinction becomes less meaningful, as we become capable of rebuilding our minds and transforming any desire to such a "true desire". If you wanted to keep the distinction even with improving technology, you'd define some class of alterations which are "acceptable" and some which aren't. "True desires" would then be any wants that could be promoted to such a status using "acceptable" means. Wei Dai started compiling one possible list [http://lesswrong.com/lw/1oc/you_cannot_be_mistaken_about_not_wanting_to/1ih8] of such acceptable alterations.

You're right that where D is desire and t is time, Dx at t1 is not falsified by D(-x) at t2. Nor is it falsified by D(-x at t1) at t2. But you haven't come close to showing where B is belief, BDx is necessarily true, or as a special case BDwh is necessarily true (wh is wireheading). Since the latter, not the former, is the titular claim of the post, you have some work left.

1Kaj_Sotala11yI'm afraid you're a bit too concise for me to follow. Could you elaborate?
2Jack11yYeah, sorry. I made the comment right after I got back from my model logic class, so I was thinking in sentence letters and logical connectors. For me this is the key passage in your post: This effectively shows that the claim "I desire X", when made right now can't be falsified by any desires I might have at different times. I actually don't think this a point about technology, but a point about desires. Two desires made at different times are allowed to be contradictory, and we don't even need to bring up wireheading or fancy technology. This phenomenon occurs all the time. We call it regret or changing our mind. So you have rebutted a common objection to the claim that someone does not want to wirehead. But it doesn't follow from that that your beliefs about your desires in general, or desires to wirehead in particular, are infallible. Given certain conceptions of what desire/preference means and certain assumptions about the transparency of mental content it might follow that you can't be wrong about desires (to wirehead and otherwise). But that hasn't been shown in the OP even though that seems to be the claim the title is making.
2Kaj_Sotala11yYes, (like I've stated in the other comments here), if you use a more broad definition of "mistaken about a want", then we can easily conclude that one can be mistaken about their wants. I thought the narrowness of the definition of 'want' I was using would have been clear from the context, but I apparently succumbed to the illusion of transparency [http://wiki.lesswrong.com/wiki/Illusion_of_transparency].

Others have said this already - but your own motives are one of the things that you can be wrong about.

Silly to worry only about the preferences of your present self - you should also act to change your preferences to make them easier to satisfy. Your potential future self matters as much as your present self does.

6Vladimir_Nesov11yIrony? I gather if the "future self" is a rock, which is a state of existence that is easier to satisfy, this rock doesn't matter as much as your present self.

Furthermore, even if we define wireheading so that you'd prefer it afterwards, that says nothing about the moral worth of wireheading somebody.

Agreed.