LESSWRONG
LW

What makes us think _any_ of our terminal values aren't based on a misunderstanding of reality?

1 min read25th Sep 201389 comments

22

Let's say Bob's terminal value is to travel back in time and ride a dinosaur.

It is instrumentally rational for Bob to study physics so he can learn how to build a time machine. As he learns more physics, Bob realizes that his terminal value is not only utterly impossible but meaningless. By definition, someone in Bob's past riding a dinosaur is not a future evolution of the present Bob.

There are a number of ways to create the subjective experience of having gone into the past and ridden a dinosaur. But to Bob, it's not the same because he wanted both the subjective experience and the knowledge that it corresponded to objective fact. Without the latter, he might as well have just watched a movie or played a video game.

So if we took the original, innocent-of-physics Bob and somehow calculated his coherent extrapolated volition, we would end up with a Bob who has given up on time travel. The original Bob would not want to be this Bob.

But, how do we know that _anything_ we value won't similarly dissolve under sufficiently thorough deconstruction? Let's suppose for a minute that all "human values" are dangling units; that everything we want is as possible and makes as much sense as wanting to hear the sound of blue or taste the flavor of a prime number. What is the rational course of action in such a situation?

PS: If your response resembles "keep attempting to XXX anyway", please explain what privileges XXX over any number of other alternatives other than your current preference. Are you using some kind of pre-commitment strategy to a subset of your current goals? Do you now wish you had used the same strategy to precommit to goals you had when you were a toddler?

What makes us think _any_ of our terminal values aren't based on a misunderstanding of reality?

12Eliezer Yudkowsky

7Tyrrell_McAllister

0RolfAndreassen

11buybuydandavis

7Douglas_Knight

2Scott Garrabrant

0Scott Garrabrant

New Comment

89 comments, sorted by

Click to highlight new comments since: Today at 2:13 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-][anonymous]11y250

I, for one, have "terminal value" for traveling back in time and riding a dinosaur, in the sense that worlds consistent with that event are ranked above most others. Now, of course, the realization of that particular goal is impossible, but possibility is orthogonal to preference.

The fact is, most things are impossible, but there's nothing wrong with having a general preference ordering over a superset of the set of physically possible worlds. Likewise, my probability distributions are over a superset of the actually physically possible outcomes.

When all the impossible things get eliminated and we move on like good rationalists, there are still choices to be made, and some things are still better than others. If I have to choose between a universe containing a bilion paperclips and a universe containing a single frozen dinosaur, my preference for ice cream over dirt is irrelevant, but I can still make a choice, and can still have a preference for the dinosaur (or the paperclips, whatever I happen to think is best).

I actually don't know what you even mean by my values dissolving, though. Sometimes I learn things that change how I would make choices. Maybe some day I will learn something that turns me into a nihilist such that I would prefer to wail about the meaninglessness of all my desires, but it seems unlikely.

5byrnema11y

In contrast to this comment's sister comment, I don't think this addresses the question. Instead, it describes what it is like when the context for the question isn't the case. Actually, the converse of the answer provides some suggestion as to what it would be like if all our our values were found to be nonsensical... It would mean we would find that we are indifferent to all choices - with the impossible eliminated, we are indifferent to all the choices possible. We might find that we keep on making meaningless choices out of something a bit stronger than 'habit' (which is how I judge the universe we're in) or we have the ability to rationally update our instrumental values in the context of our voided terminal values (for example, if we were able to edit our programs) so that we would after all not bother make any choices. This is really not so far fetched, and it is not too difficult to come up with some examples. Suppose a person had a terminal goal to eat healthy. Each morning they make choices between eggs and oatmeal, etc. And then they discover they are actually a robot who draws energy from the environment automatically, and after all it is not necessary to eat. If all they cared about was to eat healthily, to optimize their physical well-being, if they then discovered there was no connection between eating and health, they should lose all interest in any choices about food. They would have no preference to eat, or not to eat, or about what they ate. (Unless you refer to another, new terminal value.) Another example is that a person cares very much about their family, and must decide between spending money on an operation for their child or for food for their family. Then the person wakes up and finds that the entire scenario was just a dream, they don't have a family. Even if they think about it a little longer and might decide, while awake, what would have been the best action to take, they no longer have much preference (if any) about what action t

1[anonymous]11y

I agree and think that this part sums up a good response to the above question.

[-]DanielLC11y140

But, how do we know that anything we value won't similarly dissolve under sufficiently thorough deconstruction?

Experience.

I was once a theist. I believed that people were ontologically fundamental, and that there was a true morality written in the sky, and an omniscient deity would tell you what to do if you asked. Now I don't. My values did change a little, in that they're no longer based on what other people tell me is good so I don't think homosexuality is bad and stuff like that, but it wasn't a significant change.

The only part that I think did change because of that was just that I no longer believed that certain people were a good authority on ethics. Had I not believed God would tell us what's right, I'm not sure there'd have been any change at all.

Learning more physics is a comparatively small change, and I'd expect it to correspond to a tiny change in values.

In regards to your Bob example, if I had his values, I'd expect that after learning that someone in the past is by definition not a future evolution of me, I'd change my definition to something closer to the "naive" definition, and ignore any jumps in time so long as the people stay the same when deciding of someone is a future evolution of me. If I then learn about timeless quantum physics and realize there's no such thing as the past anyway, and certainly not pasts that lead to particular futures, I'd settle for a world with a lower entropy, in which a relatively high number of Feynman paths reach here.

5bokov11y

Funny you should say that. I, for one, have the terminal value of continued personal existence (a.k.a. being alive). On LW I'm learning that continuity, personhood, and existence might well be illusions. If that is the case, my efforts to find ways to survive amount to extending something that isn't there in the first place Of course there's the high probability that we're doing the philosophical equivalent of dividing by zero somewhere among our many nested extrapolations. But let's say consciousness really is an illusion. Maybe the take-home lesson is that our goals all live at a much more superficial level than we are capable of probing. Not that reductionism "robs" us of our values or anything like that... but it may mean that cannot exist an instrumentally rational course of action that is also perfectly epistemically rational. That being less wrong past some threshold will not help us set better goals for ourselves, only get better at pursuing goals we pre-committed to pursuing.

What do you mean when you say consciousness may be an illusion? It's happening to you, isn't it? What other proof do you need? What would a world look like where consciousness is an illusion, vs. one where it isn't?

2bokov11y

Identical. Therefore consciousness adds complexity without actually being necessary for explaining anything. Therefore, the presumption is that we are all philosophical zombies (but think we're not).

[-]Viliam_Bur11y110

Okay, so what creates the feeling of consciousness in those philosophical zombies? Can we generate more of those circumstances which naturally create that feeling?

If my life is "ultimately" an illusion, how can I make this illusion last as long as possible?

3Rob Bensinger11y

Eliezer:

5endoself11y

http://lesswrong.com/lw/p2/hand_vs_fingers/

1Crux11y

We are all philosophical zombies, but think we're not? We're all X, but think we're Y? What's the difference between X and Y? What would our subjective experience look like if we were actually Y, instead of just thinking we're Y? Unless you can point to something, then we can safely conclude that you're talking without a meaning.

[-]Eliezer Yudkowsky11y120

I'm trying to think of what kind of zombies there could be besides philosophical ones.

Epistemological zombie: My brain has exactly the same state, all the neurons in all the same places, and likewise the rest of the universe, but my map doesn't possess any 'truth' or 'accuracy'.

Ontological zombie: All the atoms are in all the same places but they don't exist.

Existential zombie: All the atoms are in all the same places but they don't mean anything.

Causal zombie: So far as anyone can tell, my brain is doing exactly the same things, but only by coincidence and not because it follows from the laws of physics.

Mathematical zombie: Just like me only it doesn't run on math.

Logical zombie: I got nothin'.

Conceivability zombie: It's exactly like me but it lacks the property of conceivability.

7Tyrrell_McAllister11y

Löwenheim–Skolem zombie: Makes statements that are word-for-word identical to the ones that you make about uncountable sets, and for the same causal reasons (namely, because you both implement the inference rules of ZF in the same way), but its statements aren't about actually uncountable sets, because it lives in a countable model of ZF.

4fubarobfusco11y

Your causal zombie reminds me of Leibniz's pre-established harmony.

0RolfAndreassen11y

Oddly enough, the other day I ran into someone who appears to literally believe a combination of these two.

6Rob Bensinger11y

The eliminativist responds: The world would look the same to me (a complex brain process) if dualism were true. But it would not look the same to the immaterial ghost possessing me, and we could write a computer program that simulates an epiphenomenal universe, i.e., one where every brain causally produces a ghost that has no effects of its own. So dualism is meaningful and false, not meaningless. The dualist responds in turn: I agree that those two scenarios make sense. However, I disagree about which of those possible worlds the evidence suggests is our world. And I disagree about what sort of agent we are — experience reveals us to be phenomenal consciousnesses learning about whether there's also a physical world, not brains investigating whether there's also an invisible epiphenomenal spirit-world. The mental has epistemic priority over the physical. We do have good reason to think we are epiphenomenal ghosts: Our moment-to-moment experience of things like that (ostending a patch of redness in my visual field) indicates that there is something within experience that is not strictly entailed by the physical facts. This category of experiential 'thats' I assign the label 'phenomenal consciousness' as a useful shorthand, but the evidence for this category is a perception-like introspective acquaintance, not an inference from other items of knowledge. You and I agree, eliminativist, that we can ostend something about our moment-to-moment introspective data. For instance, we can gesture at optical illusions. I simply insist that one of those somethings is epistemically impossible given physicalism; we couldn't have such qualitatively specific experiences as mere arrangements of atoms, though I certainly agree we could have unconscious mental states that causally suffice for my judgments to that effect. Eliminativist: Aren't you giving up the game the moment you concede that your judgments are just as well predicted by my interpretation of the data as by yours? If

0Alejandro111y

This is an excellent and fair summary of the debate. I think the one aspect it leaves out is that eliminativists differ from dualists in that they have internalized Quine's lessons about how we can always revise our conceptual schemes. I elaborated on this long ago in this post at my old blog.

3Rob Bensinger11y

I'm pretty confident Chalmers would disagree with this characterization. Chalmers accepts that our concepts can change, and he accepts that if zombies fall short of ideal conceivability — conceivability for a mind that perfectly understands the phenomena in question — then dualism will be refuted. That's why the Mary's Room thought experiment is about an ideally extrapolated reasoner. The weakness of such a thought experiment is, of course, that we may fail to accurately simulate an ideally extrapolated reasoner; but the strength is that this idealization has metaphysical significance in a way that mere failure of contemporary imagination doesn't. If contemporary science's best theory posits fundamental entities, then contemporary science posits fundamental entities. Science is not across-the-board ontologically agnostic or deflationary. Unless I'm misunderstanding you, your claim that a physical theory is equivalent to its Ramsey sentence is a rather different topic. I think Chalmers would respond that although this may be true for physical theories at the moment, it's a contingent, empirical truth — we happen to have discovered that we don't need to perform any ostensive acts, for instance, in fixing the meanings of our physical terms. If science discovered an exception to this generalization, science would not perish; it would just slightly complicate the set of linguistic rituals it currently uses to clarify what it's taking about. This isn't an assumption. It's an inference from the empirical character of introspection. That is, it has a defeasible (quasi-)perceptual basis. Many eliminativists want it to be the case that dualists are question-begging when they treat introspective evidence as evidence, but introspective evidence is evidence. Chalmers does not take it as axiomatic, prior to examining the way his stream of consciousness actually looks, that there is a special class of phenomenal concepts. I'm not a dualist, but I don't think any of Chalmers' a

0Alejandro111y

Thanks for your comments! In the second paragraph you quote, I was not trying to make a strong statement about scientific theories being equivalent to Ramsey sentences, though I see how that is a natural interpretation of it. I meant to support my previous paragraph about the lack of a strong distinction between conceptual implications and definitions, and contingent/nomological laws. For each "fundamental law of physics", there can be one axiomatization of physical theory where it is a contingent relation between fundamental entities, and another one where it is a definition or conceptual relation. It is central for Chalmers' viewpoint that the relation between consciousness and functional states is irreducibly contingent, but this kind of law would be unlike any other one in physics. I think you are mixing two things here: whether introspective evidence is evidence, which I agree to (e.g., when I "feel like I am seeing something green", I very likely am in the state of "seeing something green"); and whether that "stuff" that when we introspect we describe with phenomenal concepts must necessarily be described with those concepts (instead of with more sophisticated and less intuitive concepts, for which the zombie/Mary's Room/etc arguments would fail).

0Rob Bensinger11y

Yeah, Chalmers would agree that adding phenomenal consciousness would be a very profound break with the sort of theory physics currently endorses, and not just because it appears anthromorphizing. I haven't yet seen a concept that my phenomenal states appear to fall under, that blocks Mary's Room or Zombie World. Not even a schematic, partly-fleshed-out concept. (And this is itself very surprising, given physicalism.)

9TheOtherDave11y

Can you say more about how you get from "X is an illusion" to "X isn't there in the first place"? To clarify that question a little... suppose I'm thirsty in the desert, and am pursuing an image of water, and I eventually conclude to my disappointment that it is just a mirage. I'm doing two things here: * I'm correcting an earlier false belief about the world -- my observation is not of water, but of a particular kind of light-distorting system of heated air. * I'm making an implicit value judgment: I want water, I don't want a mirage, which is why I'm disappointed. The world is worse than I thought it was. Those are importantly different. If I were, instead, a non-thirsty student of optics, I would still correct my belief but I might not make the same value judgment: I might be delighted to discover that what I'd previously thought was a mere oasis is instead an interesting mirage! In the same spirit, suppose I discover that continuity, personhood, and existence are illusions, when I had previously thought they were something else (what that "something else" is, I don't really know). So, OK, I correct my earlier false belief about the world. There's still a value judgment left to make though... am I disappointed to realize I'm pursuing a mere illusion rather than the "something else" I actually wanted? Or am I delighted to discover that I'm pursuing a genuine illusion rather than an ill-defined "something else"? Your way of speaking seems to take the former for granted. Why is that? Well, it will, and it won't. But in the sense I think you mean it, yes, that's right... it won't. Our values are what they are. Being less wrong improves our ability to implement those values, and our ability to articulate those values, which may in turn cause the values we're aware of and pursuing to become more consistent, but it doesn't somehow replace our values with superior values.

1endoself11y

I am confused about this as well. I think the right thing to do here is to recognize that there is a lot we don't know about, e.g. personhood, and that there is a lot we can do to clarify our thinking on personhood. When we aren't confused about this stuff anymore, we can look over it and decide what parts we really valued; our intuitive idea of personhood clearly describes something, even recognizing that a lot of the ideas of the past are wrong. Note also that we don't gain anything by remaining ignorant (I'm not sure if you've realized this yet).

[-]buybuydandavis11y110

What makes us think any of our terminal values aren't based on a misunderstanding of reality?

Much the same thing that makes me think my height isn't based on a misunderstanding of reality. Different category. I didn't understand my way into having terminal values. Understanding can illuminate your perceptions of reality and allow you to better grasp what is, but I don't think that your terminal values were generated by your understanding. Trying to do so is a pathological tail biting exercise.

0torekp11y

I disagree with your implied claim that terminal values are independent of understandings. I can't think of any human values that don't presuppose some facts. Edit: also see this comment by scientism.

1lmm11y

If I enjoy the subjective experience of thinking about something, I can't think of any conceivable fact that would invalidate that.

0torekp11y

Touché. (At least for an instantaneous "I" and instantaneous enjoyment.) Still, there are many terminal values that do presuppose facts.

[-]Dahlen11y100

everything we want is as possible and makes as much sense as wanting to hear the sound of blue or taste the flavor of a prime number

We know it isn't because most of the time we get what we want. You want chocolate, so you go and buy some and then eat it, and the yummy chocolatey taste you experience is proof that it wasn't that futile after all for you to want chocolate.

The feeling of reward we get when we satisfy some of our terminal values is what makes us think that they aren't based on a misunderstanding of reality. So it's probably a pretty good bet to keep wanting at least the things that have led to rewards in the past, even if we aren't as sure about the rest of them, like going back in time.

[-]Douglas_Knight11y70

Peter de Blanc wrote a paper on this topic.

I think this post is asking a very important and valuable question. However, I think it's limiting the possible answers by making some unnecessary and unjustified assumptions. I agree that Bob, as described, is screwed, but I think we are sufficiently unlike Bob that that conclusion does not apply to us.

As TheOtherDave says here,

I don't understand why you say "I want to travel back in time and ride a dinosaur" is meaningless. Even granting that it's impossible (or, to say that more precisely, granting that greater understanding of reality te

... (read more)

[-]TheOtherDave11y60

So, I'm basically ignoring the "terminal" part of this, for reasons I've belabored elsewhere and won't repeat here.

I agree that there's a difference between wanting to do X and wanting the subjective experience of doing X. That said, frequently people say they want the former when they would in fact be perfectly satisfied by the latter, even knowing it was the latter. But let us assume Bob is not one of those people, he really does want to travel back in time and ride a dinosaur, not just experience doing so or having done so.

I don't understand w... (read more)

[-]lukstafi11y50

All our values are fallible, but doubt requires justification.

[-]scientism11y40

You're right that a meaningless goal cannot be pursued, but nor can you be said to even attempt to pursue it - i.e., the pursuit of a meaningless goal is itself a meaningless activity. Bob can't put any effort into his goal of time travel, he can only confusedly do things he mistakenly thinks of as "pursuing the goal of time travel", because pursuing the goal of time travel isn't a possible activity. What Bob has learned is that he wasn't pursuing the goal of time travel to begin with. He was altogether wrong about having a terminal value of travelling back in time and riding a dinosaur because there's no such thing.

3linkhyrule511y

That seems obviously wrong to me. There's nothing at all preventing me from designing an invisible-pink-unicorn maximizer, even if invisible pink unicorns are impossible. For that matter, if we allow counterfactuals, an invisible-pink-unicorn maximizer still looks like an intelligence designed to maximize unicorns - in the counterfactual universe where unicorns exist, the intelligence takes actions that tend to maximize unicorns.

2lmm11y

How would you empirically distinguish between your invisible-pink-unicorn maximizer and something that wasn't an invisible-pink-unicorn maximizer? I mean, you could look for a section of code that was interpreting sensory inputs as number of invisible-pink-unicorns - except you couldn't, because there's no set of sensory inputs that corresponds to that, because they're impossible. If we're talking about counterfactuals, the counterfactual universe in which the sensory inputs that currently correspond to paperclips correspond to invisible-pink-unicorns seems just as valid as any other.

1TheOtherDave11y

Well, there's certainly a set of sensory inputs that corresponds to /invisible-unicorn/, based on which one could build an invisible unicorn detector. Similarly, there's a set of sensory inputs that corresponds to /pink-unicorn/, based on which one could build a pink unicorn detector. If I wire a pink unicorn detector up to an invisible unicorn detector such that a light goes on iff both detectors fire on the same object, have I not just constructed an invisible-pink-unicorn detector? Granted, a detector is not the same thing as a maximizer, but the conceptual issue seems identical in both cases.

0lmm11y

Maybe. Or maybe you've constructed a square-circle detector; no experiment would let you tell the difference, no? I think the way around this is some notion of which kind of counterfactuals are valid and which aren't. I've seen posts here (and need to read more) about evaluating these counterfactuals via surgery on causal graphs. But while I can see how such reasoning would work an object that exists in a different possible world (i.e. a "contingently nonexistent" object) I don't (yet?) see how to apply it to a logically impossible ("necessarily nonexistent") object. Is there a good notion available that can say one counterfactuals involving such things is more valid than another?

1TheOtherDave11y

Take the thing apart and test its components in isolation. If in isolation they test for squares and circles, their aggregate is a square-circle detector (which never fires). If in isolation they test for pink unicorns and invisible unicorns, their aggregate is an invisible-pink-unicorn detector (which never fires).

0linkhyrule511y

That does not follow. I'll admit my original example is mildly flawed, but let's tack on something (that's still impossible) to illustrate my point: invisible pink telekinetic unicorns. Still not a thing that can exist, if you define telekinesis as "action at a distance, not mediated through one of the four fundamental forces." But now, if you see an object stably floating in vacuum, and detect no gravitational or electromagnetic anomalies (and you're in an accelerated reference frame like the surface of the earth, etc etc), you can infer the presence of an invisible telekinetic something. Or in general - an impossible object will have an impossible set of sensory inputs, but the set of corresponding sensory inputs still exists.

0Lumifer11y

Yeah, spooky action at a distance :-) Nowadays we usually call it "quantum entanglement" :-D

0linkhyrule511y

... I'm pretty sure no arrangement of entangled particles will create an object that just hovers a half-foot above the Earth's surface.

2bokov11y

Thank you, I think you articulated better than anybody so far what I mean by a goal turning out to be meaningless. Do you believe that a goal must persist down the the most fundamental reductionist level in order to really be a goal? If not, can/should methods be employed in the pursuit of a goal such that the methods exist at a lower level than the goal itself?

0scientism11y

I'm not quite sure what you're saying. I don't think there's a way to identify whether a goal is meaningless at a more fundamental level of description. Obviously Bob would be prone to say things like "today I did x in pursuit of my goal of time travel" but there's no way of telling that it's meaningless at any other level than that of meaning, i.e., with respect to language. Other than that, it seems to me that he'd be doing pretty much the same things, physically speaking, as someone pursuing a meaningful goal. He might even do useful things, like make breakthroughs in theoretical physics, despite being wholly confused about what he's doing.

When you said to suppose that "everything we want is [impossible]", did you mean that literally? Because normally if what you want is impossible, you should start wanting a different thing (or do that super-saiyan effort thing if it's that kind of impossible), but if everything is impossible, you couldn't do that either. If there is no possible action that produces a favorable outcome, I can think of no reason to act at all.

(Of course, if I found myself in that situation, I would assume I made a math error or something and start trying to do thin... (read more)

5bokov11y

I didn't mean it literally. I meant, everything on which we base our long-term plans. For example: You go to school, save up money, try to get a good job, try to advance in your career... on the belief that you will find the results rewarding. However, this is pretty easily dismantled if you're not a life-extensionist and/or cryonicist (and don't believe in an afterlife). All it takes is for you to have the realization that 1) If your memory of an experience is erased thoroughly enough (and you don't have access to anything external that will have been altered by the experience) then the experience might as well have not happened. Or insofar that it altered you through some other way than your memories, is interchangeable with any other experience that would have altered you in the same way. 2) In the absence of an afterlife, if you die all your memories get permanently deleted shortly after, and you have no further access to anything influenced by your past experiences including yourself. Therefore, death robs you of your past, present, and future making it as if you had never lived. Obviously other people will remember you for a while, but you will have no awareness of that because you will simply not exist. Therefore, no matter what you do, it will get cancelled out completely. The way around it is to make a superhuman effort at doing the not-literally-prohibited-by-physics-as-far-as-we-know kind of impossible by working to make cryonics, anti-aging, uploading, or AI (which presumably will then do one of the preceding three for you) possible. But perhaps at an even deeper level our idea of what it is these courses of action are attempting to preserve is itself self-contradictory. Does that necessarily discredit these courses of action?

7gjm11y

Why? If I have to choose between "happy for an hour, then memory-wiped" and "miserable for an hour, then memory-wiped" I unhesitatingly choose the former. Why should the fact that I won't remember it mean that there's no difference at all between the two? One of them involves someone being happy for an hour and the other someone being miserable for an hour. How so? Obviously my experience 100 years from now (i.e., no experience since I will most likely be very dead) will be the same as if I had never lived. But why on earth should what I care about now be determined by what I will be experiencing in 100 years? I don't understand this argument when I hear it from religious apologists ("Without our god everything is meaningless, because infinitely many years from now you will no longer exist! You need to derive all the meaning in your life from the whims of an alien superbeing!") and I don't understand it here either.

3Viliam_Bur11y

If you know you will be memory-wiped after an hour, it does not make sense to make long-term plans. For example, you can read a book you enjoy, if you value the feeling. But if you read a scientific book, I think the pleasure from learning would be somewhat spoiled by knowing that you are going to forget this all soon. The learning would mostly become a lost purpose, unless you can use the learned knowledge within the hour. Knowing that you are unlikely to be alive after 100 years prevents you from making some plans which would be meaningful in a parallel universe where you are likely to live 1000 years. Some of those plans are good according to the values you have now, but are outside of your reach. Thus future death does not make life completely meaningless, but it ruins some value even now.

2gjm11y

I do agree that there are things you might think you want that don't really make sense given that in a few hundred years you're likely to be long dead and your influence on the world is likely to be lost in the noise. But that's a long way from saying -- as bokov seems to be -- that this invalidates "everything on which we base our long-term plans". I wouldn't spend the next hour reading a scientific book if I knew that at the end my brain would be reset to its prior state. But I will happily spend time reading a scientific book if, e.g., it will make my life more interesting for the next few years, or lead to higher income which I can use to retire earlier, buy nicer things, or give to charity, even if all those benefits take place only over (say) the next 20 years. Perhaps I'm unusual, or perhaps I'm fooling myself, but it doesn't seem to me as if my long-term plans, or anyone else's, are predicated on living for ever or having influence that lasts for hundreds of years.

3byrnema11y

First of all, I'm really glad we're having this conversation. This question is the one philosophical issue that has been bugging me for several years. I read through your post and your comments and felt like someone was finally asking this question in a way that has a chance of being understood well enough to be resolved! ... then I began reading the replies, and it's a strange thing, the inferential distance is so great in some places that I also begin to lose the meaning of your original question, even though I have the very same question. Taking a step back -- there is something fundamentally irrational about my personal concept of identity, existence and mortality. I walk around with this subjective experience that I am so important, and my life is so important, and I want to live always. On the other hand, I know that my consciousness is not important objectively. There are two reasons for this. First, there is no objective morality -- no 'judger' outside myself. This raises some issues for me, but since Less Wrong can address this to some extent, possibly more fully, lets put this aside for the time being. Secondly, even by my own subjective standards, my own consciousness is not important. In the aspects that matter to me, my consciousness and identity is identical to that of another. Me and my family could be replaced by another and I really don't mind. (We could be replaced with sufficiently complex alien entities, and I don't mind, or with computer simulations of entities I might not even recognize as persons, and I don't mind, etc.) So why does everything -- in particular -- my longevity and my happiness matter so much to me? Sometimes I try to explain it in the following way: although "cerebrally" I should not care, I do exist, as a biological organism that is the product of evolution, and so I do care. I want to feel comfortable and happy, and that is a biological fact. But I'm not really satisfied with this its-just-a-fact-that-I-care explanatio

1TheOtherDave11y

Can you say more about why "it's just a fact that I care" is not satisfying? Because from my perspective that's the proper resolution... we value what we value, we don't value what we don't value, what more is there to say?

1byrnema11y

It is a fact that I care, we agree. Perhaps the issue is that I believe I should not care -- that if I was more rational, I would not care. That my values are based on a misunderstanding of reality, just as the title of this post. In particular, my values seem to be pinned on ideas that are not true -- that states of the universe matter, objectively rather than just subjectively, and that I exist forever/always. This "pinning" doesn't seem to be that critical -- life goes on, and I eat a turkey sandwich when I get hungry. But it seems unfortunate that I should understand cerebrally (to the extent that I am capable) that my values are based on an illusion, but that my biology demands that I keep on as though my values were based on something real. To be very dramatic, it is like some concept of my 'self' is trapped in this non-nonsensical machine that keeps on eating and enjoying and caring like Sisyphus. Put this way, it just sounds like a disconnect in the way our hardware and software evolved -- my brain has evolved to think about how to satisfying certain goals supplied by biology, which often includes the meta-problem of prioritizing and evaluating these goals. The biology doesn't care if the answer returned is 'mu' in the recursion, and furthermore doesn't care if I'm at a step in this evolution where checking-out of the simulation-I'm-in seems just as reasonable an answer as any other course of action. Fortunately, my organism just ignores those nihilistic opines. (Perhaps this ignoring also evolved, socially or more fundamentally in the hardware, as well.) I say fortunately, because I have other goals besides Tarski, or finding resolutions to these value conundrums.

0TheOtherDave11y

Well, if they are, and if I understand what you mean by "pinned on," then we should expect the strength of those values to weaken as you stop investing in those ideas. I can't tell from your discussion whether you don't find this to be true (in which case I would question what makes you think the values are pinned on the ideas in the first place), or whether you're unable to test because you haven't been able to stop investing in those ideas in the first place. If it's the latter, though... what have you tried, and what failure modes have you encountered?

0byrnema11y

My values seem to be pinned on these ideas (the ones that are not true) because while I am in the process of caring about the things I care about, and especially when I am making a choice about something, I find that I am always making the assumption that these ideas are true -- that the states of the universe matter and that I exist forever. When it occurs to me to remember that these assumptions are not true, I feel a great deal of cognitive dissonance. However, the cognitive dissonance has no resolution. I think about it for a little while, go about my business, and discover some time later I forgot again. I don't know if a specific example will help or not. I am driving home, in traffic, and brain is happily buzzing with thoughts. I am thinking about all the people in cars around me and how I'm part of a huge social network and whether the traffic is as efficient as it could be and civilization and how I am going to go home and what I am going to do. And then I remember about death, the snuffing out of my awareness, and something about that just doesn't connect. It's like I can empathize with my own non-existence (hopefully this example is something more than just a moment of psychological disorder) and I feel that my current existence is a mirage. Or rather, the moral weight that I've given it doesn't make sense. That's what the cognitive dissonance feels like.

0byrnema11y

I want to add that I don't believe I am that unusual. I think this need for an objective morality (objective value system) is why some people are naturally theists. I also think that people who think wire-heading is a failure mode, must be in the same boat that I'm in.

0Lightwave11y

I'm confused what you mean by this. If there wasn't anything more to say, then nobody would/should ever change what they value? But people's values changes over time, and that's a good thing. For example in medieval/ancient times people didn't value animals' lives and well-being (as much) as we do today. If a medieval person tells you "well we value what we value, I don't value animals, what more is there to say?", would you agree with him and let him go on to burning cats for entertainment, or would you try to convince him that he should actually care about animals' well-being? You are of course using some of your values to instruct other values. But they need to be at least consistent and it's not really clear which are the "more-terminal" ones. It seems to me byrnema is saying that privileging your own consciousness/identity above others is just not warranted, and if we could, we really should self-modify to not care more about one particular instance, but rather about how much well-being/eudaimonia (for example) there is in the world in general. It seems like this change would make your value system more consistent and less arbitrary and I'm sympathetic to this view.

1lmm11y

Is that an actual change in values? Or is it merely a change of facts - much greater availability of entertainment, much less death and cruelty in the world, and the knowledge that humans and animals are much more similar than it would have seemed to the medieval worldview?

1TheOtherDave11y

The more I think about this question, the less certain I am that I know what an answer to it might even look like. What kinds of observations might be evidence one way or the other?

0lmm11y

Do people who've changed their mind consider themselves to have different values from their past selves? Do we find that when someone has changed their mind, we can explain the relevant values in terms of some "more fundamental" value that's just being applied to different observations (or different reasoning), or not? Can we imagine a scenario where an entity with truly different values - the good ol' paperclip maximizer - is persuaded to change them? I guess that's my real point - I wouldn't even dream of trying to persuade a paperclip maximizer to start valuing human life (except insofar as live humans encourage the production of paperclips) - it values what it values, it doesn't value what it doesn't value, what more is there to say? To the extent that I would hope to persuade a medieval person to act more kindly towards animals, it would be because and in terms of the values that they already have, that would likely be mostly shared with mine.

1TheOtherDave11y

So, if I start out treating animals badly, and then later start treating them kindly, that would be evidence of a pre-existing valuing of animals which was simply being masked by circumstances. Yes? If I instead start out acting kindly to animals, and then later start treating them badly, is that similarly evidence of a pre-existing lack of valuing-animals which had previously been masked by circumstances? Or does it indicate that my existing, previously manifested, valuing of animals is now being masked by circumstances?

0lmm11y

Either that, or that your present kind-treating of animals is just a manifestation of circumstances, not a true value. Could be either. To figure it out, we'd have to examine those surrounding circumstances and see what underlying values seemed consistent with your actions. Or we could assume that your values would likely be similar to those of other humans - so you probably value the welfare of entities that seem similar to yourself, or potential mates or offspring, and so value animals in proportion to how similar they seem under the circumstances and available information.

0TheOtherDave11y

(nods) Fair enough. Thanks for the clarification.

0Lightwave11y

Well whether it's a "real" change may be besides the point if you put it this way. Our situation and our knowledge are also changing, and maybe our behavior should also change. If personal identity and/or consciousness are not fundamental, how should we value those in a world where any mind-configurations can be created and copied at will?

0lmm11y

So there's a view that a rational entity should never change its values. If we accept that, then any entity with different values from present-me seems to be in some sense not a "natural successor" of present-me, even if it remembers being me and shares all my values. There seems to be a qualitative distinction between an entity like that and upload-me, even if there are several branching upload-mes that have undergone various experiences and would no doubt have different views on concrete issues than present-me. But that's just an intuition, and I don't know whether it can be made rigorous.

0TheOtherDave11y

Fair enough. Agreed that if someone expresses (either through speech or action) values that are opposed to mine, I might try to get them to accept my values and reject their own. And, sure, having set out to do that, there's a lot more to be relevantly said about the mechanics of how we hold values, and how we give them up, and how they can be altered. And you're right, if our values are inconsistent (which they often are), we can be in this kind of relationship with ourselves... that is, if I can factor my values along two opposed vectors A and B, I might well try to get myself to accept A and reject B (or vice-versa, or both at once). Of course, we're not obligated to do this by any means, but internal consistency is a common thing that people value, so it's not surprising that we want to do it. So, sure... if what's going on here is that byrnema has inconsistent values which can be factored along a "privilege my own identity"/"don't privilege my own identity" axis, and they net-value consistency, then it makes sense for them to attempt to self-modify so that one of those vectors is suppressed. With respect to my statement being confusing... I think you understood it perfectly, you were just disagreeing -- and, as I say, you might well be correct about byrnema. Speaking personally, I seem to value breadth of perspective and flexibility of viewpoint significantly more than internal consistency. "Do I contradict myself? Very well, then I contradict myself, I am large, I contain multitudes." Of course, I do certainly have both values, and (unsurprisingly) the parts of my mind that align with the latter value seem to believe that I ought to be more consistent about this, while the parts of my mind that align with the former don't seem to have a problem with it. I find I prefer being the parts of my mind that align with the former; we get along better.

0Lightwave11y

As humans we can't change/modify ourselves too much anyway, but what about if we're able to in the future? If you can pick and choose your values? It seems to me that, for such entity, not valuing consistency is like not valuing logic. And then there's the argument that it leaves you open for dutch booking / blackmail.

0TheOtherDave11y

Yes, inconsistency leaves me open for dutch booking, which perfect consistency would not. Eliminating that susceptibility is not high on my list of self-improvements to work on, but I agree that it's a failing. Also, perceived inconsistency runs the risk of making me seen as unreliable, which has social costs. That said, being seen as reliable appears to be a fairly viable Schelling point among my various perspectives (as you say, the range is pretty small, globally speaking), so it's not too much of a problem. In a hypothetical future where the technology exists to radically alter my values relatively easily, I probably would not care nearly so much about flexibility of viewpoint as an intrinsic skill, much in the same way that electronic calculators made the ability to do logarithms in my head relatively valueless.

0lmm11y

My position would be that actions speak louder than thoughts. If you act as though you value your own happiness more than that of others... maybe you really do value your own happiness more than that of others? If you like doing certain things, maybe you value those things - I don't see anything irrational in that. (It's perfectly normal to self-deceive to believe our values are more selfless than they actually are. I wouldn't feel guilty about it - similarly, if your actions are good it doesn't really matter whether you're doing them for the sake of other people or for your own satisfaction) The other resolution I can see would be to accept that you really are a set of not-entirely-aligned entities, a pattern running on untrusted hardware. At which point parts of you can try and change other parts of you. That seems rather perilous though. FWIW I accept the meat and its sometimes-contradictory desires as part of me; it feels meaningless to draw lines inside my own brain.

0byrnema11y

Yes, this is where I'm at.

2Ishaan11y

Yes, under the assumption that you only value things that future-you will feel the effects of. If this is true, then all courses of action are equally rational and it doesn't matter what you do - you're at null. If you are such a being which values at least one thing that you will not directly experience, then the answer is no, these actions can still have worth. Most humans are like this, even if they don't realize it. Well...you'll still die eventually.

[-]Alex Flint11y20

Another way to think about Dave's situation is that his utility function assigns the same value to all possible futures (i.e. zero) because the one future that would've been assigned a non-zero value turned out to be unrealizable. His real problem is that his utility function has very little structure: it is zero almost everywhere.

I suspect our/my/your utility function is structured in a way that even if broad swaths of possible futures turn out to be unrealizable, the remainder will still contain gradients and local maxima, so there will be some more des... (read more)

[-]cousin_it11y20

This comment by Wei Dai might be relevant, also see steven0461's answer.

No matter what the universe is, all you need for casual decision theory is that you live in a universe in which your actions have consequences, and you prefer some of the possible consequences over others. (you can adjust and alter this sentence for your preferred decision theory)

What if that doesn't happen? What if you didn't prefer any consequence over any other, and you were quite certain no action you took would make any difference to anything that mattered?

Well, it's not a trick question ... you'll just act in any arbitrary way. It won't matter. All a... (read more)

[-][anonymous]11y20

What is the rational course of action in such a situation?

Being able to cast off self-contradictions (A is equal to negation-of-A) is as close as I can offer to a knowable value that won't dissolve. But I may be wrong, depending on what you mean by sufficient deconstruction. If the deconstruction is sufficient, it is sufficient, and therefore sufficient, and you've answered your own question: we cannot know. Which leads to the self-contradiction that we know one thing and that is we cannot know any thing including that we cannot know anything.

Self-co... (read more)

[-]Scott Garrabrant11y20

I live my life under the assumption that I do have achievable values. If I had no values that I could achieve and I was truly indifferent between all possible outcomes, then my decisions do not matter. I can ignore any such possible worlds in my decision theory.

5bokov11y

So, to clarify: We don't know what a perfectly rational agent would do if confronted with all goals being epistemically irrational, but there is no instrumental value in answering this question because if we found ourselves in such a situation we wouldn't care. Is that a fair summary? I don't yet know if I agree or disagree, right now I'm just making sure I understand your position.

0Scott Garrabrant11y

I believe that is a fair summary of my beliefs. Side note: Before I was convinced by EY's stance on compatibilism of free will, I believed in free will for a similar reason.

[-]blacktrance11y10

My terminal value is my own happiness. I know that it exists because I have experienced it, and experience it regularly. I can't imagine a world in which someone convinces me that I don't experience something that I experience.

94hodmt11y

"Happiness" as a concept sounds simple in the same way "a witch did it" sounds simple as an explanation. Most people consider wireheading to be a failure state, and defining "happiness" so as to avoid wireheading is not simple.

0blacktrance11y

Happiness as a feeling is simple, though it may be caused by complex things. If wireheading would make me happy - that is, give me the best possible enjoyable feeling in the world - I'd wirehead. I don't consider that a failure state.

[-]Manfred11y00

Lukeprog's metaethics posts went over this - so how about getting the right answer for him? :)

[-]fubarobfusco11y00

Terminal values are part of the map, not the territory.

1Crux11y

What does this mean? Terminal values are techniques by which we predict future phenomenon? Doesn't sound like we're talking about values anymore, but my only understanding of what it would mean for something to be part of the map is that it would be part of how we model the world, i.e. how we predict future occurrences.

0fubarobfusco11y

The agents that we describe in philosophical or mathematical problems have terminal values. But what confidence have we that these problems map accurately onto the messy real world? To what extent do theories that use the "terminal values" concept accurately predict events in the real world? Do people — or corporations, nations, sub-agents, memes, etc. — behave as if they had terminal values? I think the answer is "sometimes" at best. Sometimes humans can be money-pumped or Dutch-booked. Sometimes not. Sometimes humans can end up in situations that look like wireheading, such as heroin addiction or ecstatic religion ... but sometimes they can escape them, too. Sometimes humans are selfish, sometimes spendthrift, sometimes altruistic, sometimes apathetic, sometimes self-destructive. Some humans insist that they know what humans' terminal values are (go to heaven! have lots of rich, smart babies! spread your memes!) but other humans deny having any such values. Humans are (famously) not fitness-maximizers. I suggest that we are not necessarily anything-maximizers. We are an artifact of an in-progress amoral optimization process (biological evolution) and possibly others (memetic evolution; evolution of socioeconomic entities); but we may very well not be optimizers ourselves at all.

0chaosmage11y

They're theories by which we predict future mental states (such as satisfaction) - our own or those of others.

[-]Locaha11y-20

Heh. It's even worse than that. The idea that Bob is a single agent with terminal values is likely wrong. There are several agents comprising Bob and their terminal values change constantly, depending on the weather.

[-]BaconServ11y-20

An agent optimized to humanity's CEV would instantly recognize that trying to skip ahead would be incredibly harmful to our present psychology; without dreams—however irrational—we don't tend to develop well in terms of CEV. If all of our values break down over time, a superintelligent agent optimized for our CEV will plan for the day our dreams are broken, and may be able to give us a helping hand and a pat on the back to let us know that there are still reasons to live.

This sounds like the same manner of fallacy associated with determinism and the ignorance of the future being derived from the past though the present rather than by a timeless external "Determinator."

0TheOtherDave11y

I think you're vastly underestimating the magnitude of that "helping hand." By way of analogy... a superintelligent agent optimized for (or, more to the point, optimizing for) solar system colonization might well conclude that establishing human colonies on Mars is incredibly harmful to our present physiology, since without oxygen we don't tend to develop well in terms of breathing. It might then develop techniques to alter our lungs, or alter the environment of Mars in such a way that our lungs can function better there (e.g., oxygenate it). An agent optimizing for something that relates to our psychology, rather than our physiology, might similarly develop techniques to alter our minds, or alter our environment in such a way that our minds can function better.

-2BaconServ11y

I think you're vastly underestimating the magnitude of my understanding. In the context of something so shocking as having our naive childhood dreams broken, is there some superintelligent solution that's supposed to be more advanced that consoling you in your moment of grief? To be completely honest, I wouldn't expect a humanity CEV agent to even bother trying to console us; we can do that for each other and it knows this well in advance, it's got bigger problems to worry about. Do you mean to suggest that a superintelligent agent wouldn't be able to foresee or provide solutions to some problem that we are capable of dreaming up today? You'll have to forgive me, but I'm not seeing what it is about my comment that gives you reason think I'm misunderstanding anything here. Do you expect an agent optimized to humanity's CEV is going to use inoptimal strategies for some reason? Will it give a helping interstellar spaceship when really all it needed to do to effectively solve whatever spaceflight-unrelated microproblem in our psychology that exists at the moment before it's solved the problem was a simple pat on the back?

2TheOtherDave11y

Yes. No. No. No.

0BaconServ11y

Fair enough.