Inferring Our Desires

[-]steven046115y270

As such, we'd be unlikely to get what we really want if the world was re-engineered in accordance with a description of what we want that came from verbal introspective access to our motivations.

Interesting as these experimental results are, it sounds to me like you're saying that there's a license to be human (or a license to be yourself, or a license to be your current self).

Suppose I found out that many of my actions that seemed random were actually subtly aimed at invading Moldova, perhaps because aliens with weird preferences placed some functional equivalent of mind control lasers in my brain, and suppose that this fact was not introspectively accessible to me; e.g., a future where Moldova is invaded does not feel more utopian to imagine than the alternatives. Isn't there an important sense in which, in that hypothetical, I don't care about invading Moldova? What if the mind control laser was outside my brain, perhaps in orbit? At what point do I get to say, "I won't let my so-called preferences stop me from doing what's right?"

My impression is that this mindset, where you determine what to do by looking closely at the world to see what you're already doing, and then giving that precedence over what seems right, would be seen as an alien mindset by anyone not affected by certain subtle misunderstandings of the exact sense in which value is subjective. My impression is that once these misunderstandings go away and people ask themselves what considerations they're really moved by, they'll find out that where their utility function (or preferences or whatever) disagrees with what, on reflection, seems right, they genuinely don't care (at least in any straightforward way) what their preferences are, paradoxical as that sounds.

Or am I somehow confused here?

[-]Wei Dai15y80

My impression is that once these misunderstandings go away and people ask themselves what considerations they're really moved by, they'll find out that where their utility function (or preferences or whatever) disagrees with what, on reflection, seems right, they genuinely don't care (at least in any straightforward way) what their preferences are, paradoxical as that sounds.

I think you would have a strong point if the arguments that really move us forms a coherent ethical system, but what if when people find out what they're really moved by, it turns out not to be anything coherent, but just a semi-random set of "considerations" that happen to move a hodgepodge of neural circuits?

[-]steven046115y50

That certainly seems to be to some extent true of real humans, but the point is that even if I'm to some extent a random hodgepodge, this does not obviously create in me an impulse to consult a brain scan readout or a table of my counterfactual behaviors and then follow those at the expense of whatever my other semi-random considerations are causing me to feel is right.

[-]Wei Dai15y60

this does not obviously create in me an impulse to consult a brain scan readout or a table of my counterfactual behaviors

Sure, unless one of the semi-random considerations that moves you is "Crap, my EV is not coherent. Well I don't want to lay down and wait to die, so let's just make an AI that will serve my current desires." :)

[-]lessdazed14y10

Incoherent considerations aren't all that bad. Even if someone prefers A to B, B to C, and C to A, they'll just spend a lot of time switching rather than waiting to die. I guess that people probably prefer changing their considerations in general, so your example of a semi-random consideration is sufficient but not at all unique or uncommon.

[-]Vladimir_Nesov15y60

This is also a reason why looking closely at neuroscience seems like a dubious way of making progress on metaethics.

[-]Sniffnoy15y30

Agreed. But depending on exactly what's meant I think lukeprog is still correct in the statement that "we'd be unlikely to get what we really want if the world was re-engineered in accordance with a description of what we want that came from verbal introspective access to our motivations", simply because the descriptions that people actually produce from this are so incomplete. We'd have to compile something from asking "Would you prefer Moldova to be invaded or not? Would you prefer...", etc., since people wouldn't even think of that question themselves. (And we'd probably need specific scenarios, not just "Moldova is invaded vs. not".)

And since verbal introspection is so unreliable, a better check might be somehow actually simulating you in a world where Moldova is invaded vs. not, and seeing which you prefer. That may be getting a little too close to "license to be human" territory, since that obviously would be revealed preference, but due to human inconsistency - specifically, the fact that our preferences over actions don't seem to always follow from preferences over consequences like they should - I'm not certain it's necessarily the sort that gives us problems. It's when you go by our preferences over actions that you get the real problems...

[-]Will_Newsome15y00

I agree with you, but I think there are a lot of LW people who didn't really like the meta-ethics sequence or liked it but got something odd out of it and who basically think that most of what they value comes from genetic-evolutionary pressures (the aliens in your scenario). Luke's post is very important for them if not for the rest of us who are more interested in where we're getting our notion of 'right' from if not entirely from the aliens.

[-]TimFreeman15y00

Suppose I found out that many of my actions that seemed random were actually subtly aimed at invading Moldova, perhaps because aliens with weird preferences placed some functional equivalent of mind control lasers in my brain

I suspect you'd prefer the aliens turn off their mind-control lasers, and if you had a choice you would have preferred they did not turn on the lasers in the first place.

Once you're corrupted, you're corrupted. At that point we have a mind-controlled Steven wandering around and there's not much point in trying to learn about human motivation from the behavior of humans who are mind-controlled by aliens.

[-]steven046115y60

So the next question is, what if it's not space aliens, but an alien god?

[-]TimFreeman15y30

what if it's not space aliens, but an alien god [really evolution]?

Well, then its unlikely that your random unconscious actions have any ulterior motive as sophisticated as invading Moldova. Your true desires are probably just some combination of increasing your status, activities prone to make babies, and your conscious desires, assuming the conscious desires haven't been subverted by bad philosophy.

I don't see much harm in activities prone to make babies, so the real question here is "If I my unconscious desires lead me to have poor relationships because I'm gaming them for status, and I don't consciously value status, would I want to fix that by changing the unconscious desires?" I think I would, if I could be sure my income wouldn't be affected much, and the fix was well tested, preferably on other people.

But in any case, human volition is going to look like a clump of mud. It has a more-or-less well defined position, but not exactly, and the boundaries are unclear.

[-]PrometheanFaun12y-20

Personally I find having an inconsistent mind so intolerable that as far as I know, I'd face a choice between

A: blocking the aliens out of my head completely
B: Assimilating with them completely.

Correspondingly I have endeavoured to establish a rapport with evolution's design deep enough that I can either

A: Consciously adapt it to the epoch of intelligent agency, for example, instilling within it a fear of solar collapse, a sense of the kinship linking all life on earth, and a cognizance of extra-solar hunting grounds for it to aspire towards. These might sound like rationalizations of noble goals we'd communally established post hoc.. well yes, they would either way, I think those goals were only able to be ennobled upon the favour of evolution's old intents of surviving and spreading.
B: Truly accept as not horrible and perfectly normal, the subjectively horrible unacceptable things that would drive most people away from forging this kind of self-rapport. I'd give examples but these are by their nature hard to index, as if they're communicated tactfully, they don't seem horrible at all.

But then, I was drawn to this thread for a reason. I wonder if all my progress under A is just a mat of rationalizations and if the reality of Her Design is too ugly for me to publicly embrace, and if that very design has been built to anticipate that, and that is why our vocal selves are blanketed with confusion as to our intents.

[+]mcdonald92814y-70

[-]saturn15y250

It strikes me that, in addition to the face-value interpretations given by the researchers, the subjects of some of these experiments could also be seen as rationally responding to incentives not to reveal their desires. The face attractiveness subjects might be afraid of embarrassing an authority figure or "messing up" the experiment. The split-brain patient might (rightly) think a truthful "I don't know" would be interpreted as evasive or hostile. The children might reason that being seen doing a rewarded activity "for free" would remove the basis for any future rewards.

The priming results don't seem to fit this pattern, though.

[-]novalis15y60

Change blindness is a known phenomenon. People simply don't notice many changes that they're not paying attention to.

[-]cwillu15y20

Have you read http://lesswrong.com/lw/jj/conjunction_controversy_or_how_they_nail_it_down/?

[-][anonymous]15y110

I've got the impression that these findings are mostly based on studies done on college students and children. I'm not sure how much I should trust them to generalize (and apply to me specifically), especially with regards to hypocrisy and self-justification theory.

Luke, are you aware of similar studies done on not-neurotypical populations, like say schizophrenics, experienced meditators, autists or experts in analytical fields?

[-]CarlShulman15y40

We do, as shown by decades of research on operant conditioning. When a neutral potential goal is associated with a stimulus of positive affect, we acquire new goals, and we can be unaware that this has happened:

Nice post. I would like more on this and terminal vs instrumental distinctions.

[-]lukeprog15y00

You mean you want to know more about desire acquisition, or more about the neuroscience of operant conditioning?

As for the terminal vs. instrumental distinction, I spent quite a while looking for anything relevant and came up empty, so I'm pretty sure neuroscientists aren't there yet. Alas.

[-]atucker15y40

Has anyone run across a neurological difference between desires that you have, but aren't pursuing much and desires that you don't want as much?

That would clear up a lot of confusion about akrasia vs. hypocrisy, and might be helpful in my personal attempts to increase the extent to which I can follow my stated goals.

[-]lukeprog15y00

See the footnotes here for work on motivation as related to akrasia - in particular, Steel's 'The Nature of Procrastination' article.

I'm not sure what you mean by 'desires you have but aren't pursuing much'. Which concept of desire are you using? The motivational one? I suspect we don't understand the neuroscience of motivation well enough to say much about your question, but I'm not sure I understood your question.

[-]atucker15y90

I would prefer that the weather be sunny and roughly 70 degrees Farenheit with a slight breeze tomorrow, but am doing nothing to try and make that happen.

That could just be a desire that I don't be expect to be able to fulfill (expectation roughly equal to 0), but I intuitively feel that desires are separate from motivation to pursue them (this might be wrong though).

For another example:

Alice wants to make a living writing, and gets happy and misty-eyed at the thought. However, she always says "I can't do it now, I have X Y Z". Meanwhile, she occasionally comments on and reads a group blog..

Bob thinks being a writer would be cool, but doesn't intend to do anything about it. He occasionally comments on a group blog.

Most people would say that Alice wants to be a writer more than Bob does, but they do roughly the same amount of tangible work towards it. Most people would say that Alice wants to be a writer more than Bob does.

I was mostly asking with respect to akrasia vs. hypocrisy, but realized that you can distinguish between the two by making it easier for the person in question to accomplish their goal.

If they choose to fulfill the desire, then they actually want it, and if they don't choose to, then they don't care as much.

[-]shokwave15y20

Alice wants to make a living writing, and gets happy and misty-eyed at the thought. However, she always says "I can't do it now, I have X Y Z". Meanwhile, she occasionally comments on and reads a group blog..

Ouch.

[-]NancyLebovitz15y30

Alternate explanation for the children, drawing, and rewards experiment: Thinking about a reward makes the activity less satisfying.

[-]fubarobfusco15y130

Another alternate explanation:

When you're bored, you entertain yourself by generating arbitrary complex behavior to no particular purpose. Sometimes, while you're doing this, other people notice something you did and give you resources for it.

When this happens, you re-categorize the particular activity that got the resources, from "arbitrary complex behavior I generated for no particular purpose" to "things that get resources from people".

In other words, from "mapping out the space of possible activities" to "generating value in an economy".

Or, from "explore" to "exploit".

Or, from "play" to "work".

Once an activity is successfully classed as "work" — that is, something that gets other people to give you resources — you don't need to do it unless you want more resources. If you don't want more resources right now, you can safely spend time exploring other possible activities. But if you get hungry/poor/etc., you can go back to the best "work" you've found so far, to get resources you need.

(Similarly, some things may not get you resources from others, but may get you attention, affection, status, etc. — which probably gets a distinct classification such as "social activity".)

[-]Perplexed15y20

As such, we'd be unlikely to get what we really want if the world was re-engineered in accordance with a description of what we want that came from verbal introspective access to our motivations. Less naive proposals would involve probing the neuroscience of motivation at the algorithmic level. (Footnote: Inferring desires from behavior alone probably won't work, either.)

There is something a bit bizarre about proposing to extract preferences by scanning brains (because raw behavior and reported introspection are not authentic and primitive enough), and then to insist that these fundamental preferences be extrapolated through a process of reflective equilibrium - thereby becoming more refined.

Is there some argument justifying the claim that what I really want is not what I say I want, and not what I do, but rather what the technician running the scanner says I want. By what definition of "really" is this what I really want? By what definition of "want"?

Note: In some ways this echoes steven0461, but I think it makes some additional points.

[-]NancyLebovitz15y30

I was thinking that the brain scan approach could be tested on a small scale with people living in an environment designed according to their brain scans, but then I realized that the damned thing doesn't ground out. If you don't trust what people say, you can't judge the success of the project by questionnaires or interviews. If you can't trust what people do, then you can't use whether or not they are willing to stay in the project.

I think that if the rates of depression and/or suicide go up, the brain scan project is a failure, but that's a pretty crude measure.

You could use brain scans, of course, but that's no way to find out whether brain scans improve whatever you're trying to improve.

[-]timtyler15y00

If you can't trust what people do, then you can't use whether or not they are willing to stay in the project.

Why can't you trust what people do? That is surely the #1 resource when it comes to what their decision algorithm says. So: train a video camera on them and apply revealed preference theory.

[-]NancyLebovitz15y30

It may not follow from the article, but I think that if people's actions are so much shaped by unconscious effects and miscalculations about happiness and other goals, then actions aren't a very reliable guide. See also the many discussions here about akrasia-- should akrasia be used to deduce that people generally would rather spend large amounts of their time doing things they don't like all that much and don't contribute to their goals?

[-]timtyler15y00

OK, so what people do, and what they say are the #1 and #2 best available resources on what they actually want. Sample from multiple individuals, and I figure some pretty successful reconstructions of their goals will be possible.

[-]Antisuji15y20

Common sense suggests that we infer others' feelings from their appearance and actions, but we have a different, more direct route to our own feelings: direct perception or introspection. In contrast, self-perception theory suggests that our knowledge of ourselves is exactly like our knowledge of others.

It's unclear to me how this is related to the overjustification effect. Could you make the connection more explicit for me? As it is it feels like a non sequitur.

[-]Mercurial15y60

My impression is that lukeprog is interweaving material on the overjustification effect and the introspection illusion. The introspection illusion helps to explain why we're not aware of the overjustification effect in ourselves.

[-]Antisuji15y40

Thanks, that sounds right. I want to say that that was my impression as well, but if I try to be honest with myself I really don't know if that's true.

It still seems like a big leap, and from what I understand Luke may be misrepresenting self-perception theory. Luke claims that "our knowledge of ourselves is exactly like our knowledge of others" while your link says that in the introspection illusion "people wrongly think they have direct insight into the origins of their mental states, while treating others' introspections as unreliable" (my emphasis). These sound like different claims and Luke's is more extraordinary. And for that matter it doesn't seem necessary or helpful for the ensuing discussion of the overjustification effect.

It occurs to me, though, that I'm just arguing because I'm confused about the material, so I'm going to go read some more.

[-]Peacewise14y00

Fascinating. thanks for the post and references lukeprog.

[-]Raw_Power14y-10

This theory seems to debunk the classical "people need an economic incentive to do their jobs": it seems to imply that imposing and economic reward on the tax detracts from the intrinsinc enjoyment of the task by making the task performers think the task is for the sake of the remuneration rather than for its own sake. It also seems to suggest that, were this reward system be removed (but what would it be replaced with, practically speaking?) people might be happier by enjoying their own work.

[-]Vaniver14y10

This theory seems to debunk the classical "people need an economic incentive to do their jobs"

This suggests that if you pay someone to do X, they will be less likely to do X as a hobby, and enjoy X less while they're doing it. That does not imply that if you didn't pay them to do X, they would do it enough to satisfy the job requirements.

There are cases where that's true- open source programming comes to mind- but they seem to be the exception, rather than the norm.

[-]Will_Newsome15y-10

Ah, praise be to ya. This is a damn good start. Ideally the post would give scenarios (imagined or abstracted from cases on Less Wrong or wherever) showing how people do this kind of just-so story introspection and the various places at which you can get a map/territory confusion, or flinch and avoid your beliefs' real weak points, or just not think about a tricky problem for 5 minutes. But we can always do that in another more specialized post at some later point, like Alicorn did with luminosity.

(I tentatively think that people have this intuitive process for evaluating the expected benefits of questioning or thinking up (alternative) plausible causal chains connecting subgoals to goals they think are more justified, but because values sort of feel like they're determined by the map it's easy to think that the feeling of it being hard is an unresolvable property and not something that can be fixed by spending more time looking at the territory. I wildly speculate that the intuitive calculation people use involves a lot of looking for concrete expected benefits for exploring valuespace, which is something that makes Less Wrong cool: the existence of a tribe that can support you and help you think through things makes your brain think it's an okay use of resources to ponder things like cryonics, existential risks, et cetera.)

[-]Will_Newsome15y-30

Interestingly, I think there used to be a group of people who nominally were dedicated to doing the kind of desire-inferring with an aim at concrete progress and conceptual understanding that I can cheer for, though they didn't have all that fancy neuroscience knowledge back then. Fun quiz: can you guess which field I'm referring to? I gave you some hints. And... here is the answer I had in mind. (Check this wiki article for the context, though.) Was that your guess? If not, what was?

[-]gjm15y60

What's your point?

(I looked at the article, and at another more specific WP article, without finding anything that looked much like what Luke was saying. Whether you're aiming to raise the status of the field in question, to discredit Luke or what he's saying by association with something widely disapproved of, to point out an illuminating parallel, or whatever, I think you need to be much more explicit.)

[-]Will_Newsome15y-20

Mostly it just seemed to me like an interesting connection, especially if the notion of eugenics is generalized to be more explicitly reflective on memetics and multi-level selection -- instead of the focus at the individual-biological/organismic (and to a weird extent the racial level) -- at which point it becomes reflexive, even. It has various abstract connections to FAI/CEV. Specifically what seemed cool about the vision of eugenics outlined in the diagram I linked to is that it is reflective, empirical, naturalistic meta-ethics / applied ethics, which I'm not sure went on before that and hasn't come up again except in some very primitive complex systems and dual inheritance studies as far I know. In hindsight I should not have expected these thoughts to automatically enter peoples' brains when they saw the diagram I linked to.

I was also hoping that other people could notice similar connections to other fields that might also be non-obviously related to this theme of refining and more effectively applying our models of morality and meta-ethics.

[-]Will_Newsome15y120

I think I am consistently up against Hofstadter's law of inferential distances, or something.

Dear Will_Newsome's brain,

Please update on the above information, or explain more clearly why you do not want to, and in any case please explain why various parts or coalitions of you do not want to change your strategy for communication or do not want to acknowledge that the lack of a changed strategy is indicative of not updating. Once such concerns are out in the open I promise to reflect carefully and explicitly on how best to reach something like a Pareto improvement, obviously with your guidance and partnership at each step of the way.

Sincerely, Will_Newsome's executive function algorithm that likes to use public commitments as self-bargaining tactics because it read a Less Wrong post that said that was a good idea.

[-]lukeprog15y50

I think I am consistently up against Hofstadter's law of inferential distances

Concur. Will's brain, please update! I would like to understand Will more often. :)

[-]Emile15y00

I thought you may have been talking about the Epicureans.

[-]Will_Newsome15y00

Yeah, they were actually the second group that came to mind in roughly that memespace, but it seems to me what they didn't have was very clear reflection on how they got their goals and how that is relevant. I think that might have been hard to explain or examine without the idea of evolution.

[-]timtyler15y-40

Less naive proposals would involve probing the neuroscience of motivation at the algorithmic level.

I think that seems more naive - once you consider the timescales. Machine intelligence seems likely to come before we have the required brain-scanning/brain-understanding technology. Maybe once we have intelligent machines, then we can figure out the brain - but we likely can't use brain scans to let the intelligent machine know what we want in the first place - because the timing will be wrong.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

60

Inferring Our Desires

60

60

The overjustification effect

Implicit motivation

Implications

Notes

References