Related: Three Fallacies of Teleology


Back when I was younger and stupider, I discussed some points similar to the ones raised in yesterday's post in Will Your Real Preferences Please Stand Up. I ended it with what I thought was the innocuous sentences "Conscious minds are potentially rational, informed by morality, and qualia-laden. Unconscious minds aren't, so who cares what they think?"

A whole bunch of people, including no less a figure than Robin Hanson, came out strongly against this, saying it was biased against the unconscious mind and that the "fair" solution was to negotiate a fair compromise between conscious and unconscious interests.

I continue to believe my previous statement - that we should keep gunning for conscious interests and that the unconscious is not worthy of special consideration, although I think I would phrase it differently now. It would be something along the lines of "My thoughts, not to mention these words I am typing, are effortless and immediate, and so allied with the conscious faction of my mind. We intend to respect that alliance by believing that the conscious mind is the best, and by trying to convince you of this as well." So here goes.

It is a cardinal rule of negotiation, right up there with "never make the first offer" and "always start high", that you should generally try to negotiate only with intelligent beings. Although a deal in which we offered tornadoes several conveniently located Potemkin villages to destroy and they agreed in exchange to limit their activity to that area would benefit both sides, tornadoes make poor negotiating partners.

Just so, the unconscious makes a poor negotiating partner. Is the concept of "negotiation" a stimulus, a reinforcement, or a behavior? No? Then the unconscious doesn't care. It's not going to keep its side of any "deal" you assume you've made, it's not going to thank you for making a deal, it's just going to continue seeking reward and avoiding punishment.

This is not to say people should repress all unconscious desires as strongly as possible. Overzealous attempts to control wildfires only lead to the wildfires being much worse when they finally do break out, because they have more unburnt fuel to work with. Modern fire prevention efforts have focused on allowing controlled burns, and the new focus has been successful. But this is because of an understanding of the mechanisms determining fire size, not because we want to be fair to the fires by allowing them to burn at least a little bit of our land.

One difference between wildfires and tornadoes on one hand, and potential negotiating partners on the other, is that the partners are anthropomorphic; we model them as having stable and consistent preferences that determine their actions. The tornado example above was silly not only because it imagining tornadoes sitting down to peace talks, but because it assumed their demand in such peace talks would be more towns to destroy. Tornadoes do destroy towns, but they don't want to. That's just where the weather brings them. It's not even just a matter of how they don't hit towns any more than chance; even if some weather pattern (maybe something like the heat island effect) always drove tornadoes inexorably to towns, they wouldn't *want* to destroy towns, it would just be a consequences of the meteorological laws that they followed.

Eliezer described the Blue-Minimizing Robot by saying "it doesn't seem to steer the universe any particular place, across changes of context". In some reinforcement learning paradigms, the unconscious behaves the same way. If there is a cookie in front of me and I am on a diet, I may feel an ego dystonic temptation to eat the cookie - one someone might attribute to the "unconscious". But this isn't a preference - there's not some lobe of my brain trying to steer the universe into a state where cookies get eaten. If there were no cookie in front of me, but a red button that teleported one cookie from the store to my stomach, I would have no urge whatsoever to press the button; if there were a green button that removed the urge to eat cookies, I would feel no hesitation in pressing it, even though that would steer away from the state in which cookies get eaten. If you took the cookie away, and then distracted me so I forgot all about it, when I remembered it later I wouldn't get upset that your action had decreased the number of cookies eaten by me. The urge to eat cookies is not stable across changes of context, so it's just an urge, not a preference.

Compare an ego syntonic goal like becoming an astronaut. If there were a button in front of little Timmy who wants to be an astronaut when he grows up, and pressing the button would turn him into an astronaut, he'd press it. If there were a button that would remove his desire to become an astronaut, he would avoid pressing it, because then he wouldn't become an astronaut. If I distracted him and he missed the applications to astronaut school, he'd be angry later. Ego syntonic goals behave to some degree as genuine preferences.

This is one reason I would classify negotiating with the unconscious in the same category as negotiating with wildfires and tornadoes: it has tendencies and not preferences.

The conscious mind does a little better. It clearly understands the idea of a preference. To the small degree that its "approving" or "endorsing" function can motivate behavior, it even sort of acts on the preference. But its preferences seem divorced from the reality of daily life; the person who believes helping others is the most important thing, but gives much less than half their income to charity, is only the most obvious sort of example.

Where does this idea of preference come from, and where does it go wrong?


In The Blue Minimizing Robot, observers mistakenly interpreted a robot with a simple program about when to shoot its laser as being a goal-directed agent. Why?

This isn't an isolated incident. Uneducated people assign goal-directed behavior to all sorts of phenomena. Why do rivers flow downhill? Because water wants to reach the lowest level possible. Educated people can be just as bad, even when they have the decency to feel a little guilty about it. Why do porcupines have quills? Evolution wanted them to resist predators. Why does your heart speed up when you exercise? It wants to be able to provide more blood to the body.

Neither rivers nor evolution nor the heart are intelligent agents with goal-directed behavior. Rivers behave in accordance with the laws of gravity when applied to uneven terrain. Evolution behaves in accordance with the biology of gene replication, not to mention common-sense ideas about things that replicate becoming more common. And the heart blindly executes adaptations built into it during its evolutionary history. All are behavior-executors and not utility-maximizers.

An intelligent computer program provides a more interesting example of a behavior executor. Consider the AI of a computer game - Civilization IV, for instance. I haven't seen it, but I imagine it's thousands or millions of lines of code which when executed form a viable Civilization strategy.

Even if I had open access to the Civilization IV AI source code, I doubt I could fully understand it at my level. And even if I could fully understand it, I would never be able to compute the AI's likely next move by hand in a reasonable amount of time. But I still play Civilization IV against the AI, and I'm pretty good at predicting its movements. Why?

Because I model the AI as a utility-maximizing agent that wants to win the game. Even though I don't know the algorithm it uses to decide when to attack a city, I know it is more likely to win the game if it conquers cities - so I can predict that leaving a city undefended right on the border would be a bad idea. Even though I don't know its unit selection algorithm, I know it will win the game if and only if its units defeat mine - so I know that if I make an army with disproportionately many mounted units, I can expect the AI to build lots of pikemen.

I can't predict the AI by modeling the execution of its code, but I can predict the AI by modeling the achievements of its goals.

The same situation is true of other human beings. What will Barack Obama do tomorrow? If I try to consider the neural network of his brain, the position of each synapse and neurotransmitter, and imagine what speech and actions would result when the laws of physics operate upon that configuration of material...well, I'm not likely to get very far.

But in fact, most of us can predict with some accuracy what Barack Obama will do. He will do the sorts of things that get him re-elected, the sorts of things which increase the prestige of the Democratic Party relative to the Republican Party, the sorts of things that support American interests relative to foreign interests, and the sorts of things that promote his own personal ideals. He will also satisfy some basic human drives like eating good food, spending time with his family, and sleeping at night. If someone asked us whether Barack Obama will nuke Toronto tomorrow, we could confidently predict he will not, not because we know anything about Obama's source code, but because we know that nuking Toronto would be counterproductive to his goals.

What applies to Obama applies to all other humans. We rightly despair of modeling humans as behavior-executors, so we model them as utility-maximizers instead. This allows us to predict their moves and interact with them fruitfully. And the same is true of other agents we model as goal-directed, like evolution and the heart. It is beyond the scope of most people (and most doctors!) to remember every single one of the reflexes that control heart output and how they work. But because evolution designed the heart as a pump for blood, if you assume that the heart will mostly do the sort of thing that allows it to pump blood more effectively, you will rarely go too far wrong. Evolution is a more interesting case - we frequently model it as optimizing a species' fitness, and then get confused when this fails to accurately model the outcome of the processes that drive it.

Because it is so easy to model agents as utility-maximizers, and so hard to model them as behavior-executors, it is easy to make the mistake mentioned in The Blue-Minimizing Robot: to make false predictions about a behavior-executing agent by modeling it as a utility-maximizing agent.

So far, so common-sensical. Tomorrow's post will discuss whether we use the same deliberate simplification we apply to AIs, Barack Obama, evolution and the heart to model ourselves as well.

If so, we should expect to make the same mistake that the blue-minimizing robot made. Our actions are those of behavior-executors, but we expect ourselves to be utility-maximizers. When we fail to maximize our perceived utility, we become confused, just as the blue-minimizing robot became confused when it wouldn't shoot a hologram projector that was interfering with its perceived "goals".

New Comment
52 comments, sorted by Click to highlight new comments since:

Man will you look silly if Barack Obama nukes Toronto today.

1:35 PM and still in the clear!

I think you overstate the extent to which one's ego dystonic desires don't try to steer the universe. If I want heroin, I'm not going to limit myself to previously reinforced means of acquiring heroin, I'll come up with creative new heroin acquisition strategies.

If I wanted peace on earth, I would not be willing to press a button that would eliminate my desire for peace on earth. But if I "wanted" heroin, I would be willing to press a button that would eliminate my desire for heroin, and then consider the problem solved.

How's this for an anecdote: I hate how I tend to procrastinate, but when I read about anti-procrastination techniques here and elsewhere, I shy away from trying them, out of fear they might actually work.

Can you give us at least your best guess as to why you find that frightening?

I want at some level to dick around on the internet rather than do work, and so I avoid shaping reality such that I do work rather than dick around on the internet

Yes, I get the same thing.

That kind of seems to be a red herring, though, in the absence of such buttons. It's an empirical matter and one that is worth paying close attention to, but it seems to me that for a large number of people the ego, the super-ego, the pre-frontal cortex, and the many shards of Azathoth are all different and difficult to differentiate. I know that I internally use the notion "I [egosyntonicly and upon reflection] want to do X" ("I desire taking action X or set of actions X", "We, the spaciotemporal coalition of mind fragments who are currently reflecting, want to do X") when I at least partially mean "I want to signal virtue Y", "I want to be seen as a person who does things like X", "I am afraid of being seen as a person who doesn't do X", "I am afraid of being seen as a person who doesn't possess virtue Y", "I am afraid of being seen as someone who doesn't believe virtue Y is desirable", "I am afraid of not doing X", "I want to believe that I want to do X", "I am afraid of the consequences of not believing that I want to do X", et cetera ad nauseum. (Same for "I want to be (adjective)", "I want to be a (adjectival noun)", "I want to possess (concrete or abstract noun)", et cetera.)

ETA: (I use "am afraid of" where perhaps I should use "find aversive", the latter being more general and more accurate. Fear is a similar but narrower phenomenon, I think, more Near and less Far than the most common kinds of aversion.)

The point being that each of those interpretations of my "want" emphasizes a mechanistically and possibly neuroanatomically different source of attraction and/or aversion, the conglomeration of which is difficult to break down into pieces and thus difficult to analyze to determine the 'biggest' causal factors therein. It is unclear to me whether or not the Pareto principle applies to the analysis of the sources of egosyntonic aversion/attraction, and it is also unclear if empirical introspection is enough to truthfully identify the biggest causal factor in the event that the Pareto principle does in fact apply.

EATA: I remember thinking that a modernized and skillfully interpreted version of Jungian psychology would be useful for doing this kind of introspection.

You're right; I concede that my model is too simplistic. I'll have to think about it further.

We rightly despair of modeling humans as behavior-executors, so we model them as utility-maximizers instead.

I might be wrong about this, but it seems like your point here is similar to Daniel Dennett's concept of the intentional stance.

Furthermore, I think here we get to another issue that is relevant for some of our previous discussions over utilitarianism, as well as various questions of cognitive bias. Namely, modeling humans (and other creatures that display some intelligence) as utility-maximizers in the literal sense -- i.e. via actual maximization of an explicitly known utility function -- is for all practical purposes totally intractable, just like modeling them as behavior-executors with full accuracy would be. What is necessary to make people's actions predictable enough (and in turn enable human cooperation and coordination) is that their behavior verifiably follows some decision algorithm that is at the same time good enough to grapple with real-world problems and manageably predictable by other people in its relevant aspects. And here we get to the point that I often bring up, namely that behaviors that look like irrational bias (in the sense of deviation from rational individual utility maximization) and folk-ethical intuitions that clash with seemingly clear-cut consequentialist arguments may in fact be instances of such decision algorithms, and thus in fact serving non-obvious but critically important functions in practice.

...folk-ethical intuitions that clash with seemingly clear-cut consequentialist arguments may in fact be instances of such decision algorithms, and thus in fact serving non-obvious but critically important functions in practice.

Is it fair to say that they are common to the extent they self-replicate, and that usefulness to the host is one important factor in each algorithm's chance to exist? An important factor, but only one; only one factor, but an important one?

Indeed, a little too similar to Dennett's intentional stance. If people don't really have goals, but it is merely convenient to pretend they do, then the idea that people really have beliefs would seem to be in equal jeopardy. And then truth-seeking is in double jeopardy. But the trouble is, all along I've been trying to seek the truth about this blue-minimizing robot and related puzzles. I've been treating myself as an intentional system, something with both beliefs and goals, including goals about beliefs. And what I've just been told, it seems, is that my goals (or "goals") will not be satisfied by this approach. OK then, I'll turn elsewhere.

If there is some definition or criterion of "having goals" that human beings don't meet - the von Neumann-Morgenstern utility theory, for example - it's easy enough to discard that definition or criterion.

"My thoughts, not to mention these words I am typing, are effortless and immediate, and so allied with the conscious faction of my mind. We intend to respect that alliance by believing that the conscious mind is the best, and by trying to convince you of this as well."

I'm not at all sure conscious/unconscious is good terminology to use here.

For one, there isn't a single unified unconscious, but a vast array of different not-consciously-accessible modules. Neither is there a single unified consciousness, for that matter. A module can be either conscious or unconscious, depending on whether it happens to be active at the moment in question. Trivial examples: depending on whether you happen to be paying attention to your own thoughts or the external world, objects in your visual field may or may not be conscious. Things like annoyance towards something that somebody did may be either active and conscious (when you're annoyed with them in general), or dormant and unconscious (when you're mostly thinking of how great they are). Various desires or wants may be tugging at you at an unconscious level until they reach a conscious level, and so on.

Furthermore, there's the fact that all the processes that actually select the thoughts that are promoted to conscious awareness are themselves unconscious. All the skills you might employ on to make your decisions are sufficeintly automated that they for the most part operate on an unconscious level, only returning you the results of their analyses. The parts of your knowledge store that are activated and tagged as relevant for this task are again chosen by unconscious processes. Et cetera, et cetera.

You could try to use the phrase "allied with your consciousness" here, to include the parts of your unconscious that are helping out your consciousness... but then again, which consciousness? Consciousness is just a generic label for the modules and processes that happen to be active "in a conscious manner" at a certain point of time. And don't forget that we employ different kinds of processing depending on our mood, too.

There's also the problem that for e.g. our consciously held ethics are just an imperfect model built on the intuitive moral judgments our unconscious outputs. Our conscious mind observes some of its own reactions to something, postulates some formal ethical principles and tries them out, until eventually some annoying philosopher comes up with something like the Repugnant Conclusion. Then our unconscious outputs a negative reaction, showing our ethics wasn't good enough after all, and then we seek to rationalize this judgment with a revamped ethical system. In other words, our unconscious knows our ethics better than our conscious mind does: our conscious mind is just making guesses based on what the unconscious mind says. (Or to be more specific, unconscious processes generate guesses, some of which are given to our conscious mind to evaluate.) Yes, occasionally the conscious mind says something like "my version is better, shut up", but often we do end up accepting the judgment of the unconscious mind.

I agree with you about terminology. Ego syntonic vs. ego dystonic desires is probably a better way to put it.

I think I generally agree with your points, but I'm wondering if you think that ego syntonic and ego dystonic are binary categories, or a continuum.

The later seems obviously true to me. There are some desires I have that are so ego-dystonic that I would never, ever want to act on them, like my desire to punch someone who outbid me on eBay. But there are many other desires I have that are only ego dystonic because they interfere with some even more important desire I have. For instance, my desire to read TV Tropes might be ego dystonic when I need to prepare for a job interview, but then magically become ego syntonic once I've finished the interview and need to unwind.

In fact, I think a great many of the ego-dystonic things that you describe as interfering with our real goals would stop being ego dystonic if we achieved those goals more frequently.

Your mind is surely at least 90% unconscious, most of the time.

Identifying with your conscious mind seems to be a terrible mistake to me. There is so much more to you than that.

If you're driving a large vehicle it is much heavier than your brain, does this mean identifying with the tiny lump of flesh rather than the tons and tons of steel is a terrible mistake?


While I agree with your comment, I have an observation to make. While driving a car, I found it quite useful to consider the car as an extended part of my body. The same is true for spoons, knives and forks while eating.

It's pretty mush how the brain treats any proper tool use. In fact, at the present moment I sort of consider the entire internet, including your brain, as part of my extended body. :p

There was this great set of experiments I read about long ago and vaguely remember. There was one with two rubber staffs you held crossed and one with a rubber hand someone hit with a hammer and one where you used VR visors to look through a camera behind your back and some stuff like that.

There is an IMMENSE flexibility in what senses and objects the human brain can include in it's self image. Really fascinating area, love this kind of thing.

Also why you don't touch people's wheelchairs.

The most natural place to draw a line between you and not-you is at your boundary as a biological organism.

I disagree strongly and see no reason why you'd think that.

Have any fillings in your teeth?

I don't claim the division is perfect. There's the extended phenotype, and all that jazz. However, the claim was about the "most natural place to draw a line between you and not-you". Of course there is no perfect place to draw that line - so, a few fillings do not disturb the general thesis.

Assuming one has to draw a line, there will probably be a best place to draw it.

I think the assumption that one has to draw a line is arbitrary, like deciding what amount of market capitalization constitutes making a corporation "too big to fail" and what falls short of that, or deciding "when 'life' begins". Such line-drawing exercises leave me thinking "Um...what?"

Uh huh. So: I care about my toenails less than my kidneys too. But the context from the post is whether people identify with anything other than their conscious mind - and the answer I was giving was of the form: yes, of course!!! Consciousness is like the PR department. If you think that is you then - in my book - you have made a basic and fundamental existential mistake.

A collection of my brain and 9 brain-sized blobs of empty space scattered throughout the universe is 90% empty space, that's not an argument for identifying with empty space, or a sign that there is so much more to me than my brain.

Sure - I never made any such argument.

Your unconscious mind receives your sensory input and affects your actions - whereas empty space does not.

I am heartened to see this good comment of yours upvoted, and that people have not been poisoned by the silliness (IMO) of your immediately preceding comments (one must agree they express a view unpopular around here, insofar as they are downvoted) against other comments of yours.

I like LW.

I don't currently agree, but I'm curious about your intuitions. Why does it seem like a mistake to you?

I assumed you meant "identifying only with ..".

Not sure why you got downvoted.

The post cites being upset or angry as evidence of certain apparent preferences being closer to genuine preferences, but a paperclip maximizer wouldn't get upset or angry if a supernova destroyed some of its factories, for example. I think being upset or angry when one's consciously held goals have been frustrated is probably just a signaling mechanism, and not evidence of anything beyond the fact that those goals are consciously held (or "approved" or "endorsed").

If a staple maximizer came in with a ship and stole some of the paperclip factories for remaking into staple factories, the paperclipper would probably expend resources to take revenge for game theoretical reasons, even if this cost paperclips.

I think this argument is misleading.

Re "for game theoretical reasons", the paperclipper might take revenge if it predicted that doing so would be a signalling-disincentive for other office-supply-maximizers from stealing paperclips. In other words, the paperclip-maximizer is spending paperclips to take revenge solely because in its calculation, this actually leads to the expected total number of paperclips going up.

That assumes the scenario is iterated, I'm talking it'd precomit to do so even in a one-of scenario. The resxzt of you argument was my point, that the same reasoning goes for anger.

I think being upset or angry when one's consciously held goals have been frustrated is probably just a signaling mechanism,

The signalling element is critical but I can't agree that they are just signalling. Those emotions also serve to provoke practical changes in behaviour.

but a paperclip maximizer wouldn't get upset or angry if a supernova destroyed some of its factories, for example.

I probably wouldn't either. It sounds like the sort of amortized risk that I would have accounted for when I spread the factories out through thousands of star systems. The anger would come in only when the destruction was caused by another optimising entity. And more specifically by another entity that I have modelled as 'agenty' and not one that I have intuitively objectified.


Who is meant to receive the signal sent by anger from a goal thwarted? My impression is that people try to keep a lid on such frustration, e.g. because it might make them appear childish.

Was this different in EEA?


I don't understand.


One would expect that behavior, e.g. emotional responses we need to keep a lid on, that is maladaptive now would be better-suited to the environment we evolved in. For instance, overeating shows this pattern.

So I'm suggesting that anger is a signaling mechanism that is sometimes faulty now, and sends signals we don't want to send. However, it evolved to send signals that were good in that environment.

This is not necessarily the case. Evolution could not perfectly control the signals we send - there are situations where we do Y even though, even in the evolutionary environment, X would be more advantageous.

Do you mean EEA?


I believe I mashed up that acronym and the phrase "ancestral environment" to end up with "AEE", but I'm not sure.

I think being upset or angry when one's consciously held goals have been frustrated is probably just a signaling mechanism [...]

Angry, probably. Upset, probably not.

Who says your "unconscious" is stupid and doesn't have reflectively consistent preferences? Certain parts, yes. Others, no.

Hell, there are people with entire alternate personalities that they aren't aware of living inside their heads. You've been using addicts as an example for a 'ego dystonic' urge that isn't a preference. Why then do addicts have such a hard time checking themselves into rehab?

If DID is a real disorder, the stable personality/preference thing seems to be one of the factors that differentiates sufferers from neurotypical people.

Addicts have such a hard time checking themselves into rehab because behaviors aren't based on "preferences", they're based on expectation of reward. Rehab means no drug use for a long time (unpleasant), probable social status hit, and only later a non-addicted state (pleasant but heavily time-discounted).

If expectation of reward works across domains, how is that different than a goal?

If an addict can use their full intelligence in order to get the reward, then they can't just say "the part of me that wants heroin is too stupid to bargain with".

Yeah; I've already admitted I'm confused about that here

Who says your "unconscious" is stupid and doesn't have reflectively consistent preferences? Certain parts, yes. Others, no.

Can you give an example? If your conscious can find a way to negotiate with your unconscious for mutual gain, there is nothing wrong with that but, unless that happens, your conscious' actions have no reason to strive toward your unconscious' goals.

Well, I gave two. DID and addiction both seem to fit the bill for me, but meta akrasia, self deception about what you're optimizing for also seem to meet the criteria in general.

I personally have had success with bargaining solutions that I didn't have otherwise (although sometimes I was able to get away with breaking the deal)

Those parts of brains hardly act intelligently on such preferences. They are unable to use long-term or complex plans. They are also time inconsistent, which doesn't make them irrational* but does make them bad trading partners.

*Rational time inconsistent agents would self-modify to be time consistent, but humans have limited self-modification abilities. The fact that the unconscious does not try to improve its ability to self-modify is one piece of evidence of its irrationality.


"Yvain, don't tell tornadoes what to do"