Conflicts Between Mental Subagents: Expanding Wei Dai's Master-Slave Model


Scott Alexander

Related to: Alien Parasite Technical Guy, A Master-Slave Model of Human Preferences

In Alien Parasite Technical Guy, Phil Goetz argues that mental conflicts can be explained as a conscious mind (the "alien parasite”) trying to take over from an unsuspecting unconscious.

Last year, Wei Dai presented a model (the master-slave model) with some major points of departure from Phil's: in particular, the conscious mind was a special-purpose subroutine and the unconscious had a pretty good idea what it was doing1. But Wei said at the beginning that his model ignored akrasia.

I want to propose an expansion and slight amendment of Wei's model so it includes akrasia and some other features of human behavior. Starting with the signaling theory implicit in Wei's writing, I'll move on to show why optimizing for signaling ability would produce behaviors like self-signaling and akrasia, speculate on why the same model would also promote some of the cognitive biases discussed here, and finish with even more speculative links between a wide range of conscious-unconscious conflicts.

The Signaling Theory of Consciousness

This model begins with the signaling theory of consciousness. In the signaling theory, the conscious mind is the psychological equivalent of a public relations agency. The mind-at-large (hereafter called U for “unconscious” and similar to Wei's “master”) has socially unacceptable primate drives you would expect of a fitness-maximizing agent like sex, status, and survival. These are unsuitable for polite society, where only socially admirable values like true love, compassion, and honor are likely to win you friends and supporters. U could lie and claim to support the admirable values, but most people are terrible liars and society would probably notice.

So you wall off a little area of your mind (hereafter called C for “conscious” and similar to Wei's “slave”) and convince it that it has only admirable goals. C is allowed access to the speech centers. Now if anyone asks you what you value, C answers "Only admirable things like compassion and honor, of course!" and no one detects a lie because the part of the mind that's moving your mouth isn't lying.

This is a useful model because it replicates three observed features of the real world: people say they have admirable goals, they honestly believe on introspection that they have admirable goals, but they tend to pursue more selfish goals. But so far, it doesn't explain the most important question: why do people sometimes pursue their admirable goals and sometimes not?

Avoiding Perfect Hypocrites

In the simplest case, U controls all the agent's actions and has the ability to set C's values, and C only controls speech. This raises two problems.

First, you would be a perfect hypocrite: your words would have literally no correlation to your actions. Perfect hypocrites are not hard to notice. In a world where people are often faced with Prisoners' Dilemmas against which the only defense is to swear a pact to mutually cooperate, being known as the sort of person who never keeps your word is dangerous. A recognized perfect hypocrite could make no friends or allies except in the very short-term, and that limitation would prove fatal or at least very inconvenient.

The second problem is: what would C think of all this? Surely after the twentieth time protesting its true eternal love and then leaving the next day without so much as a good-bye, it would start to notice it wasn't pulling the strings. Such a realization would tarnish its status as "the honest one"; it couldn't tell the next lover it would remain forever true without a little note of doubt creeping in. Just as your friends and enemies would soon realize you were a hypocrite, so C itself would realize it was part of a hypocrite and find the situation incompatible with its idealistic principles.

Other-signaling and Self-Signaling

You could solve the first problem by signaling to others. If your admirable principle is to save the rainforest, you can loudly and publicly donate money to the World Wildlife Fund. When you give your word, you can go ahead and keep it, as long as the consequences aren't too burdensome. As long as you are seen to support your principles enough to establish a reputation for doing so, you can impress friends and allies and gain in social status.

The degree to which U gives permission to support your admirable principles depends on the benefit of being known to hold the admirable principle, the degree to which supporting the principle increases others' belief that you genuinely hold the principle, and the cost of the support. For example, let's say a man is madly in love with a certain woman, and thinks she would be impressed by the sort of socially conscious guy who believes in saving the rainforest. Whether or not he should donate $X to the World Wildlife Fund depends on how important winning the love of this woman is to him, how impressed he thinks she'd be to know he strongly believes in saving the rainforests, how easily he could convince her he supports the rainforests with versus without a WWF donation - and, of course, the value of X and how easily he can spare the money. Intuitively, if he's really in love, she would be really impressed, and it's only a few dollars, he would do it; but not if he's not that into her, she doesn't care much, and the WWF won't accept donations under $1000.

Such signaling also solves the second problem, the problem of C noticing it's not in control - but only partly. If you only give money when you're with a love interest and ey's standing right there, and you only give the minimum amount humanly possible so as to not repulse your date, C will notice that also. To really satisfy C, U must support admirable principles on a more consistent basis. If a stranger comes up and gives a pitch for the World Wildlife Fund, and explains that it would really help a lot of rainforests for a very low price, U might realize that C would get a little suspicious if it didn't donate at least a token amount. This kind of signaling is self-signaling: trying to convince part of your own mind.

This model modifies the original to include akrasia2 (U refusing to pursue C's goals) and the limitations on akrasia (U pursues C's goals insofar as it has to convince other people - and C itself - its signaling is genuine).

It also provides a key to explaining some superficially weird behavior. A few weeks ago, I saw a beggar on the sidewalk and walked to the other side of the street to avoid him. This isn't sane goal-directed behavior: either I want beggars to have my money, or I don't. But under this model, once the beggar asks for money, U has to give it or risk C losing some of its belief that it is compassionate and therefore being unable to convince others it is compassionate. But as long as it can avoid being forced to make the decision, it can keep both its money and C's innocence.

Thinking about this afterward, I realized how silly it was, and now I consider myself unlikely to cross the street to avoid beggars in the future. In the language of the model, C focuses on the previously subconscious act of avoiding the beggar and realizes it contradicts its principles, and so U grudgingly has to avoid such acts to keep C's innocence and signaling ability intact.

Notice that this cross-the-street trick only works if U can act without C being fully aware what happened or its implications. As we'll see below, this ability of U's has important implications for self-deception scenarios.

From Rationality to Rationalization

So far, this model has assumed that both U and C are equally rational. But a rational C is a disadvantage for U for exactly the reasons mentioned in the last paragraph; as soon as C reasoned out that avoiding the beggar contradicted its principles, U had to expend more resources giving money to beggars or lose compassion-signaling ability. If C is smart enough to realize that its principle of saving the rainforest means you ought to bike to work instead of taking the SUV, U either has to waste resources biking to work or accept a decrease in C's environmentalism-signaling ability. Far better that C never realizes it ought to bike to work in the first place.

So it's to U's advantage to cripple C. Not completely, or it loses C's language and reasoning skills, but enough that it falls in line with U's planning most of the time.

“How, in detail, does U cripple C?” is a restatement of one of the fundamental questions of Less Wrong and certainly too much to address in one essay, but a few suggestions might be in order:

- The difference between U and C seems to have a lot to do with two different types of reasoning. U seems to reason over neural inputs – it takes in things like sense perceptions and outputs things like actions, feelings, and hunches. This kind of reasoning is very powerful – for example, it can take as an input a person you've just met and immediately output a calculation of their value as a mate in the form of a feeling of lust – but it can also fail in weird ways, like outputting a desire to close a door three dozen times into the head of an obsessive-compulsive, or succumbing to things like priming. C, the linguistic one, seems to reason over propositions – it takes propositions like sentences or equations as inputs, and returns other sentences and equations as outputs. This kind of reasoning is also very powerful, and also produces weird errors like the common logical fallacies.

- When U takes an action, it relays it to C and claims it was C's action all along. C never wonders why its body is acting outside of its control; only why it took an action it originally thought it disapproved of. This relay can be cut in some disruptions of brain function (most convulsions, for example, genuinely seem involuntary), but remains spookily intact in others (if you artificially activate parts of the brain that cause movement via transcranial magnetic stimulation, your subject will invent some plausible sounding reason for why ey made that movement)3.

- C's crippling involves a tendency for propositional reasoning to automatically cede to neural reasoning and to come up with propositional justifications for its outputs, probably by assuming U is right and then doing some kind of pattern-matching to fill in blanks. For example, if you have to choose to buy one of two cars, and after taking a look at them you feel you like the green one more, C will try to come up with a propositional argument supporting the choice to buy the green one. Since both propositional and neural reasoning are a little bit correlated with common sense, C will often hit on exactly the reasoning U used (for example, if the red car has a big dent in it and won't turn on, it's no big secret why U's heuristics rejected it) but in cases where U's justification is unclear, C will end up guessing and may completely fail to understand the real reasons behind U's choice. Training in luminosity can mitigate this problem, but not end it.

- A big gap in this model is explaining why sometimes C openly criticizes U, for example when a person who is scared of airplanes says “I know that flying is a very safe mode of transportation and accidents are vanishingly unlikely, but my stupid brain still freaks out every time I go to an airport”. This might be justifiable along the lines that allowing C to signal that it doesn't completely control mental states is less damaging than making C look like an idiot who doesn't understand statistics – but I don't have a theory that can actually predict when this sort of criticism will or won't happen.

- Another big gap is explaining how and when U directly updates on C's information. For example, it requires conscious reasoning and language processing to understand that a man on a plane holding a device with a countdown timer and shouting political and religious slogans is a threat, but a person on that plane would experience fear, increased sympathetic activation, and other effects mediated by the unconscious mind.

This part of the model is fuzzy, but it seems safe to assume that there is some advantage to U in changing C partially, but not completely, from a rational agent to a rubber-stamp that justifies its own conclusions. C uses its propositional reasoning ability to generate arguments that support U's vague hunches and selfish goals.

How The World Would Look

We can now engage, with a little bit of cheating, in some speculation about how a world of agents following this modified master-slave model would look.

You'd claim to have socially admirable principles, and you'd honestly believe these claims. You'd pursue these claims at a limited level expected by society: for example, if someone comes up to you and asks you to donate money to children in Africa, you might give them a dollar, especially if people are watching. But you would not pursue them beyond the level society expects: for example, even though you might consciously believe saving a single African child (estimated cost: $900) is more important than a plasma TV, you would be unlikely to stop buying plasma TVs so you could give this money to Africa. Most people would never notice this contradiction; if you were too clever to miss it you'd come up with some flawed justification; if you were too rational to accept flawed justifications you would just notice that it happens, get a bit puzzled, call it “akrasia”, and keep doing it.

You would experience borderline cases, where things might or might not be acceptable, as moral conflicts. A moral conflict would feel like a strong desire to do something, fighting against the belief that, if you did it, you would be less of the sort of person you want to be. In cases where you couldn't live with yourself if you defected, you would cooperate; in cases where you could think up any excuse at all that allowed you to defect and still consider yourself moral, you would defect.

You would experience morality not as a consistent policy to maximize utility across both selfish and altruistic goals, but as a situation-dependent attempt to maximize feelings of morality, which could be manipulated in unexpected ways. For example, as mentioned before, going to the opposite side of the street from a beggar might be a higher-utility option than either giving the beggar money or explicitly refusing to do so. In situations where you were confident in your morality, you might decide moral signaling was an inefficient use of resources – and you might dislike people who would make you feel morally inferior and force you to expend more resources to keep yourself morally satisfied.

Your actions would be ruled by “neural reasoning” that outputs expectations different from the ones your conscious reasoning would endorse. Your actions might hinge on fears which you knew to be logically silly, and your predictions might come from a model different from the one you thought you believed. If it was necessary to protect your signaling ability, you might even be able to develop and carry out complicated plots to deceive the conscious mind.

Your choices would be determined by illogical factors that influenced neural switches and levers and you would have to guess at the root causes of your own decisions, often incorrectly – but would defend them anyway. When neural switches and levers became wildly inaccurate due to brain injury, your conscious mind would defend your new, insane beliefs with the same earnestness with which it defended your old ones.

You would be somewhat rational about neutral issues, but when your preferred beliefs were challenged you would switch to defending them, and only give in when it is absolutely impossible to keep supporting them without looking crazy and losing face.

You would look very familiar.



1. Wei Dai's model gets the strongest compliment I can give: after reading it, it seemed so obvious and natural to think that way that I forgot it was anyone's model at all and wrote the first draft of this post without even thinking of it. It has been edited to give him credit, but I've kept some of the terminology changes to signify that this isn't exactly the same. The most important change is that Wei thinks actions are controlled by the conscious mind, but I side with Phil and think they're controlled by the unconscious and relayed to the conscious. The psychological evidence for this change in the model are detailed above; some neurological reasons are mentioned in the Wegner paper below.

2. Or more accurately one type of akrasia. I disagreed with Robin Hanson and Bryan Caplan when they said a model similar to this explains all akrasia, and I stand by that disagreement. I think there are at least two other, separate causes: akrasia from hyperbolic discounting, and the very-hard-to-explain but worthy-of-more-discussion-sometime akrasia from wetware design.

3. See Wegner, "The Mind's Best Trick: How We Experience Conscious Will" for a discussion of this and related scenarios.