In my previous post, I suggested that akrasia involves subagent disagreement - or in other words, different parts of the brain having differing ideas on what the best course of action is. The existence of such conflicts raises the question, how does one resolve them?

In this post I will discuss various techniques which could be interpreted as ways of resolving subagents disagreements, as well as some of the reasons for why this doesn’t always happen.

A word on interpreting “subagents”

The frame that I’ve had so far is that of the brain being composed of different subagents with conflicting beliefs. On the other hand, one could argue that the subagent interpretation isn’t strictly necessary for many of the examples that I bring up in this post. One could just as well view my examples as talking about a single agent with conflicting beliefs.

The distinction between these two frames isn’t always entirely clear. In “Complex Behavior from Simple (Sub)Agents”, mordinamael presents a toy model where an agent has different goals. Moving to different locations will satisfy the different goals to a varying extent. The agent will generate a list of possible moves and picks the move which will bring some goal the closest to being satisfied.

Is this a unified agent, or one made up of several subagents?

One could argue for either interpretation. On the other hand, mordinamael's post frames the goals as subagents, and they are in a sense competing with each other. On the other hand, the subagents arguably don’t make the final decision themselves: they just report expected outcomes, and then a central mechanism picks a move based on their reports.

This resembles the neuroscience model I discussed in my last post, where different subsystems in the brain submit various action “bids” to the basal ganglia. Various mechanisms then pick a winning bid based on various criteria - such as how relevant the subsystem’s concerns are for the current situation, and how accurate the different subsystems have historically been in their predictions.

Likewise, in extending the model from Consciousness and the Brain for my toy version of the Internal Family Systems model, I postulated a system where various subagents vote for different objects to become the content of consciousness. In that model, the winner was determined by a system which adjusted the vote weights of the different subagents based on various factors.

So, subagents, or just an agent with different goals?

Here I would draw an analogy to parliamentary decision-making. In a sense, a parliament as a whole is an agent. Various members of parliament cast their votes, with “the voting system” then “making the final choice” based on the votes that have been cast. That reflects the overall judgment of the parliament as a whole. On the other hand, for understanding and predicting how the parliament will actually vote in different situations, it is important to model how the individual MPs influence and broker deals with each other.

Likewise, the subagent frame seems most useful when a person’s goals interact in such a way that applying the intentional stance - thinking in terms of the beliefs and goals of the individual subagents - is useful for modeling the overall interactions of the subagents.

For example, in my toy Internal Family Systems model, I noted that reinforcement learning subagents might end up forming something like alliances. Suppose that a robot has a choice between making cookies, poking its finger at a hot stove, or daydreaming. It has three subagents: “cook” wants the robot to make cookies, “masochist” wants to poke the robot’s finger at the stove, and “safety” wants the robot to not poke its finger at the stove.

By default, “safety” is indifferent between “make cookies” and “daydream”, and might cast its votes at random. But when it votes for “make cookies”, then that tends to avert “poke at stove” more reliably than voting for “daydream” does, as “make cookies” is also being voted for by “cook”. Thus its tendency to vote for “make cookies” in this situation gets reinforced.

We can now apply the intentional stance to this situation, and say that “safety” has "formed an alliance" with “cook”, as it correctly “believes” that this will avert masochistic actions. If the subagents are also aware of each other and can predict each other's actions, then the intentional stance gets even more useful.

Of course, we could just as well apply the purely mechanistic explanation and end up with the same predictions. But the intentional explanation often seems easier for humans to reason with, and helps highlight salient considerations.

Integrating beliefs, naturally or with techniques

In any case, regardless of whether we are talking about subagents with conflicting beliefs or just conflicting goals, it still seems like many of our problems arise from some kind of internal disagreement. I will use the term “integration” for anything that acts to resolve such conflicts, and discuss a few examples of things which can be usefully thought of as integration.

In these examples, I am again going to rely on the basic observation from Consciousness and the Brain: that when some subsystem in the brain manages to elevate a mental object into the content of consciousness, multiple subsystems will synchronize their processing around that object. Assuming that the conditions are right, this will allow for the integration of otherwise conflicting beliefs or behaviors.

Why do we need to explicitly integrate beliefs, rather than this happening automatically? One answer is that trying to integrate all beliefs would be infeasible; as CronoDAS notes:

GEB has a section on this.
In order to not compartmentalize, you need to test if your beliefs are all consistent with each other. If your beliefs are all statements in propositional logic, consistency checking becomes the Boolean Satisfiability Problem, which is NP-complete. If your beliefs are statements in predicate logic, then consistency checking becomes PSPACE-complete, which is even worse than NP-complete.

Rather than try to constantly integrate every possible belief and behavior, the brain will rather try to integrate beliefs at times when it notices contradictions. Of course, sometimes we do realize that there are contradictions, but still don’t automatically integrate the subagents. Then we can use various techniques for making integration more effective. How come integration isn’t more automatic?

One reason is that integration requires the right conditions, and while the brain has mechanisms for getting those conditions right, integration is still a nontrivial skill. As an analogy, most children learn the basics of talking and running on their own, but they can still explicitly study rhetoric or running techniques to boost their native competencies far above their starting level. Likewise, everyone natively does some integration on their own, but people can also use explicit techniques which make them much better at it.

Resisting belief integration

Lack of skill isn’t the full answer for why we don’t always automatically update, however. Sometimes it seems as if the mind actively resists updating.

One of the issues that commonly comes up in Internal Family Systems therapy is that parts of the mind want to keep some old belief frozen, because if it were known, it would change the person’s behavior in an undesired way. For example, if someone believes that they have a good reason not to abandon their friend, then a part of the mind which values not abandoning the friend in question might resist having this belief re-revaluated. The part may then need to be convinced that knowing the truth only leaves opens the option of abandoning the friend, it doesn’t compel it.

Note that this isn’t necessarily true. If there are other subagents which sufficiently strongly hold the opinion that the friend should be abandoned, and the subagent-which-values-the-friend is only managing to prevent that by hanging on to a specific belief, then readjusting that belief might remove the only constraint which was preventing the anti-friend coalition from dumping the friend. Thus from the point of view of the subagent which is resisting the belief update, the update would compel an abandonment of the friend. In such a situation, additional internal work may be necessary before the subagent will agree to let the belief revision proceed.

More generally, subagents may be incentivized to resist belief updating for at least three different reasons (this list is not intended to be exhaustive):

  1. The subagent is trying to pursue or maintain a goal, and predicts that revising some particular belief would make the person less motivated to pursue or maintain the goal.
  2. The subagent is trying to safeguard the person’s social standing, and predicts that not understanding or integrating something will be safer, give the person an advantage in negotiation, or be otherwise socially beneficial. For instance, different subagents holding conflicting beliefs allows a person to verbally believe in one thing while still not acting accordingly - even actively changing their verbal model so as to avoid falsifying the invisible dragon in the garage.
  3. Evaluating a belief would require activating a memory of a traumatic event that the belief is related to, and the subagent is trying to keep that memory suppressed as part of an exile-protector dynamic.

Here’s an alternate way of looking at the issue, which doesn’t use the subagent frame. So far I have been mostly talking about integrating beliefs rather than goals, but humans don’t seem to have a clear value/belief distinction. As Stuart Armstrong discusses in his mAIry’s room article, for humans simply receiving sensory information often also rewires some of their values. Now, Mark Lippman suggests that trying to optimize a complicated network of beliefs and goals means that furthering one goal may hurt other goals, so the system needs to have checks in place to ensure that one goal is not pursued in a way which disproportionately harms the achievement of other goals.

For example, most people wouldn’t want to spend the rest of their lives doing nothing but shooting up heroin, even if they knew for certain that this maximized the achievement of their “experience pleasure” goal. If someone offered them the chance to experience just how pleasurable heroin felt like - giving them more accurate emotion-level predictions of the experience - they might quite reasonably refuse, as they feared that making this update might make them more inclined to take heroin. Eliezer once noted that if someone offered him a pill which simulated the joy of scientific discovery, he would make sure never to take it.

Suppose that a system has a network of beliefs and goals and it does something like predicting how various actions and their effects - not only their effects on the external world, but on the belief/goal network itself - might influence its goal achievement. If it resists actions which reduce the probability of achieving its current goals, then this might produce dynamics which look like subagents trying to achieve their goals at the expense of the other subagents.

For instance, Eliezer’s refusal to take the pill might be framed as a subagent valuing scientific discovery trying to block a subagent valuing happiness from implementing an action which would make the happiness subagent’s bids for motor system access stronger. Alternatively, it might be framed as the overall system putting value on actually making scientific discoveries, and refusing to self-modify in a way which it predicted would hurt this goal. (You might note that this has some interesting similarities to things like the Cake or Death problem in AI alignment.)

In any case, integration is not always straightforward. Even if the system does detect a conflict between its subagents, it may have a reason to avoid doing so.

Having reviewed some potential barriers for integration, let us move on to different ways in which conflicts can be detected and integrated.

Ways to integrate conflicting subagents

Cognitive Behavioral Therapy

Scott Alexander has an old post where he quotes this excerpt from the cognitive behavioral therapy book When Panic Attacks:

I asked Walter how he was thinking and feeling about the breakup with Paul. What was he telling himself? He said “I feel incredibly guilty and ashamed, and it seems like it must have been my fault. Maybe I wasn’t skillful enough, attractive enough, or dynamic enough. Maybe I wasn’t there for him emotionally. I feel like I must have screwed up. Sometimes I feel like a total fraud. Here I am, a marriage and family therapist, and my own relationship didn’t even work out. I feel like a loser. A really, really big loser.” [...]
I thought the Double Standard Technique might help because Walter seemed to be a warm and compassionate individual. I asked what he’d say to a dear friend who’d been rejected by someone he’d been living with for eight years. I said “Would you tell him that there’s something wrong with him, that he screwed up his life and flushed it down the toilet for good?”
Walter looked shocked and said he’d never say something like that to a friend. I suggested we try a role-playing exercise so that he could tell me what he would say to a friend who was in the same predicament […]
Therapist (role-playing patient’s friend): Walter, there’s another angle I haven’t told you about. What you don’t understand is that I’m impossible to live with and be in a relationship with. That’s the real reason I feel so bad, and that’s why I’ll be alone for the rest of my life.
Patient (role-playing as if therapist is his friend who just had a bad breakup): Gosh, I’m surprised to hear you say that, because I’ve known you for a long time and never felt that way about you. In fact, you’ve always been warm and open, and a loyal friend. How in the world did you come to the conclusion that you were impossible to be in a relationship with?
Therapist (continuing role-play): Well, my relationship with [my boyfriend] fell apart. Doesn’t that prove I’m impossible to be in a relationship with?
Patient (continuing role-play): In all honesty, what you’re saying doesn’t make a lot of sense. In the first place, your boyfriend was also involved in the relationship. It takes two to tango. And in the second place, you were involved in a reasonably successful relationship with him for eight years. So how can you claim that you’re impossible to live with?
Therapist (continuing role-play:) Let me make sure I’ve got this right. You’re saying that I was in a reasonably successful relationship for eight years, so it doesn’t make much sense to say that I’m impossible to live with or impossible to be in a relationship with?
Patient (continuing-role-play:) You’ve got it. Crystal clear.
At that point, Walter’s face lit up, as if a lightbulb had suddenly turned on in his brain, and we both started laughing. His negative thoughts suddenly seemed absurd to him, and there was an immediate shift in his mood…after Walter put the lie to his negative thoughts, I asked him to rate how he was feeling again. His feeling of sadness fell all the way from 80% to 20%. His feelings of guilt, shame, and anxiety fell all the way to 10%, and his feelings of hopelessness dropped to 5%. The feelings of loneliness, embarrassment, frustration, and anger disappeared completely.

At the time, Scott expressed confusion about how just telling someone that their beliefs aren’t rational, would be enough to transform the beliefs. But that wasn’t really what happened. Walter was asked whether he’d say something harsh to a friend, and he said no, but that alone wasn’t enough to improve his condition. What did help was putting him in a position where he had to really think through the arguments for why this is irrational in order to convince his friend, and then, after having formulated the arguments once himself, get convinced by them himself.

In terms of our framework, we might say that a part of Walter’s mind contained a model which output a harsh judgment of himself, while another part contained a model which would output a much less harsher judgment of someone else who was in otherwise identical circumstances. Just bringing up the existence of this contradiction wasn’t enough to change it: it caused the contradiction to be noticed, but didn’t activate the relevant models extensively enough for their contents to be reprocessed.

But when Walter had to role-play a situation where he thought of himself as actually talking with a depressed friend, that required him to more fully activate the non-judgmental model and apply it to the relevant situation. This caused him to blend with the model, taking its perspective as the truth. When that perspective was then propagated to the self-critical model, the easiest way for the mind to resolve the conflict was simply to alter the model producing the self-critical thoughts.

Note that this kind of a result wasn’t guaranteed to happen: Walter’s self-critical model might have had a reason for why these cases were actually different, and pointing out that reason would have been another way for the contradiction to be resolved. In the example case, however, it seemed to work.

Mental contrasting

Another example of activating two conflicting mental models and forcing an update that way comes from the psychologist Gabriele Oettingen’s book Rethinking Positive Thinking. Oettingen is a psychologist who has studied combining a mental imagery technique known as “mental contrasting” with trigger-action planning.

It is worth noting that this book has come under some heavy criticism and may be based on cherry-picked studies. However, in the book this particular example is just presented as an anecdote without even trying to cite any particular studies in its support. I present it because I’ve personally found the technique to be useful, and because it feels like a nice concise explanation of the kind of integration that often works:

Try this exercise for yourself. Think about a fear you have about the future that is vexing you quite a bit and that you know is unjustified. Summarize your fear in three to four words. For instance, suppose you’re a father who has gotten divorced and you share custody with your ex-wife, who has gotten remarried. For the sake of your daughter’s happiness, you want to become friendly with her stepfather, but you find yourself stymied by your own emotions. Your fear might be “My daughter will become less attached to me and more attached to her stepfather.” Now go on to imagine the worst possible outcome. In this case, it might be “I feel distanced from my daughter. When I see her she ignores me, but she eagerly spends time with her stepfather.” Okay, now think of the positive reality that stands in the way of this fear coming true. What in your actual life suggests that your fear won’t really come to pass? What’s the single key element? In this case, it might be “The fact that my daughter is extremely attached to me and loves me, and it’s obvious to anyone around us.” Close your eyes and elaborate on this reality.
Now take a step back. Did the exercise help? I think you’ll find that by being reminded of the positive reality standing in the way, you will be less transfixed by the anxious fantasy. When I conducted this kind of mental contrasting with people in Germany, they reported that the experience was soothing, akin to taking a warm bath or getting a massage. “It just made me feel so much calmer and more secure,” one woman told me. “I sense that I am more grounded and focused.”
Mental contrasting can produce results with both unjustified fears as well as overblown fears rooted in a kernel of truth. If as a child you suffered through a couple of painful visits to the dentist, you might today fear going to get a filling replaced, and this fear might become so terrorizing that you put off taking care of your dental needs until you just cannot avoid it. Mental contrasting will help you in this case to approach the task of going to the dentist. But if your fear is justified, then mental contrasting will confirm this, since there is nothing preventing your fear from coming true. The exercise will then help you to take preventive measures or avoid the impending danger altogether.

As in the CBT example, first one mental model (the one predicting losing the daughter’s love) is activated and intentionally blended with, after which an opposing one is, forcing integration. And as in Walter’s example, this is not guaranteed to resolve the conflict in a more reassuring way: the mind can also resolve the conflict by determining that actually the fear is justified.

Internal Double Crux / Internal Family Systems

On some occasions a single round of mental contrasting, or the Walter CBT technique, might be enough. In that case, there were two disagreeing models, and bringing the disagreement into consciousness was enough to reject the other one entirely. But it is not always so clear-cut; sometimes there are subagents which disagree, and both of them actually have some valid points.

For instance, someone might have a subagent which wants the person to do socially risky things, and another subagent which wants to play things safe. Neither is unambiguously wrong: on the other hand, some things are so risky that you should never try to do them. On the other hand, never doing anything which others might disapprove of is not going to lead to a particularly happy life, either.

In that case, one may need to actively facilitate a dialogue between the subagents, such as in the CFAR technique of Internal Double Crux (description, discussion and example, example as applied to dieting), iterating it for several rounds until both subagents come to agreement. The CBT and mental contrasting examples above might be considered special cases of an IDC session, where agreement was reached within a single round of discussion.

More broadly, IDC itself can be considered a special case of applying Internal Family Systems, which includes facilitating conversations between mutually opposing subagents as one of its techniques.

Self-concept editing

In the summer of 2017, I found Steve Andreas’s book Transforming Your Self, and applied its techniques to fixing a number of issues in my self-concepts which had contributed to my depression and anxiety. Effects from this work which have lasted include no longer having generalized feelings of shame, no longer needing constant validation to avoid such feelings of shame, no longer being motivated by a desire to prove to myself that I’m a good person, and no longer having obsessive escapist fantasies, among other things.

I wrote an article at the time that described the work. The model in Transforming Your Self is that I might have a self-concept such as “I am kind”. That self-concept is made up of memories of times when I either was kind (examples of the concept), or times when I was not (counterexamples). In a healthy self-concept, both examples and counterexamples are integrated together: you might have memories of how you are kind in general, but also memories of not being very kind at times when you were e.g. under a lot of stress. This allows you to both know your general tendency, as well as letting you prepare for situations where you know that you won’t be very kind.

The book’s model also holds that sometimes a person’s counterexamples might be split off from their examples. This leads to an unstable self-concept: either your subconscious attention is focused on the examples and totally ignores the counterexamples, in which case you feel good and kind, or it swings to the counterexamples and totally ignores the examples, in which case you feel like a terrible horrible person with no redeeming qualities. You need a constant stream of external validation and evidence in order to keep your attention anchored on the examples; the moment it ceases, your attention risks swinging to the counterexamples again.

While I didn’t have the concept back then, what I did could also be seen as integrating true but disagreeing perspectives between two subagents. There was one subagent which held memories of times when I had acted in what it thought of as a bad way, and was using feelings of shame to motivate me to make up for those actions. Another subagent was then reacting to it by making me do more and more things which I could use to prove to myself and others that I was indeed a good person. (This description roughly follows the framing and conceptualization of self-esteem and guilt/shame in the IFS book Freedom from your Inner Critic.)

Under the sociometer theory of self-esteem, self-esteem is an internal evaluation of one’s worth as a partner to others. With this kind of an interpretation, it makes sense to have subagents acting in the ways that I described: if you have done things that your social group would judge you for, then it becomes important to do things which prove your worth and make them forgive you.

This then becomes a special case of an IFS exile/protector dynamic. Under that formulation, the splitting of the counterexamples and the lack of updating actually serves a purpose. The subagent holding the memories of doing shameful things doesn’t want to stop generating the feelings of shame until it has received sufficient evidence that the “prove your worth” behavior has actually become unnecessary.

One of the techniques from Transforming Your Self that I used to fix my self-concept was integrating the examples by adding qualifiers to the counterexamples: “when I was a child, and my executive control wasn’t as developed, I didn’t always act as kindly as I could have”. Under the belief framing, this allowed my memories to be integrated in a way which showed that my selfishness as a child was no longer evidence of me being horrible in general. Under the subagent framing, this communicated to the shame-generating subagent that the things that I did as a child would no longer be held against me, and that it was safe to relax.

Another technique mentioned in Transforming Your Self, which I did not personally need to use, was translating the concerns of subagents into a common language. For instance, someone’s positive self-concept examples might be in the form of mental images, with their negative counterexamples being in the form of a voice which reminds them of their failures. In that case, they might translate the inner speech into mental imagery by visualizing what the voice is saying, turning both the examples and counterexamples into mental images that can then be combined. This brings us to…

Translating into a common language

Eliezer presents an example of two different framings eliciting conflicting behavior in his “Circular Altruism” post:

Suppose that a disease, or a monster, or a war, or something, is killing people. And suppose you only have enough resources to implement one of the following two options:
1. Save 400 lives, with certainty.
2. Save 500 lives, with 90% probability; save no lives, 10% probability.
Most people choose option 1. [...] If you present the options this way:
1. 100 people die, with certainty.
2. 90% chance no one dies; 10% chance 500 people die.
Then a majority choose option 2. Even though it's the same gamble. You see, just as a certainty of saving 400 lives seems to feel so much more comfortable than an unsure gain, so too, a certain loss feels worse than an uncertain one.

In my previous post, I presented a model where subagents which are most strongly activated by the situation are the ones that get access to the motor system. If you are hungry and have a meal in front of you, the possibility of eating is the most salient and valuable feature of the situation. As a result, subagents which want you to eat get the most decision-making power. On the other hand, if this is a restaurant in Jurassic Park and a velociraptor suddenly charges through the window, then the dangerous aspects of the situation become most salient. That lets the subagents which want you to flee to get the most decision-making power.

Eliezer’s explanation of the saving lives dilemma is that in the first framing, the certainty of saving 400 lives is salient, whereas in the second explanation the certainty of losing 100 lives is salient. We can interpret this in similar terms as the “eat or run” dilemma: the action which gets chosen, depends on which features are the most salient and how those features activate different subagents (or how those features highlight different priorities, if we are not using the subagent frame).

Suppose that you are someone who was tempted to choose option 1 when you were presented with the first framing, and option 2 when you were presented with the second framing. It is now pointed out to you that these are actually exactly equivalent. You realize that it would be inconsistent to prefer one option over the other just depending on the framing. Furthermore, and maybe even more crucially, realizing this makes both the “certainty of saving 400 lives” and “certainty of losing 100 lives” features become equally salient. That puts the relevant subagents (priorities) on more equal terms, as they are both activated to the same extent.

What happens next depends on what the relative strengths of those subagents (priorities) are otherwise, and whether you happen to know about expected value. Maybe you consider the situation and one of the two subagents (priorities) happens to be stronger, so you decide to consistently save 400 or consistently lose 100 lives in both situations. Alternatively, the conflicting priorities may be resolved by introducing the rule that “when detecting this kind of a dilemma, convert both options into an expected value of lives saved, and pick the option with the higher value”.

By converting the options to an expected value, one can get a basis by which two otherwise equal options can be evaluated and chosen between. Another way of looking at it is that this is bringing in a third kind of consideration/subagent (knowledge of the decision-theoretically optimal decision) in order to resolve the tie.

Urge propagation

CFAR and Harvard Effective Altruism is a video of a lecture given by former CFAR instructors Valentine Smith and Duncan Sabien. In Valentine’s part of the lecture, he describes a few motivational techniques which work by mentally reframing the contents of an experience.

The first example involves having a $50 parking ticket, which - unless paid within 30 days - will accrue an additional $90 penalty. This kind of a thing tends to feel ughy to deal with, causing an inclination to avoid thinking about it - while also being aware of the need to do something about it. Something along the lines of two different subagents which are both trying to avoid pain using opposite methods - one by not thinking about unpleasant things, another by doing things which stop future unpleasantness.

Val’s suggested approach involves noting that if you instead had a cheque for $90, which would expire in 30 days, then that would not cause such a disinclination. Rather, it would feel actively pleasant to cash it in and get the money.

The structure of the “parking ticket” and “cheque” scenarios are equivalent, in that both cases you can take an action to be $90 better off after 30 days. If you notice this, then it may be possible for you to re-interpret the action of paying off the parking ticket as something that gains you money, maybe by something like literally looking at it and imagining it as a cheque that you can cash in, until cashing it in starts feeling actively pleasant.

Val emphasizes that this is not just an arbitrary motivational hack: it’s important that your reframe is actually bringing in real facts from the world. You don’t want to just imagine the parking ticket as a ticking time bomb, or as something else which it actually isn’t. Rather, you want to do a reframe which integrates both perspectives, while also highlighting the features which will help fix the conflict.

One description of what happens here would be that once the pain-avoiding subagent notices that paying the parking ticket can feel like a net gain, and that it being a net gain is actually describing a real fact about the world, then it can drop its objection and you can proceed to take actions. The other way of looking at it is that like with expected value, you are introducing a common currency - the future impact on your finances - which allows the salient features from both subagents’ perspectives to be integrated and then resolved.

Val’s second example involves a case where he found himself not doing push-ups like he had intended to. When examining the reason why not, he noticed that the push-ups felt physically unpleasant: they involved sweating, panting, and a burning sensation, and this caused a feeling of aversion.

Part of how he solved the issue was by realizing that his original goal for getting exercise was to live longer and be in better health. The unpleasant physical sensations were a sign that he was pushing his body hard enough that the push-ups would actually be useful for this goal. He could then create a mental connection between the sensations and his goal of being healthier and living longer: the sensations started feeling like something positive, since they were an indication of progress.

Besides being an example of creating a common representation between the subagents, this can also be viewed as doing a round of Internal Double Crux, something like:

Exercise subagent: We should exercise.
Optimizer subagent: That feels unpleasant and costs a lot of energy, we would have the energy to do more things if we didn’t exercise.
Exercise subagent: That’s true. But the feelings of unpleasantness are actually a sign of us getting more energy in the long term.
Optimizer subagent: Oh, you’re right! Then let’s exercise, that furthers my goals too.

(There's also a bunch of other good stuff in the video that I didn't describe here, you may want to check it out if you haven't already done so.)

Exposure Therapy

So far, most of the examples have assumed that the person already has all the information necessary for solving the internal disagreement. But sometimes additional information might be required.

The prototypical use of exposure therapy is for phobias. Someone might have a phobia of dogs, while at the same time feeling that their fear is irrational, so they decide to get therapy for their phobia.

How the therapy typically proceeds is by exposing the person to their fear in increments that are as small as possible. For instance, a page by Anxiety Canada offers this list of steps that someone might have for exposing themselves to dogs:

Step 1: Draw a dog on a piece of paper.
Step 2: Read about dogs.
Step 3: Look at photos of dogs.
Step 4: Look at videos of dogs.
Step 5: Look at dogs through a closed window.
Step 6: Then through a partly-opened window, then open it more and more.
Step 7: Look at them from a doorway.
Step 8: Move further out the doorway; then further etc.
Step 9: Have a helper bring a dog into a nearby room (on a leash).
Step 10: Have the helper bring the dog into the same room, still on a leash.

The ideal is that each step is enough to make you feel a little scared, but not so scared that it would serve to act retraumatize you or otherwise make you feel horrible about what happened.

In a sense, exposure therapy involves one part of the mind thinking that the situation is safe, and another part of the mind thinking that the situation is unsafe, and the contradiction being resolved by testing it. If someone feels nervous about looking at a photo of a dog, it implies that a part of their mind thinks that seeing a photo of a dog means they are potentially in danger. (In terms of the machine learning toy model from my IFS post, it means that a fear model is activated, which predicts the current state to be dangerous.)

By looking at photos sufficiently many times, and then afterwards noting that everything is okay, the nervous subagent gets information about having been wrong, and updates its model. Over time, and as the person goes forward in steps, the nervous subagent can eventually conclude that it had overgeneralized from the original trauma, and that dogs in general aren’t that dangerous after all.

As in the CBT example, one can view this as activating conflicting models and the mind then fixing the conflict by updating the models. In this case, the conflict is between the frightened subagent's prediction that seeing the dog is a sign of danger, and another subagent's later assessment that everything turned out to be fine.

Conclusion to integration methods

I have considered here a number of ways of integrating subagent conflicts. Here are a few key principles that are used in them:

  • Selectively blending with subagents/beliefs to make disagreements between them more apparent. Used in the Cognitive Behavioral Therapy and mental contrasting cases. Also used in a somewhat different form in exposure therapy, where you are partially blended with a subagent that thinks that the situation is dangerous, while getting disagreeing information from the rest of the world.
  • Facilitating a dialogue between subagents “from the outside”. Used in Internal Double Crux, Internal Family Systems. In a sense, the next bullet can also be viewed a special case of this.
    • Combining aspects of the conflicting perspectives to a whole which allows for resolution. Used in self-concept editing, Eliezer’s altruism example, and urge propagation.
  • Collecting additional information which allows for the disagreement to be resolved. Used in exposure therapy.

I believe that we have evolved to use all of these spontaneously, without necessarily realizing what it is that we are doing.

For example, many people have the experience of it being useful to talk to a friend about your problems, weighting the pros and cons of different options. Frequently just getting to talk about it helps clarify the issue, even if the friend doesn’t say anything (or even if they are a rubber duck). Probably not coincidentally, if you are talking about the conflicting feelings that you have in your mind, then you are frequently doing something like an informal version of Internal Double Crux. You are representing all the sides of a dilemma until you have reached a conclusion and integrated the different perspectives.

To the extent that they are effective, various schools of therapy and self-improvement - ranging from CBT to IDC to IFS - are formalized methods for making such integration more effectively.

New Comment
15 comments, sorted by Click to highlight new comments since: Today at 3:41 AM

Another reason not to integrate is that integration is actually just bad in some circumstances. You don't want all your heuristics to propagate to all possible domains all at once since they wouldn't be applicable and too many options would likely make your decision making capabilities worse. Some kinds of drug experiences demonstrate this.

I'm not sure if I'd treat "different heuristics in different domains" as an example of non-integration. At least it feels different from the inside. If someone points out to me that I'm not applying a programming heuristic when dealing with humans, I'm likely to react by "well that's because I'm dealing with humans not code", rather than noticing something that feels like a contradiction.

A contradiction feels more like having the heuristics (when X, do A) and (when Y, do not-A), and it then being pointed out to me that actually in this situation, X and Y and both apply.

I'm reminded of this excerpt from a recent paper, Holistic Reinforcement Learning: The Role of Structure and Attention (Trends in Cognitive Sciences, Angela Radulescu, Yael Niv & Ian Ballard 2019; Sci-hub version):

Bayesian nonparametric models group perceptual observations into unobserved ‘latent causes’ (or clusters) [52–55]. For example, consider a serial reversal learning task in which the identity of the high-reward option sporadically alternates. In such tasks, animals initially learn slowly but eventually learn to respond rapidly to contingency changes [56]. Bayesian nonparametric models learn this task by grouping reward outcomes into two latent causes: one in which the first option is better and one in which the second option is better. Once this structure is learned, the model displays one-shot reversals after contingency changes because it infers that the latent cause has changed. This inference about latent causes in the environment has also shed light on several puzzling conditioning effects. When presented with a neutral stimulus such as a tone followed by a shock, animals eventually display a fear response to the tone. The learned fear response gradually diminishes when the tone is later presented by itself (i.e., in extinction) but often returns after some time has passed. This phenomenon is known as spontaneous recovery. Bayesian nonparametric models attribute spontaneous recovery to the inference that extinction signals a new environmental state. This prevents old associations from being updated [57]. Bayesian nonparametric models also predict that gradual extinction will prevent spontaneous recovery, a finding borne out by empirical data [57]. In gradual extinction, the model infers a single latent state and gradually weakens the association between that state and aversive outcome, thereby abolishing the fear memory.

Promoted to curated: I continue to think this whole sequence is about pretty important things, and this post in particular stands out as making connections to a large volume of existing writing both on LessWrong and in the established literature, which I think is particularly key for a topic like this.


In Unlocking the Emotional Brain, Bruce Ecker argues that the same psychological process is involved in all of the processes you mentioned above - memory reconsolidation (which is also the same process that electroconvulsive therapy is accidentally triggering).

According to Ecker, there are 3 steps needed to trigger memory reconsolidation:

1. Reactivate. Re-trigger/re-evoke the target knowledge by presenting salient cues or contexts from the original learning.

2. Mismatch/unlock. Concurrent with reactivation, create an experience that is significantly at variance with the target learning’s model and expectations of how the world functions. This step unlocks synapses and renders memory circuits labile, i.e., susceptible to being updated by new learning.

3. Erase or revise via new learning. During a window of about five hours before synapses have relocked, create a new learning experience that contradicts (for erasing) or supplements (for revising) the labile target knowledge.

There's some problems to the theory that memory reconsolidation is what's going on in experiential therapies like focusing, IFS, and exposure therapy, chief among them IMO that in animal studies, reconsolidation needs to happen within hours of creating the original learning (whereas in these therapies it can happen decades later).

However, I've found the framework incredibly useful for figuring out the essential and non-essential parts of the therapies mentioned above, for troubleshooting when a shift isn't happening with coaching clients, and for creating novel techniques and therapies that apply the above 3 steps in the most straightforward way possible.

I should finally get around reading that book, thanks for continuing to remind me about it. :-)

I can see how memory reconsolidation would apply for some of the processes. But how would it be involved when you encounter a novel decision, like Eliezer's save-a-life dilemma, where there is presumably no previous memory to reconsolidate?

If Eliezers goals or beliefs are learned, then it applies. Anything that is learned can be unlearned with memory re-consolidation, although it seems to be particularly effective with emotional learning. An interesting open question is "are human's born with internal conflicts, or do they only result from subsequent learnings?" After playing around with CT Charting and Core Transformation with many clients, I tend to think the latter, but if the former is true then memory reconsolidation won't help for those innate conflicts.

Interestingly, was just reading a paper from DeepMind which talked about deep reinforcement learning systems learning better if they are supplemented with an episodic memory store which maintains a store of all previous situations. Upon re-encountering a similar situation as one that was encountered in the past, the neural net is restored to a similar state as it was in before:

In episodic meta-RL, meta-learning occurs within a recurrent neural network, as described in the previous section and Box 3. However, superimposed on this is an episodic memory system, the role of which is to reinstate patterns of activity in the recurrent network. As in episodic deep RL, the episodic memory catalogues a set of past events, which can be queried based on the current context. However, rather than linking contexts with value estimates, episodic meta-RL links them with stored activity patterns from the recurrent network's internal or hidden units. These patterns are important because, through meta-RL, they come to summarize what the agent has learned from interacting with individual tasks (see Box 3 for details). In episodic meta- RL, when the agent encounters a situation that appears similar to one encountered in the past, it reinstates the hidden activations from the previous encounter, allowing previously learned information to immediately influence the current policy. In effect, episodic memory allows the system to recognize previously encountered tasks, retrieving stored solutions.
Through simulation work in bandit and navigation tasks, Ritter et al. [39] showed that episodic meta-RL, just like ‘vanilla’ meta-RL, learns strong inductive biases that enable it to rapidly solve novel tasks. More importantly, when presented with a previously encountered task, episodic meta-RL immediately retrieves and reinstates the solution it previously discovered, avoiding the need to re-explore. On the first encounter with a new task, the system benefits from the rapidity of meta-RL; on the second and later encounters, it benefits from the one-shot learning ability conferred by episodic control. [...]
Equally direct links connect episodic meta-RL with psychology and neuroscience. Indeed, the reinstatement mechanism involved in episodic meta-RL was directly inspired by neuroscience data indicating that episodic memory circuits can serve to reinstate patterns of activation in cerebral cortex, including areas supporting working memory (see [40]). Ritter and colleagues [39] (S. Ritter, PhD Thesis, Princeton University, 2019) show how such a function could itself be configured through RL, giving rise to a system that can strategically reinstate information about tasks encountered earlier (see also 50, 51, 52).

This would fit together with the thing about memory reconsolidation being key to adjusting all subagents (if a subagent is something like a memory pattern coding for a specific situation), as well otherwise fitting with a lot of data about memory change being key to this kind of thing.

Then again, H.M. could learn new skills despite being unable to learn new episodic memories...

What does CT Charting stand for?

Connection Theory, from Leverage Research.

I still don't love the term "subagents", despite everyone getting lots out of it, as well as personally agreeing with the intentional stance and the "alliances" you mention. I think my crux-net is something like

  • agents are strategic
  • fragments of our associative mental structures aren't strategic except insofar as their output calls other game theoretic substructures or you are looking at something like the parliamentary moderator
  • if you think of these as agents, you will attribute false strategy to them and feel stuck more often, when in fact they are easily worked with if you think of their apparent strategy as "using highly simplistic native associations and reinforcements, albeit sometimes by pinging other fragments to do things outside their own purview, to accomplish their goal"

However, it does seem possible to me that the "calling other fragments" step does actually chain so far as to constitute real strategy and offer a useful level of abstraction for viewing such webs as subagents. I haven't seen much evidence for this—does this framing make sense, and do you think it is clear there is something more like Turing-complete webs of strategy within subagents vs merely pseudostrategy? Wish I had a replacement word I liked better than subagent.

do you think it is clear there is something more like Turing-complete webs of strategy within subagents vs merely pseudostrategy ?

I don't know. As suggested by this post, I move pretty freely between the subagent framing and the "associative belief structure" framing as seems appropriate to the situation. To me agentness doesn't necessarily require the agents to be particularly strategic. (A thermostat is technically an agent, but not a very strategic one.)

IFS calls subagents just "parts", which I prefer in some contexts; it has fewer connotations of being particularly strategic.

I was reflecting on whether to nominate this post, since a) it seemed probably important, but b) I hadn't made much direct use of it myself nor even read it that thoroughly.

I ended up thinking "you know what? There really should be a button for 'elevate this to other people's consideration for the review', without necessarily nominating it yourself.'" And, well, that button doesn't exist and we're probably not going to build it by tomorrow. So meanwhile I think I just endorse being-the-first-person-to-nominate-something as a hacky way to elevate things to people's attention this year.

Interestingly, an agent with an unitary utility function may still find itself in a situation similar to akrasia, if it can't make a choice between two lines of actions, which have almost equal weights. This was described as a situation of Buridan ass by Lamport, and he shows that the problem doesn't have easy solutions and cause real life accidents.

Another part of the problem is that if I have a to make choice between equal alternatives – and the situations of choice are always choice between seemingly equal alternatives, or there are no need to make a choice – is that I have to search for additional evidence which of the alternatives is better, and as result my choice is eventually decided by very small piece of evidence. This make me vulnerable for adversarial attacks by, say, sellers, which could press me to make a choice by saying "It is 5 per cent discount today."

For cases where the equal weights are both 'positive' or 'negative', one can just 'flip a coin' (and notice any resistance to the outcome), and that's what I've tried to learn to do, particularly for relatively small weights.

But for relatively large weights or, worse, for 'opposing' weights, i.e. one 'positive' and the other 'negative', like a situation where one has to choose between escaping some large negative element but ay the cost of giving up another large positive element simultaneously, this 'akrasia' can feel very much like being (emotionally or psychically) torn in two. Often then the relevant consideration is something like a threshold, e.g. is the large negative element too negative?