A typical paradigm by which people tend to think of themselves and others is as consequentialist agents: entities who can be usefully modeled as having beliefs and goals, who are then acting according to their beliefs to achieve their goals.

This is often a useful model, but it doesn’t quite capture reality. It’s a bit of a fake framework. Or in computer science terms, you might call it a leaky abstraction.

An abstraction in the computer science sense is a simplification which tries to hide the underlying details of a thing, letting you think in terms of the simplification rather than the details. To the extent that the abstraction actually succeeds in hiding the details, this makes things a lot simpler. But sometimes the abstraction inevitably leaks, as the simplification fails to predict some of the actual behavior that emerges from the details; in that situation you need to actually know the underlying details, and be able to think in terms of them.

Agent-ness being a leaky abstraction is not exactly a novel concept for Less Wrong; it has been touched upon several times, such as in Scott Alexander’s Blue-Minimizing Robot Sequence. At the same time, I do not think that it has been quite fully internalized yet, and that many foundational posts on LW go wrong due to being premised on the assumption of humans being agents. In fact, I would go as far as to claim that this is the biggest flaw of the original Sequences: they were attempting to explain many failures of rationality as being due to cognitive biases, when in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective. But if you are implicitly modeling humans as goal-directed agents, then cognitive biases is the most natural place for irrationality to emerge from, so it makes sense to focus the most on there.

Just knowing that an abstraction leaks isn’t enough to improve your thinking, however. To do better, you need to know about the actual underlying details to get a better model. In this sequence, I will aim to elaborate on various tools for thinking about minds which look at humans in more granular detail than the classical agent model does. Hopefully, this will help us better get past the old paradigm.

My model of what I think our subagents looks like draws upon a number of different sources, including neuroscience, psychotherapy and meditation, so in the process of sketching out my model I will be covering a number of them in turn. To give you a rough idea of what I'm trying to do, here's a summary of some upcoming content.

Published posts:

(Note: this list may not always be fully up to date; see the sequence index for actively maintained version)

Book summary: Consciousness and the Brain. One of the fundamental building blocks of much of consciousness research, is that of Global Workspace Theory (GWT). This could be described as a component of a multiagent model, focusing on the way in which different agents exchange information between one another. One elaboration of GWT, which focuses on how it might be implemented in the brain, is the Global Neuronal Workspace (GNW) model in neuroscience. Consciousness in the Brain is a 2014 book that summarizes some of the research and basic ideas behind GNW, so summarizing the main content of that book looks like a good place to start our discussion and for getting a neuroscientific grounding before we get more speculative.

Building up to an IFS model. One theoretical approach for modeling humans as being composed of interacting parts is that of Internal Family Systems. In my experience and that of several other people in the rationalist community, it’s very effective for this purpose. However, having its origins in therapy, its theoretical model may seem rather unscientific and woo-y. This personally put me off the theory for a long time, as I thought that it sounded fake, and gave me a strong sense of "my mind isn't split into parts like that".

In this post, I construct a mechanistic sketch of how a mind might work, drawing on the kinds of mechanisms that have already been demonstrated in contemporary machine learning, and then end up with a model that pretty closely resembles the IFS one.

Subagents, introspective awareness, and blending. In this post, I extend the model of mind that I've been building up in previous posts to explain some things about change blindness, not knowing whether you are conscious, forgetting most of your thoughts, and mistaking your thoughts and emotions as objective facts, while also connecting it with the theory in the meditation book The Mind Illuminated.

Subagents, akrasia, and coherence in humans. We can roughly describe coherence as the property that, if you become aware that there exists a more optimal strategy for achieving your goals than the one that you are currently executing, then you will switch to that better strategy. For a subagent theory of mind, we would like to have some explanation of when exactly the subagents manage to be collectively coherent (that is, change their behavior to some better one), and what are the situations in which they fail to do so.

My conclusion is that we are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS-style protector) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.

Integrating disagreeing subagents. In the previous post, I suggested that akrasia involves subagent disagreement - or in other words, different parts of the brain having differing ideas on what the best course of action is. The existence of such conflicts raises the question, how does one resolve them?

In this post I discuss various techniques which could be interpreted as ways of resolving subagents disagreements, as well as some of the reasons for why this doesn’t always happen.

Subagents, neural Turing machines, thought selection, and blindspots. In my summary of Consciousness and the Brain, I briefly mentioned that one of the functions of consciousness is to carry out artificial serial operations; or in other words, implement a production system (equivalent to a Turing machine) in the brain.

While I did not go into very much detail about this model in the post, I’ve used it in later articles. For instance, in Building up to an Internal Family Systems model, I used a toy model where different subagents cast votes to modify the contents of consciousness. One may conceptualize this as equivalent to the production system model, where different subagents implement different production rules which compete to modify the contents of consciousness.

In this post, I flesh out the model a bit more, as well as applying it to a few other examples, such as emotion suppression, internal conflict, and blind spots.

Subagents, trauma, and rationality. This post interprets the appearance of subagents as emerging from unintegrated memory networks, and argues that the presence of these is a matter of degree. There’s a continuous progression of fragmented (dissociated) memory networks giving arise to increasingly worse symptoms as the degree of fragmentation grows. The continuum goes from everyday procrastination and akrasia on the “normal” end, to disrupted and dysfunctional beliefs on the middle, and conditions like clinical PTSD, borderline personality disorder, and dissociative identity disorder on the severely traumatized end.

I also argue that emotional work and exploring one's past traumas in order to heal them, is necessary for effective instrumental and epistemic rationality.

Against "System 1" and "System 2". The terms System 1 and System 2 were originally coined by the psychologist Keith Stanovich and then popularized by Daniel Kahneman in his book Thinking, Fast and Slow. Stanovich noted that a number of fields within psychology had been developing various kinds of theories distinguishing between fast/intuitive on the one hand and slow/deliberative thinking on the other. Often these fields were not aware of each other. The S1/S2 model was offered as a general version of these specific theories, highlighting features of the two modes of thought that tended to appear in all the theories.

Since then, academics have continued to discuss the models. Among other developments, Stanovich and other authors have discontinued the use of the System 1/System 2 terminology as misleading, choosing to instead talk about Type 1 and Type 2 processing. In this post, I will build on some of that discussion to argue that Type 2 processing is a particular way of chaining together the outputs of various subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.

Book summary: Unlocking the Emotional Brain. Written by the psychotherapists Bruce Ecker, Robin Ticic and Laurel Hulley, Unlocking the Emotional Brain claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds. Its discussion and models are closely connected to the models about internal conflict and belief revision that are discussed in previous posts, particularly "integrating disagreeing subagents".

A mechanistic model of meditation. Meditation has been claimed to have all kinds of transformative effects on the psyche, such as improving concentration ability, healing trauma, cleaning up delusions, allowing one to track their subconscious strategies, and making one’s nervous system more efficient. However, an explanation for why and how exactly this would happen has typically been lacking. This makes people reasonably skeptical of such claims.

In this post, I want to offer an explanation for one kind of a mechanism: meditation increasing the degree of a person’s introspective awareness, and thus leading to increasing psychological unity as internal conflicts are detected and resolved.

A non-mystical explanation of insight meditation and the three characteristics of existence: introduction and preamble. Insight meditation, enlightenment, what’s that all about?

The sequence of posts starting from this one is my personal attempt at answering that question. It seeks to:

  • Explain what kinds of implicit assumptions build up our default understanding of reality and how those assumptions are subtly flawed.
  • Point out aspects from our experience whose repeated observation will update those assumptions, and explain how this may cause psychological change in someone who meditates.
  • Explain how the so-called “three characteristics of existence” of Buddhism - impermanence, no-self and unsatisfactoriness - are all interrelated and connected with each other in a way that is connected to the previously discussed topics in the sequence.

Farther out (sketched out but not as extensively planned/written yet)

The game theory of rationality and cooperation in a multiagent world. Multi-agent models have a natural connection to Elephant in the Brain -style dynamics: our brains doing things for purposes of which we are unaware. Furthermore, there can be strong incentives to continue systematic self-deception and not integrate conflicting beliefs. For instance, if a mind has subagents which think that specific beliefs are dangerous to hold or express, then they will work to suppress subagents holding that belief from coming into conscious awareness.

“Dangerous beliefs” might be ones that touch upon political topics, but they might also be ones of a more personal nature. For instance, someone may have an identity as being “good at X”, and then want to rationalize away any contradictory evidence - including evidence suggesting that they were wrong on a topic related to X. Or it might be something even more subtle.

These are a few examples of how rationality work has to happen on two levels at once: to debug some beliefs (individual level), people need to be in a community where holding various kinds of beliefs is actually safe (social level). But in order for the community to be safe for holding those beliefs (social level), people within the community also need to work on themselves so as to deal with their own subagents that would cause them to attack people with the “wrong” beliefs (individual level). This kind of work also seems to be necessary for fixing “politics being the mind-killer” and collaborating on issues such as existential risk across sharp value differences; but the need to carry out the work on many levels at once makes it challenging, especially since the current environment incentivizes many (sub)agents to sabotage any attempt at this.

(This topic area is also related to that stuff Valentine has been saying about Omega.)

This sequence is part of research done for, and supported by, the Foundational Research Institute.

New Comment
15 comments, sorted by Click to highlight new comments since:

Vaniver has said most of the things I want to say here, but there are some additional things I want to say: 

I think building models of the mind is really hard. I also notice that in myself, building models of the mind feels scary in a way that I often prevents me from thinking sanely in many important situations. 

I think the causes of why it feels scary are varied and complicated, but a lot of it boils down to the fact that in order to model minds, a purely physically reductionistic approach is often difficult, and my standards for evidence often feel calibrated for domains like physics, other hard sciences, and mathematics, and it's often hard to communicate my reasons for why I believe minds work a certain way to others, since a substantial portion of it is internal and difficult to communicate. 

But, building explicit and broad models of our mind like this sequence does strikes me as essential being effective in the world. 

Overall, I think this sequence had a positive effect on me for two reasons: 

  1. It provided me with a set of concrete models of the mind that I have used a few times since then
  2. It rekindled a certain courage in me to allow myself to build these kind of models in the first place, and I hope it has done the same for others.

I think for me at least the second effect was larger than the first one, though both are pretty substantial. 

Yeah, that used to bother me too, when I learned about multi agent theory and pondering it, I of course pointed my attention inwardly, trying to observe it.

Then agents arose and started talking with each other, arguing about the fact that they can't tell if they're actually representatives of underlying structures and coalitions of the neural substrate or just one fanciful part, that's engaged in puppet phantasy play. Or what the boundaries between those two even are.

Or if their apparent existence is valid evidence for multi-agent theories being any good. Well, I suppose I wasn't bothered, they were bothered :) I/They just really badly wanted a real-time brain scan to get context for my perceptions.

Eventually, I embraced the triplethink of operational certainty [minimizes internal conflict, preserves scarce neurotransmitters], meta doubt, and meta-meta awareness, that propositions that can be expressed in conscious language can't capture the complexity of the neural substrate, anyway.

All models are wrong, yet modeling is essential.

While this sequence ended up spanning more than 2019, I think this represents some of the best 'psychology' on LW in 2019, and have some hope (like Hazard) that all of it will get represented or collected in some way.

Writing a pitch for the sequence feels like writing a pitch for writing about psychology on LW in general, as the sequence itself has it all: book reviews, highly upvoted posts, clear explanations of detailed models, commentary from other experts in the field. So why care about psychology on LW? Both because 1) it's often a source of rapid advances in personal effectiveness, 2) that these sorts of problems are often 'adaptive' make them difficult to solve and thus fosters learned helplessness or unproductive thrashing, and 3) taking a systematic, rational view helps separate out the wheat from the chaff (when it comes to the advice and models) and also helps make the 'squishy' sort of self-development accessible to those suspicious of plans and models with poor justification.

I do think, at least for the IFS material, that it'd be useful to pull in at least this comment, and possibly more of the discussion with pjeby more broadly. 

I really look forward to this Sequence.

I'm very excited to see the rest of this! Last spring I wrote the first post for a sequence that had very similar intents. You posting this has given me a nudge to move forward with mine. Here's a brief outline of things I was going to look at (might be useful for you to further clarify to yourself the specific chunks of this topic you are trying to explore)

  • Give some computer architecture arguments for why it's hard to get something to be agent like, and why those arguments might apply to our minds.
  • Explore how social pressure to "act like an agent" and conform to the person-hood interface makes it difficult to notice one's own non-agentyness.
  • For me (and I'd guess others) a lot of my intentional S2 frames for valuing people seems to put a lot of weight on how "agenty" someone is. I would like to dwell on a "rescuing the utility function" like move for agency.

Sounds like our posts could be nicely complementary, I encourage you to continue posting yours! And huh, you scooped me on the "agenthood is a leaky abstraction" idea, I didn't realize it had been previously used on LW. :)

There is an interesting psychotherapeutic technic of calling subpersonalities one by one, called "Voice Dialogue" which were developed by Stones. I experienced a few surprising results from it both being a seater and a subject of the therapy. This technic may be used to demonstrate the soundness of the subpersonalities theory for those who doubt - or to query the subpersonalities one by one, may be with the goal of learning their values for AI alignment. This is their site: http://delos-inc.com/

in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective


I'm not convinced that this is true, or that it's an important critique of the original sequences.


Looking at the definition of agent, I'm curious how this matches with Cartesian Frames.

Given that we want to learn to think about humans in a new way, we should look for ways to map the new way of thinking into a native mode of thought

I was very happy to read this pingback, but it's purely anecdotal. There are better sources for this idea.


Overall as a sequence index, it's not clear to me whether this post makes sense for inclusion. I can imagine a few possibilities

  1. Most of the rest of the sequence will be included in the curation, and
    1. This post lays out motivations and definitions that aren't repeated in the sequence, or
    2. The rest of the sequence will function fine without this post
  2. Only the sequence index will be included
    1. The motivation and definitions provide the most value
    2. The post summaries contain most of the value of the sequence
  3. A small subset of posts in the sequence will be included
    1. The subset is standalone ideas
      1. The index serves as motivation, connecting tissue, and a reference point
      2. The index isn't included and the posts stand alone
    2. The subset excludes tangents and only includes a central theme
      1. the index motivates and ties the included posts together
      2. the index isn't necessary as the posts have their own flow

Overall I don't know what to expect from the context of curating this post. Would be interested to hear more from people who have spent more time with the sequence.

This theme has been very confusing for me for the last couple of years, and I am very much looking for the increase of my understanding of what mind is and how it works after reading this sequence. Thank you very much in advance for writing it.

Hi Kaj, I would LOVE to read this sequence but, I hate reading long texts on a screen. Is there any ebook version available?

If there is, I haven't heard of it. :) 

Thanks, I might it do it myself

Hey, I had an experience that was very much influenced by this sequence, I think. Any chance that you (or someone else with more context than me) would take a look?


I wish to examine a point in the foundations of your post - to be more precise, a point which leads to the inevitable conclusion that it is not problematic in this discussion to use the term 'agent' while it is understood in a manner which allows a thermostat to qualify as an agent.

A thermostat certainly has triggers/sensors which force a reaction when a condition has been met. However to argue that this is akin to how a person is an agent is to argue that a rock supposedly "runs" the program known as gravity, when it falls. The issue is not a lack of parallels; it is a lack of undercurrent below the parallels (in a sense, this is causing the view that a thermostat is an agent, to be a 'leaking abstraction' as you put it). For we have to consider that no actual identification of change (be it through sense or thought or both) is possible when the entity identifying such change lacks the ability to translate it in a setting of its own. By translating I mean something readily evident in the case of human agents - not so evident in the case of ants or other relatively simpler creatures. If your room is on fire you identify this as a change from the normal, but this does not mean there is only one way to identify the changed situation. Someone living next to you will also identify that there is a fire, but chances are the (to use an analogy) code for that in their mind will differ very significantly from your own. Yet on some basic level you will be in agreement that there was a fire, and you had to leave.

Now an ant, another being which has life - unlike a thermostat - picks up changes in its environment. If you try to attack it it may go into panic mode. This, again, does not mean the act of attacking the ant is picked up as it is; it is once against translated, this time by the ant. How it translates it is not known, however it seems impossible to argue that it merely picks up the change as something set, some block of truth with the meaning 'change/danger' etc. It picks it up due to its ability (not conscious in the case of the ant) to identify something as set, and something as a change in that original set. A thermostat has no identification of anything set, because not being alive it has no power nor need to sense a starting condition, let alone to have inside it a vortex where translations of changes are formed.

All the above is why I firmly am against the view that "agent" is to be defined in a way that both a human and a thermostat can partake in it, when the discussion is about humans and involves that term.

What do you think of the following taxonomy?:

Inactive: Rocks (in isolation*)

Reactive/Reaction Circuits: Thermostat

Decisive/Active/Agents/Conscious: Humans

*A circuit can be made out of dominos.