A typical paradigm by which people tend to think of themselves and others is as consequentialist agents: entities who can be usefully modeled as having beliefs and goals, who are then acting according to their beliefs to achieve their goals.
An abstraction in the computer science sense is a simplification which tries to hide the underlying details of a thing, letting you think in terms of the simplification rather than the details. To the extent that the abstraction actually succeeds in hiding the details, this makes things a lot simpler. But sometimes the abstraction inevitably leaks, as the simplification fails to predict some of the actual behavior that emerges from the details; in that situation you need to actually know the underlying details, and be able to think in terms of them.
Agent-ness being a leaky abstraction is not exactly a novel concept for Less Wrong; it has been touched upon several times, such as in Scott Alexander’s Blue-Minimizing Robot Sequence. At the same time, I do not think that it has been quite fully internalized yet, and that many foundational posts on LW go wrong due to being premised on the assumption of humans being agents. In fact, I would go as far as to claim that this is the biggest flaw of the original Sequences: they were attempting to explain many failures of rationality as being due to cognitive biases, when in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective. But if you are implicitly modeling humans as goal-directed agents, then cognitive biases is the most natural place for irrationality to emerge from, so it makes sense to focus the most on there.
Just knowing that an abstraction leaks isn’t enough to improve your thinking, however. To do better, you need to know about the actual underlying details to get a better model. In this sequence, I will aim to elaborate on various tools for thinking about minds which look at humans in more granular detail than the classical agent model does. Hopefully, this will help us better get past the old paradigm.
One particular family of models that I will be discussing, will be that of multi-agent theories of mind. Here the claim is not that we would literally have multiple personalities. Rather, my approach will be similar in spirit to the one in Subagents Are Not A Metaphor:
Here’s are the parts composing my technical definition of an agent:
This could be anything from literally a utility function to highly framing-dependent. Degenerate case: embedded in lookup table from world model to actions.
Degenerate case: stateless world model consisting of just sense inputs.
3. Search Process
Causal decision theory is a search process. “From a fixed list of actions, pick the most positively reinforced” is another. Degenerate case: lookup table from world model to actions.
Note: this says a thermostat is an agent. Not figuratively an agent. Literally technically an agent. Feature not bug.
This is a model that can be applied naturally to a wide range of entities, as seen from the fact that thermostats qualify. And the reason why we tend to automatically think of people - or thermostats - as agents, is that our brains have evolved to naturally model things in terms of this kind of an intentional stance; it’s a way of thought that comes natively to us.
Given that we want to learn to think about humans in a new way, we should look for ways to map the new way of thinking into a native mode of thought. One of my tactics will be to look for parts of the mind that look like they could literally be agents (as in the above technical definition of an agent), so that we can replace our intuitive one-agent model with intuitive multi-agent models without needing to make trade-offs between intuitiveness and truth. This will still be a leaky simplification, but hopefully it will be a more fine-grained leaky simplification, so that overall we’ll be more accurate.