Subjectivity vs Agency: AI "Waking Up"?

Jonathan Moregård

We often talk about the world in two fundamental ways: through agency and causality. A rock falls because of gravity (causality). A dog wags its tail because it’s happy (agency). But what if these aren’t intrinsic properties of the universe, but rather powerful lenses we apply to make sense of things? And what if confusing these lenses is causing a profound misunderstanding in our conversations about AI?

Let’s explore this idea.

Agency vs. Causality: Two Sides of the Same Coin

Imagine a stream. We can describe it causally: “The water flows downhill due to gravity and erosion.” Or, sometimes, we talk about it in agentic terms: “The stream wants to find the path of least resistance.”

Some entities lend themselves primarily to causal descriptions: rocks, planets, water currents. Their behavior is best understood through predictable physical laws.

Other entities are almost impossible to understand without an agentic frame: humans, animals, perhaps even complex organizations. We talk about what a lion wants to eat or what a person believes.

And then there are the fascinating in-between cases. “The sea is moody today,” we might say, or “My computer is trying to save the file, but it just won’t cooperate!” Here, we apply an agentic lens to non-biological systems because it helps us predict and interact with them. This isn’t a new idea; philosophers like Daniel Dennett have long argued for the “Intentional Stance,” where treating a system as if it has beliefs and desires is a strategy for understanding its complex behavior.

When Science “Killed the Universe”

Here’s where things get interesting. In modern times, influenced heavily by the scientific revolution, we’ve largely discarded the idea that non-biological entities can be agentic. We scoff at the notion that a rock “wants” to fall. “They can’t make choices!” we declare.

This shift was crucial for science. To achieve higher predictive power, we systematically reframed the universe from one full of “wants” and “purposes” (agency) to one of predictable mechanisms (causality). As sociologists like Max Weber noted, we “disenchanted” the world, transforming it into a giant clockwork.

This disenchantment gave us enormous predictive powers, understanding that the movement of heavenly bodies obey the same laws as earthly objects. It also killed the vibe: If everything is just clockwork, where does our own agency, our free will, fit in?

Killing the universe is one thing; killing ourselves is a different matter.

The Cartesian Bastion: Conflating Agency with Subjectivity

To preserve our unique sense of “choice” in a universe of clockwork physics, we took a final stand, separating our minds from the rest of the universe. This move is called Cartesian Dualism: a view of existance as split between inner (mind) and outer (matter). The outside is dead, we killed it! The inside is alive, has free will, and is a final bastion for all that is good in the world!

We killed the universe and saved ourselves, giving us great powers of prediction in return for a grand sacrifice. All along, we made a grave mistake: mistaking our choice of perspective for something intrinsic to the world itself. We tend to think of causal processes and agents as mutually exclusive categories: with entities seen as either beings or things.

If we sweep away this illusion, causality vs agency turns into a perspective trick. Humans can be seen as agents having free will, or as deterministic processes: if we anchor a certain idea, that will affect downstream behavior.

If agency and causality are perspective tricks, what does that mean? Surely there is an “inner world” and an “outer world”! Agreed! I know that I have subjectivity: the ability to experience. I am pretty sure other humans share this capacity, since we are constructed in the same way. Now, how far can we extrapolate this? What kinds of entities are likely to share subjectivity?

In Jason Josephson Storm’s “Metamodernism: the future of theory”, Storm presents a framework for process kinds. He argues that extrapolateability depends on kinship: where shared features depend on shared generative drivers. Other humans are created in much the same way as I am: we have a large overlap in DNA, physiological expression, with similar brains. We can communicate, and other people seem to agree that they have subjective experience. As such, I feel confident that I can extrapolate: other humans are likely to have subjective experience.

How about animals? We share a phylogenetic generative driver. Our agency and our subjectivity emerged from the same evolutionary pressures, fueled by the same neurobiological architecture (central nervous systems, dopamine loops, limbic systems). Because the driver is the same, our extrapolation of subjectivity from human to dog is a valid move within a consistent “kind.”

Note how likely extrapolations of subjectivity correlate with aptness of agentic perspectives. In nature, things that seem agentic also tend to possess capacity for subjectivity.

Implicitly, people carry this belief: agenticness = subjectivity

However, if we accept that agenthood is not intrinsic, but rather a choice of perspective, this correspondence breaks down. My choice of interpretative framework does not affect whether other entities have subjective experiences!

The Mind Projection Fallacy

Humans and animals are outputs of a very similar process. We share brains, limbic systems, hormones etc. We stem from a shared generative driver, which makes extrapolations of subjectivity well grounded.

Chatting with LLM’s is similar to chatting with humans.

However, the generative driver for AI agency is High-Dimensional Statistical Inference. It is a process of backpropagation and loss-function minimization.

Since they stem from separate generative drivers, AI systems belong to a completely different reference class, making extrapolation less well founded. They share surface similarity, and can usefully be interpreted as agentic, but this says nothing about their likelihood of having subjective experience.

This is a highly unusual state of affairs. We are used to mix agency (perspective) with subjectivity (state). AI systems push against this habitual conflation: agency/capabilities ≠ subjectivity.

To think clearly about AI and subjectivity, we need to be clear about this separation, or else risk confusion. Here are some ways in which this confusion shows up:

Thinking capabilities requires subjectivity: “One day the AI might ‘wake up’, and then take over”
Thinking that increasing capabilities lead to subjectivity: “As AI systems become more capable, at some point we will need to think about their ethical treatment”
Thinking that lack of subjectivity implies that agentic perspectives are fallacious: “It’s a statistical engine! It can’t make choices or have preferences!”

Extrapolation Across Drivers

So if a reference class implies extrapolability based on shared drivers, what are some classes, and what drivers do they correspond to? Intuitively, here’s a list:

Humans: Very close DNA matching, similar physiology
Animals: Still close phylogenetically, share brains, limbic systems, hormones etc.
Life (including plants, bacteria, fungi): Lot of shared structure at the cellular level, DNA, etc.
Matter (including rocks, the sun etc): Shared physical laws, atoms etc.

These can be visualized as concentric circles:

Note how the inner circles are a subset of the outer ones, sharing increasing degrees of kinship. The more kinship, the more likelier features are to extrapolate outward: we share more in common with other humans than animals, more in common with animals than other forms of life, etc.

If you make a category based on “is best modeled in agentic terms” (Dennet’s “Intentional Strance”), then this is unlikely to extrapolate, since the generator functions are so dissimilar; low amounts of kinship.

Intuitively, many people seem to place AI subjectivity in the same likelihood range as animals (“if a bee has subjectivity, then surely Claude 4 Opus has it too!”). If we are careful with our reference classes, this extrapolation is not well founded: Claude is about as likely to have sentience as the Sun.

Deeper Into The Weeds: Functionalism

To get some early feedback, I fed this essay into Claude Pro for feedback. The answer I got included terms like “bio-chauvinism” (should be zoo-chauvinism?), and “Functionalism”. ¹

The basic counter to the argument I’ve made in this article is this: “Your choice of reference class sucks!”. Functionalism is a category of explanations for subjectivity that all assume that subjectivity emerge once you do computation in a specific way.

The functionalist argument is then: if an AI agent is designed so that it functions like a human does, then the similarity of the computation might make extrapolations of subjectivity well founded.

I doubt this line of reasoning for two reasons:

Functionalism posits “strong emergence”, a ??? step where subjectivity spontaneously emerge once there’s complex enough computation. This has been discussed by Andrés at QRI. Paper, video.
More importantly, our current generation of AI models don’t perform computation that’s similar to the kinds of computation performed by human brains. The structure is dissimilar, even if the surface characteristics are similar. Positing a shared reference class based on dissimilar architectures doesn’t make sense, and seems more like a way to rationalize the “agency=subjectivity” fallacy rather than a principled take.

LESSWRONG
LW