1019

LESSWRONG
LW

1018
World Modeling
Frontpage

2

The Astronaut and the Planet: Part II

by epicurus
12th Sep 2025
5 min read
0

2

World Modeling
Frontpage

2

New Comment
Moderation Log
More from epicurus
View more
Curated and popular this week
0Comments

This is the second in a series of posts on self-models and what they teach us about cognition—both artificial and natural. While I expect some of these ideas will seem novel and others obvious, I hope most readers will find something new to consider. You can find Part I here.

In the first part of this series, we introduced a metaphor for our cognitive selves: our mind is a vast and complex Planet - the seat of unconscious processing, intuition, and raw emotion - orbited by a solitary Astronaut, our conscious, narrative self. The Astronaut's role is not to command the Planet but to observe it, constructing a simplified map of its inner workings from a curated stream of data. This model gives rise to our sense of a unified "I" and our experience of agency.

This framing, however, left us with a crucial and perplexing question. If the Planet handles the vast majority of cognitive and biological processing, and the Astronaut is largely an observer and storyteller, it seems to be a costly, almost superfluous, feature. The Planet does the work, so what is the Astronaut for? Why did a system like this evolve at all?

As many of us will have noticed, our conscious awareness is at its most alert when faced with novel, uncertain situations - learning a new skill, avoiding an accident, or navigating a social faux pas - and least active in predictable, routine situations - waiting in traffic, doing the laundry. This suggests that the astronaut is especially important in learning from and responding to novel experiences. In this part, we will sketch out two important roles the astronaut plays in this vein.

The Astronaut as Coach

Today's machines almost universally learn using gradient descent, an algorithm which can be roughly described as an automated way to assign credit for a particular outcome to each part of the neural network. While simple, it has proven to be enormously effective and forms the backbone of modern successes in AI. Nevertheless, there are reasons to hope that we can do better.

Gradient descent can be enormously expensive because it has to calculate updates for each weight. Moreover, it can depend sensitively on hyperparameters, like the learning rate, which are often found only through trial and error. Finally, the gradient descent algorithm itself does not inform us which actions to take to learn most efficiently.

It seems plausible that the Astronaut is evolution's attempt to circumvent these inefficiencies. From this viewpoint, the Astronaut is most interested in learning the dynamics of the Planet in uncertain situations. This also has the potential to circumvent the difference in computational resources between the Planet and the Astronaut—the dynamics of the Planet might be much more compressible than the Planet as a whole. By learning a high-level, compositional, interpretable model of the Planet in these situations ("to climb this wall, I will need to use these two handholds and those two footholds in this particular order..."), the Astronaut can guide the Planet to change the specific sub-optimal parts of its process and steer it toward future explorations that maximize information gain.

For example, a tennis player understanding why she faulted on a serve is hardly ever likely to credit a crow flying overhead. Her model of herself and the game—her Astronaut—has already identified the key, plausible parameters. She is likely to focus on her wrist movement or shot placement rather than the million other factors the gradient descent algorithm would have to compute credit for, only to eventually assign them low importance over many repeated trials. This makes her learning more sample-efficient and guides her future practice as she works to improve her serve.

The Astronaut as Simulator

The other common decision-making technique we employ in difficult situations is to simulate ourselves in the future. We can play out a choice we might make and predict our emotional and cognitive reactions to its consequences. This is clearly an activity that is enormously helped by having a simplified, high-level model of ourselves—that is, by having an Astronaut.

Of course, this is not the only way we learn. Especially in situations with a high degree of reliability and dense feedback, the Astronaut merely trusts her Planet to figure things out and learn directly from experience. This is the case, for example, in learning to ride a bike. Very few of us have a detailed, conscious mental model of how a bike works or how we steer it; it just seems to happen automatically.

This dichotomy has a clear parallel in modern reinforcement learning (RL) techniques.

A model-free RL agent learns purely through trial and error. It doesn't understand the underlying rules of its environment; it simply learns a set of policies or value estimates. Through millions of attempts, it discovers that doing action A in state S tends to lead to a positive reward and develops a powerful, reactive "intuition." This is the domain of the Planet and how we learn to ride a bike or instinctively return a tennis volley. The Planet is fast, efficient, and masterful in environments it has been extensively trained on. Conversely, if the rules suddenly change—if the wiring on the bike is reversed—relearning is a slow and painful process. It must first unlearn the old associations and then learn new ones.

A model-based RL agent, on the other hand, does something different. It uses its experiences to build an internal, simplified model of how the world works. Before acting, it can use this model to simulate the likely outcomes of different actions: "If I press this new button, what do I predict will happen?" This is precisely the Astronaut's role as simulator. The Astronaut’s self-model is our internal sketch of the game’s physics. This process is slower and more computationally demanding than a simple reflex, but it is vastly more flexible. When faced with a novel situation, the Astronaut can "think through" the consequences, allowing the system to find a smart solution on the first try, rather than through brute force.

While today's model-based RL agents typically build a model of their environment, we are suggesting that evolution took a crucial extra step: it built a model of the agent itself. This self-model is the Astronaut.


If we take these ideas seriously, the obvious next step is to attempt to replace gradient descent (on a trained foundation model, in some particular circumstances) with a different neural network trained precisely to update the weights of the first model. We should both expect this to succeed, and for the second neural network to learn a model of the first first neural network as well as its training environment. I am currently running small scale experiments to test this hypothesis out, but will gladly welcome any help from experienced ML people.

In part III, I will talk about goals and what can go wrong in having an astronaut that is too rigid or static.