In this post, I will provide some speculative reasoning about Simulators and Agents being entangled in certain ways.

I have thought quite a bit about LLMs and the Simulator framing for them, and I am convinced that it is a good explanatory/predictive frame for behavior of current LLM (+multimodal) systems. 
It provides intuitive answers for why the difficulties around capability assessment for LLMs exist, why it is useful to say "please" and "thank you" when interacting with them, and lines out somewhat coherent explanations about their out of distribution behaviors (i.e. Bing Sydney).

When I was wrapping my head around this simulator classification for GPTs, contrasting them against previously established ideas about classifying AGIs, like oracles or genies, I noticed that simulators seem like a universally useful cognitive structure/algorithm, and that most agents in nature seem to implement some version of a simulator to predict the behaviors of systems that are entangled with those agents’ regulation targets.
And on the other hand, targeted simulation tends to be biased towards agents for various reasons, whereas untargeted simulation tends to sample simulacra until it reaches a stable pattern - which may either be an inert state in simulated space, or an agent-like pattern that keeps up its own boundaries/is self-evidencing.

To put it plainly:

I suspect that agents tend to converge on implementing simulators as part of their cognition, and that simulators tend to naturally discover/implement agents. 

What follows in this post is a sort of handwavy explanation of this suspicion. I have no idea if this is true, but I think it is sufficiently interesting or even educatively wrong to deserve posting. Please feel free to engage and criticize as much as you like, this is speculative conceptual work and I may not know more than you about any given point.


Agents convergently implement simulators

Simulators are one natural solution for the kind of compression task that embedded agents are faced with: The environment is complex, and it changes, and the agent is always interacting with just a relatively small portion of the environment, the selection of which tends to change continuously, but sometimes more radically (i.e. when a predator shows up).
What kind of realistic algorithm can make “useful” predictions[1] on this kind of sensory input sequence?

Simulators are well suited to this kind of task, of “loading the current scene” and applying various learned rules and dynamics to roll out likely future states of the scene, probably biased towards predicting aspects of the future scene that are particularly relevant to the agent. This process is compositional in that it is taking only present elements of the scene and then generating the next instance based on the relationships between these elements - which can be done at different levels of abstraction (predator chases prey, thin branch breaks/makes noise when stepped on).

It is also storage efficient: I believe Schmidhuber sometimes brings up the example of how, when compressing a video of a falling apple, one may only need to store the first frame and the local gravity constant to be able to recover a close approximation of the video - as long as one has a simulator to insert the compressed “code” into. 
Generally, the more powerful/general a simulator you have, the smaller you can compress any data in the domain the simulator is proficient in, like a key/seed that you can easily carry around and unfold as needed - which makes sense to do as long as you are more storage constrained than processing constrained.

Another way of saying all this is that all agents have to be regulators to some extent, at the very least in terms of keeping up their markov blankets over time, retaining their core properties even under perturbations, for them to keep existing in the agent category. Good regulation requires causal modeling of systems that interact with the regulation target(s), and because agents are embedded in more complex environments, this modeling is subject to some serious constraints. The best way to model the causal dynamics of a complex system selectively seems to be by relevance/resolution adjusted generative modeling, which we can think of as targeted/contextual simulation.

It seems like LLMs are faced with a sufficiently similar sort of task, to effectively compress a lot of the training data in order to predict sequences - arguably we are putting even more optimisation pressure on them to find good compression strategies, than was applied to any biological system.


Simulators convergently simulate agents

So, alright. Maybe we do get simulators whenever we push agents or learning systems in general to generatively model/compress a large training corpus. What about the other direction? Do we get agents from simulators?
Sort of? 

The sort of simulator we talked about so far is a directed kind of simulator, refined to offer its best performance on relevant predictions. Perhaps it just so happens that many of the most relevant+difficult predictions that biological agents make are about other biological agents - partially due to the time pressure involved with interacting with a system that has a similar action speed as that agent, but partially also due to the compact complexity of agents. I could go into more detail about what it means to interact with a system that has an internal “state of mind”, but I think it’s sufficiently obvious why simulators would be subjected to extra optimisation pressure wrt agents that were or may be encountered.
With LLMs, we can point to the prevalence of agentic behavior (or descriptions thereof) in the training data, not least because stylistic elements in text can be connected back towards their agentic authors.

We could also think about a more general, less directed sort of simulator, that is just rolling out  a set of dynamics, a window into a hypothetical world in progress. My guess is what I mentioned earlier: it samples over patterns of interactions of simulacra, until it either gets stuck in an inert state, or discovers patterns that retain complex dynamics over time(/as the simulation is rolled forward), which is a natural category for agents. Either way the simulator explores the space of possible patterns of simulacra interactions until it finds stable patterns, but the latter case is more interesting for us. It would discover agents in a way that is a bit reminiscent of how physics “discovered” agents: just applying the local rules to the scene again and again until elegantly ordered, self-sustaining complexity emerges over time. This is just a lot quicker when the simulator has been trained on a world that already contains agents, since it would learn many higher level rules (i.e. narrative ones) than what the base level of reality supplies.

I am not sure whether to think of LLMs as belonging to the former or the latter category, or if these are even appropriate categories. They are like “scene simulators” that have been trained on a whole world of possible scenes, each individual context quite contained, but stretching over a vast, perhaps fully general territory.

Who knows what kind of patterns they had to discover and internalize to be this good at next token prediction. I’ll just note that more universal patterns that help with compression over larger spaces of possible “scenes” (spaces we humans might not be equipped/trained to hold in our minds) seems advantageous for driving loss down.


Simulators and Agents are nested

Let’s take humans as a handwavy example:

  1. Physics acts in the role of a simulator (that isn’t in training), applying rules to existing elements to generate/transition to the next state, sampling over patterns of behavior that the interacting elements display
    1. Some of these patterns become inert over time (i.e. a gas spreading through a contained space)
    2. Some patterns like waves or whirlpools retain a sort of macro-dynamic over some time before collapsing again
    3. Living organisms are self-perpetuating patterns of complex/macro organization
    4. It happens to be the case that the decision making of organisms has a relatively high impact on whether they (or their offspring) persist over time
      1. Rather than it mostly/only depending on the “body design” of the organism
  2. Agents/Organisms are selected over time according to their ability to perpetuate their patterns, putting selection pressure on the decision making process
    1. Decision making can be broken down to a sort of goal-to-action mapping, a consequentialist calculus of which behavior in the current context would be useful (which in this context means something like adaptive)
    2. Decision making is selected partially according to its quality of prediction about relevant outcomes
  3. We know that humans brains are running a sophisticated predictive simulation that mainly dictates our conscious perception and is kept on track by constant grounding through sensory stimuli
    1. This simulation is very much contextual (i.e. the perception of color does not only depend on wave-length, but also on internal processes that track colors of objects and access color associations of different objects, allowing us to keep perceiving something as “red” even if the lighting conditions change significantly)
      1. This simulation is very attention/expectation adjusted, so we might notice someone in a gorilla costume on the screen if we are focusing on counting ball passes
  4. We, as conscious beings, actually “live” inside of the simulations that our brains generate. We interact with the world indirectly, by navigating the simulated stage that the brain presents us with, only ever interfacing with “stage objects” and their often non-physical properties (like color)
    1. Only this relevance adjusted simplification of the real world is tractable for us to think about, entertain counterfactuals about, and make long term predictions within
    2. In some sense, when we are simulating other people, we are also generating an abstracted version of what might be going on in their mind, a sort of little simulator that animates the dynamics of thoughts and emotions to yield predictions about likely behavior
    3. (Evolving sophistication on this layer might have been a prerequisite for our dominance as a species)


As the level of sophistication rises, it becomes natural to introduce the respective next abstraction layer into the system. When I started turning this towards reasoning about LLMs, I conjectured about whether it is more expected for them to “top out” with a simulator or an agent. I now think that, insofar as this framing makes sense, every agent is necessarily embedded in some kind of simulator. One meaningful distinction is in whether a simulator is trained towards the purposeful use of a singular agent/unified group of agents that it is simulating, or if the simulator remains largely unadjusted (= not disproportionately adjusted) by those patterns. Either way, the simulator provides important context for the selection of internal agents (and their attributes) over time.

One important thread that I haven’t thought through yet is the extent to which the internal development of a “consequentialist reasoner”, that figures out how to better (more quickly, effectively) make the simulator converge on the training distribution, is a central concern. It seems like consequentialist reasoning is a cognitive algorithm that we might expect to naturally emerge when optimizing our AI systems for complex tasks, and LLMs are certainly no exception. A vague picture we could draw is that the simulator might discover such an agent as a coherent entity which happens to have preferences that the overall system is trained towards over time. This is a good time to remember that LLMs don’t necessarily “want/try” to do next token prediction, but rather that the training selects for systems that happen to demonstrate that skill with high proficiency, for whatever internal reason.

In any case, with LLMs it seems clear that we at the very least have a powerful simulator capable of simulating complex agents, cultures, and other dynamical systems. It also seems clear that we can’t rely on this being an entirely neutral simulator, since it was strongly selected to simulate a certain subset of high order physics well, while requiring much less accuracy on other aspects. Still, the simulation target seems general enough such that this might be sufficiently unlike the kind of simulation going on in our brains. I am curious what other people think.

  1. ^

    I can elaborate on "cognition is centrally about useful prediction", if that is too vague.

New Comment