finding agents in raw dynamics
Related: Formalizing «Boundaries» with Markov blankets, «Boundaries» Sequence
Suppose you are given time series data from a simulated ecology (such as Smallville). You can usually tell what the agents in there are doing; at least if the variables are properly labeled. If they are not labeled but just numeric values, your first task would be to reverse-engineer the mapping to the conceptual model behind the simulation (or whatever generated the time series data). That might not be feasible, but let's ignore that for now. Because there is a bigger problem and that is a result of any labelling where existing or reverse engineered: You introduce an ontology. Once you call something an agent (label="player1_position"), you assume that those variables belong to that agent. That works well because our intuitions about agents are pretty good.
Until it doesn’t. Our intuitions attribute agency where it isn't. Our ancestors anthropomorphized nature. We may attribute intent where there’s only coupling. It is more dangerous to overlook an agent than to see one too many.
But we may miss intent when it is distributed or doesn't neatly fit into our ontologies of physically bounded agents.
If you need to find an agent, esp. a potentially powerful agent that may work very differently from our intuitions, we need a method that can discover agents in raw unlabeled data without using a prior ontology (or, in other words, without a unit that is already known to be an agent).
That's two problems: 1. without prior ontology and 2. finding in raw data.
For the first, as part of agent foundations work, johnswentworth, and others have proposed modeling an agent in terms of its boundary that shields inner states from outer change. A natural modeling seemed to be a Markov blanket.

Quick recap: The blanket is a partitioning[1] of all variables into four sets:
There are some problems with Markov blankets, but these seem solvable:
Thus, it may make more sense to talk about invariant ε-boundaries when referring to such agents. Where the Markov blanket is not determined from raw observables, but over parameters invariant[2] under transformation, and the understanding that the ε is chosen to minimize predictive regret for predicting internal from external (invariant) variables.
Give such a formalization of an agent as a Markov blanket, how can we find it in the raw data? While there are efficient algorithms to find the blanket of a single variable, and there is some recent work on the relation between Markov Blanket Density and Free Energy Minimization, I'm not aware of any implementation that uses blankets to find agents. One problem is that we do not have a nice causal graph[3] that we could inspect structurally for the blanket property. We have to check a lot of variables statistically.
Thus as a first contribution, I offer an implementation of Unsupervised Agent Discovery (UAD), i.e., of how it might be possible to find such agents in unlabeled raw time series data.
The implementation takes a raw dataset and executes these steps:
The implementation also includes a simulation of a configurable number of agents (simple state machines for simple controllers) in a simplified shared environment from which a dataset of all their observables is generated.
From implementing and running these simulations, I learned
Discovering agents is nice. Now that we can do that, what else can we do? We can try to understand the internal states better. We can try to understand how the agent represents itself and its environment. This is the second contribution of the post (Note: These is not yet part of the implementation).
We can treat memory as a compression[4] of past inputs that inform future actions.
For each internal variable m∈I, and lag k, compute . If Δm(k) is large, then the past of m predicts the agent’s next internal state. We have to be careful because a variable may look like memory just because it is changing slowly.
If we can track inputs and outputs of an agents, we can try to infer what the implied policy behind this input-output relation is. We can infer R(I,A) with inverse reinforcement learning (IRL). Given a P(R) that weighs reward functions[5], we can use the standard formulation . We could call these the agent's goals. Though with simple modeling, we can't yet represent issues like Mesa Optimization.
If we can find two agents and their memory, we can ask if one of them represents the other, or, technically, if some memory predict beyond what X already does: . A large Δm means that some part of X is predicting Y.
I'm not clear if this refutes How an alien theory of mind might be unlearnable or not.
Once you can track what agents know about each other and what objectives they follow, you can, in principle, derive how well these agents do or do not cooperate.
You can calculate:
Cooperation is favored when the fraction . We can call κ a cooperativity index; a generalization of Hamiltons's rule[7]. If we look at the full induced cooperation graph between agents, where edges are weighted by κ, we can use percolation theory to determine at which level cooperation will become universal (a giant component) due to The Evolution of Trust.
The promise of this approach is that we could look at a complex system with many potential agents large and small, human, legal, or artificial, and determine at least an approximation of the cooperation structure and whether there are adversarial tendencies.
With all these promises, there are potentially serious problems.
The best blanket algorithm fails if we cannot observe sufficient details and especially the internals of agents of interest - for sure we can't look into humans. The theoretical arguments above even model why agents wouldn't want transparency in some cases. And if we use proxies we loose reliability. As a consolation, as we are most interested in powerful AI, we might have access to the internals of them at least.
If any variable we overlook is a common cause we can mistake agent boundaries and fail to separate and thereby fail to identify the crucial agents. I'm mostly thinking about complex LLM-like agents here. An LLM is distributed across multiple computers and may depend on human operators for its function but still be powerful. Agents running on LLMs are also hard to find with slot-based approaches
Things may change. The method, as currently conceived, is sample hungry, and if anything changes during the sampling period, this change will be covered and may interfere with the discovery. And also, some types of learning that change the policy of the agent (which might even be in response to the discovery process) may manifest only after the sampling period.
There are concerns that the calculations might not be statistically stable and esp. not computationally feasible. Currently, the algorithm is combinatorial in the number of variables. My current argument here is that we have an existence proof: Humans have learned to identify agents in complex data fairly effectively, and it should be possible to reproduce that in a specialized algorithm.
At this point, Unsupervised Agent Discovery doesn't give you an algorithm that discovers agents in real-world data yet. But it provides a precise way to talk about agents, their goals and cooperation, and many other things we care about that usually require an a priori notion of an agent, but can now be grounded in physics.
Github repo: https://github.com/GunnarZarncke/agency-detect/tree/master
My initial dense LaTeX/PDF writeup of UAD can be found here.
Many thanks to the reviewers Jonas Hallgren, Chris Pang, and Peter Kuhn. Additional thanks go to the team at AE studio that supported the development of the experiments and write-up with time and compute.
Feedback welcome, especially from the agent foundations people.
by using the condition
from M. D. Kirchhoff, T. Parr, E. Palacios, K. Friston, and J. Kiverstein, “The Markov blankets of life: autonomy, active inference and the free energy principle,” J. R. Soc. Interface, vol. 15, no. 138, 2018.
The invariants may themselves depend on the agents' dynamics, making a simple layer-by-layer inference infeasible.
If we could intervene on the simulation/experiment, we could determine the causal structure as done in the Deepmind Discovering Agents paper. That is also how humans check whether something is an agent or not: we prod it and see if it evades. Is is a promising direction but was beyond the scope of this work.
In evolutionary environments, agents with memory of past input that may be relevant to future outputs that affect survival will outcompete agents without. And agents with more compact memory will outcompete agents with larger memory but the same predictive effect.
This weighing P(R) is often seen as arbitrary or in need of justification, but here we are closer to the underlying substrate. In most environments of interest, we can argue that there will be entropic forces that select for simpler policies and lower prediction errors.
This residual mutual information between agents’ actions, internal models, or rewards is not indicate a failure of separation. It captures alignment without leakage, e.g. from shared task structure, common external drivers, following the same conventions, or algorithmic similarity.
Hamilton's rule says that genes for a particular behavior should increase in frequency when rB>C where r = the genetic relatedness, B = reproductive benefit, and C = the reproductive cost
W. D. Hamilton, “The genetical evolution of social behaviour,” J. Theor. Biol., vol. 7, no. 1, pp.1–16, 1964.