Assume you are an agent, and you have a model of the world. How do you check that you are in your model, that you exist within it?
To see why I care about this question, let's go back to embedded agency.
Embedded agency is the study of intelligent agents which are part of the world with with they interact. This is to distinguish from cartesian (or dualistic) agency, where agents and world are two separated entities with clearly defined I/O channels. Embedded agents fit more to our intuitions about agents; but they run into many problems.
The Embedded world-models sub-problem of embedded agency captures the issue with formally modelling a world bigger than yourself, which contains you and your map of it. It ranges from logical uncertainty, the use of probability to quantify beliefs about the consequences of known facts, to self-reference issues caused by the infinite recursion of the map containing itself.
An embedded agent will probably build models of the world. And any meaningful model of the world will contain a representation of the agent itself. Hence distinguishing a model of the world with the agent and one without, as well as pinpointing in which part of the model the agent lies, gives us information about how to construct and deal with worlds in which we are embedded.
Notice that the agent considering the world in which it is embedded might cause problems of self-reference. The point of this post is to answer the question of embedding in an outside-view approach: given a description of an agent and a description of a world-model, study when the agent is embedded within the world-model. This separates the issue of embedding from self-reference, and provides a first step towards answering the question in the context of self-reference.
I also see this question as an instance of ontology identification: finding how things we care about are encoded in a world-model. One of the difficulties of the general problem is that the object we are looking for and the world-model are represented in potentially different ways.
But this is not the case here. Our agent will be (hypercomputation nonwithstanding) Turing-equivalent. Hence we represent it as a Turing Machine. On the other hand, the world-model is used for computing predictions about the world. This model is thus also representable by a Turing Machine. And here is the key insight: because both the object we are looking for and the model we are looking into are at the same level of formalization (i.e. Turing Machines), we can study this relation mathematically.
It is simply a relation between Turing machines.
(One tape) Turing Machines are traditionally defined as a tuple, where is the set of states, the alphabet, the transition function, the set of initial states, and the set of final states.
I removed some details of the full definition (like the blank symbol and the input symbols), but that's the gist of it. The magic comes from the transition function , which is of type : for each pair of non-final state and symbol under the tape, it gives the next state, the symbol to write in the current cell, and the move of the tape.
Here are some of my simplifying assumptions.
- I fix to : each cell contains a bit.
- I consider the variants of TMs where there is an additional movement option: N. It simply keeps the head in the same place. Therefore the transition function is of type .
- I focus on deterministic TMs, for which returns a single triple. There are non-deterministic TMs, but they add nothing for my purpose, and equivalent to deterministic TMs anyway.
- I make total. In the general definition, can be a partial function, which causes halting whenever it is not defined for the current pair; we can replace that by a transition without effect (thanks to the N movement) to a halting state.
That's the classical way of defining TMs. But for my purpose, I prefer an alternative presentation: TMs as state diagrams. In this formalism, the TM is represented by a directed graph, whose nodes are the states, and whose edges are the transitions. These transitions are labelled by a triple of type . In this triple, the first element is the symbol read by the head, the second is the symbol written by the head, and the last is the move of the head.
Let's take an example: the Turing Machine I will use for my agent in this post. It simply copies the first cell of the input to the cell on its right, and then go back over the first cell and halts.
Here is its transition function:
And here is its state diagram:
(As is common, I use _ as a placeholder when the transition is independent of the symbol read. That way, I write one transition instead of two).
There are two reasons I favor the state diagram representation of TMs. First, it makes them easier to draw, and drawings are an antidote to the dryness of formalism. Maybe more importantly, the intuitions behind the relations I propose stem from formal methods -- more specifically from Labelled Transition Systems, a formalism quite similar to the state diagram representation of TMs. Hence the intuition will be easier to translate unharmed into this representation.
Requirements for the embeddings
Recall that we have a TM A representing an agent, and a TM M representing a world model. We are looking for a mathematical condition for A to be embedded in M.
Because they stem from the intuitions behind embedded agency, I call them conceptual requirements.
- Dynamic Embedding: we want the agent to possibly stop being simulated in the model. Because an embedded agent act on an environment that contains it, the agent might break itself through its actions.
A classical example is the action of dropping an anvil on the agent's head. We want a good embedded model of the world to represent that the agent might stop working afterwards. On the other hand, dualistic agents, like variants of AIXI, consider themselves as separate from the environment. It doesn't even make sense to consider that the I/O with the environment will stop!
- Concurrent Embedding: we want the agent to be run or simulated concurrently with the rest of the world. This means that the agent has no "special place" in the world, in which it runs uninterrupted until it stops.
This captures the fact that embedded agents don't get to put the world on pause when deciding.
- Interruptible Embedding: we want the agent to possibly stop while computing something. This is a sort of combination between dynamic and concurrent embedding, because being stopped while executing implies that the TM stops running, and that it doesn't run in one block.
The other kind of requirement, I call implementation requirements.
They ask that these embeddings can be implemented tractably.
- Deciding Embedding: We want to decide efficiently if there is such an embedding.
- Computing Embedding: We want to compute efficiently such an embedding.
- Semantic Embedding: We want embeddings to be defined according to the "semantic" of the TM A (what it computes) instead of according to the specific representation of A that we have.
By direct embedding, I mean an embedding of a TM A into a TM M, such that A itself can be found within M. The alternative would be an equivalent version of A (in terms of computability) being within -- this case is treated in the next section.
The simplest embedding I can think of is inspired by simulation preorders. A simulation embedding is a relation between the states of A and the states of M, satisfying:
- an initial state of A, .
What does this mean? The easiest way to represent it is to start from the initial states, and then unroll the relation. For the initial state of A, we have a state of M with . This is not necessarily an initial state of M. Then, for each transition from to a , there is a corresponding transition with the same label from to another state of M. Since is total, this completely defines the transitions from . And by applying this idea recursively to the , we conclude that there is a "replica" of A in M.
Here is an example of such a M (for our example agent defined above):
Another way to phrase it: at some point in its execution, M "calls" A, and follows the computation of A until it ends. The simulation relation entails nothing about what happens before or after this call to A; it simply requires that such a call exists.
How does this embedding satisfy our requirements? Let's start with the conceptual ones.
- Dynamic: the existence of a simulation embedding only entails one call to A inside M. For the same A, we can have a with a simulation embedding with one call to A, and a with a simulation embedding where A is called infinitely many times.
- Not concurrent: Because is total, each call to A inside M happens sequentially, until the "replica" of A halts.
- Not interruptible: For the same reason that the calls to A are not concurrent, they are also not interruptible.
Compared with a dualistic separation of agent and world, simulation embeddings only improve on the dynamic requirement; the rest stays as in dualistic agents.
On the other hand, simulation embeddings satisfy most implementation requirements.
- The existence of a simulation embedding is efficiently computable: We have polynomial algorithms (with small exponent) for computing simulations preorders; these should work for our simulation embeddings.
- A simulation embedding is efficiently computable: Same reason as above.
- Not semantic: The embedding is "syntactic", in the sense that it relies on the specific transitions of A, not what it computes.
So simulation embeddings are not completely worthless. Nonetheless, we want to improve the conceptual part.
Weak Simulation Embedding
The issues with simulation embeddings stem from the way it determines completely the decision function of M during each "call" to A. It thus makes sense to weaken this part of the definition, and see what we end up with.
To do so, let's introduce the notation to mean that there is a path between and in the state diagram representation of our TM. This contrasts with , which means there is a transition between and labelled by .
Then a weak simulation embedding is a relation between the states
of A and the states of M, satisfying:
- an initial state of A,
What do we gain compared to simulation embeddings? Well, now the "calls" to A inside M are both concurrent (there might be other transitions between and ), and interruptible ( implies there exists a path between them, not that all paths starting at pass through ).
For example, the following Turing Machine has a weak simulation embedding
It differs from our previous example through concurrency: something happens between the state corresponding to state and the one corresponding to state (also for and ). Here, this is simply a translation to the right.
This allows things like implementing a virtual memory on which A runs, by giving it specific cells and moving between them after each transition of A.
We can also add interruptions and dynamic changes:
Conceptually, that's great. Do we lose anything in terms of implementation? I'm not sure. While simulation embeddings come straight from formal methods, I never saw a relation exactly like weak simulation (it might be a miss from my part). Intuitively, deciding and computing a weak simulation embedding should not be more expensive than for a simulation embedding, since we just replace the existence of a transition by the existence of a path (which is computable in linear time).
Still, I have no exact algorithm for that, and would be very interested by any reference or intuition on the complexity of weak simulation embeddings.
One thing we did not improve though is the "syntactic constraint": we still define embeddings in terms of how A was written as a TM, not in terms of what A computes.
Which bring us to the next section.
Embeddings up to equivalence
If we assume our agent is halting, then it computes a function on boolean strings. That's the semantic of A; whereas the implementation details of A -- the specific transition function/state machine -- is the syntax of A. When two TM A and A' compute the same function, we say they are equivalent.
Let's then define a weak simulation embedding up to equivalence as
the tuple of:
- is a TM equivalent to A
- is a weak simulation embedding between and M
This has all the benefits of weak simulation embeddings, and satisfies the semantic requirement: what matters is the computation of A, not the way it is written.
Unfortunately, neither the decision nor the search is computable for this
embedding. This follows from the fact that even deciding if two halting
TMs compute the same function is undecidable in general. And that's a fundamental
building block for giving such an embedding.
It thus seems that we have to accept the pain of syntactic embeddings for now, until someone finds a better approach than me (or prove it definitely impossible).
In conclusion, as long as we are looking for a specific representation of an agent (the TM A) inside the specific representation of a world (the TM M), weak simulation embeddings do the work. They capture whether the agent is embedded in the world in a satisfying and efficiently computable way, according to intuitions about embedded agency.
If on the other hand we are looking for a semantic equivalent of an agent inside a representation of the world, then the natural extension of weak simulation becomes uncomputable. Whether there is an efficiently computable approach to deciding and building such an embedding is still open.