A core subproblem in ontology identification is to understand why and how humans and agents break down their world models into distinct, structured concepts like tables, chairs and strawberries. This is important because we want AIs to optimize the real world things we care, but the things we care about are expressed in terms of latent variables in our world models. On the other hand, when an AI plans to achieve its goals in the world, that planning refers to its own internal representations, which means we must understand how those internal representations correspond to latent variables/concepts in our ontologies to ensure that the AI is optimizing for the right things in the right way.
From an external perspective, if our only goal was to explain the functional behavior of the world model, it would seem perfectly valid to just treat the world model as one undifferentiated blob of black box program that outputs predictions about the world. A black box program might even be the simplest explanation for the world model's behavior. There doesn't seem to be any obvious reason why we'd need to decompose this black box into well-structured concepts, or why such decomposition would line up consistently with our own ontologies in a meaningful way.
So behavior alone doesn't seem sufficient to pin down structured concepts. We might be tempted to just project our own ontology onto the black box program by taking our current understanding of how the world works and trying to draw correspondences to different parts of the black box that we've somehow carved up. But this approach won't be robust to ontology shifts: The AI will learn and discover all sorts of new things about the world that we haven't observed or even conceived of, including new laws of physics or novel abstractions alien to us, and these are precisely the kinds of things that won't fit into whatever ontology we're trying to force onto the AI's world model.
If projecting our ontology onto the black box program doesn't work, we need to start from the black box description of the world model and derive the ontology from the black box somehow. This seems like a really challenging task, it's not even clear what desiderata would let us pin down a decomposition into well-structured abstractions that remains robust to ontology shifts. However, the achievement of "deriving structure starting from a black box" isn't completely unprecedented:
Bayesian networks: In causal discovery for Bayesian networks, we start with a joint probability distribution over a collection of variables X1...Xn. The joint distribution is a bit similar to our "black box world model program" in that it's not particularly structured but tells us everything about what probabilistic predictions it will make (e.g. when you condition on a variable). However, by continually probing and querying the conditional independence properties of this joint distribution, we can gradually rule out causal DAG structures that are inconsistent with the distribution's conditional independence properties, and eventually pin down the causal structure among the variables (up to a Markov equivalence class).
Algorithmic Markov Condition: Algorithmic Markov Condition is essentially the AIT version of Bayesian networks. Instead of random variables, we have binary strings, and instead of factorizing a joint probability distribution, we factorize the joint complexity of the strings. We can think of the Algorithmic Markov Condition as specifying the "optimal computational order for compressing strings." For instance, if A,B,C,D are four strings and their joint complexity factorizes according to the DAG below, we can interpret that as saying: The optimal way to compress A,B,C,D together is to start with A, then find the shortest program that computes B from A as well as the shortest program that computes C from A. Once we obtain C and B, we find the shortest program that computes D from C and B. Formally we have:
K(A,B,C,D)=K(A)+K(C|A)+K(B|A)+K(D|B,C)
In both of these examples, it seems much easier to derive and talk about structure once we have multiple entities, such as multiple variables or multiple strings. Because once you have multiple entities, you can talk about the relationships between them such as conditional independence. Through continually probing properties of these relationships we can eventually derive a kind of "structure" by stitching the relational properties together. These formalisms aren't just surface-level analogies to our task in ontology identification either: The approximate version of Bayesian networks is the theoretical basis for natural latents, while the approximate version of the Algorithmic Markov Condition forms the basis for the Solomonoff version of natural latents.
Abstraction as redundant computation
Bayesian networks and the Algorithmic Markov Condition are still not quite right for what we want to do in ontology identification because they already assume a particular decomposition into variables or binary strings, and these decompositions are exactly the sorts of things we want to derive in the first place. We want to know why strawberries and chairs are the kinds of things we tend to model as latent variables (instead of e.g. one half of a strawberry combined with one half of the table as a latent variable). Of course we still want to discover the causal relationships between these variables or understand how to derive higher-level concepts from them, but the first step is to derive these variables themselves without assuming them upfront.
So for ontology identification, we can't start with a particular decomposition into latent variables like we do in Bayesian networks or Algorithmic Markov Condition. The fact that we had multiple variables was a big reason why we could derive structure in the first place, by probing the relationships between different variables. However, while we can't assume a particular decomposition of the world model, we often have multiple agents with different world models or multiple plausible hypotheses about the world. We can potentially leverage this multiplicity to derive structure in a somewhat similar way as the Algorithmic Markov Condition.
In particular, when we say that two agents share the same abstraction, one mental picture we might have is that the computations of both agents' world models "route through" the same abstraction. For instance, when two agents share the concept of strawberries, one possible meaning is that they share the same process for computing beliefs about strawberries, but they might differ on how they compute the implications of those beliefs such as downstream predictions or actions:
Similar to algorithmic markov conditions, we can use a directed acyclic graph to talk about the optimal way to compress a collection of world models together. However, instead of the "optimal order for computing strings", we try to capture how multiple world models be represented as the composition of overlapping abstractions. Taking the example in the image above, suppose that we have two agents represented as two functions f1,f2:B∗→B∗ which takes sensory observations (about strawberries) and returns some action or predictions about the world. When we say that the collection of f1,f2 factorizes according to the DAG above, we mean that there exists three abstractions/functions P1,f∗1,f∗2:B∗→B∗ that satisfy the following:
K(f1,f2)=K(P1)+K(f∗1)+K(f∗2). The joint complexity of the world models/agents is equal to the sum of the K-complexity of all "abstractions" in the DAG
Each function is equivalent to the composition of abstractions specified by the directed acyclic graph. In particular, for our example we have f1=f∗1∘P1 and f2=f∗2∘P1. Both agents share the same abstraction P1 which computes beliefs about strawberries from sensory observations, but they have a different process f∗1 and f∗2 for computing predictions and actions from those beliefs.
We can also imagine a much larger collection of world models f1...fn that factorize according to a much more complicated DAG, but the rules are the same: Each arrow in the DAG corresponds to an "abstraction"; the joint complexity of the world models is equal to the sum of the K-complexity of all abstractions. Each world model fi is assigned a final abstraction node f∗i, and the DAG specifies how information propagates: Each abstraction receives information from its "parents" specified by the DAG, and pass information to its "children" until reaching the final abstraction nodes f∗i. Each final node then produces the output/predictions of each world model.
Going back to our strawberry example, the two conditions that we impose in our factorization imply that: (1) This factorization is one of the optimal ways to represent f1 and f2 from a compression perspective. (2) The factorization breaks down the computation of each world model f1,f2 into a hierarchy of overlapping abstractions. By tracing down the arrows in the DAG, we can find the abstraction that is shared among both agents (P1). This is a concrete property that we can verify even though we only have access to the "black box" functional behavior of f1and f2.
Why is this useful?
What's interesting about this generalization of algorithmic markov conditions is that it gives us a concrete formalization of "redundant computation across multiple world models/hypothesis", and redundant computations are exactly the sorts of "shared interface" that we need for ontology identification:
Redundant computation seems like the shared interface that is needed for agents to communicate with each other. In particular, we often communicate our beliefs using concepts that are much lower-dimensional than the raw sensory observations. For instance, in our strawberry example, the first agent might receive some sensory input x∗ about strawberries and forms a belief P1(x∗) about strawberries, and because the abstraction P1 is shared, the first agent can communicate its beliefs P1(x∗) with the second agent instead of sending the raw observation x∗, the second agent can then use that information to update its world model and compute its "counterfactual" predictions or actions f∗2(P1(x∗)) given the first agent's beliefs.
To encode our values as the AI's optimization target, we need some representation that is both consistent with our own ontologies and expressible in the AI's ontology. Redundant computation between the AI's world model and our own can provide the shared interfaces that allow us to do that.
The fact that we retain the same concepts over time despite continually updating our world model suggests that these concepts are redundant among multiple hypotheses that we have about the world. In particular, redundant computation might explain how our concepts can have stable semantics even as we go through ontology shifts, since two hypotheses can share the same abstractions even when they make drastically different predictions.
Redundant computation might even serve as a framework for analyzing instrumental convergence. If we try to factorize a collection of policies (with different goals) instead of a collection of world models, then the redundant abstractions across those policies can be interpreted as instrumentally convergent strategies that are useful for a wide variety of goals.
Open problems
While our generalization of the Algorithmic Markov Condition provides a particular formalization of redundant computations, this formalization is limited in that it only supports a fixed computational DAG over the collection of world models. But the relationships between abstractions can be much more expressive, involving recursions and shifting structures. We would like a framework that can capture the fact that relationships between abstractions can change according to context. We also want the concept of redundant computation to be robust to ontology shifts. For instance, during ontology shifts we might change the way we compute beliefs about strawberries from sensory observations (e.g., we might learn how to predict macroscopic properties from the molecular composition of strawberries). We want to say that we still retain the concept of a strawberry even though our process for computing beliefs changed. To capture these scenarios, we can't have a fixed computational DAG that never changes. Instead, we might want to think of abstractions as reusable functions that can make function calls on each other and "adapt" to new abstractions when they're added to the world model.
Our generalization of algorithmic markov conditions tells us that a collection of world models can be factorized into the composition of a collection of abstractions, it doesn't yet tell us that it has to be factorized that way and not some other way. In other words, we need some additional desiderata that allow us to pin down a unique factorization while making sure those desiderata align with our intuitions and have sound theoretical basis, so that the resulting factorization is actually consistent with our ontologies.
A core subproblem in ontology identification is to understand why and how humans and agents break down their world models into distinct, structured concepts like tables, chairs and strawberries. This is important because we want AIs to optimize the real world things we care, but the things we care about are expressed in terms of latent variables in our world models. On the other hand, when an AI plans to achieve its goals in the world, that planning refers to its own internal representations, which means we must understand how those internal representations correspond to latent variables/concepts in our ontologies to ensure that the AI is optimizing for the right things in the right way.
From an external perspective, if our only goal was to explain the functional behavior of the world model, it would seem perfectly valid to just treat the world model as one undifferentiated blob of black box program that outputs predictions about the world. A black box program might even be the simplest explanation for the world model's behavior. There doesn't seem to be any obvious reason why we'd need to decompose this black box into well-structured concepts, or why such decomposition would line up consistently with our own ontologies in a meaningful way.
So behavior alone doesn't seem sufficient to pin down structured concepts. We might be tempted to just project our own ontology onto the black box program by taking our current understanding of how the world works and trying to draw correspondences to different parts of the black box that we've somehow carved up. But this approach won't be robust to ontology shifts: The AI will learn and discover all sorts of new things about the world that we haven't observed or even conceived of, including new laws of physics or novel abstractions alien to us, and these are precisely the kinds of things that won't fit into whatever ontology we're trying to force onto the AI's world model.
If projecting our ontology onto the black box program doesn't work, we need to start from the black box description of the world model and derive the ontology from the black box somehow. This seems like a really challenging task, it's not even clear what desiderata would let us pin down a decomposition into well-structured abstractions that remains robust to ontology shifts. However, the achievement of "deriving structure starting from a black box" isn't completely unprecedented:
Algorithmic Markov Condition: Algorithmic Markov Condition is essentially the AIT version of Bayesian networks. Instead of random variables, we have binary strings, and instead of factorizing a joint probability distribution, we factorize the joint complexity of the strings. We can think of the Algorithmic Markov Condition as specifying the "optimal computational order for compressing strings." For instance, if A,B,C,D are four strings and their joint complexity factorizes according to the DAG below, we can interpret that as saying: The optimal way to compress A,B,C,D together is to start with A, then find the shortest program that computes B from A as well as the shortest program that computes C from A. Once we obtain C and B, we find the shortest program that computes D from C and B. Formally we have:
K(A,B,C,D)=K(A)+K(C|A)+K(B|A)+K(D|B,C)
In both of these examples, it seems much easier to derive and talk about structure once we have multiple entities, such as multiple variables or multiple strings. Because once you have multiple entities, you can talk about the relationships between them such as conditional independence. Through continually probing properties of these relationships we can eventually derive a kind of "structure" by stitching the relational properties together. These formalisms aren't just surface-level analogies to our task in ontology identification either: The approximate version of Bayesian networks is the theoretical basis for natural latents, while the approximate version of the Algorithmic Markov Condition forms the basis for the Solomonoff version of natural latents.
Abstraction as redundant computation
Bayesian networks and the Algorithmic Markov Condition are still not quite right for what we want to do in ontology identification because they already assume a particular decomposition into variables or binary strings, and these decompositions are exactly the sorts of things we want to derive in the first place. We want to know why strawberries and chairs are the kinds of things we tend to model as latent variables (instead of e.g. one half of a strawberry combined with one half of the table as a latent variable). Of course we still want to discover the causal relationships between these variables or understand how to derive higher-level concepts from them, but the first step is to derive these variables themselves without assuming them upfront.
So for ontology identification, we can't start with a particular decomposition into latent variables like we do in Bayesian networks or Algorithmic Markov Condition. The fact that we had multiple variables was a big reason why we could derive structure in the first place, by probing the relationships between different variables. However, while we can't assume a particular decomposition of the world model, we often have multiple agents with different world models or multiple plausible hypotheses about the world. We can potentially leverage this multiplicity to derive structure in a somewhat similar way as the Algorithmic Markov Condition.
In particular, when we say that two agents share the same abstraction, one mental picture we might have is that the computations of both agents' world models "route through" the same abstraction. For instance, when two agents share the concept of strawberries, one possible meaning is that they share the same process for computing beliefs about strawberries, but they might differ on how they compute the implications of those beliefs such as downstream predictions or actions:
Similar to algorithmic markov conditions, we can use a directed acyclic graph to talk about the optimal way to compress a collection of world models together. However, instead of the "optimal order for computing strings", we try to capture how multiple world models be represented as the composition of overlapping abstractions. Taking the example in the image above, suppose that we have two agents represented as two functions f1,f2:B∗→B∗ which takes sensory observations (about strawberries) and returns some action or predictions about the world. When we say that the collection of f1,f2 factorizes according to the DAG above, we mean that there exists three abstractions/functions P1,f∗1,f∗2:B∗→B∗ that satisfy the following:
We can also imagine a much larger collection of world models f1...fn that factorize according to a much more complicated DAG, but the rules are the same: Each arrow in the DAG corresponds to an "abstraction"; the joint complexity of the world models is equal to the sum of the K-complexity of all abstractions. Each world model fi is assigned a final abstraction node f∗i, and the DAG specifies how information propagates: Each abstraction receives information from its "parents" specified by the DAG, and pass information to its "children" until reaching the final abstraction nodes f∗i. Each final node then produces the output/predictions of each world model.
Going back to our strawberry example, the two conditions that we impose in our factorization imply that: (1) This factorization is one of the optimal ways to represent f1 and f2 from a compression perspective. (2) The factorization breaks down the computation of each world model f1,f2 into a hierarchy of overlapping abstractions. By tracing down the arrows in the DAG, we can find the abstraction that is shared among both agents (P1). This is a concrete property that we can verify even though we only have access to the "black box" functional behavior of f1and f2.
Why is this useful?
What's interesting about this generalization of algorithmic markov conditions is that it gives us a concrete formalization of "redundant computation across multiple world models/hypothesis", and redundant computations are exactly the sorts of "shared interface" that we need for ontology identification:
Open problems