A core subproblem in ontology identification is to understand why and how humans and agents break down their world models into distinct, structured concepts like tables, chairs and strawberries. This is important because we want AIs to optimize the real world things we care, but the things we care about are expressed in terms of latent variables in our world models. On the other hand, when an AI plans to achieve its goals in the world, that planning refers to its own internal representations, which means we must understand how those internal representations correspond to latent variables/concepts in our ontologies to ensure that the AI is optimizing for the right things in the right way.
From an external perspective, if our only goal was to explain the functional behavior of the world model, it would seem perfectly valid to just treat the world model as one undifferentiated blob of black box program that outputs predictions about the world. A black box program might even be the simplest explanation for the world model's behavior. There doesn't seem to be any obvious reason why we'd need to decompose this black box into well-structured concepts, or why such decomposition would line up consistently with our own ontologies in a meaningful way.
So behavior alone doesn't seem sufficient to pin down structured concepts. We might be tempted to just project our own ontology onto the black box program by taking our current understanding of how the world works and trying to draw correspondences to different parts of the black box that we've somehow carved up. But this approach won't be robust to ontology shifts: The AI will learn and discover all sorts of new things about the world that we haven't observed or even conceived of, including new laws of physics or novel abstractions alien to us, and these are precisely the kinds of things that won't fit into whatever ontology we're trying to force onto the AI's world model.
If projecting our ontology onto the black box program doesn't work, we need to start from the black box description of the world model and derive the ontology from the black box somehow. This seems like a really challenging task, it's not even clear what desiderata would let us pin down a decomposition into well-structured abstractions that remains robust to ontology shifts. However, the achievement of "deriving structure starting from a black box" isn't completely unprecedented:
Algorithmic Markov Condition: Algorithmic Markov Condition is essentially the AIT version of Bayesian networks. Instead of random variables, we have binary strings, and instead of factorizing a joint probability distribution, we factorize the joint complexity of the strings. We can think of the Algorithmic Markov Condition as specifying the "optimal computational order for compressing strings." For instance, if are four strings and their joint complexity factorizes according to the DAG below, we can interpret that as saying: The optimal way to compress together is to start with , then find the shortest program that computes from as well as the shortest program that computes from . Once we obtain and , we find the shortest program that computes from and . Formally we have:
In both of these examples, it seems much easier to derive and talk about structure once we have multiple entities, such as multiple variables or multiple strings. Because once you have multiple entities, you can talk about the relationships between them such as conditional independence. Through continually probing properties of these relationships we can eventually derive a kind of "structure" by stitching the relational properties together. These formalisms aren't just surface-level analogies to our task in ontology identification either: The approximate version of Bayesian networks is the theoretical basis for natural latents, while the approximate version of the Algorithmic Markov Condition forms the basis for the Solomonoff version of natural latents.
Bayesian networks and the Algorithmic Markov Condition are still not quite right for what we want to do in ontology identification because they already assume a particular decomposition into variables or binary strings, and these decompositions are exactly the sorts of things we want to derive in the first place. We want to know why strawberries and chairs are the kinds of things we tend to model as latent variables (instead of e.g. one half of a strawberry combined with one half of the table as a latent variable). Of course we still want to discover the causal relationships between these variables or understand how to derive higher-level concepts from them, but the first step is to derive these variables themselves without assuming them upfront.
So for ontology identification, we can't start with a particular decomposition into latent variables like we do in Bayesian networks or Algorithmic Markov Condition. The fact that we had multiple variables was a big reason why we could derive structure in the first place, by probing the relationships between different variables. However, while we can't assume a particular decomposition of the world model, we often have multiple agents with different world models or multiple plausible hypotheses about the world. We can potentially leverage this multiplicity to derive structure in a somewhat similar way as the Algorithmic Markov Condition.
In particular, when we say that two agents share the same abstraction, one mental picture we might have is that the computations of both agents' world models "route through" the same abstraction. For instance, when two agents share the concept of strawberries, one possible meaning is that they share the same process for computing beliefs about strawberries, but they might differ on how they compute the implications of those beliefs such as downstream predictions or actions:
Similar to algorithmic markov conditions, we can use a directed acyclic graph to talk about the optimal way to compress a collection of world models together. However, instead of the "optimal order for computing strings", we try to capture how multiple world models be represented as the composition of overlapping abstractions. Taking the example in the image above, suppose that we have two agents represented as two functions which takes sensory observations (about strawberries) and returns some action or predictions about the world. When we say that the collection of factorizes according to the DAG above, we mean that there exists three abstractions/functions that satisfy the following:
We can also imagine a much larger collection of world models that factorize according to a much more complicated DAG, but the rules are the same: Each arrow in the DAG corresponds to an "abstraction"; the joint complexity of the world models is equal to the sum of the K-complexity of all abstractions. Each world model is assigned a final abstraction node , and the DAG specifies how information propagates: Each abstraction receives information from its "parents" specified by the DAG, and pass information to its "children" until reaching the final abstraction nodes . Each final node then produces the output/predictions of each world model.
Going back to our strawberry example, the two conditions that we impose in our factorization imply that: (1) This factorization is one of the optimal ways to represent and from a compression perspective. (2) The factorization breaks down the computation of each world model into a hierarchy of overlapping abstractions. By tracing down the arrows in the DAG, we can find the abstraction that is shared among both agents (). This is a concrete property that we can verify even though we only have access to the "black box" functional behavior of and .
What's interesting about this generalization of algorithmic markov conditions is that it gives us a concrete formalization of "redundant computation across multiple world models/hypothesis", and redundant computations are exactly the sorts of "shared interface" that we need for ontology identification: