This dialogue is part of the agent foundations fellowship with Alex Altair, funded by the LTFF. Thank you Dalcy, Alex Altair and Alfred Harwood for feedback and comments.
Context: I (Daniel) am working on a project about ontology identification. I've found conversations to be a good way to discover inferential gaps when explaining ideas, so I'm experimenting with using dialogues as the main way of publishing progress during the fellowship.
We can frame ontology identification as a robust bottleneck for a wide variety of problems in agent foundations & AI alignment. I find this helpful because the upstream problems can often help us back out desiderata that we want to achieve, and allow us to pin down theories/solutions that we're looking for:
Rephrasing in my terms:
Does that sound broadly right?
- A lot of human concepts are concepts whose relationship generalize across a wide variety of other concepts, which means we want to separate that concept with the specific context that it's in
- In order to do this, we need to structure the concept in a way that contains some relational information about how it interacts with other variables, but leave out other relational details that we want to abtract over
This part sounds important, but I don't get it.
Does that sound broadly right?
Yep that sounds like what I had in mind
One basic property of such a modeling formalism is making the "relational information" into an explicit variable (rather than a derived thing) that other elements of the formalism can directly access.
And importantly, this allows us to move things like higher-order terms/natural latents across different contexts and still be able to make sense of their meaning in that context.
This part sounds important, but I don't get it.
So when you have a higher-order term like "behind", it's a term that generalizes across a wide variety of contexts (we can say "A is behind B" for a wide variety of As and Bs). So our mental representation of the word "behind" should contain the "relational information" that tells us how it interacts with a given context, but we also want to abstract over/throw out contextual information that is way too specific (e.g. what objects A and B are in a specific instance: "behind" shouldn't be defined as a particular spatial relation to a table or a cat or a house or any other specific object.)
Another angle/motivation I'm thinking of is in the context of solomonoff induction:
This seems exciting but I don’t fully understand! Maybe this example can help clear up where I’m struggling.
Humans have a kind of in-built model of physics which encodes naive pre-Newtonian intuitions like “If I shove a thing, it will move”. As we learn more about physics, we learn that this model of the universe is wrong and we update it with relativity/quantum mechanics/whatever. But if I have to pick up a chair and move it to a different room, I’m not using relativity to work out how I should move it, I’m using my pre-Newtonian intuitions. So in some sense, that instrumental part of my world model has remained unchanged despite me updating my beliefs. But I don’t think that this means that elements of my ontology have stayed the same. Modern physics is ontologically completely different to naive physics. It seems to me that upon learning modern physics, one’s ontology changes completely, but there is still some instrumental value in keeping the old ontology around to be used as a quick and dirty (and computationally cheap) approximation when I need to pick up a chair. But I don’t think this is the same thing as saying that the concepts have remained ‘invariant’ as one goes from using naive physics to modern physics.
For this example would you say that, upon the agent learning modern physics, the ontology has changed almost entirely (because the principles/concepts behind the different models of the world are completely different) or only a little bit (because learning modern physics doesn’t affect the majority of actions that an agent takes)? Or something else?
So in this example we have two possible viewpoints:
I think both of these viewpoints are reasonable and valid, but for the purpose of ontology identification, we want to take the first perspective because:
What this means is that we want to structure our concepts in a way that can adapt to ontology shifts: My mental representation of a chair should only capture the information that is shared between a wide variety of "theories about chairs". I might currently believe that chairs are made of atoms, but if it turns out that they're made of quantum fields, I can still carry on making the same predictions about chair-related things because my concept of a chair does not rely on a specifc theory about "what chairs are".
So now I want to introduce some minimal examples for how we can have a "unit" of a world model that "packs" enough relational information inside that unit such that we can interpret its meaning in isolation, without having to reference anything else in the world model. We'll call this property relational completeness, and we write R(x) for "x is relationally complete/we can interpret the semantics of x from x itself".
An example of something that is not relationally complete is the parameters and activations of a particular neuron, because the parameters do not tell us where the neuron is located inside the network, which is part of what defines the "semantics" of the neuron's activation (i.e. what is implied by the neuron's activation).
To demonstrate a minimal example of something that is "relationally complete", we make the following assumption:
Given these assumptions, we want to demonstrate that relational completeness is a "compositional" property where the relational completeness of a component C "enables" the relational completeness of other components that depends on C. We do this by considering the following induction proof sketch:
The property that I want to zoom in on is that each f only specifies its "local" relationship with the variables that it directly interacts with (i.e. the variables that fdirectly takes as input), but in order for something to be relationally complete, we would expect that it has to contain information about its global relationship all the way down to the sensory inputs, since that's what it takes for an object to encode an equivalence class over sensory inputs (which is how we define the semantics of an object in this setting). However, in this case it seems like we can achieve relational completeness just by including "local" relational information.
The intuition behind this is that when we have an object that is relationally complete, by definition, all information about the semantics of that object is contained within the object itself; any relevant relational information about how that object is computed from upstream variables is already contained in the object, which means that when we try to derive downstream variables on top of that object, we don't need to go back upstream to retrieve relational information.
In other words, a relationally complete object mediates between the semantic information between upstream and downstream variables, and this is what allows relational completeness to be a compositional property, where the relational completeness of upstream objects enables the relational completeness of downstream variables.
An analogy of this is if you're playing the game of telephone, you can think of a "relationally complete" messenger as a messenger who can fully explain how the current message is derived from the original source message, and once you have access to such a messenger, you don't need to go back upstream to ask the previous messengers anymore, and it also makes it easier for you to become a "relationally complete" messenger yourself because they pass that information onto you (which is where compositionality comes in).
Cool! Let me see if I understand. So you have a proof that if you take a set of relationally complete objects and apply a computable function, then the resulting set (along with a specification of the function) is also relationally complete. This is because you can run the function on all possible a-values to find out which a-values generate which x-values and then 'import' the meaning from the set A to the corresponding elements in set X.
You can then apply this iteratively/inductively, so that a repeatedly applying functions leads to more relationally complete sets. You then postulate that sensory input is relationally complete, so that gives the first step upon which you can then build the inductive proof. (Tell me if this is right so far!) Glancing at it, I think I buy this proof.
The thing that I'm not sure about is whether sensory inputs actually are relationally complete in the sense you describe. Are you just postulating that they might be in order to get the proof going, or is there a strong reason for thinking that they are?
Most likely I'm misunderstanding the concept of relational completeness, but how is it possible that the 'meaning' of sensory input is interpretable in isolation? If two people are listening to the same piece of spoken word audio but one of them understands the language being spoken and the other doesn't, they will ascribe a different meaning to it, even if their sensory inputs are exactly the same. Could you flesh out what it means in practice for sensory inputs to be relationally complete? Alternatively, are there any other obvious/simple examples of relationally complete objects?
You can then apply this iteratively/inductively, so that a repeatedly applying functions leads to more relationally complete sets. You then postulate that sensory input is relationally complete, so that gives the first step upon which you can then build the inductive proof. (Tell me if this is right so far!) Glancing at it, I think I buy this proof.
Yep that seems correct to me! (P.S. I intentionally made an error for simplification which I'll mention later)
The thing that I'm not sure about is whether sensory inputs actually are relationally complete in the sense you describe. Are you just postulating that they might be in order to get the proof going, or is there a strong reason for thinking that they are?
Good question. So I should clarify that when I say an object O is not relationally complete, I expect that I need to add something else in the world model such that"O + that something else" will be relationally complete. In the neural network example, the parameters + activations of a neuron aren't relationally complete because I need to add information about where that neuron is located inside the network relative to everything else.
An implicit assumption is that all information about semantics must come from the world model, and we consider sensory variables relationally complete because they are fundamental in the sense that they are used to derive everything else and aren't derived from anything else.
A longer answer is that sensory observations are macrostates which induce an equivalence class over the set of environments (microstates) that can result in those sensory observations, and that equivalence class is the actual "semantics" of those sensory observations. Importantly, "semantics" in this sense is an objective, observer-independent property, and that still holds even when different observers ascribe different "subjective" meaning to those sensory observations.
So when it comes to ontology identification, we want to make sure that we can isolate relationally complete components from the world model in the "observer-independent" semantics sense. But after that, we have to make sure that we as observers are making the correct interpretations about those relationally complete objects, which is an additional task.
So I actually cheated a little in this step of the proof sketch:
4. Just to spell out what this means more concretely:
- We can treat sensory inputs (histories) X0 as zeroth order variables which are relationally complete
- We can have first order variables (f1,x1)∈X1 where f1 is a function over subsets of X0, which are relationally complete by hypothesis
- We can have any nth order variables (fn,xn)∈Xn where fn is a function over subsets of Xn−1, which are relationally complete by induction
because I'm assuming an order about which functions are applied after which other functions, but that information is not specified in the variables themselves. For these variables to actually be relationally complete, we need to encode that information within the objects themselves; we can't have any overarching structural information outside of those objects.
To fix this, we need to somehow add another type of entity to the pair (f,x) that allows us to encode the order of how the functions are applied inside the objects themselves, so that we don't have to impose a structure outside of the objects. In addition, we want the resulting relationally complete object to be maximally expressive: For instance, we don't want our relationally complete object to only support a fixed computational DAG; we want the ordering of function composition to be able to dynamically adapt to the context. A useful analogy is to think about function calls in regular programs:
Our goal is to take this sort of structure and use that to encode the order of function composition inside the relationally complete objects themselves, so that we don't need to specify any additional structure on top of those relationally complete objects. To do this, we need to add an object r with a particular type signature so that each relationally complete object is a tuple (f,r,x), and we should be able to figure out the order of function composition (which may be context dependent) just by looking at the collection of relationally complete objects:
Imagine that we have two variables x1, x2 where they have a functional relationship f(x1)=x2. One of the ways of framing relational completeness is that we want to split the information about this function f into two components f1 and f2 , such that we can rederive the relationship between x1 and x2 entirely from the pair (f1,x1),(f2,x2). We want to think of f1 as the information that "belongs to x1" and f2 as the information that belongs to x2.
However, if these are the only two variables that we're considering, then it seems like there are various ways of splitting f that are equally valid: We could consider putting all of the information about f into f2 while leaving f1 empty, but the opposite choice of putting all information about f into f1 seems equally valid. In other words, there's no unique objective way to "pack" relational information inside the objects.
But now suppose that we have n+1 variables x1...xn+1 where xn+1 is computed from x1...xn by xn+1=⊗ni=1fi(xi) where ⊗ represents some form of aggregation of information. In this case, we want to split the n functions fi into n+1 parts fi (i∈{1,n+1}), where fi represents the relational information associated with xi. Contrary to before, there is an "objectively correct" way of splitting the function in some sense: Namely, if there is some information that is redundantly represented in all (or multiple) fi's, then we should put that information in fn+1 because that allows us to store only one copy of that information (whereas storing them in all of the fi,i∈{1...n} would result in multiple copies of the same information).
Our current formalization of relational completeness does enable this form of function splitting: Ignoring the r component for a moment and consider two objects (f1,x1),(f2,x2) where x1=f1(f2,x2). An equivalent way of expressing this is to curry the function f1, so that it takes in f2 and returns a function that maps x2 to x1. In other words, f1(f2) returns another function g, and g(x2)=x1.
We can then consider the case where f1 may take a wide range of other functions/objects fias argument, so that:
Then suppose that there is some information that is represented in a wide variety of gis, a simplicity prior forces us to shift that information into f1 so that we only have to store one copy of the redundant information.
However, we don't currently have a way of doing the same thing on the output side: Suppose that we have n+1 variables where xi=gi(x1),i∈{2,n+1}, and we want to split it into n+1 parts fi (i∈{1,n+1}) where each ui is the relational information associated with xi. Similar to before, if there is some information redudantly represented across multiple gis, we want to shift that information onto u1, so that we only store one copy of the information. The issue is that for a relational complete object (f1,x1), f1 is already preoccupied the role of capturing the redundant relational information on the input side, so we need something else to capture the redundant relational information on the output side.
One simple fix is to add another component u to our relationally complete object so that each object is defined as (f,r,u,x), where u represents the redundant information between the relationships from (f,r,u,x) to other objects that use information from (f,r,u,x). Changing u doesn't affect how x is computed from other objects, it only affects how the information from x is used.
We can also think of this modification as a way of adding expressivity to the objects: Originally, once we define how two objects (f1,r1,x1),(f2,r2,x2) aggregate information from other objects (which are defined by f and r), that fully defines the functional relationship between (f1,r1,x1) and (f2,r2,x2) and there are no additional degrees of freedom that allow us to change the functional relationship between them (without changing how they aggregate information from other objects). Adding the u component gives us that additional degree of freedom, while also allowing us to capture the redundant relational information on the output side.
Another way of thinking about relational completeness is that we know each variable must be represented in some kind of format, and we want to associate each variable with a description of its format, so that downstream variables can take that description and figure out how to use the information from that variable. The first obvious piece of relevant description of a variable is "how that variable is computed from other variables", and that piece of information is captured by the f and r component, while u represents all the rest of the description that is relevant. Note that this "description of format" is used by all downstream variables, which reflects that fact that it is redundantly represented across the relational information on the output side.
Minimal latent across potential hypotheses: I previously mentioned that the features of ontology we're trying to identify 1. must be derivable from existing observations, since that's all we have access to and 2. must continue to work in the future as we update our hypothesis. These two assumptions together imply that we're looking for the minimal latent across a wide variety of likely potential hypotheses. Now consider the following learning procedure:
Suppose that we found a P that satisfies this property: Notice that due to the simplicity prior, each P∪O is a likely hypothesis that reproduces our existing observations, and we can pinpoint different hypotheses by varying O, while the P component mostly stays invariant. In other words, P captures exactly the type of minimal latent that is redundantly represented across a wide variety of likely hypotheses given our existing observations. While that doesn't tell us everything we want to know about ontology identification, it does allow us to pinpoint a much smaller part of the search space of what we could be looking for. One of the reasons why relational completeness is important for this setup is that when each object contains all the relevant relational information about itself, modifications and augmentations (the O component) of programs becomes much more straightforward because we don't need to specify additional relationships between the modification (O) and the original program (P): the modification already contains all of that information.
Implementing natural latents: Suppose that we're trying to find the natural latent of a collection of relationally complete objects (which we call observables), where the natural latent itself is represented by a relationally complete object. Relational completeness implies that the natural latent will have access to all of the information that defines the semantics of the observables, which makes the task of extracting the relevant latent a lot easier. In addition, we can expect natural latents to generalize to new contexts the same way that higher-order terms can generalize to objects we haven't seen before, because that new context will contain all the semantic information that defines how it should relate to the natural latent. A relationally complete natural latent can "figure out" how to aggregate information from a wide variety of contexts, and once that information is derived, a wide variety of contexts can adapt to that piece of information to make predictions.
In contrast, suppose that we're in a setting without relational completeness, such as trying to find natural latents in activations of a neural network: An immediate challenge is that most of the semantic information of the activations is just missing from the activations, which makes it difficult for us to find the minimal latent of that semantic information. To overcome this challenge, we essentially have to rederive that semantic information from somewhere else, such as by observing a wide range of samples. However, this doesn't tell us anything about how the natural latent should generalize to new activations that we've never seen before, and we have no guarantees that the natural latent will remain invariant since the relational/indexical information about those activations isn't guaranteed to remain invariant.