These all seem to be pointing to different aspects of the same problem.
- Cross-ontology goal translation: given a utility function over a latent variable in one model, find an equivalent utility function over latent variables in another model with a different ontology. One subquestion here is how the first model’s input data channels and action variables correspond to the other model’s input data channels and action variables - after all, the two may not be “in” the same universe at all, or they may represent entirely separate agents in the same universe who may or may not know of each other's existence.
- Correspondence theorems: quantum mechanics should reduce to classical mechanics in places where classical worked well, special relativity should reduce to Galilean relativity in places where Galilean worked well, etc. As we move to new models with new ontologies, when and how should the structure of the old models be reproduced?
- The indexing problem: I have some system containing three key variables A, B, and C. I hire someone to study these variables, and after considerable effort they report that X is 2.438. Apparently they are using different naming conventions! What is this variable X? Is it A? B? C? Something else entirely? Where does their X fit in my model?
- How do different people ever manage to point to the same thing with the same word in the first place? Clearly the word “tree” is not a data structure representing the concept of a tree; it’s just a pointer. What’s the data structure? What’s its type signature? Similarly, when I point to a particular tree, what’s the data structure for the concept of that particular tree? How does the “pointer” aspect of these data structures work?
- When two people are using different words for the same thing, how do they figure that out? What about the same word for different things?
- I see a photograph of a distinctive building, and wonder “Where is this?”. I have some data - i.e. I see the distinctive building - but I don’t know where in the world the data came from, so I don’t know where in my world-model to perform an update. Presumably I need to start building a little side-model of “wherever this picture was taken”, and then patch that side-model into my main world model once I figure out “where it goes”.
- Distributed models and learning: a bunch of different agents study different (but partially overlapping) subsystems of a system - e.g. biologists study different subsystems of a bacteria. Sometimes the agents end up using different names or even entirely different ontologies - e.g. some parts of a biological cell require thinking about spatial diffusion, while some just require overall chemical concentrations. How do we combine submodels from different agents, different ontologies and different data? How can we write algorithms which learn large model structures via stitching together small structures each learned independently from different subsystems/data?
Abstraction plays a role in these, but it’s not the whole story. It tells us how high-level concepts relate to low-level, and why very different cognitive architectures would lead to surprisingly similar abstractions (e.g. neural nets learning similar concepts to humans). If we can ground two sets of high-level abstractions in the same low level world, then abstraction can help us map from one high-level to the low-level to the other high-level. But if two neural networks are trained on different data, and possibly even different kinds of data (like infrared vs visual spectrum photos), then we need a pretty detailed outside model of the shared low-level world in order to map between them.
Humans do not seem to need a shared low-level world model in order to pass concepts around from human to human. Things should ultimately be groundable in abstraction from the low level, but it seems like we shouldn’t need a detailed low-level model in order to translate between ontologies.
In some sense, this looks like Ye Olde Symbol Grounding Problem. I do not know of any existing work on that subject which would be useful for something like “given a utility function over a latent variable in one model, find an equivalent utility function over latent variables in another model”, but if anybody knows of anything promising then let me know.
Not Just Easy Mode
After poking at these problems a bit, they usually seem to have an “easy version” in which we fix a particular Cartesian boundary.
In the utility function translation problem, it’s much easier if we declare that both models use the same Cartesian boundary - i.e. same input/output channels. Then it’s just a matter of looking for functional isomorphism between latent variable distributions.
For correspondence theorems, it’s much easier if we declare that all models are predicting exactly the same data, or predict the same observable distribution. Again, the problem roughly reduces to functional isomorphism.
Similarly with distributed models/learning: if a bunch of agents build their own models of the same data, then there are obvious (if sometimes hacky) ways to stitch them together. But what happens when they’re looking at different data on different variables, and one agent’s inferred latent variable may be another agent’s observable?
The point here is that I don’t just want to solve these on easy mode, although I do think some insights into the Cartesian version of the problem might help in the more general version.
Once we open the door to models with different Cartesian boundaries in the same underlying world, things get a lot messier. To translate a variable from model A into the space of model B, we need to “locate” model B’s boundary in model A, or locate model A’s boundary in model B, or locate both in some outside model. That’s the really interesting part of the problem: how do we tell when two separate agents are pointing to the same thing? And how does this whole "pointing" thing work to begin with?
I’ve been poking around the edges of this problem for about a month, with things like correspondence theorems and seeing how some simple approaches to cross-ontology translation break. Something in this cluster is likely to be my next large project.
Why this problem?
From an Alignment as Translation viewpoint, this seems like exactly the right problem to make progress on alignment specifically (as opposed to embedded agency in general, or AI in general). To the extent that the “hard part” of alignment is translating from human concept-space to some AI’s concept-space, this problem directly tackles the bottleneck. Also closely related is the problem of an AI building a goal into a successor AI - though that’s probably somewhat easier, since the internal structure of an AI is easier to directly probe than a human brain.
Work on cross-ontology transport is also likely to yield key tools for agency theory more generally. I can already do some neat things with embedded world models using the tools of abstraction, but it feels like I’m missing data structures to properly represent certain pieces - in particular, data structures for the “interface” where a model touches the world (or where a self-embedded model touches itself). The indexing problem is one example of this. I think those interface-data-structures are the main key to solving this whole cluster of problems.
Finally, this problem has a lot of potential for relatively-short-term applications, which makes it easier to build a feedback cycle. I could imagine identifying concept-embeddings by hand or by ad-hoc tricks in one neural network or probabilistic model, then using ontology translation tools to transport those concept-embeddings into new networks or models. I could even imagine whole “concept libraries”, able to import pre-identified concepts into newly trained models. This would give us a lot of data on how robust identified abstract concepts are in practice. We could even run stress tests, transporting concepts from model to model to model in a game of telephone, to see how well they hold up.
Anyway, that’s one potential vision. For now, I’m still figuring out the problem framing. Really, the reason I’m looking at this problem is that I keep running into it as a bottleneck to other, not-obviously-similar problems, which makes me think that this is the limiting constraint on a broad class of problems I want to solve. So, over time I expect to notice additional possibilities which a solution would unblock.