Yes, I plan to write a sequence about it some time in the future, but here are some rough high-level sketches:
Note: I haven't thought of the best framing of these ideas but hopefully I'll come back with a better presentation some point in the future
previously: My decomposition of the alignment problem
A simple model of meta/continual learning
In the framework of solomonoff induction, we observe an infinite stream of bitstring and we try to predict the next bit by finding the shortest hypothesis which reproduces our observations (some caveats here). When we receive an additional bit of observation, in principle, we can rule out an infinite number of hypotheses (namely all programs which didn't predict our observation) which creates an opportunity to speedup our induction process for future observations. Specifically, as we try to find the next shortest program which predicts our next bit of observation, we can learn to skip over the programs that have already been falsified by our past observations. The process of "learning how to skip over falsified programs" takes time and computational costs upfront, but it can yield dividends of computational efficiency for future induction.
This is my mental model for how agents can "learn how to learn efficiently": An agent who has received more observations can usually adapt to new situations quicker because more incorrect hypotheses can be ruled out already, which means there's a narrower set of remaining hypotheses to choose from.
More generally, an important question to ask is given that the underlying space of remaining hypotheses is constantly shrinking as we receive new observations, what sorts of data structures for representing hypothesis should we use to exploit that? How should we represent programs if we don't just want to execute them, but also potentially modify them into other plausible hypothesis? If a world model is selected based on its ability to quickly adapt to new environments, what is the type signature of that world model?
Quick thoughts
Why this might be relevant for alignment
Transformative AI will often need to modify their ontologies in order to accomodate new observations, which means that if we want to translate our preferences over real world objects to the AI's world model, we need to be able to stably "point" to real world objects despite ontology shifts. If efficient learning relies on specific data structures for representing hypotheses, these structures may reveal properties that remain invariant under ontology shifts. By identifying these invariant properties, we can potentially create robust ways to maintain our preferences within the AI's evolving world model.
Furthermore, insofar as humans utilize a similar data structure to represent their world models, this could provide insights into how our actual preferences remain consistent despite ontology shifts, offering a potential blueprint for replicating this process in AI.