MIRI has said a lot about the issue of embedded agency over the last year. However, I am yet to see them trying to make progress in what I see as the most promising areas.
How does one attack a problem that is new, complicated and non-obvious? By constructing toy models and inverting hard questions to make them more tractable.
In general an inverse problem is harder than the "direct" one, because we are trying to infer unobservables from observables. Wikipedia gives an example of figuring out the position of Neptune from the perturbations in the orbit of Uranus. Another popular example is NP-complete problems: they are famously hard to solve but it is easy to verify a solution. Another example: you take a multiple-choice math quiz, it is often faster and easier to get the right answer by plugging the 4 or 5 potential solutions into the stated problem than to solve the problem directly.
I'll give an example from my own area. The equations of general relativity are hard to solve except in a few highly symmetric cases. It is a classic inverse problem. But! Any spacetime metric is actually a solution of the Einstein equations, so all one needs to do is to write down a metric and calculate its Einstein tensor to see what kind of a matter distribution (and boundary conditions) it is a solution of. Inverting the inverse problem! Of course, life is not that easy. Most solutions correspond to "unphysical" matter, usually with negative energy density, superluminal flows, singularities, infinities, weird topologies etc. However, it is a useful approach if one wants to study some general properties of the equations, and get a feel for (or sometimes a theorem about) what goes wrong, why and how. After a few iterations one can get better at guessing what form a "good" solution might take, and write up an ansatz that can help solve the original, not the inverse problem in some cases.
Another, more familiar example: arithmetic division. Until you learn or figure out the rules, it's hard. But its inverse problem, multiplication, is actually much easier! So to learn more about division, it pays to try to start with potential solutions and see what kind of multiplication actually solve the division problem. Eventually one can come up with the long division algorithm, that uses nothing but multiplication and subtraction. And voila, inverting an inverse problem helps us solve the original one.
This approach is common in computer science, as well. Plenty of algorithms, like search, actually rely on solving smaller and simpler inverse problems.
I contend that a similar approach could be useful for making progress in understanding embedded agency. To that end, let's first restate the original problem of embedded agency (copied from the alignment forum page):
How can one make good models of the world that are able to fit within an agent that is much smaller than the world?
This is a hard inverse problem! There are many faucets of it, such as the oft-mentioned problem of logical counterfactuals, that do not seem to yield to direct attacks. So, it seem natural to learn to "seek under the light" before stepping into the darkness, and that includes, you guessed it, constructing toy models and inverting the inverse problems.
What would inverting this problem look like? There are multiple possible formulations, just like an inverse of the operation of power a^b is both n-th root and logarithm. Here is a couple of ideas:
- Create a toy universe and look for its representations inside.
- Create a toy model and construct a world around it such that the model represents the world in some way.
Here is an example: a fractal is self-similar, so any subset of it can be thought of as a near-perfect model of the whole. Of course, a model is not enough, one has to figure out what would constitute an agent using this model in this fractal world. But at least it can be a promising and potentially illuminating direction to explore. There are plenty more ideas one can come up after thinking about it for 5 minutes.
I hope someone at MIRI is either thinking along these directions, or is ready to try to, instead of being stuck analyzing the messy and complicated inverse problem that is the "real world".