We have a computational graph (aka circuit aka causal model) representing an agent and its environment. We’ve chosen a cut through the graph to separate “agent” from “environment” - i.e. a Cartesian boundary. Arrows from environment to agent through the boundary are “observations”; arrows from agent to environment are “actions”.
Let’s highlight a few problems with this as a generic agent model…
My human body interfaces with the world via the entire surface area of my skin, including molecules in my hair randomly bumping into air molecules. All of those tiny interactions are arrows going through the supposed “Cartesian boundary” around my body. These don’t intuitively seem like “actions” or “observations”, at least not beyond some high-level observations of temperature and pressure.
In general, low-level boundaries will have lots of tiny interactions crossing them which don’t conceptually seem like “actions” or “observations”.
When I’m driving, I often identify with the car rather than with my body. Or if I lose a limb, I stop identifying with the lost limb. (Same goes for using the toilet - I’ve heard that it’s quite emotionally stressful for children during potty training to throw away something which came from their physical body, because they still identify with it.)
In general, it’s ambiguous what Cartesian boundary to use; our conceptual boundaries around an “agent” don’t seem to correspond perfectly to any particular physical surface.
An Agent Optimizing Its Own Actions
I could draw a supposed “Cartesian boundary” around a rock, and declare all the interactions between the rock and its environment to be “actions” and “observations”. If someone asks what the rock is optimizing, I’ll say “the actions” - i.e. the rock “wants” to do whatever it is that the rock in fact does.
In general, we intuitively conceive of “agents” as optimizers in some nontrivial sense. Optimizing actions doesn’t cut it; we generally don’t think of something as an agent unless it’s optimizing something out in the environment away from itself.
Solution: Optimization At A Distance
Let’s solve all of these problems in one fell swoop.
We’ll start with the rock problem. One natural answer is to declare that we’re only interested in agents which optimize things “far away” from themselves. What does that mean? Well, as long as we’re already representing the world as a computational DAG, we might as well say that two chunks of our computation DAG are “far apart” when there are many intermediating layers between them. Like this:
If you’ve read the Telephone Theorem post, it’s the same idea.
For instance, if I’m planning a party, then the actions I take now are far away in time (and probably also space) from the party they’re optimizing. The “intermediate layers” might be snapshots of the universe-state at each time between the actions and the party. (... or they might be something else; there are usually many different ways to draw intermediate layers between far-apart things.)
This applies surprisingly well even in situations like reinforcement learning, where we don’t typically think of the objective as “far away” from the agent. If I'm a reinforcement learner optimizing for some reward I’ll receive later, that later reward is still typically far away from my current actions. My actions impact the reward via some complicated causal path through the environment, acting through many intermediate layers.
So we’ve ruled out agents just “optimizing” their own actions. How does this solve the other two problems?
We’re using the same kind of model and the same notion of “far apart” as the Telephone Theorem, so we can carry that theorem over. The main takeaway is that far apart things interact only via a typically-relatively-small “abstract summary”. This summary consists of the information which is arbitrarily well conserved as it propagates through the intermediate layers.
Because the agent only interacts with the far away things-it’s-optimizing via a relatively-small summary, it’s natural to define the “actions” and “observations” as the contents of the summary flowing in either direction, rather than all the low-level interactions flowing through the agent’s supposed “Cartesian boundary”. That solves the microscopic interactions problem: all the random bumping between my hair/skin and air molecules mostly doesn’t impact things far away, except via a few summary variables like temperature and pressure.
This redefinition of “actions” and “observations” also makes the Cartesian boundary flexible. The Telephone Theorem says that the abstract summary consists of information which is arbitrarily well conserved as it propagates through the intermediate layers. So, the summary isn’t very sensitive to which layer we declare to be “the Cartesian boundary”; we can move the boundary around quite a bit without changing the abstract “agent” we’re talking about. (Though obviously if we move the Cartesian boundary to some totally different part of the world, that may change what “agent” we’re talking about.) If we want, we could even stop thinking of the boundary as localized to a particular cut through the graph at all.
Aside: Dynamic Programming
When Adam Shimi first suggested to me a couple years ago that “optimization far away” might be important somehow, one counterargument I raised was dynamic programming (DP): if the agent is optimizing an expected utility function over something far away, then we can use DP to propagate the expected utility function back through the intermediate layers to find an equivalent utility function over the agent’s actions:
This isn’t actually a problem, though. It says that optimization far away is equivalent to some optimization nearby. But the reverse does not necessarily hold: optimization nearby is not necessarily equivalent to some optimization far away. This makes sense: optimization nearby is a trivial condition which matches basically any system, and therefore will match the interesting cases as well as the uninteresting cases.
(Note that I haven’t actually demonstrated here that optimization at a distance is nontrivial, i.e. that some systems do optimize at a distance and others don’t; I’ve just dismissed one possible counterargument. I have several posts planned on optimization at a distance over the next few weeks, and nontriviality will be in one of them.)
I like to picture optimization at a distance like a satellite dish or phased array:
In a phased array, lots of little antennas distributed over an area are all controlled simultaneously, so that their waves add up to one big coherent wave which can propagate over a long distance. Optimization at a distance works the same way: there’s lots of little actions distributed over space/time, all controlled in such a way that their influence can add up coherently and propagate over a long distance to optimize some far-away target.