This is something which has bothered me for a while, but, I'm writing it specifically in response to the recent post on mesa-optimizers.
I feel strongly that the notion of 'optimization process' or 'optimizer' which people use -- partly derived from Eliezer's notion in the sequences -- should be split into two clusters. I call these two clusters 'selection' vs 'control'. I don't have precise formal statements of the distinction I'm pointing at; I'll give several examples.
Before going into it, several reasons why this sort of thing may be important:
- It could help refine the discussion of mesa-optimization. The article restricted its discussion to the type of optimization I'll call 'selection', explicitly ruling out 'control'. This choice isn't obviously right. (More on this later.)
- Refining 'agency-like' concepts like this seems important for embedded agency -- what we eventually want is a story about how agents can be in the world. I think almost any discussion of the relationship between agency and optimization which isn't aware of the distinction I'm drawing here (at least as a hypothesis) will be confused.
- Generally, I feel like I see people making mistakes by not distinguishing between the two (whether or not they've derived their notion of optimizer from Eliezer). I judge an algorithm differently if it is intended as one or the other.
(See also Stuart Armstrong's summary of other problems with the notion of optimization power Eliezer proposed -- those are unrelated to my discussion here, and strike me more as technical issues which call for refined formulae, rather than conceptual problems which call for revised ontology.)
The Basic Idea
Eliezer quantified optimization power by asking how small a target an optimization process hits, out of a space of possibilities. The type of 'space of possibilities' is what I want to poke at here.
First, consider a typical optimization algorithm, such as simulated annealing. The algorithm constructs an element of the search space (such as a specific combination of weights for a neural network), gets feedback on how good that element is, and then tries again. Over many iterations of this process, it finds better and better elements. Eventually, it outputs a single choice.
This is the prototypical 'selection process' -- it can directly instantiate any element of the search space (although typically we consider cases where the process doesn't have time to instantiate all of them), it gets direct feedback on the quality of each element (although evaluation may be costly, so that the selection process must economize these evaluations), the quality of an element of search space does not depend on the previous choices, and only the final output matters.
The term 'selection process' refers to the fact that this type of optimization selects between a number of explicitly given possibilities. The most basic example of this phenomenon is a 'filter' which rejects some elements and accepts others -- like selection bias in statistics. This has a limited ability to optimize, however, because it allows only one iteration. Natural selection is an example of much more powerful optimization occurring through iteration of selection effects.
Now, consider a targeting system on a rocket -- let's say, a heat-seeking missile. The missile has sensors and actuators. It gets feedback from its sensors, and must somehow use this information to decide how to use its actuators. This is my prototypical control process. (The term 'control process' is supposed to invoke control theory.) Unlike a selection process, a controller can only instantiate one element of the space of possibilities. It gets to traverse exactly one path. The 'small target' which it hits is therefore 'small' with respect to a space of counterfactual possibilities, with all the technical problems of evaluating counterfactuals. We only get full feedback on one outcome (although we usually consider cases where the partial feedback we get along the way gives us a lot of information about how to navigate toward better outcomes). Every decision we make along the way matters, both in terms of influencing total utility, and in terms of influencing what possibilities we have access to in subsequent decisions.
So: in evaluating the optimization power of a selection process, we have a fairly objective situation on our hands: the space of possibilities is explicitly given; the utility function is explicitly given; we can compare the true output of the system to a randomly chosen element. In evaluating the optimization power of a control process, we have a very subjective situation on our hands: the controller only truly takes one path, so any judgement about a space of possibilities requires us to define counterfactuals; it is less clear how to define an un-optimized baseline; utility need not be explicitly represented in the controller, so may have to be inferred (or we think of it as parameter, so, we can measure optimization power with respect to different utility functions, but there's no 'correct' one to measure).
I do think both of these concepts are meaningful. I don't want to restrict 'optimization' to refer to only one or the other, as the mesa-optimization essay does. However, I think the two concepts are of a very different type.
Bottlecaps & Thermostats
The mesa-optimizer write-up made the decision to focus on what I call selection processes, excluding control processes:
We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. [...] For example, a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.(1) Rather, bottle caps have been optimized to keep water in place.
It makes sense to say that we aren't worried about bottlecaps when we think about the inner alignment problem. However, this also excludes much more powerful 'optimizers' -- something more like a plant.
When does a powerful control process become an 'agent'?
- Bottlecaps: No meaningful actuators or sensors. Essentially inanimate. Does a particular job, possibly very well, but in a very predictable manner.
- Thermostats: Implements a negative feedback loop via a sensor, an actuator, and a policy of "correcting" things when sense-data indicates they are "off". Actual thermostats explicitly represent the target temperature, but one can imagine things in this cluster which wouldn't -- in general, the connection between what is sensed and how things are 'corrected' can be quite complex (involving many different sensors and actuators), so that no one place in the system explicitly represents the 'target'.
- Plants: Plants are like very complex thermostats. They have no apparent 'target' explicitly represented, but can clearly be thought of as relatively agentic, achieving complicated goals in complicated environments.
- Guided Missiles: These are also mostly in the 'thermostat' category, but, guided missiles can use simple world-models (to track the location of the target). However, any 'planning' is likely based on explicit formulae rather than any search. (I'm not sure about actual guided missiles.) If so, a guided missile would still not be a selection process, and therefore lack a "goal" in the mesa-optimizer sense, despite having a world-model and explicitly reasoning about how to achieve an objective represented within that world-model.
- Chess Programs: A chess-playing program has to play each game well, and every move is significant to this goal. So, it is a control process. However, AI chess algorithms are based on explicit search. Many, many moves are considered, and each move is evaluated independently. This is a common pattern. The best way we know how to implement very powerful controllers is to use search inside (implementing a control process using a selection process). At that point, a controller seems clearly 'agent-like', and falls within the definition of optimizer used in the meso-optimization post. However, it seems to me that things become 'agent-like' somewhere before this stage.
(See also: adaptation-executers, not fitness maximizers.)
I don't want to frame it as if there's "one true distinction" which we should be making, which I'm claiming the mesa-optimization write-up got wrong. Rather, we should pay attention to the different distinctions we might make, studying the phenomena separately and considering the alignment/safety implications of each.
This is closely related to the discussion of upstream daemons vs downstream daemons. A downstream-daemon seems more likely to be an optimizer in the sense of the mesa-optimization write-up; it is explicitly planning, which may involve search. These are more likely to raise concerns through explicitly reasoned out treacherous turns. An upstream-daemon could use explicit planning, but it could also be only a bottlecap/thermostat/plant. It might powerfully optimize for something in the controller sense without internally using selection. This might produce severe misalignment, but not through explicitly planned treacherous turns. (Caveat: we don't understand mesa-optimizers; an understanding sufficient to make statements such as these with confidence would be a significant step forward.)
It seems possible that one could invent a measure of "control power" which would rate highly-optimized-but-inanimate objects like bottlecaps very low, while giving a high score to thermostat-like objects which set up complicated negative feedback loops (even if they didn't use any search).
Processes Within Processes
I already mentioned the idea that the best way we know how to implement powerful control processes is through powerful selection (search) inside of the controller.
To elaborate a bit on that: a controller with a search inside would typically have some kind of model of the environment, which it uses by searching for good actions/plans/policies for achieving its goals. So, measuring the optimization power as a controller, we look at how successful it is at achieving its goals in the real environment. Measuring the optimization power as a selector, we look at how good it is at choosing high-value options within its world-model. The search can only do as well as its model can tell it; however, in some sense, the agent is ultimately judged by the true consequences of its actions.
IE, in this case, the selection vs control distinction is a map/territory distinction. I think this is part of why I get so annoyed at things which mix up selection and control: it looks like a map/territory error to me.
However, this is not the only way selection and control commonly relate to each other.
Effective controllers are very often designed through a search process. This might be search taking place within a model, again (for example, training a neural network to control a robot, but getting its gradients from a physics simulation so that you can generate a large number of training samples relatively cheaply) or the real world (evolution by natural selection, "evaluating" genetic code by seeing what survives).
Further complicating things, a powerful search algorithm generally has some "smarts" to it, ie, it is good at choosing what option to evaluate next based on the current state of things. This "smarts" is controller-style smarts: every choice matters (because every evaluation costs processing power), there's no back-tracking, and you have to hit a narrow target in one shot. (Whatever the target of the underlying search problem, the target of the search-controller is: find that target, quickly.) And, of course, it is possible that such a search-controller will even use a model of the fitness landscape, and plan its next choice via its own search!
(I'm not making this up as a weird hypothetical; actual algorithms such as estimation-of-distribution algorithms will make models of the fitness landscape. For obvious reasons, searching for good points in such models is usually avoided; however, in cases where evaluation of points is expensive enough, it may be worth it to explicitly plan out test-points which will reveal the most information about the fitness landscape, so that the best point can be selected later.)
Blurring the Lines: What's the Critical Distinction?
I mentioned earlier that this dichotomy seems more like a conceptual cluster than a fully formal distinction. I mentioned a number of big differences which stick out at me. Let's consider some of these in more detail.
The classical sort of search algorithm I described as my central example of a selection process includes the ability to get a perfect evaluation of any option. The difficulty arises only from the very large number of options available. Control processes, on the other hand, appear to have very bad feedback, since you can't know the full outcome until it is too late to do anything about it. Can we use this as our definition?
I would agree that a search process in which the cost of evaluation goes to infinity becomes purely a control process: you can't perform any filtering of possibilities based on evaluation, so, you have to output one possibility and try to make it a good one (with no guarantees). Maybe you get some information about the objective function (like its source code), and you have to try to use that to choose an option. That's your sensors and actuators. They have to be very clever to achieve very good outcomes. The cheaper it is to evaluate the objective function on examples, the less "control" you need (the more you can just do brute-force search). In the opposite extreme, evaluating options is so cheap that you can check all of them, and output the maximum directly.
While this is somewhat appealing, it doesn't capture every case. Search algorithms today (such as stochastic gradient descent) often have imperfect feedback. Game-tree search deals with an objective function which is much too costly to evaluate directly (the quality of a move), but can be optimized for nonetheless by recursively searching for good moves in subgames down the game tree (mixed with approximate evaluations such as rollouts or heuristic board evaluations). I still think of both of these as solidly on the "selection process" side of things.
On the control process side, it is possible to have perfect feedback without doing any search. Thermostats realistically have noisy information about the temperature of a room, but, you can imagine a case where they get perfect information. It isn't any less a controller, or more a selection process, for that fact.
Choices Don't Change Later Choices
Another feature I mentioned was that in selection processes, all options are available to try at any time, and what you look at now does not change how good any option will be later. On the other hand, in a control process, previous choices can totally change how good particular later choices would be (as in reinforcement learning), or change what options are even available (as in game playing).
First, let me set two complications aside.
- Weird decision theory cases: it is theoretically possible to screw with a search by giving it an objective function which depends on its choices during search. This doesn't seem that interesting for our purposes here. (And that's coming from me...)
- Local search limits the "options" to small modifications of the option just considered. I don't think this is blurring the lines between search and control; rather, it is more like using a controller within a smart search to try to increase efficiency, as I discussed at the end of the processes-within-processes section. All the options are still "available" at all times; the search algorithm just happens to be one which limits itself to considering a smaller list.
I do think some cases blur the lines here, though. My primary example is the multi-armed bandit problem. This is a special case of the RL problem in which the history doesn't matter; every option is equally good every time, except for some random noise. Yet, to me, it is still a control problem. Why? Because every decision matters. The feedback you get about how good a particular choice was isn't just thought of as information; you "actually get" the good/bad outcome each time. That's the essential character of the multi-armed bandit problem: you have to trade off between experimentally trying options you're uncertain about vs sticking with the options which seem best so far, because every selection carries weight.
This leads me to the next proposed definition.
Offline vs Online
Selection processes are like offline algorithms, whereas control processes are like online algorithms.
With offline algorithms, you only really care about the end results. You are OK running gradient descent for millions of iterations before it starts doing anything cool, so long as it eventually does something cool.
With online algorithms, you care about each outcome individually. You would probably not want to be gradient-descent-training a neural network in live user-servicing code on a website, because live code has to be acceptably good from the start. Even if you can initialize the neural network to something acceptably good, you'd hesitate to run stochastic gradient descent on it live, because stochastic gradient descent can sometimes dramatically decrease performance for a while before improving performance again.
Furthermore, online algorithms have to deal with non-stationarity. This seems suitably like a control issue.
So, selection processes are "offline optimization", whereas control processes are "online optimization": optimizing things "as they progress" rather than statically. (Note that the notion of "online optimization" implied by this line of thinking is slightly different from the common definition of online optimization, though related.)
The offline vs online distinction also has a lot to do with the sorts of mistakes I think people are making when they confuse selection processes and control processes. Reinforcement learning, as a subfield of AI, was obviously motivated from a highly online perspective. However, it is very often used as an offline algorithm today, to produce effective agents, rather than as an effective agent. So, that there's been some mismatch between the motivations which shaped the paradigm and actual use. This perspective made it less surprising when black-box optimization beat reinforcement learning on some problems (see also).
This seems like the best definition so far. However, I personally still feel like it is still missing something important. Selection vs control feels to me like a type distinction, closer to map-vs-territory.
To give an explicit counterexample: evolution by natural selection is obviously a selection process according to the distinction as I make it, but it seems much more like an online algorithm than on offline one, if we try to judge it as such.
Internal Features vs Context
Returning to the definition in mesa-optimizers (emphasis mine):
Whether a system is an optimizer is a property of its internal structure—what algorithm it is physically implementing—and not a property of its input-output behavior. Importantly, the fact that a system’s behavior results in some objective being maximized does not make the system an optimizer.
The notion of a selection process says a lot about what is actually happening inside a selection process: there is a space of options, which can be enumerated; it is trying them; there is some kind of evaluation; etc.
The notion of control process, on the other hand, is more externally defined. It doesn't matter what's going on inside of the controller. All that matters is how effective it is at what it does.
A selection process -- such as a neural network learning algorithm -- can be regarded "from outside", asking questions about how the one output of the algorithm does in the true environment. In fact, this kind of thinking is what we do when we think about generalization error.
Similarly, we can analyze a control process "from inside", trying to find the pieces which correspond to beliefs, goals, plans, and so on (or postulate what they would look like if they existed -- as must be done in the case of controllers which truly lack such moving parts). This is the decision-theoretic view.
However, one might argue that viewing selection processes from the outside is viewing them as control -- viewing them as essentially having one shot at overall decision quality. Similarly, viewing control process from inside is essentially viewing it as selection -- the decision-theoretic view gives us a version of a control problem which we can solve by mathematical optimization.
In this view, selection vs control doesn't really cluster different types of object, but rather, different types of analysis. To a large extent, we can cluster objects by what kind of analysis we would more often want to do. However, certain cases (such as a game-playing AI) are best viewed through both lenses (as a controller, in the context of doing well in a real game against a human, and as a selection process, when thinking about the game-tree search).
Overall, I think I'm probably still somewhat confused about the whole selection vs control issue, particularly as it pertains to the question of how decision theory can apply to things in the world.