Selection vs Control

abramdemski

This is something which has bothered me for a while, but, I'm writing it specifically in response to the recent post on mesa-optimizers.

I feel strongly that the notion of 'optimization process' or 'optimizer' which people use -- partly derived from Eliezer's notion in the sequences -- should be split into two clusters. I call these two clusters 'selection' vs 'control'. I don't have precise formal statements of the distinction I'm pointing at; I'll give several examples.

Before going into it, several reasons why this sort of thing may be important:

It could help refine the discussion of mesa-optimization. The article restricted its discussion to the type of optimization I'll call 'selection', explicitly ruling out 'control'. This choice isn't obviously right. (More on this later.)
Refining 'agency-like' concepts like this seems important for embedded agency -- what we eventually want is a story about how agents can be in the world. I think almost any discussion of the relationship between agency and optimization which isn't aware of the distinction I'm drawing here (at least as a hypothesis) will be confused.
Generally, I feel like I see people making mistakes by not distinguishing between the two (whether or not they've derived their notion of optimizer from Eliezer). I judge an algorithm differently if it is intended as one or the other.

(See also Stuart Armstrong's summary of other problems with the notion of optimization power Eliezer proposed -- those are unrelated to my discussion here, and strike me more as technical issues which call for refined formulae, rather than conceptual problems which call for revised ontology.)

The Basic Idea

Eliezer quantified optimization power by asking how small a target an optimization process hits, out of a space of possibilities. The type of 'space of possibilities' is what I want to poke at here.

Selection

First, consider a typical optimization algorithm, such as simulated annealing. The algorithm constructs an element of the search space (such as a specific combination of weights for a neural network), gets feedback on how good that element is, and then tries again. Over many iterations of this process, it finds better and better elements. Eventually, it outputs a single choice.

This is the prototypical 'selection process' -- it can directly instantiate any element of the search space (although typically we consider cases where the process doesn't have time to instantiate all of them), it gets direct feedback on the quality of each element (although evaluation may be costly, so that the selection process must economize these evaluations), the quality of an element of search space does not depend on the previous choices, and only the final output matters.

The term 'selection process' refers to the fact that this type of optimization selects between a number of explicitly given possibilities. The most basic example of this phenomenon is a 'filter' which rejects some elements and accepts others -- like selection bias in statistics. This has a limited ability to optimize, however, because it allows only one iteration. Natural selection is an example of much more powerful optimization occurring through iteration of selection effects.

Control

Now, consider a targeting system on a rocket -- let's say, a heat-seeking missile. The missile has sensors and actuators. It gets feedback from its sensors, and must somehow use this information to decide how to use its actuators. This is my prototypical control process. (The term 'control process' is supposed to invoke control theory.) Unlike a selection process, a controller can only instantiate one element of the space of possibilities. It gets to traverse exactly one path. The 'small target' which it hits is therefore 'small' with respect to a space of counterfactual possibilities, with all the technical problems of evaluating counterfactuals. We only get full feedback on one outcome (although we usually consider cases where the partial feedback we get along the way gives us a lot of information about how to navigate toward better outcomes). Every decision we make along the way matters, both in terms of influencing total utility, and in terms of influencing what possibilities we have access to in subsequent decisions.

So: in evaluating the optimization power of a selection process, we have a fairly objective situation on our hands: the space of possibilities is explicitly given; the utility function is explicitly given; we can compare the true output of the system to a randomly chosen element. In evaluating the optimization power of a control process, we have a very subjective situation on our hands: the controller only truly takes one path, so any judgement about a space of possibilities requires us to define counterfactuals; it is less clear how to define an un-optimized baseline; utility need not be explicitly represented in the controller, so may have to be inferred (or we think of it as parameter, so, we can measure optimization power with respect to different utility functions, but there's no 'correct' one to measure).

I do think both of these concepts are meaningful. I don't want to restrict 'optimization' to refer to only one or the other, as the mesa-optimization essay does. However, I think the two concepts are of a very different type.

Bottlecaps & Thermostats

The mesa-optimizer write-up made the decision to focus on what I call selection processes, excluding control processes:

We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. [...] For example, a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.(1) Rather, bottle caps have been optimized to keep water in place.

It makes sense to say that we aren't worried about bottlecaps when we think about the inner alignment problem. However, this also excludes much more powerful 'optimizers' -- something more like a plant.

When does a powerful control process become an 'agent'?

Bottlecaps: No meaningful actuators or sensors. Essentially inanimate. Does a particular job, possibly very well, but in a very predictable manner.
Thermostats: Implements a negative feedback loop via a sensor, an actuator, and a policy of "correcting" things when sense-data indicates they are "off". Actual thermostats explicitly represent the target temperature, but one can imagine things in this cluster which wouldn't -- in general, the connection between what is sensed and how things are 'corrected' can be quite complex (involving many different sensors and actuators), so that no one place in the system explicitly represents the 'target'.
Plants: Plants are like very complex thermostats. They have no apparent 'target' explicitly represented, but can clearly be thought of as relatively agentic, achieving complicated goals in complicated environments.
Guided Missiles: These are also mostly in the 'thermostat' category, but, guided missiles can use simple world-models (to track the location of the target). However, any 'planning' is likely based on explicit formulae rather than any search. (I'm not sure about actual guided missiles.) If so, a guided missile would still not be a selection process, and therefore lack a "goal" in the mesa-optimizer sense, despite having a world-model and explicitly reasoning about how to achieve an objective represented within that world-model.
Chess Programs: A chess-playing program has to play each game well, and every move is significant to this goal. So, it is a control process. However, AI chess algorithms are based on explicit search. Many, many moves are considered, and each move is evaluated independently. This is a common pattern. The best way we know how to implement very powerful controllers is to use search inside (implementing a control process using a selection process). At that point, a controller seems clearly 'agent-like', and falls within the definition of optimizer used in the meso-optimization post. However, it seems to me that things become 'agent-like' somewhere before this stage.

I don't want to frame it as if there's "one true distinction" which we should be making, which I'm claiming the mesa-optimization write-up got wrong. Rather, we should pay attention to the different distinctions we might make, studying the phenomena separately and considering the alignment/safety implications of each.

This is closely related to the discussion of upstream daemons vs downstream daemons. A downstream-daemon seems more likely to be an optimizer in the sense of the mesa-optimization write-up; it is explicitly planning, which may involve search. These are more likely to raise concerns through explicitly reasoned out treacherous turns. An upstream-daemon could use explicit planning, but it could also be only a bottlecap/thermostat/plant. It might powerfully optimize for something in the controller sense without internally using selection. This might produce severe misalignment, but not through explicitly planned treacherous turns. (Caveat: we don't understand mesa-optimizers; an understanding sufficient to make statements such as these with confidence would be a significant step forward.)

It seems possible that one could invent a measure of "control power" which would rate highly-optimized-but-inanimate objects like bottlecaps very low, while giving a high score to thermostat-like objects which set up complicated negative feedback loops (even if they didn't use any search).

Processes Within Processes

I already mentioned the idea that the best way we know how to implement powerful control processes is through powerful selection (search) inside of the controller.

To elaborate a bit on that: a controller with a search inside would typically have some kind of model of the environment, which it uses by searching for good actions/plans/policies for achieving its goals. So, measuring the optimization power as a controller, we look at how successful it is at achieving its goals in the real environment. Measuring the optimization power as a selector, we look at how good it is at choosing high-value options within its world-model. The search can only do as well as its model can tell it; however, in some sense, the agent is ultimately judged by the true consequences of its actions.

IE, in this case, the selection vs control distinction is a map/territory distinction. I think this is part of why I get so annoyed at things which mix up selection and control: it looks like a map/territory error to me.

However, this is not the only way selection and control commonly relate to each other.

Effective controllers are very often designed through a search process. This might be search taking place within a model, again (for example, training a neural network to control a robot, but getting its gradients from a physics simulation so that you can generate a large number of training samples relatively cheaply) or the real world (evolution by natural selection, "evaluating" genetic code by seeing what survives).

Further complicating things, a powerful search algorithm generally has some "smarts" to it, ie, it is good at choosing what option to evaluate next based on the current state of things. This "smarts" is controller-style smarts: every choice matters (because every evaluation costs processing power), there's no back-tracking, and you have to hit a narrow target in one shot. (Whatever the target of the underlying search problem, the target of the search-controller is: find that target, quickly.) And, of course, it is possible that such a search-controller will even use a model of the fitness landscape, and plan its next choice via its own search!

(I'm not making this up as a weird hypothetical; actual algorithms such as estimation-of-distribution algorithms will make models of the fitness landscape. For obvious reasons, searching for good points in such models is usually avoided; however, in cases where evaluation of points is expensive enough, it may be worth it to explicitly plan out test-points which will reveal the most information about the fitness landscape, so that the best point can be selected later.)

Blurring the Lines: What's the Critical Distinction?

I mentioned earlier that this dichotomy seems more like a conceptual cluster than a fully formal distinction. I mentioned a number of big differences which stick out at me. Let's consider some of these in more detail.

Perfect Feedback

The classical sort of search algorithm I described as my central example of a selection process includes the ability to get a perfect evaluation of any option. The difficulty arises only from the very large number of options available. Control processes, on the other hand, appear to have very bad feedback, since you can't know the full outcome until it is too late to do anything about it. Can we use this as our definition?

I would agree that a search process in which the cost of evaluation goes to infinity becomes purely a control process: you can't perform any filtering of possibilities based on evaluation, so, you have to output one possibility and try to make it a good one (with no guarantees). Maybe you get some information about the objective function (like its source code), and you have to try to use that to choose an option. That's your sensors and actuators. They have to be very clever to achieve very good outcomes. The cheaper it is to evaluate the objective function on examples, the less "control" you need (the more you can just do brute-force search). In the opposite extreme, evaluating options is so cheap that you can check all of them, and output the maximum directly.

While this is somewhat appealing, it doesn't capture every case. Search algorithms today (such as stochastic gradient descent) often have imperfect feedback. Game-tree search deals with an objective function which is much too costly to evaluate directly (the quality of a move), but can be optimized for nonetheless by recursively searching for good moves in subgames down the game tree (mixed with approximate evaluations such as rollouts or heuristic board evaluations). I still think of both of these as solidly on the "selection process" side of things.

On the control process side, it is possible to have perfect feedback without doing any search. Thermostats realistically have noisy information about the temperature of a room, but, you can imagine a case where they get perfect information. It isn't any less a controller, or more a selection process, for that fact.

Choices Don't Change Later Choices

Another feature I mentioned was that in selection processes, all options are available to try at any time, and what you look at now does not change how good any option will be later. On the other hand, in a control process, previous choices can totally change how good particular later choices would be (as in reinforcement learning), or change what options are even available (as in game playing).

First, let me set two complications aside.

Weird decision theory cases: it is theoretically possible to screw with a search by giving it an objective function which depends on its choices during search. This doesn't seem that interesting for our purposes here. (And that's coming from me...)
Local search limits the "options" to small modifications of the option just considered. I don't think this is blurring the lines between search and control; rather, it is more like using a controller within a smart search to try to increase efficiency, as I discussed at the end of the processes-within-processes section. All the options are still "available" at all times; the search algorithm just happens to be one which limits itself to considering a smaller list.

I do think some cases blur the lines here, though. My primary example is the multi-armed bandit problem. This is a special case of the RL problem in which the history doesn't matter; every option is equally good every time, except for some random noise. Yet, to me, it is still a control problem. Why? Because every decision matters. The feedback you get about how good a particular choice was isn't just thought of as information; you "actually get" the good/bad outcome each time. That's the essential character of the multi-armed bandit problem: you have to trade off between experimentally trying options you're uncertain about vs sticking with the options which seem best so far, because every selection carries weight.

This leads me to the next proposed definition.

Offline vs Online

Selection processes are like offline algorithms, whereas control processes are like online algorithms.

With offline algorithms, you only really care about the end results. You are OK running gradient descent for millions of iterations before it starts doing anything cool, so long as it eventually does something cool.

With online algorithms, you care about each outcome individually. You would probably not want to be gradient-descent-training a neural network in live user-servicing code on a website, because live code has to be acceptably good from the start. Even if you can initialize the neural network to something acceptably good, you'd hesitate to run stochastic gradient descent on it live, because stochastic gradient descent can sometimes dramatically decrease performance for a while before improving performance again.

Furthermore, online algorithms have to deal with non-stationarity. This seems suitably like a control issue.

So, selection processes are "offline optimization", whereas control processes are "online optimization": optimizing things "as they progress" rather than statically. (Note that the notion of "online optimization" implied by this line of thinking is slightly different from the common definition of online optimization, though related.)

The offline vs online distinction also has a lot to do with the sorts of mistakes I think people are making when they confuse selection processes and control processes. Reinforcement learning, as a subfield of AI, was obviously motivated from a highly online perspective. However, it is very often used as an offline algorithm today, to produce effective agents, rather than as an effective agent. So, that there's been some mismatch between the motivations which shaped the paradigm and actual use. This perspective made it less surprising when black-box optimization beat reinforcement learning on some problems (see also).

This seems like the best definition so far. However, I personally still feel like it is still missing something important. Selection vs control feels to me like a type distinction, closer to map-vs-territory.

To give an explicit counterexample: evolution by natural selection is obviously a selection process according to the distinction as I make it, but it seems much more like an online algorithm than on offline one, if we try to judge it as such.

Internal Features vs Context

Returning to the definition in mesa-optimizers (emphasis mine):

Whether a system is an optimizer is a property of its internal structure—what algorithm it is physically implementing—and not a property of its input-output behavior. Importantly, the fact that a system’s behavior results in some objective being maximized does not make the system an optimizer.

The notion of a selection process says a lot about what is actually happening inside a selection process: there is a space of options, which can be enumerated; it is trying them; there is some kind of evaluation; etc.

The notion of control process, on the other hand, is more externally defined. It doesn't matter what's going on inside of the controller. All that matters is how effective it is at what it does.

A selection process -- such as a neural network learning algorithm -- can be regarded "from outside", asking questions about how the one output of the algorithm does in the true environment. In fact, this kind of thinking is what we do when we think about generalization error.

Similarly, we can analyze a control process "from inside", trying to find the pieces which correspond to beliefs, goals, plans, and so on (or postulate what they would look like if they existed -- as must be done in the case of controllers which truly lack such moving parts). This is the decision-theoretic view.

However, one might argue that viewing selection processes from the outside is viewing them as control -- viewing them as essentially having one shot at overall decision quality. Similarly, viewing control process from inside is essentially viewing it as selection -- the decision-theoretic view gives us a version of a control problem which we can solve by mathematical optimization.

In this view, selection vs control doesn't really cluster different types of object, but rather, different types of analysis. To a large extent, we can cluster objects by what kind of analysis we would more often want to do. However, certain cases (such as a game-playing AI) are best viewed through both lenses (as a controller, in the context of doing well in a real game against a human, and as a selection process, when thinking about the game-tree search).

Overall, I think I'm probably still somewhat confused about the whole selection vs control issue, particularly as it pertains to the question of how decision theory can apply to things in the world.

Selection vs Control is a distinction I always point to when discussing optimization. Yet this is not the two takes on optimization I generally use. My favored ones are internal optimization (which is basically search/selection), and external optimization (optimizing systems from Alex Flint’s The ground of optimization). So I do without control, or at least without Abram’s exact definition of control.

Why? Simply because the internal structure vs behavior distinction mentioned in this post seems more important than the actual definitions (which seem constrained by going back to Yudkowski’s optimization power). The big distinction is between doing internal search (like in optimization algorithms or mesa-optimizers) and acting as optimizing something. It is intuitive that you can do the second without the first, but before Alex Flint’s definition, I couldn’t put words on my intuition than the first implies the second.

So my current picture of optimization is Internal Optimization (Internal Search/Selection) \subset External Optimization (Optimizing systems). This means that I think of this post as one of the first instances of grappling at this distinction, without agreeing completely with the way it ends up making that distinction.

I like this division a lot. One nitpick: I don't think internal optimization is a subset of external optimization, unless we're redrawing the system boundary at some point. A search always takes place within the context of a system's (possibly implicit) world-model; that's the main thing which distinguishes it from control/external optimization. If that world-model does not match the territory, then the system may not successfully optimize anything in its environment, even though it's searching for optimizing plans internally.

Thanks!

My take on internal optimization as a subset of external optimization probably works assuming convergence, because the configuration space capturing the internal state of the program (and its variables) is pushed reliably towards the configurations with a local minimum in the corresponding variable. See here.

Whether that's actually what we want is another question, but I think the point you're mentioning can be captured by whether the target subspace of the configuration space puts constraints on things outside the system (for good cartesian boundaries and all the corresponding subtleties).

Got it, that's the case I was thinking of as "redrawing the system boundary". Makes sense.

That still leaves the problem that we can write an (internal) optimizer which isn't iterative. For instance, a convex function optimizer which differentiates its input function and then algebraically solves for zero gradient. (In the real world, this is similar to what markets do.) This was also my main complaint on Flint's notion of "optimization": not all optimizers are iterative, and sometimes they don't even have an "initial" point against which we could compare.

I'm a bit confused: why can't I just take the initial state of the program (or of the physical system representing the computer) as the initial point in configuration space for your example? The execution of your program is still a trajectory through the configuration space of your computer.

Personally, my biggest issue with optimizing systems is that I don't know what the "smaller" concerning the target space really means. If the target space has only one state less than the total configuration space, is this still an optimizing system? Should we compute a ratio of measure between target and total configuration space to have some sort of optimizing power?

The initial state of the program/physical computer may not overlap with the target space at all. The target space wouldn't be larger or smaller (in the sense of subsets); it would just be an entirely different set of states.

Flint's notion of optimization, as I understand it, requires that we can view the target space as a subset of the initial space.

I would agree that a search process in which the cost of evaluation goes to infinity becomes purely a control process: you can't perform any filtering of possibilities based on evaluation, so, you have to output one possibility and try to make it a good one (with no guarantees).

This is backwards, actually. “Control” isn’t the crummy option you have to resort to when you can’t afford to search. Searching is what you have to resort to when you can’t do control theory.

When your Jacuzzi is at 60f and you want it at 102f, there are a lot of possible heating profiles you could try out. However, you know that no combination of “on off off on on off off on” is going to surprise you by giving a better result than simply leaving the heater on when it’s too cold and off when it’s too hot. Control theory actually can guarantee the optimal results, and with some simple assumptions it’s exactly what it seems like it’d be. Guided missiles do get more complicated than this with all the inertias and significant measurement noise and moving target and all that, but the principle remains the same: compute the best estimate of where you stand relative to the trajectory you want to be on (where “trajectory” includes things like the angular rates of your control surfaces), and then steer your trajectory towards that. There’s just nothing left to search for when you already know the best thing to do.

The reason we ever need to search is because it’s not always obvious when our actions are bringing us towards or away from our desired trajectory. “Searching” is performing trial and error by simulating forward in time until you realize “nope, this leads to a bad outcome” and backing up to before you “made” the mistake and trying something else. For example if you’re trying to cook a meal you might have to get all the way to the finished product before you realized that you started out with too much of one of your ingredients. However, this is a result of not knowing the composition you’re looking for and how your inputs affect it. Once you understand the objective, the process and actuators, and how things project into the future, you know your best guess of where to go at each step. If the water is too cold, you simply turn the heater on.

Searching, then, isn’t just something we do when projecting forward and evaluating outcomes is cheap. It’s what we do when analyzing the problem and building an understanding of how our inputs affect our trajectories (i.e. control theory) is expensive. Or difficult, or impossible.

Or perhaps better put, searching is for when we haven’t yet found what we want and how to get there. Control systems are what we implement once we know.

I agree with most of what you say here, but I think you're over-emphasizing the idea that search deals with unknowns whereas control deals with knows. Optimization via search works best when you have a good model of the situation. The extreme case for usefulness of search is a game like Chess, where the rules are perfectly known, there's no randomness, and no hidden information. If you don't know a lot about a situation, you can't build an optimal controller, but you also can't set up a very good representation of the problem to solve via search.

This is backwards, actually. “Control” isn’t the crummy option you have to resort to when you can’t afford to search. Searching is what you have to resort to when you can’t do control theory.

Why not both? Most of your post is describing situations where you can't easily solve a control problem with a direct rule, so you spin up a search based on a model of the situation. My paragraph which you quoted was describing a situation where dumb search becomes harder and harder, so you spin up a controller (inside the search process) to help out. Both of these things happen.

I agree with most of what you say here, but I think you're over-emphasizing the idea that search deals with unknowns whereas control deals with knows.

There’s uncertainty in both approaches, but it is dealt with differently. In controls, you’ll often use kalman filters to estimate the relevant states. You might not know your exact state because there is noise on all your sensors, and you may have uncertainty in your estimate of the amount of noise, but given your best estimates of the variance, you can calculate the one best estimate of your actual state.

There’s still nothing to search for in the sense of “using our model of our system, try different kalman filter gains and see what works best”, because the math already answered that for you definitively. If you’re searching in the real world (i.e. actually trying different gains and seeing what works best), that can help, but only because you’re getting more information about what your noise distributions are actually like. You can also just measure that directly and then do the math.

With search over purely simulated outcomes, you’re saying essentially “I have uncertainty over how to do the math”, while in control theory you’re essentially saying “I don’t”.

Perhaps a useful analogy would be that of numerical integration vs symbolic integration. You can brute force a decent enough approximation of any integral just by drawing a bunch of little trapezoids and summing them up, and a smart highschooler can write the program to do it. Symbolic integration is much "harder", but can often give exact solutions and isn't so hard to compute once you know how to do it.

Why not both?

You can do both. I’m not trying to argue that doing the math and calculating the optimal answer is always the right thing to do (or even feasible/possible).

In the real world, I often do sorta “search through gains” instead of trying to get my analysis perfect or model my meta-uncertainty. Just yesterday, for example, we had some overshoot on the linear actuator we’re working on. Trying to do the math would have been extremely tedious and I likely would have messed it up anyway, but it took about two minutes to just change the values and try it until it worked well. It’s worth noting that “searching” by actually doing experiments is different than “searching” by running simulations, but the latter can make sense too -- if engineer time doing control theory is expensive, laptop time running simulations is cheap, and the latter can substitute for the former to some degree.

The point I was making was that the optimal solution is still going to be what control theory says, so if it's important to you to have the rightest answer with the fewest mistakes, you move away from searching and towards the control theory textbook -- not the other way around.

Most of your post is describing situations where you can't easily solve a control problem with a direct rule, so you spin up a search based on a model of the situation.

I don't follow this part.

Thanks for a great post! I have a small confusion/nit regarding natural selection. Despite its name, I don't think it's a good exemplar of a selection process. Going through the features of a selection process from the start of the post:

can directly instantiate any element of the search space. No: natural selection can only make local modifications to previously instantiated points. But you already dealt with this local search issue in Choices Don't Change Later Choices.
gets direct feedback on the quality of each element. Yes.
quality of element does not depend on previous choices. No, the evaluation of an element in natural selection depends a great deal on previous choices because they usually make up important parts of its environment. I think this is the thrust of the claim that natural selection is online (which I agree with).
only the final output matters. No? From the perspective of natural selection, I think the quality of the current output is what matters.

I'd love to know why natural selection seemed obvious as an example of a selection process, since it did not to me due to its poor score on the checklist above.

(I am unfortunately currently bogged down with external academic pressures, and so cannot engage with this at the depth I’d like to, but here’s some initial thoughts.)

I endorse this post. The distinction explained here seems interesting and fruitful.

I agree with the idea to treat selection and control as two kinds of analysis, rather than as two kinds of object – I think this loosely maps onto the distinction we make between the mesa-objective and the behavioural objective. The former takes the selection view of the learned algorithm; the latter takes the control view.

At least speaking for myself (the other authors might have different thoughts on this), the decision to talk explicitly in terms of the selection view in the mesa-optimiser post is based on an intuition that selectors, in general, have more coherently defined counterfactual behaviour. That is, given a very different input, a selector will still select an output that scores well on its mesa-objective, because that’s how selectors work. Whereas a controller, to the degree it optimises for an objective, seems more likely to just completely stop working on a different input. I have fairly low confidence in this argument, however: it seems to me that one can plausibly have pretty coherent counterfactual behaviour in a very broad distribution even without doing selection. And since it is ultimately the behaviour that does the damage, it would be good to have a working distinction that is based purely on that. We (the mesa-optimisation authors) haven’t been able to come up with one.

Another reason to be interested in selectors is that in RL, the learned algorithm is supposed to fill a controller role. So, restricting attention to selectors allows to talk at least somewhat meaningfully about non-optimiser agents, which is otherwise difficult, as any learned agent is in a controller-shaped context.

In any case, I hope that more work happens on this problem, either dissolving the need to talk about optimisation, or at least making all these distinctions more precise. The vagueness of everything is currently my biggest worry about the legitimacy of mesa-optimiser concerns.

Yeah, I agree with most of what you're saying here.

A learned controller which isn't implementing any internal selection seems more likely to be incoherent out-of-distribution (ie lack a strong IRL interpretation of its behavior), as compared with a mesa-optimizer;
However, this is a low-confidence argument at present; it's very possible that coherent controllers can appear w/o necessarily having a behavioral objective which matches the original objective, in which case a version of the internal alignment problem applies. (But this might be a significantly different version of the internal alignment problem.)

I think a crux here is: to what extent are mesa-controllers with simple behavioral objectives going to be simple? The argument that mesa-optimizers can compress coherent strategies does not apply here.

Actually, I think there's an argument that certain kinds of mesa-controllers can be simple: the mesa-controllers which are more like my rocket example (explicit world model; explicit representation of objective within that world model; but, optimal policy does not use any search). There is also other reason to suspect that these could survive techniques which are designed to make sure mesa-optimizers don't arise: they aren't expending a bunch of processing power on an internal search, so, you can't eliminate them with some kind of processing-power device. (Not that we know of any such device that eliminates mesa-optimizers -- but if we did, it may not help with rocket-type mesa-controllers.)

Terminology point: I like the term 'selection' for the cluster I'm pointing at, but, I keep finding myself tempted to say 'search' in an AI context. Possibly, 'search vs control' would be better terminology.

to what extent are mesa-controllers with simple behavioural objectives going to be simple?

I’m not sure what “simple behavioural objective” really means. But I’d expect that for tasks requiring very simple policies, controllers would do, whereas the more complicated the policy required to solve a task, the more one would need to do some kind of search. Is this what we observe? I’m not sure. AlphaStar and OpenAI Five seem to do well enough in relatively complex domains without any explicit search built into the architecture. Are they using their recurrence to search internally? Who knows. I doubt it, but it’s not implausible.

certain kinds of mesa-controllers can be simple: the mesa-controllers which are more like my rocket example (explicit world-model; explicit representation of objective within that world model; but, optimal policy does not use any search).

The rocket example is interesting. I guess the question for me there is, what sorts of tasks admit an optimal policy that can be represented in this way? Here it also seems to me like the more complex an environment, the more implausible it seems that a powerful policy can be successfully represented with straightforward functions. E.g., let’s say we want a rocket not just to get to the target, but to self-identify a good target in an area and pick a trajectory that evades countermeasures. I would be somewhat surprised if we can still represent the best policy as a set of non-searchy functions. So I have this intuition that for complex state spaces, it’s hard to find pure controllers that do the job well.

Yeah, I agree that this seems possible, but extremely unclear. If something uses a fairly complex algorithm like FFT, is it search? How "sophisticated" can we get without using search? How can we define "search" and "sophisticated" so that the answer is "not very sophisticated"?

In a field like alignment or embedded agency, it's useful to keep a list of one or two dozen ideas which seem like they should fit neatly into a full theory, although it's not yet clear how. When working on a theoretical framework, you regularly revisit each of those ideas, and think about how it fits in. Every once in a while, a piece will click, and another large chunk of the puzzle will come together.

Selection vs control is one of those ideas. It seems like it should fit neatly into a full theory, but it's not yet clear what that will look like. I revisit the idea pretty regularly (maybe once every 3-4 months) to see how it fits with my current thinking. It has not yet had its time, but I expect it will (that's why it's on the list, after all).

Bearing in mind that the puzzle piece has not yet properly clicked, here are some current thoughts on how it might connect to other pieces:

Selection and control have different type signatures.
A selection process optimizes for the values of variables in some model, which may or may not correspond anything in the real world. Human values seem to be like this - see Human Values Are A Function Of Humans' Latent Variables.
A control process, on the other hand, directly optimizes things in its environment. A thermostat, for instance, does not necessarily contain any model of the temperature a few minutes in the future; it just directly optimizes the value of the temperature a few minutes in the future.
The post basically says it, but it's worth emphasizing: reinforcement learning is a control process, expected utility maximization is a selection process. The difference in type signatures between RL and EU maximization is the same as the difference in type signatures between selection and control.
Inner and outer optimizers can have different type signatures: an outer controller (e.g. RL) can learn an inner selector (e.g. utility maximizer), or an outer selector (e.g. a human) can build an inner controller (e.g. a thermostat), or they could match types with or without matching models/objectives. Which things even can match depends on the types involved - e.g. if one of the two is a controller, it may not have any world-model, so it's hard to talk about variables in its world-model corresponding to variables in a selector's world-model.
The Good Regulator Theorem roughly says that the space of optimal controllers always includes a selector (although it doesn't rule out additional non-selectors in that space).

In my ~~wayward youth~~formal education, I studied numerical optimization, controls systems, the science of decision-making, and related things, and so some part of me was always irked by the focus on utility functions and issues with them; take this early comment of mine and the resulting thread as an example. So I was very pleased to see a post that touches on the difference between the approaches and the resulting intuitions bringing it more into the thinking of the AIAF.

That said, I also think I've become more confused about what sorts of inferences we can draw from internal structure to external behavior, when there are Church-Turing-like reasons to think that a robot built with mental strategy X can emulate a robot built with mental strategy Y, and both psychology and practical machine learning systems look like complicated pyramids built out of simple nonlinearities that can approximate general functions (but with different simplicity priors, and thus efficiencies). This sort of distinction doesn't seem particularly useful to me from the perspective of constraining our expectations, while it does seem useful for expanding them. [That is, the range of future possibilities seems broader than one would expect if they only thought in terms of selection, or only thought in terms of control.]

Totally agree this is a useful distinction. The map/territory thing feels right on. This is something that the mainstream AI research community doesn't seem confused about. As far as I can see, no one there thinks search and planning are the same task.

With regard to search algorithms being controllers: Here's a discussion I had with ErickBall where they argue that planning will ultimately prove useful for search and I argue it won't. There might also be some new ideas for "what's the critical distinction" in that discussion.

I guess what I think isn't that the mainstream isn't explicitly confused about the distinction (ie, doesn't make confused claims), but that it isn't clearly made/taught, which leaves some individuals confused.

I think this has a little to do with the (also often implicit) distinction between research and application (ie, research vs engineering). In the context of pure research, it might make a lot of sense to take shortcuts with toy models which you could not take in the intended application of the algorithms, because you are investigating a particular phenomenon and the shortcuts don't interfere with that investigation. However, these shortcuts can apparently change the type of the problem, and other people can become confused about what problem type you are really trying to solve.

To be a bit more concrete, you might test an AI on a toy model, and directly feed the AI some information about the toy model (as a shortcut). You can do this because the toy model is a simulation you built, so, you have direct access to it. Your intention in the research might be that such direct-fed information would be replaced with learning one day. (To you, your AI is "controller" type.) Others may misinterpret your algorithm as a search technique which takes an explicit model of a situation (they see it as "selection" type).

This could result in other people writing papers which contrast your technique with other "selection"-type techniques. Your algorithm might compare poorly because you made some decisions motivated by eventual control-type applications. This becomes hard to point out because the selection/control distinction is a bit tricky.

As far as I can see, no one there thinks search and planning are the same task.

I'm not sure what you mean about search vs planning. My guess is that search=selection and planning=control. While I do use "search" and "selection" somewhat interchangeably, I don't want to use "planning" and "control" interchangeably; "planning" suggests a search-type operation applied to solve a control problem (the selection-process-within-a-control-process idea).

Also, it seems to me that tons of people would say that planning is a search problem, and AI textbooks tend to reflect this.

With regard to search algorithms being controllers: Here's a discussion I had with ErickBall where they argue that planning will ultimately prove useful for search and I argue it won't.

In the discussion, you say:

Optimization algorithms used in deep learning are typically pretty simple. Gradient descent is taught in sophomore calculus. Variants on gradient descent are typically used, but all the ones I know of are well under a page of code in complexity.

Gradient descent is extremely common these days, but much less so when I was first learning AI (just over ten years ago). To a large extent, it has turned out that "dumber" methods are easier to scale up.

However, much more sophisticated search techniques (with explicit consequentialist reasoning in the inner loop) are still discussed occasionally, especially for cases where evaluating a point is more costly. "Bayesian Optimization" is the subfield in which this is studied (that I know of). Here's an example:

Gaussian Processes for Global Optimization (the search is framed as a sequential decision problem!)

Later, you ask:

How do you reckon long-term planning will be useful for architecture search? It's not a stateful system.

The answer (in terms of Bayesian Optimization) is that planning ahead is still helpful in the same way that planning a sequence of experiments can be helpful. You are exploring the space in order to find the best solution. At every point, you are asking "what question should I ask next, to maximize the amount of information I'll uncover in the long run?". This does not reduce to "what question should I ask next, in order to maximize the amount of information I have right now?" -- but, most optimization algorithms don't even go that far. Most optimization algorithms don't explicitly reason about value-of-information at all, instead doing reasoning which is mainly designed to steer toward the best points it knows how to steer to immediately, with some randomness added in to get some exploration.

Yet, this kind of reasoning is not usually worth it, or so it seems based on the present research landscape. The overhead of planning-how-to-search is too costly; it doesn't save time overall.

It strikes me that evolution by natural selection has most of the characteristics you attribute to a control system, not a selection system: feedback is far from perfect, each step of evaluation is heavily constrained by previous outputs and there is no going back, most of the search space is unreachable, it operates on the territory and there is no map, there is no final output distinct from the computation itself, and as you mentioned, it is strictly "on-line". It's true that it is massively parallel, and in this sense different elements of the search space are evaluated and either accepted or rejected at each "step". I'm not sure that evolution is "obviously a selection process according to the distinction as [you] make it".

Of course, it is an astoundingly inefficient optimizer, of whichever type it is, so it is not surprising that it lacks many of the stereotypical characteristics of its class.

It might be based on the fact that it produces agents.

I wasn't clear on whether these was more a control thing or a selection thing - when looking at an agent, we care about what it does on its own. But we're also interested in "evolution's future outputs".

It seems possible that one could invent a measure of "control power"

I think the likelihood of this comment being helpful is small, but I know of two sort-of-adjacent efforts. Both of which took place under the auspices of DARPA's META Program, a program for improving systems engineering.

The first is a complexity metric, which they define as unexpected behavior of any kind and attempt to quantify in terms of information entropy. The part about the development of the metric begins on page 4.

The second is an adaptability metric. This one is considerably fussier; they eventually had to produce several metrics because of tradeoffs, and then tried to produce a valuation method so you could compare the metrics properly. It relies on several specific techniques which I have no knowledge of, and is much more heavily anchored in current real applications, but the crux of the effort seems to align with the "choices don't change later choices" section above.

This post feels to me like the same type of conversation that would have been helpful in the work of these two papers, so I mention them on the off-chance the relationship works both ways.

I've come back to this because I was thinking about slop.

The story goes: back in the pre-internet era, art was good. Then, with algorithms deciding which stuff to promote, it got worse (in some sense) and with AI it's got even worse. The word used for the new, implicitly bad media is often slop.

Slop content has a bunch of different definitions, but the particular trend from the pre-internet to the algorithm-internet to the AI-internet is one of exchanging control for selection.

In, say, the 90s, a huge amount of control effort went into, say, films and TV shows. Only a small number made it to the TV/cinema, and the selection pressure after that operated on the level of ~hours of content, with a feedback loop of ~months to ~years in terms of what stuff gets made.

With YouTube, a thousand different videos get instantiated. The algorithm can select between the actual finished products using a direct feedback loop from users.

This does seem to produce qualitatively different media.

Small nitpick, half a decade late: bottlecaps are arguably proportional controllers—the pressure they exert on the inside is proportional to the pressure applied by the inside, until the bottlecap hits a performance limit and breaks.

Intuitively it feels like you are onto something. Whether it is inherent to the optimizer's functionality or is an artifact of how we view it, is hard to say. Most selectors use the algorithms that are the same or similar to controllers. Gradient descent in simulated annealing can be thought of evaluating possible worlds (counterfactuals) and making the one with the highest utility actual. And vice versa, a guided missile can be thought of as a selector in a search space. I wonder if this is what you mean when you say

the selection vs control distinction is a map/territory distinction

My guess is that the distinction is in large part the matter of your "stance". If you think in a Cartesian way, analyzing an unchanging external reality, then it's more of a search. If you think in terms of changing that reality with your actions, then it's a controller.

... Rereading what you said, I guess I am basically agreeing with you.

Yeah I think this is definitely a "stance" thing.

Take the use of natural selection and humans as examples of optimization and mesa-optimization - the entire concept of "natural selection" is a human-convenient way of describing a pattern in the universe. It's approximately an optimizer, but in order to get rid of that "approximately" you have to reintroduce epicycles until your model is as complicated as a model of the world again. Humans aren't optimizers either, that's just a human-convenient way of describing humans.

More abstractly, the entire process of recognizing a mesa-optimizer - something that models the world and makes plans - is an act of stance-taking. Or Quinean radical translation or whatever. If a cat-recognizing neural net learns an attention mechanism that models the world of cats and makes plans, it's not going to come with little labels on the neurons saying "these are my input-output interfaces, this is my model of the world, this is my planning algorithm." It's going to be some inscrutable little bit of linear algebra with suspiciously competent behavior.

Not only could this competent behavior be explained either by optimization or some variety of "rote behavior," but the neurons don't care about these boundaries and can occupy a continuum of possibilities between any two central examples. And worst of all, the same neurons might have multiple different useful ways of thinking about them, some of which are in terms of elements like "goals" and "search," and others are in terms of the elements of rote behavior.

In light of this, the problem of mesa-optimizers is not "when will this bright line be crossed?" but "when will this simple model of the AI's behavior be predictable useful?" Even though I think the first instinct is the opposite.

More abstractly, the entire process of recognizing a mesa-optimizer - something that models the world and makes plans - is an act of stance-taking.

And pretty specifically, the intentional stance. I think Daniel Dennett did some pretty powerful clarification decades ago which could help this debate.

This felt to me like an important distinction to think about when thinking about optimization.

Thanks!

Got it, that's the case I was thinking of as "redrawing the system boundary". Makes sense.

Flint's notion of optimization, as I understand it, requires that we can view the target space as a subset of the initial space.

I would agree that a search process in which the cost of evaluation goes to infinity becomes purely a control process: you can't perform any filtering of possibilities based on evaluation, so, you have to output one possibility and try to make it a good one (with no guarantees).

Or perhaps better put, searching is for when we haven’t yet found what we want and how to get there. Control systems are what we implement once we know.

This is backwards, actually. “Control” isn’t the crummy option you have to resort to when you can’t afford to search. Searching is what you have to resort to when you can’t do control theory.

I agree with most of what you say here, but I think you're over-emphasizing the idea that search deals with unknowns whereas control deals with knows.

With search over purely simulated outcomes, you’re saying essentially “I have uncertainty over how to do the math”, while in control theory you’re essentially saying “I don’t”.

Why not both?

You can do both. I’m not trying to argue that doing the math and calculating the optimal answer is always the right thing to do (or even feasible/possible).

Most of your post is describing situations where you can't easily solve a control problem with a direct rule, so you spin up a search based on a model of the situation.

I don't follow this part.

can directly instantiate any element of the search space. No: natural selection can only make local modifications to previously instantiated points. But you already dealt with this local search issue in Choices Don't Change Later Choices.
gets direct feedback on the quality of each element. Yes.
quality of element does not depend on previous choices. No, the evaluation of an element in natural selection depends a great deal on previous choices because they usually make up important parts of its environment. I think this is the thrust of the claim that natural selection is online (which I agree with).
only the final output matters. No? From the perspective of natural selection, I think the quality of the current output is what matters.

I'd love to know why natural selection seemed obvious as an example of a selection process, since it did not to me due to its poor score on the checklist above.

(I am unfortunately currently bogged down with external academic pressures, and so cannot engage with this at the depth I’d like to, but here’s some initial thoughts.)

I endorse this post. The distinction explained here seems interesting and fruitful.

Yeah, I agree with most of what you're saying here.

A learned controller which isn't implementing any internal selection seems more likely to be incoherent out-of-distribution (ie lack a strong IRL interpretation of its behavior), as compared with a mesa-optimizer;
However, this is a low-confidence argument at present; it's very possible that coherent controllers can appear w/o necessarily having a behavioral objective which matches the original objective, in which case a version of the internal alignment problem applies. (But this might be a significantly different version of the internal alignment problem.)

to what extent are mesa-controllers with simple behavioural objectives going to be simple?

certain kinds of mesa-controllers can be simple: the mesa-controllers which are more like my rocket example (explicit world-model; explicit representation of objective within that world model; but, optimal policy does not use any search).

Bearing in mind that the puzzle piece has not yet properly clicked, here are some current thoughts on how it might connect to other pieces:

Selection and control have different type signatures.
A selection process optimizes for the values of variables in some model, which may or may not correspond anything in the real world. Human values seem to be like this - see Human Values Are A Function Of Humans' Latent Variables.
A control process, on the other hand, directly optimizes things in its environment. A thermostat, for instance, does not necessarily contain any model of the temperature a few minutes in the future; it just directly optimizes the value of the temperature a few minutes in the future.
The post basically says it, but it's worth emphasizing: reinforcement learning is a control process, expected utility maximization is a selection process. The difference in type signatures between RL and EU maximization is the same as the difference in type signatures between selection and control.
Inner and outer optimizers can have different type signatures: an outer controller (e.g. RL) can learn an inner selector (e.g. utility maximizer), or an outer selector (e.g. a human) can build an inner controller (e.g. a thermostat), or they could match types with or without matching models/objectives. Which things even can match depends on the types involved - e.g. if one of the two is a controller, it may not have any world-model, so it's hard to talk about variables in its world-model corresponding to variables in a selector's world-model.
The Good Regulator Theorem roughly says that the space of optimal controllers always includes a selector (although it doesn't rule out additional non-selectors in that space).

As far as I can see, no one there thinks search and planning are the same task.

Also, it seems to me that tons of people would say that planning is a search problem, and AI textbooks tend to reflect this.

With regard to search algorithms being controllers: Here's a discussion I had with ErickBall where they argue that planning will ultimately prove useful for search and I argue it won't.

In the discussion, you say:

Optimization algorithms used in deep learning are typically pretty simple. Gradient descent is taught in sophomore calculus. Variants on gradient descent are typically used, but all the ones I know of are well under a page of code in complexity.

Gaussian Processes for Global Optimization (the search is framed as a sequential decision problem!)

Later, you ask:

How do you reckon long-term planning will be useful for architecture search? It's not a stateful system.

Yet, this kind of reasoning is not usually worth it, or so it seems based on the present research landscape. The overhead of planning-how-to-search is too costly; it doesn't save time overall.

Of course, it is an astoundingly inefficient optimizer, of whichever type it is, so it is not surprising that it lacks many of the stereotypical characteristics of its class.

It might be based on the fact that it produces agents.

It seems possible that one could invent a measure of "control power"

This post feels to me like the same type of conversation that would have been helpful in the work of these two papers, so I mention them on the off-chance the relationship works both ways.

I've come back to this because I was thinking about slop.

Slop content has a bunch of different definitions, but the particular trend from the pre-internet to the algorithm-internet to the AI-internet is one of exchanging control for selection.

With YouTube, a thousand different videos get instantiated. The algorithm can select between the actual finished products using a direct feedback loop from users.

This does seem to produce qualitatively different media.

the selection vs control distinction is a map/territory distinction

... Rereading what you said, I guess I am basically agreeing with you.

Yeah I think this is definitely a "stance" thing.

More abstractly, the entire process of recognizing a mesa-optimizer - something that models the world and makes plans - is an act of stance-taking.

And pretty specifically, the intentional stance. I think Daniel Dennett did some pretty powerful clarification decades ago which could help this debate.

This felt to me like an important distinction to think about when thinking about optimization.

182

Selection vs Control

182

Ω 61

The Basic Idea

Selection

Control

Bottlecaps & Thermostats

Processes Within Processes

Blurring the Lines: What's the Critical Distinction?

Perfect Feedback

Choices Don't Change Later Choices

Offline vs Online

Internal Features vs Context

182

Ω 61

182

Ω 61