The word “optimizer” can be used in at least two different ways.
First, a system can be an “optimizer” in the sense that it is solving a computational optimization problem. A computer running a linear program solver, a SAT-solver, or gradient descent, would be an example of a system that is an “optimizer” in this sense. That is, it runs an optimization algorithm. Let “optimizer_1” denote this concept.
Second, a system can be an “optimizer” in the sense that it optimizes its environment. A human is an optimizer in this sense, because we robustly take actions that push our environment in a certain direction. A reinforcement learning agent can also be thought of as an optimizer in this sense, but confined to whatever environment it is run in. This is the sense in which “optimizer” is used in posts such as this. Let “optimizer_2” denote this concept.
These two concepts are distinct. Say that you somehow hook up a linear program solver to a reinforcement learning environment. Unless you do the “hooking up” in a particularly creative way there is no reason to assume that the output of the linear program solver would push the environment in a particular direction. Hence a linear program solver is an optimizer_1, but not an optimizer_2. On the other hand, a simple tabular RL agent would eventually come to systematically push the environment in a particular direction, and is hence an optimizer_2. However, such a system does not run any internal optimization algorithm, and is therefore not an optimizer_1. This means that a system can be an optimizer_1 while not being an optimizer_2, and vice versa.
There are some arguments related to AI safety that seem to conflate these two concepts. In Superintelligence (pg 153), on the topic of Tool AI, Nick Bostrom writes that:
A second place where trouble could arise is in the course of the software’s operation. If the methods that the software uses to search for a solution are sufficiently sophisticated, they may include provisions for managing the search process itself in an intelligent manner. In this case, the machine running the software may begin to seem less like a mere tool and more like an agent. Thus, the software may start by developing a plan for how to go about its search for a solution. The plan may specify which areas to explore first and with what methods, what data to gather, and how to make best use of available computational resources. In searching for a plan that satisfies the software’s internal criterion (such as yielding a sufficiently high probability of finding a solution satisfying the user-specified criterion within the allotted time), the software may stumble on an unorthodox idea. For instance, it might generate a plan that begins with the acquisition of additional computational resources and the elimination of potential interrupters (such as human beings).
To me, this argument seems to make an unexplained jump from optimizer_1 to optimizer_2. It begins with the observation that a powerful Tool AI would be likely to optimize its internal computation in various ways, and that this optimization process could be quite powerful. In other words, a powerful Tool AI would be a strong optimizer_1. It then concludes that the system might start pursuing convergent instrumental goals – in other words, that it would be an optimizer_2. The jump between the two is not explained.
The implicit assumption seems to be that an optimizer_1 could turn into an optimizer_2 unexpectedly if it becomes sufficiently powerful. It is not at all clear to me that this is the case – I have not seen any good argument to support this, nor can I think of any myself. The fact that a system is internally running an optimization algorithm does not imply that the system is selecting its output in such a way that this output optimizes the environment of the system.
The excerpt from Superintelligence is just one example of an argument that seems to slide between optimizer_1 and optimizer_2. For example, some parts of Dreams of Friendliness seem to be doing so, or at least it's not always clear which of the two is being talked about. I’m sure there are more examples as well.
Be mindful of this distinction when reasoning about AI. I propose that “consequentialist” (or perhaps "goal-directed") is used to mean what I have called “optimizer_2”. I don’t think there is a need for a special word to denote what I have called “optimizer_1” (at least not once the distinction between optimizer_1 and optimizer_2 has been pointed out).
Note: It is possible to raise a sort of embedded agency-like objection against the distinction between optimizer_1 and optimizer_2. One might argue that:
There is no sharp boundary between the inside and the outside of a computer. An “optimizer_1” is just an optimizer whose optimization target is defined in terms of the state of the computer it is installed on, whereas an “optimizer_2” is an optimizer whose optimization target is defined in terms of something outside the computer. Hence there is no categorical difference between an optimizer_1 and an optimizer_2.
I don’t think that this argument works. Consider the following two systems:
- A computer that is able to very quickly solve very large linear programs.
- A computer that solves linear programs, and tries to prevent people from turning it off as it is doing so, etc.
System 1 is an optimizer_1 that solves linear programs, whereas system 2 is an optimizer_2 that is optimizing the state of the computer that it is installed on. These two things are different. (Moreover, the difference isn’t just that system 2 is “more powerful” than system 1 – system 1 might even be a better linear program solver than system 2.)
Acknowledgements: We were aware of the difference between "optimizer_1" and "optimizer_2" while working on the mesa-optimization paper, and I'm not sure who first pointed it out. We were also probably not the first people to realise this.