Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching of the whole community.

What is optimization? What is the relationship between a computational optimization process — say, a computer program solving an optimization problem — and a physical optimization process — say, a team of humans building a house?

We propose the concept of an optimizing system as a physically closed system containing both that which is being optimized and that which is doing the optimizing, and defined by a tendency to evolve from a broad basin of attraction towards a small set of target configurations despite perturbations to the system. We compare our definition to that proposed by Yudkowsky, and place our work in the context of work by Demski and Garrabrant’s Embedded Agency, and Drexler’s Comprehensive AI Services. We show that our definition resolves difficult cases proposed by Daniel Filan. We work through numerous examples of biological, computational, and simple physical systems showing how our definition relates to each.


In the field of computer science, an optimization algorithm is a computer program that outputs the solution, or an approximation thereof, to an optimization problem. An optimization problem consists of an objective function to be maximized or minimized, and a feasible region within which to search for a solution. For example we might take the objective function as a minimization problem and the whole real number line as the feasible region. The solution then would be and a working optimization algorithm for this problem is one that outputs a close approximation to this value.

In the field of operations research and engineering more broadly, optimization involves improving some process or physical artifact so that it is fit for a certain purpose or fulfills some set of requirements. For example, we might choose to measure a nail factory by the rate at which it outputs nails, relative to the cost of production inputs. We can view this as a kind of objective function, with the factory as the object of optimization just as the variable x was the object of optimization in the previous example.

There is clearly a connection between optimizing the factory and optimizing for x, but what exactly is this connection? What is it that identifies an algorithm as an optimization algorithm? What is it that identifies a process as an optimization process?

The answer proposed in this essay is: an optimizing system is a physical process in which the configuration of some part of the universe moves predictably towards a small set of target configurations from any point in a broad basin of optimization, despite perturbations during the optimization process.

We do not imagine that there is some engine or agent or mind performing optimization, separately from that which is being optimized. We consider the whole system jointly — engine and object of optimization — and ask whether it exhibits a tendency to evolve towards a predictable target configuration. If so, then we call it an optimizing system. If the basin of attraction is deep and wide then we say that this is a robust optimizing system.

An optimizing system as defined in this essay is known in dynamical systems theory as a dynamical system with one or more attractors. In this essay we show how this framework can help to understand optimization as manifested in physically closed systems containing both engine and object of optimization.

In this way we find that optimizing systems are not something that are designed but are discovered. The configuration space of the world contains countless pockets shaped like small and large basins, such that if the world should crest the rim of one of these pockets then it will naturally evolve towards the bottom of the basin. We care about them because we can use our own agency to tip the world into such a basin and then let go, knowing that from here on things will evolve towards the target region.

All optimization basins have a finite extent. A ball may roll to the center of a valley if initially placed anywhere within the valley, but if it is placed outside the valley then it will roll somewhere else entirely, or perhaps will not roll at all. Similarly, even a very robust optimizing system has an outer rim to its basin of attraction, such that if the configuration of the system is perturbed beyond that rim then the system no longer evolves towards the target that it once did. When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.

Example: computing the square root of two

Say I ask my computer to compute the square root of two, for example by opening a python interpreter and typing:

>>> print(math.sqrt(2))

The value printed here is actually calculated by solving an optimization problem. It works roughly as follows. First we set up an objective function that has as its minimum value the square root of two. One function we could use is

Next we pick an initial estimate for the square root of two, which can be any number whatsoever. Let’s take 1.0 as our initial guess. Then we take a gradient step in the direction indicated by computing the slope of the objective function at our initial estimate:

Then we repeat this process of computing the slope and updating our estimate over and over, and our optimization algorithm quickly converges to the square root of two:

This is gradient descent, and it can be implemented in a few lines of python code:

	current_estimate = 1.0
	step_size = 1e-3
	while True:
		objective = (current_estimate**2 - 2) ** 2
		gradient = 4 * current_estimate * (current_estimate**2 - 2)
		if abs(gradient) < 1e-8:
		current_estimate -= gradient * step_size

But this program has the following unusual property: we can modify the variable that holds the current estimate of the square root of two at any point while the program is running, and the algorithm will still converge to the square root of two. That is, while the code above is running, if I drop in with a debugger and overwrite the current estimate while the loop is still executing, what will happen is that the next gradient step will start correcting for this perturbation, pushing the estimate back towards the square root of two:

If we give the algorithm time to converge to within machine precision of the actual square root of two then the final output will be bit-for-bit identical to the result we would have gotten without the perturbation.

Consider this for a moment. For most kinds of computer code, overwriting a variable while the code is running will either have no effect because the variable isn’t used, or it will have a catastrophic effect and the code will crash, or it will simply cause the code to output the wrong answer. If I use a debugger to drop in on a webserver servicing an http request and I overwrite some variable with an arbitrary value just as the code is performing a loop in which this variable is used in a central way, bad things are likely to happen! Most computer code is not robust to arbitrary in-flight data modifications.

But this code that computes the square root of two is robust to in-flight data modifications, or at least the "current estimate" variable is. It’s not that our perturbation has no effect: if we change the value, the next iteration of the algorithm will compute the objective function and its slope at a completely different point, and each iteration after that will be different to how it would have been if we hadn’t intervened. The perturbation may change the total number of iterations before convergence is reached. But ultimately the algorithm will still output an estimate of the square root of two, and, given time to fully converge, it will output the exact same answer it would have output without the perturbation. This is an unusual breed of computer program indeed!

What is happening here is that we have constructed a physical system consisting of a computer and a python program that computes the square root of two, such that:

  • for a set of starting configurations (in this case the set of configurations in which the "current estimate" variable is set to each representable floating point number),

  • the system exhibits a tendency to evolve towards a small set of target configurations (in this case just the single configuration in which the "current estimate" variable is set to the square root of two),

  • and this tendency is robust to in-flight perturbations to the system’s configuration (in this case robustness is limited to just the dimensions corresponding to changes in the "current estimate" variable).

In this essay I argue that systems that converge to some target configuration, and will do so despite perturbations to the system, are the systems we should rightly call "optimizing systems".

Example: building a house

Consider a group of humans building a house. Let us consider the humans together with the building materials and construction site as a single physical system. Let us imagine that we assemble this system inside a completely closed chamber, including food and sleeping quarters for the humans, lighting, a power source, construction materials, construction blueprint, as well as the physical humans with appropriate instructions and incentives to build the house. If we just put these physical elements together we get a system that has a tendency to evolve under the natural laws of physics towards a configuration in which there is a house matching the blueprint.

We could perturb the system while the house is being built — say by dropping in at night and removing some walls or moving some construction materials about — and this physical system will recover. The team of humans will come in the next day and find the construction materials that were moved, put in new walls to replace the ones that were removed, and so on.

Just like the square root of two example, here is a physical system with:

  • A basin of attraction (all the possible arrangements of viable humans and building materials)

  • A target configuration set that is small relative to the basin of attraction (those in which the building materials have been arranged into a house matching the design)

  • A tendency to evolve towards the target configurations when starting from any point within the basin of attraction, despite in-flight perturbations to the system

Now this system is not infinitely robust. If we really scramble the arrangement of atoms within this system then we’ll quickly wind up with a configuration that does not contain any humans, or in which the building materials are irrevocably destroyed, and then we will have a system without the tendency to evolve towards any small set of final configurations.

In the physical world we are not surprised to find systems that have this tendency to evolve towards a small set of target configurations. If I pick up my dog while he is sleeping and move him by a few inches, he still finds his way to his water bowl when he wakes up. If I pull a piece of bark off a tree, the tree continues to grow in the same upward direction. If I make a noise that surprises a friend working on some math homework, the math homework still gets done. Systems that contain living beings regularly exhibit this tendency to evolve towards target configurations, and tend to do so in a way that is robust to in-flight perturbations. As a result we are familiar with physical systems that have this property, and we are not surprised when they arise in our lives.

But physical systems in general do not have the tendency to evolve towards target configurations. If I move a billiard ball a few inches to the left while a bunch of billiard balls are energetically bouncing around a billiard table, the balls are likely to come to rest in a very different position than if I had not moved the ball. If I change the trajectory of a satellite a little bit, the satellite does not have any tendency to move back into its old orbit.

The computer systems that we have built are still, by and large, more primitive than the living systems that we inhabit, and most computer systems do not have the tendency to evolve robustly towards some set of target configurations, so optimization algorithms as discussed in the previous section, which do have this property, are somewhat unusual.

Defining optimization

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.

Some systems may have a single target configuration towards which they inevitably evolve. Examples are a ball in a steep valley with a single local minimum, and a computer computing the square root of two. Other systems may have a set of target configurations and perturbing the system may cause it to evolve towards a different member of this set. Examples are a ball in a valley with multiple local minima, or a tree growing upwards (perturbing the tree by, for example, cutting off some branches while it is growing will probably change its final shape, but will not change its tendency to grow towards one of the configurations in which it has reached its maximum size).

We can quantify optimizing systems in the following ways.

Robustness. Along how many dimensions can we perturb the system without altering its tendency to evolve towards the target configuration set? What magnitude perturbation can the system absorb along these dimensions? A self-driving car navigating through a city may be robust to perturbations that involve physically moving the car to a different position on the road in the city, but not to perturbations that involve changing the state of physical memory registers that contain critical bits of computer code in the car’s internal computer.

Duality. To what extent can we identify subsets of the system corresponding to "that which is being optimized" and "that which is doing the optimization"? Between engine and object of optimization; between agent and world. Highly dualistic systems may be robust to perturbations of the object of optimization, but brittle with respect to perturbations of the engine of optimization. For example, a system containing a 2020s-era robot moving a vase around is a dualistic optimizing system: there is a clear subset of the system that is the engine of optimization (the robot), and object of optimization (the vase). Furthermore, the robot may be able to deal with a wide variety of perturbations to the environment and to the vase, but there are likely to be numerous small perturbations to the robot itself that will render it inert. In contrast, a tree is a non-dualistic optimizing system: the tree does grow towards a set of target configurations, but it makes no sense to ask which part of the tree is "doing" the optimization and which part is "being" optimized. This latter example is discussed further below.

Retargetability. Is it possible, using only a microscopic perturbation to the system, to change the system such that it is still an optimizing system but with a different target configuration set? A system containing a robot with the goal of moving a vase to a certain location can be modified by making just a small number of microscopic perturbations to key memory registers such that the robot holds the goal of moving the vase to a different location and the whole vase/robot system now exhibits a tendency to evolve towards a different target configuration. In contrast, a system containing a ball rolling towards the bottom of a valley cannot generally be modified by any microscopic perturbation such that the ball will roll to a different target location. A tree is an intermediate example: to cause the tree to evolve towards a different target configuration set — say, one in which its leaves were of a different shape — one would have to modify the genetic code simultaneously in all of the tree’s cells.

Relationship to Yudkowsky’s definition of optimization

In Measuring Optimization Power, Eliezer Yudkowsky defines optimization as a process in which some part of the world ends up in a configuration that is high in an agent’s preference ordering, yet has low probability of arising spontaneously. Yudkowsky’s definition asks us to look at a patch of the world that has already undergone optimization by an agent or mind, and draw conclusions about the power or intelligence of that mind by asking how unlikely it would be for a configuration of equal or greater utility (to the agent) to arise spontaneously.

Our definition differs from this in the following ways:

  • We look at whole systems that evolve naturally under physical laws. We do not assume that we can decompose these systems into some engine and object of optimization, or into mind and environment. We do not look at systems that are "being optimized" by some external entity but rather at "optimizing systems" that exhibit a natural tendency to evolve towards a target configuration set. These optimizing systems may contain subsystems that have the properties of agents, but as we will see there are many instances of optimizing systems that do not contain dualistic agentic subsystems.

  • When discerning the boundary between optimization and non-optimization, we look principally at robustness — whether the system will continue to evolve towards its target configuration set in the face of perturbations — whereas Yudkowsky looks at the improbability of the final configuration.

Relationship to Drexler’s Comprehensive AI Services

Eric Drexler has written about the need to consider AI systems that are not goal-directed agents. He points out that the most economically important AI systems today are not constructed within the agent paradigm, and that in fact agents represent just a tiny fraction of the design space of intelligent systems. For example, a system that identifies faces in images would be an intelligent system but not an agent according to Drexler’s taxonomy. This perspective is highly relevant to our discussion here since we seek to go beyond the narrow agent model in which intelligent systems are conceived of as unitary entities that receive observations from the environment, send actions back into the environment, but are otherwise separate from the environment.

Our perspective is that there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.

Figure: relationship between our optimizing system concept and Drexler’s taxonomy of AI systems

Examples of systems that lie in each of these three tiers are as follows:

  • A system that identifies faces in images by evaluating a feed-forward neural network is an AI system but not an optimizing system.

  • A tree is an optimizing system but not a goal-directed agent system (see section below analyzing a tree as an optimizing system).

  • A robot with the goal of moving a ball to a specific destination is a goal-directed agent system.

Relationship to Garrabrant and Demski’s Embedded Agency

Scott Garrabrant and Abram Demski have written about the many ways that a dualistic view of agency in which one conceives of a hard separation between agent and environment fails to capture the reality of agents that are reducible to the same basic building-blocks as the environments in which they are embedded. They show that if one starts from a dualistic view of agency then it is difficult to design agents capable of reflecting on and making improvements to their own cognitive processes, since the dualistic view of agency rests on a unitary agent whose cognition does not affect the world except via explicit actions. They also show that reasoning about counterfactuals becomes nonsensical if starting from a dualistic view of agency, since the agent’s cognitive processes are governed by the same physical laws as those that govern the environment, and the agent can come to notice this fact, leading to confusion when considering the consequences actions that are different from the actions that the agent will, in fact, output.

One could view the Embedded Agency work as enumerating the many logical pitfalls one falls into if one takes the "optimizer" concept as the starting point for designing intelligent systems, rather than "optimizing system" as we propose here. The present work is strongly inspired by Garrabrant and Demski’s work. Our hope is to point the way to a view of optimization and agency that captures reality sufficiently well to avoid the logical pitfalls identified in the Embedded Agency work.

Example: ball in a valley

Consider a physical ball rolling around in a small valley. According to our definition of optimization, this is an optimizing system:

Configuration space. The system we are studying consists of the physical valley plus the ball

Basin of attraction. The ball could initially be placed anywhere in the valley (these are the configurations comprising the basin of attraction)

Target configuration set. The ball will roll until it ends up at the bottom of the valley (the set of local minima are the target configurations)

We can perturb the ball while it is "in flight", say by changing its position or velocity, and the ball will still ultimately end up at one of the target configurations. This system is robust to perturbations along dimensions corresponding to the spatial position and velocity of the ball, but there are many more dimensions along which this system is not robust. If we change the shape of the ball to a cube, for example, then the ball will not continue rolling to the bottom of the valley.

Example: ball in valley with robot

Consider now a ball in a valley as above, but this time with the addition of an intelligent robot holding the goal of ensuring that the ball reaches the bottom of the valley.

Configuration space. The system we are studying now consists of the physical valley, the ball, and the robot. We consider the evolution of and perturbations to this whole joint system.

Target configuration set. As before, the target configuration is the ball being at the bottom of the valley

Basin of attraction. As before, the basin of attraction consists of all the possible spatial locations that the ball could be placed in the valley.

We can now perturb the system along many more dimensions than in the case where there was no robot. For example, we could introduce a barrier that prevents the ball from rolling downhill past a certain point, and we can then expect a sufficiently intelligent robot to move the ball over the barrier. We can expect a sufficiently well-designed robot to be able to overcome a wide variety of hurdles that gravity would not overcome on its own. Therefore we say that this system is more robust than the system without the robot.

There is a sequence of systems spanning the gap between a ball rolling in a valley, which is robust to a narrow set of perturbations and therefore we say exhibits a weak degree of optimization, up to a robot with a goal of moving a ball around in a valley, which is robust to a much wider set of perturbations, and therefore we say exhibits a stronger degree of optimization. Therefore the difference between systems that do and do not undergo optimization is not a binary distinction but a continuous gradient of increasing robustness to perturbations.

By introducing the robot to the system we have also introduced new dimensions along which the system is fragile: the dimensions corresponding to modifications to the robot itself, and in particular the dimensions corresponding to modifications to the code running on the robot (i.e. physical perturbations to the configuration of the memory cells in which the code is stored). There are two types of perturbation we might consider:

  • Perturbations that destroy the robot. There are numerous ways we could cut wires or scramble computer code that would leave the robot completely non-operational. Many of these would be physically microscopic, such as flipping a single bit in a memory cell containing some critical computer code. In fact there are now more ways to break the system via microscopic perturbations compared to when we were considering a ball in a valley without a robot, since there are few ways to cause a ball not to reach the bottom of a valley by making only a microscopic perturbation to the system, but there are many ways to break modern computer systems via a microscopic perturbation.

  • Perturbations that change the target configurations. We could also make physically microscopic perturbations to this system that change the robot’s goal. For example we might flip the sign on some critical computations in the robot’s code such that the robot works to place the ball at the highest point rather than the lowest. This is still a physical perturbation to the valley/ball/robot system: it is one that affects the configuration of the memory cells containing the robot’s computer code. These kinds of perturbations may point to a concept with some similarity to that of an agent. If we have a system that can be perturbed in a way that preserves the robustness of the basin of convergence but changes the target configuration towards which the system tends to evolve, and if we can find perturbations that cause the target configurations to match our own goals, then we have a way to navigate between convergence basins.

Example: computer performing gradient descent

Consider now a computer running an iterative gradient descent algorithm in order to solve an optimization problem. For concreteness let us imagine that the objective function being optimized is globally convex, in which case the algorithm will certainly reach the global optimum given sufficient time. Let us further imagine that the computer stores its current best estimate of the location of the global optimum (which we will henceforth call the "optimizand") at some known memory location, and updates this after every iteration of gradient descent.

Since this is a purely computational process, it may be tempting to define the configuration space at the computational level — for example by taking the configuration space to be the domain of the objective function. However, it is of utmost importance when analyzing any optimizing system to ground our analysis in a physical system evolving according to the physical laws of nature, just as we have for all previous examples. The reason this is important is to ensure that we always study complete systems, not just some inert part of the system that is "being optimized" by something external to the system. Therefore we analyze this system as follows.

Configuration space. The system consists of a physical computer running some code that performs gradient descent. The configurations of the system are the physical configurations of the atoms comprising the computer.

Target-configuration set. The target configuration set consists of the set of physical configurations of the computer in which the memory cells that store the current optimized state contain the true location of the global optimum (or the closest floating point representation of it).

Basin of attraction. The basin of attraction consists of the set of physical configurations in which there is a viable computer and it is running the gradient descent algorithm.

Example: billiard balls

Let us now examine a system that is not an optimizing system according to our definition. Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration. Is this an optimizing system?

In order to qualify as an optimizing system, a system must (1) have a tendency to evolve towards a set of target configurations that are small relative to the basin of attraction, and (2) continue to evolve towards the same set of target configurations if perturbed.

If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations. A system does not need to be robust along all dimensions in order to be an optimizing system, but a billiard table exhibits no such robust dimensions at all, so it is not an optimizing system.

Example: satellite in orbit

Consider a second example of a system that is not an optimizing system: a satellite in orbit around Earth. Unlike the billiard balls, there is no chaotic tendency for small perturbations to lead to large deviations in the system’s evolution, but neither is there any tendency for the system to come back to some target configuration when perturbed. If we perturb the satellite’s velocity or position, then from that point on it is in a different orbit and has no tendency to return to its previous orbit. There is no set of target configurations towards which the system evolves despite perturbations, so this is not an optimizing system.

Example: a tree

Consider a patch of fertile ground with a tree growing in it. Is this an optimizing system?

Configuration space. For the sake of concreteness let us take a region of space that is sealed off from the outside world — say 100m x 100m x 100m. This region is filled at the bottom with fertile soil and at the top with an atmosphere conducive to the tree’s growth. Let us say that the region contains a single tree.

We will analyze this system in terms of the arrangement of atoms inside this region of space. Out of all the possible configurations of these atoms, the vast majority consist of a uniform hazy gas. An astronomically tiny fraction of configurations contain a non-trivial mass of complex biological nutrients making up soil. An even tinier fraction of configurations contain a viable tree.

Target-configuration set. A tree has a tendency to grow taller over time, to sprout more branches and leaves, and so on. Furthermore, trees can only grow so tall due to the physics of transporting sugars up and down the trunk. So we can identify a set of target configurations in which the atoms in our region of space are arranged into a tree that has grown to its maximum size (has sprouted as many branches and leaves as it can support given the atmosphere, the soil that it is growing in, and the constraints of its own biology). There are many topologies in which the tree’s branches could divide, many positions that leaves could sprout in, and so on, so there are many configurations within the target configuration set. But this set is still tiny compared to all the ways that the same atoms could be arranged without the constraint of forming a viable tree.

Basin of convergence. This system will evolve towards the target configuration set starting from any configuration in which there is a viable tree. This includes configurations in which there is just a seed in the ground, as well as configurations in which there is a tree of small, medium, or large size. Starting from any of these configurations, if we leave the system to evolve under the natural laws of physics then the tree will grow towards its maximum size, at which point the system will be in one of the target configurations.

Robustness to perturbations. This system is highly robust to perturbations. Consider perturbing the system in any of the following ways:

  • Moving soil from one place to another

  • Removing some leaves from the tree

  • Cutting a branch off the tree

These perturbations might change which particular target configuration is eventually reached — the particular arrangement of branches and leaves in the tree once it reaches its maximum size — but they will not stop the tree from growing taller and evolving towards a target configuration. In fact we could cut the tree right at the base of the trunk and it would continue to evolve towards a target configuration by sprouting a new trunk and growing a whole new tree.

Duality. A tree is a non-dualistic optimizing system. There is no subsystem that is responsible for "doing" the optimization, separately from that which is "being" optimized. Yet the tree does exhibit a tendency to evolve towards a set of target configurations, and can overcome a wide variety of perturbations in order to do so. There are no man-made systems in existence today that are capable of gathering and utilizing resources so flexibly as a tree, from so broad a variety of environments, and there are certainly no man-made systems that can recover from being physically dismembered to such an extent that a tree can recover from being cut at the trunk.

At this point it may be tempting to say that the engine of optimization is natural selection. But recall that we are studying just a single tree growing from seed to maximum size. Can you identify a physical subset of our 100m x 100m x 100m region of space that is this engine of optimization, analogous to how we identified a physical subset of the robot-and-ball system as the engine of optimization (i.e. the physical robot)? Natural selection might be the process by which the initial system came into existence, but it is not the process that drives the growth of the tree towards a target configuration.

It may then be tempting to say that it is the tree’s DNA that is the engine of optimization. It is true that the tree’s DNA exhibits some characteristics of an engine of optimization: it remains unchanged throughout the life of the tree, and physically microscopic perturbations to it can disable the tree. But a tree replicates its DNA in each of its cells, and perturbing just one or a small number of these is not likely to affect the tree’s overall growth trajectory. More importantly, a single strand of DNA does not really have agency on its own: it requires the molecular machinery of the whole cell to synthesize proteins based on the genetic code in the DNA, and the physical machinery of the whole tree to collect and deploy energy, water, and nutrients. Just as it would be incorrect to identify the memory registers containing computer code within a robot as the "true" engine of optimization separate from the rest of the computing and physical machinery that brings this code to life, it is not quite accurate to identify DNA as an engine of optimization. A tree simply does not decompose into engine and object of optimization.

It may also be tempting to ask whether the tree can "really" be said to be undergoing optimization in the absence of any "intention" to reach one of the target configurations. But this expectation of a centralized mind with centralized intentions is really an artifact of us projecting our view of our self onto the world: we believe that we have a centralized mind with centralized intentions, so we focus our attention on optimizing systems with a similar structure. But this turns out to be misguided on two counts: first, the vast majority of optimizing systems do not contain centralized minds, and second, our own minds are actually far less centralized than we think! For now we put this question of whether optimization requires intentions and instead just work within our definition of optimizing systems, which a tree definitely satisfies.

Example: bottle cap

Daniel Filan has pointed out that some definitions of optimization would nonsensically classify a bottle cap as an optimizer, since a bottle cap causes water molecules in a bottle to stay inside the bottle, and the set of configurations in which the molecules are inside a bottle is much smaller than the set of configurations in which the molecules are each allowed to take a position either inside or outside the bottle.

In our framework we have the following:

  • The system consists of a bottle, a bottle cap, and water molecules. The configuration space consists of all the possible spatial arrangements of water molecules, either inside or outside the bottle.

  • The basin of attraction is the set of configurations in which the water molecules are inside the bottle

  • The target configuration set is the same as the basin of attraction

This is not an optimizing system for two reasons.

First, the target configuration set is no smaller than the basin of attraction. To be an optimizing system there must be a tendency to evolve from any configuration within a basin of attraction towards a smaller target configuration set, but in this case the system merely remains within the set of configurations in which the water molecules are inside the bottle. This is no different from a rock sitting on a beach: due to basic chemistry there is a tendency to remain within the set of configurations in which the molecules comprising the rock are physically bound to one another, but it has no tendency to evolve from a wide basin of attraction towards a small set of target configuration.

Second, the bottle cap system is not robust to perturbations since if we perturb the position of a single water molecule so that it is outside the bottle, there is no tendency for it to move back inside the bottle. This is really just the first point above restated, since if there were a tendency for water molecules moved outside the bottle to evolve back towards a configuration in which all the water molecules were inside the bottle, then we would have a basin of attraction larger than the target configuration set.

Example: the human liver

Filan also asks whether one’s liver should be considered an optimizer. Suppose we observe a human working to make money. If this person were deprived of a liver, or if their liver stopped functioning, they would presumably be unable to make money. So are we then to view the liver as an optimizer working towards the goal of making money? Filan asks this question as a challenge to Yudkowsky’s definition of optimization, since it seems absurd to view one’s liver as an optimizer working towards the goal of making money, yet Yudkowsky’s definition of optimization might classify it as such.

In our framework we have the following:

  • The system consists of a human working to make money, together with the whole human economy and world.

  • The basin of attraction consists of the configurations in which there is a healthy human (with a healthy liver) having the goal of making money

  • The target configurations are those in which this person’s bank balance is high. (Interestingly there is no upper bound here, so there is no fixed point but rather a continuous gradient.)

We can expect that this person is capable of overcoming a reasonably broad variety of obstacles in pursuit of making money, so we recognize that this overall system (the human together with the whole economy) is an optimizing system. But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.

In general we cannot expect to decompose optimizing systems into an engine of optimization and object of optimization. We can see that the system has the characteristics of an optimizing system, and we may identify parts, including in this case the person’s liver, that are necessary for these characteristics to exist, but we cannot in general identify any crisp subset of the system as that which is doing the optimization. And picking various subcomponents of the system (such as the person’s liver) and asking "is this the part that is doing the optimization?" does not in general have an answer.

By analogy, suppose we looked at a planet orbiting a star and asked: "which part here is doing the orbiting?" Is it the planet or the star that is the "engine of orbiting"? Or suppose we looked at a car and noticed that the fuel pump is a complex piece of machinery without which the car’s locomotion would cease. We might ask: is this fuel pump the true "engine of locomotion"? These questions don’t have answers because they mistakenly presuppose that we can identify a subsystem that is uniquely responsible for the orbiting of the planet or the locomotion of the car. Asking whether a human liver is an "optimizer" is similarly mistaken: we can see that the liver is a complex piece of machinery that is necessary in order for the overall system to exhibit the characteristics of an optimizing system (robust evolution towards a target configuration set), but beyond this it makes no more sense to ask whether the liver is a true "locus of optimization".

So rather than answering Filan’s question in either the positive or the negative, the appropriate move is to dissolve the concept of an optimizer, and instead ask whether the overall system is an optimizing system.

Example: the universe as a whole

Consider the whole physical universe as a single closed system. Is this an optimizing system?

The second law of thermodynamics tells us that the universe is evolving towards a maximally disordered thermodynamic equilibrium in which it cycles through various maxentropy configuration. We might then imagine that the universe is an optimizing system in which the basin of attraction is all possible configurations of matter and energy, and the target configuration set consists of the maxentropy configurations.

However, this is not quite accurate. Out of all possible configurations of the universe, the vast majority of configurations are at or close to maximum entropy. That is, if we sample a configuration of the universe at random, we have only an astronomically tiny chance of finding anything other than a close-to-uniform gas of basic particles. If we define the basin of attraction as all possible configurations of matter in the universe and the target configuration set as the set of maxentropy configurations, then the target configuration set actually contains almost the entirety of the basin of attraction, with the only configurations that are in the basin of attraction but not the target configuration set being the highly unusual configurations of matter containing stars, galaxies, and so on.

For this reason the universe as a whole does not qualify as an optimizing system under our definition. (Or perhaps it would be more accurate to say that it qualifies as an extremely weak optimizing system.)

Power sources and entropy

The second law of thermodynamics tells us that any closed system will eventually tend towards a maximally disordered state in which matter and energy is spread approximately uniformly through space. So if we were to isolate one of the systems explore above inside a sealed chamber and leave it for a very long period then eventually whatever power source we put inside the sealed chamber would become depleted, and then eventually after that every complex material or compound in the system would degrade into its base products, and then finally we would be left with a chamber filled with a uniform gaseous mixture of whatever base elements we originally put in.

So in this sense there are no optimizing systems at all, since any of the systems above evolve towards their target configuration sets only for a finite period of time, after which they degrade and evolve towards a maxentropy configuration.

This is not a very serious challenge to our definition of optimization since it is common throughout physics and computer science to study various "steady-state" or "fixed point" systems even though the same objection could be made about any of them. We say that a thermometer can be used to build a heat regulator that will keep the temperature of a house within a desired range, and we do not usually need to add the caveat that eventually the house and regulator will degrade into a uniform gaseous mixture due to the heat death of the universe.

Nevertheless, two possible ways to refine our definition are:

  1. We could stipulate that some power source is provided externally to each system we analyze, and then perform our analysis conditional on the existence of that power source.

  2. We could specify a finite time horizon and say that "a system is an optimizing system if it tends towards a target configuration set up to time T".

Connection to dynamical systems theory

The concept of "optimizing system" in this essay is very close to that of a dynamical system with one or more attractors. We offer the following remarks on this connection.

  • A general dynamical system is any system with a state that evolves over time as a function of the state itself. This encompasses a very broad range of systems indeed!

  • In dynamical system theory, an attractor is the term used for what we have called the target configuration set. A fixed point attractor is, in our language, a target configuration set with just one element, such as when computing the square root of two. A limit cycle is, in our language, a system that eventually stably loops through a sequence of states all of which are in the target configuration set, such as a satellite in orbit.

  • We have discussed systems that evolve towards target configurations along some dimensions but not others (e.g. ball in a valley). We have not yet discovered whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions.

  • There is a concept of "well-posedness" in dynamical systems theory that justifies the identification of a mathematical model with a physical system. The conditions for a model to be well-posed are (1) that a solution exists (i.e. the model is not self-contradictory), (2) that there is a unique solution (i.e. the model contains enough information to pick out a single system trajectory), and (3) that the solution changes continuously with the initial conditions (the behavior of the system is not too chaotic). This third condition may present an interesting avenue for future investigation as it seems related to but not quite equivalent to our notion of robustness since robustness as we define it additionally requires that the system continue to evolve towards the same attractor state despite perturbations. Exploring this connection may present an interesting avenue for future investigation.


We have proposed a concept that we call "optimizing systems" to describe systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.

We have analyzed optimizing systems along three dimensions:

  • Robustness, which measures the number of dimensions along which the system is robust to perturbations, and the magnitude of perturbation along these dimensions that the system can withstand.

  • Duality, which measures the extent to which an approximate "engine of optimization" subsystem can be identified.

  • Retargetability, which measures the extent to which the system can be transformed via microscopic perturbations into an equally robust optimizing system but with a different target configuration set.

We have argued that the "optimizer" concept rests on an assumption that optimizing systems can be decomposed into engine and object of optimization (or agent and environment, or mind and world). We have described systems that do exhibit optimization yet cannot be decomposed this way, such as the tree example. We have also pointed out that, even among those systems that can be decomposed approximately into engine and object of optimization (for example, a robot moving a ball around), we will not in general be able to meaningfully answer the question of whether arbitrary subcomponents of the agent are an optimizer not (c.f. the human liver example).

Therefore, while the "optimizer" concept clearly still has much utility in designing intelligent systems, we should be cautious about taking it as a primitive in our understanding of the world. In particular we should not expect questions of the form "is X an optimizer?" to always have answers.

New Comment
80 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]Rohin ShahΩ28590

This is excellent, it feels way better as a definition of optimization than past attempts :) Thanks in particular for the academic style, specifically relating it to previous work, it made it much more accessible for me.

Let's try to build up some core AI alignment arguments with this definition.

Task: A task is simply an “environment” along with a target configuration set. Whenever I talk about a “task” below, assume that I mean an “interesting” task, i.e. something like “build a chair”, as opposed to “have the air molecules be in one of these particular configurations”.

Solving a task: An object O solves a task T if adding O to T’s environment transforms it into an optimizing system for the T’s target configuration set.

Performance on the task: If O solves task T, its performance is quantified by how quickly it reaches the target configuration set, and how robust it is to perturbations.

Generality of intelligence: The generality of O’s intelligence is a function of the number and diversity of tasks T that it can solve, as well as its performance on those tasks.

Optimizing AI: A computer prog... (read more)

8Alex Flint
Thanks for the very thoughtful comment Rohin. I was on retreat last week after I published the article and upon returning to computer usage I was delighted by the engagement from you and others. I like this. We'll presumably need to give O some information about the goal / target configuration set for each task. We could say that a robot capable of moving a vase around is a little bit general since we can have it solve the tasks of placing the vase at many different locations by inputting some latitude/longitude into some appropriate memory location. But this means we're actually pasting in a different object O for each task T -- each of the objects differs in those memory locations into which we're pasting the latitude/longitude. It might be helpful to think of a "agent schema" function that maps goals to objects, so we take the goal part of the task, compute the object O for that goal, then paste this object into the environment. It's also important that O be able to solve the task for a reasonably broad range of environments. Perhaps we could look at it this way: take a system containing a human that is trying to get something done. This is presumably an optimizing system as humans often robustly move their environment towards some desired target configuration set. Then an inner-aligned AI is an object O such that adding it to this environment does not change the target configuration set, but does change the speed and/or robustness of convergence to that target configuration set. Yup very difficult to say much about intentions using the pure outside view approach of this framework. Perhaps we could say that an intent-aligned AI is an inner-aligned AI modulo less robustness. Or perhaps we could say that an intent-aligned AI is an AI that would achieve the goal in a large set of benign environments, but might not achieve it in the presence of unlikely mistakes, unlikely environmental conditions, or the presence of other powerful basins of attraction. But this
4Rohin Shah
+1 to all of this. I was imagining that the tasks can come equipped with some specification, but some sort of counterfactual also makes sense. This also gets around issues of the AI system not being appropriately "motivated" -- e.g. I might be capable of performing the task "lock up puppies in cages", but I wouldn't do it, and so if you only look at my behavior you couldn't say that I was capable of doing that task. +1 especially to this
Mild optimization: the easiest way to solve hard tasks may be to specify a proxy, which an AI maximizes. The AI steers into configurations which maximize the proxy function. Simple proxies don't usually have target sets which we like, because human value is complex. However, maybe we just want the AI to randomly select a configuration which satisfies the proxy, instead of finding the maximally-proxy-ness configuration, which may be bad due to extremal Goodhart.  Quantilization tries to solve this by randomly selecting a target configuration from some top quantile, but this is sensitive to how world states are individuated. 
8Rohin Shah
This makes sense, but I think you'd need a different notion of optimizing systems than the one used in this post. (In particular, instead of a target configuration set, you want a continuous notion of goodness, like a utility function / reward function.)
I'm saying the target set for non-mild optimization is the set of configurations which maximize proxy-ness. Just take the argmax. By contrast, we might want to sample uniformly randomly from the set of satisficing configurations, which is much larger.  (This is assuming a fixed initial state)
4Rohin Shah
It sounds like you're assuming that the target configuration set is built into the AI system. According to me, a major point of this post / framework is to avoid that assumption altogether, and only describe problems in terms of the actual observed system behavior. (This is why within this framework I couldn't formalize outer alignment, and why wireheading and the search / mesa-objective split is unnatural.)
I see the tension you're pointing at. I think I had in mind something like "an AI is reliably optimizing utility function u over the configuration space (but not necessarily over universe-histories!) if it reliably moves into high-rated configurations", and you could draw different epsilon-neighborhoods of optimality in configuration space. It seems like you should be able to talk about dog-maximizers without requiring that the agent robustly end up in the maximum-dog configurations (and not in max-minus-one-dog configs).  I'm still confused about parts of this.
I don't quite understand what this is saying. Suppose we train a giant deep learning model via self-supervised learning on a ton of real-world data (like GPT-N, but w/ other sensory modalities besides text), and then we build a second system designed to provide a nice interface to the giant model. We'd give task specifications to the interface, and it would have some smarts about how to consult the model to figure out what to do. (The interface might also be learned, via reinforcement or supervised learning, or it might be hand-coded.) It seems plausible to me that a system comprising these two pieces, the model and the interface, could be an AGI according to the definition here, in that when combined with a very wide variety of environments (including the task specification in the environment), it could perform at least as well as a human. And since most of the smarts seem like they'd be in the model rather than the interface, I'd count it as getting AGI "primarily via deep learning", even if the interface was hand-coded. But it's not clear to me whether that would count as using deep learning to "create a new optimizing AI system", which is itself the AGI. The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself, and it doesn't seem to have the flavor of mesa-optimization, as I understand it. So it seems like a contradiction to the quoted claim. Have I misunderstood what you're saying here, or do you disagree with the characterization I gave of the hypothetical model + interface system? (Or have I perhaps misunderstood mesa-optimization?)
5Rohin Shah
Yeah, I'm talking about the whole system. Yeah, I agree it doesn't fit the explanation / definition in Risks from Learned Optimization. I don't like that definition, and usually mean something like "running the model parameters instantiates a computation that does 'reasoning'", which I think does fit this example. I mentioned this a bit later in the comment:
Thanks for the additions here. I'm also unsure how to gel this definition (which I quite like) with the inner/outer/mesa terminology. Here is my knuckle dragging model of the post's implication: target_set = f(env, agent) So if we plug in a bunch of values for agent and hope for the best, the target_set we get might might not be what we desired. This would be misalignment. Whereas the alignment task is more like to fix target_set and env and solve for agent. The stuff about mesa optimisers mainly sounds like inadequate (narrow) modelling of what env, agent and target_set are. Usually fixating on some fraction of the problem (win the battle, lose the war problem).

I shared this essay with a colleague where I work (Johns Hopkins University Applied Physics Lab). Here are her comments, which she asked me to share:

This essay proposes a very interesting definition of optimization as the manifestation of a particular behavior of a closed, physical system. I haven’t finished thinking this over, but I suspect it will be (as is suggested in the essay) a useful construct. The reasoning leading to the definition is clearly laid out (thank you!), with examples that are very useful in understanding the concept. The downside of being clearly laid out, however, is that it makes critique easier. I have a few thoughts about the reasoning in the essay.

The first thing I will note is that the essay gives three definitions for an optimizing system. These definitions are close, but not exactly equivalent. The nuances can be important. For example, that the target configuration set and the basin of attraction cannot be equal is obvious; that is made explicit in definition 3, but only implied in definitions 1 and 2. A bigger issue is that there are no criteria or rationale for their extent and relative size.

For example, the essay offers two reasons why the ... (read more)

4Aryeh Englander
This was actually part of a conversation I was having with this colleague regarding whether or not evolution can be viewed as an optimization process. Here are some follow-up comments to what she wrote above related to the evolution angle: We could define the natural selection system as: All configurations = all arrangements of matter on a planet (both arrangements that are living and those that are non-living) Basis of attraction = all arrangements of matter on a planet that meet the definition of a living thing Target configuration set = all arrangements of living things where the type and number of living things remains approximately stable. I think that this system meets the definition of an optimizing system given in the Ground for Optimization essay. For example, predator and prey co-evolve to be about “equal” in survival ability. If a predator become so much better than its prey that it eats them all, the predator will die out along with its prey; the remaining animals will be in balance. I think this works for climate perturbations, etc. too. HOWEVER, it should be clear that there are numerous ways in which this can happen – like the ball on bumpy surface with a lot of convex “valleys” (local minima), there is not just one way that living things can be in balance. So, to say that “natural selection optimized for intelligence” is quite not right – it just fell into a “valley” where intelligence happened. FURTHER, it’s not clear that we have reached the local minimum! Humans may be that predator that is going to fall “prey” to its own success. If that happened (and any intelligent animals remain at all), I guess we could say that natural selection optimized for less-than-human intelligence! Further, this definition of optimization has no connotation of “best” or even better – just equal to a defined set. The word “optimize” is loaded. And its use in connection with natural selection has led to a lot of trouble in terms of human races, and humans v. anim

In this post, the author proposes a semiformal definition of the concept of "optimization". This is potentially valuable since "optimization" is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.

The key paragraph, which summarizes the definition itself, is the following:

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.

In fact, "continues to exhibit this tendency with respect to the same target configuration set despite perturbations" is redundant: clearly as long as the perturbation doesn't push the system out of the basin, the tendency must continue.

This is what is known as "attractor" in dynamical systems the... (read more)

My biggest objection to this definition is that it inherently requires time. At a bare minimum, there needs to be an "initial state" and a "final state" within the same state space, so we can talk about the system going from outside the target set to inside the target set.

One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization. For instance, I could write a convex function optimizer which works by symbolically differentiating the objective function and then algebraically solving for a point at which the gradient is zero.

Is there an argument that I should not consider this to be an optimizer?

My biggest objection to this definition is that it inherently requires time

Fascinating - but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?

One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization.

Yes this is a fascinating case! I'd like to write a whole post about it. Here are my thoughts:

  • First, just as a fun fact, not that it's actually extremely rare to see any non-iterative optimization in practical usage. When we solve linear equations, we could use gaussian elimination but it's so unstable that in practice we use, most likely, the SVD, which is iterative. When we solve a system of polynomial equation we could use something like a Grobner basis or the resultant, but it's so unstable that in practice we something like a companion matrix method, which comes down to an eigenvalue decomposition, which is again iterative.
  • Consider finding the roots of a simple quadratic equation (ie solving a cubic optimization problem). We can use the quadratic equation to do this. But ultimately t
... (read more)
Another big thing to note in examples like e.g. iteratively computing a square root for the quadratic formula or iteratively computing eigenvalues to solve a matrix: the optimization problems we're solving are subproblems, not the original full problem. These crucially differ from most of the examples in the OP in that the system's objective function (in your sense) does not match the objective function (in the usual intuitive sense). They're iteratively optimizing a subproblem's objective, not the "full" problem's objective. That's potentially an issue for thinking about e.g. AI as an optimizer: if it's using iterative optimization on subproblems, but using those results to perform some higher-level optimization in a non-iterative manner, then aligning the sobproblem-optimizers may not be synonymous with aligning the full AI. Indeed, I think a lot of reasoning works very much like this: we decompose a high-dimensional problem into coupled low-dimensional subproblems (i.e. "gears"), then apply iterative optimizers to the subproblems. That's exactly how eigenvalue algorithms work, for instance: we decompose the full problem into a series of optimization subproblems in narrower and narrower subspaces, while the "high-level" part of the algorithm (i.e. outside the subproblems) doesn't look like iterative optimization.
No, the issue is that the usual definition of an optimization problem (e.g. maxx f(x)) has no built-in notion of time, and the intuitive notion of optimization (e.g. "the system makes Y big") has no built-in notion of time (or at least linear time). It's this really fundamental thing that isn't present in the "original problem", so to speak; it would be very surprising and interesting if time had to be involved when it's not present from the start.  If I specifically try to brainstorm things-which-look-like-optimization-but-don't-involve-objective-improvement-over-time, then it's not hard to come up with examples: * Rather than a function-value "improving" along linear time, I could think about a function value improving along some tree or DAG - e.g. in a heap data structure, we have a tree where the "function value" always "improves" as we move from any leaf toward the root. There, any path from a leaf to the root could be considered "time" (but the whole set of nodes at the "same level" can't be considered a time-slice, because we don't have a meaningful way to compare whole sets of values; we could invent one, but it wouldn't actually reflect the tree structure). * The example from the earlier comment: a one-shot non-iterative optimizer * A distributed optimizer: the system fans out, tests a whole bunch of possible choices in parallel, then selects the best of those. * Various flavors of constraint propagation, e.g. the simplex algorithm (and markets more generally)
I think this is a very important distinction. I prefer to use "maximizer" for "timelessly" finding the highest value of an objective function, and reserve "optimizer" for the kind of stepwise improvement discussed in this post. As I use the terms, to maximize something is to find the state with the highest value, but to optimize it is to take an initial state and find a new state with a higher value. I recognize that "optimize" and "optimizer" are sometimes used the way you're saying, as basically synonymous with "maximize" / "maximizer", and I could retreat to calling the inherently temporal thing I'm talking about an "improver" (or an "improvement process" if I don't want to reify it), but this actually seems less likely to be quickly understood, and I don't think it's all that useful for "optimize" and "maximize" to mean exactly the same thing. (There is a subset of optimizers as I (and this post, although I think the value should be graded rather than binary) use the term that in the limit reach the maximum, and a subset of those that even reach the maximum in a finite number of steps, but optimizers that e.g. get stuck in local maxima aren't IMO thereby not actually optimizers, even though they aren't maximizers in any useful sense.)
I think this is covered in my view of optimization via selection, where "direct solution" is the third option. Any one-shot optimizer is implicitly relying on an internal model completely for decision making, rather than iterating, as I explain there. I think that is compatible with the model here, but it needs to be extended slightly to cover what I was trying to say there.
This model is explicitly requiring that you deal only with physical processes, so your convex function solver would require time to get from the starting state to the end state. If it is happening non-iteratively then it would cease to be an optimizing system after it has completed the function, since there is no longer a target configuration.
I'm not sure what you're trying to say here. What's the state space (in which both the start and end state of the optimizer live), what's the basin of attraction (i.e. set of allowed initial conditions), and what's the target region within the state space? And remember, the target region needs to be a subset of the allowed initial conditions.
This end state state is the solution to the convex function being stored in some physical registers. The initial state is those registers containing arbitrary data to be overwritten. It's not particularly interesting as optimization problems go (not a very large basin of attraction) but it fulfills the basic criteria. The unique thing about your example is that it solves once and then it is done (relative to the examples in the post), so it ceases to be an optimizing system once it finishes computing the solution to your convex function. With a slight modification, you could be repeating this algorithm in a loop so it constantly recalculates a new function. Now the initial state can be some value in the result and input registers, and the target region is the set of input equations and appropriate solution in the output registers. It widens the basin of attraction to both the input and output registers rather than just the output.
Ok, two problems with this: * There's no reason why that target set would be smaller than the basin of attraction. Given one such optimization problem, there are no obvious perturbations we could make which would leave the result in the target region. * The target region is not a subset of the basin of attraction. The system doesn't evolve from a larger region to a smaller subset (as in the Venn-diagram visuals in the OP), it just evolves from one set to another. The first problem explicitly violates the OP's definition of an optimizer, and the second problem violates one of the unspoken assumptions present in all of the OP's examples.
I don't believe that either of these points are true. In your original example, there is one correct solution for any convex function. I will assume there is a single hard-coded function for the following, but it can be extended to work for an arbitrary function. The output register having the correct solution is the target set. The output register having any state is the basin of attraction. Clearly any specific number (or rather singleton of that number) is a subset of all numbers, so the target is a subset of the basin. And further, because "all numbers" has more than one element, the target set is smaller than the basin.
This argument applies to literally any deterministic program with nonempty output. Are you saying that every program is an optimizer?
Pretty much, yes, according to definition given. Like I said, not a particularly interesting optimization but an optimization none the less. To extend on this, the basin of optimization is not any smaller than an iterative process acting on a single register (and if you loop the program, then the time horizon is the same). In both cases your basin is anything in that register and the target state is one particular number in that register. As far as I can tell the definition doesn't have any way of saying that one is "more of an optimizer" than the other. If anything, the fixed output is more optimized because it arrives more quickly.
Ok, well, it seems like the one-shot non-iterative optimizer is an optimizer in a MUCH stronger sense than a random program, and I'd still expect a definition of optimization to say something about the sense in which that holds.

This seems great, I'll read and comment more thoroughly later. Two quick comments:

It didn't seem like you defined what it meant to evolve towards the target configuration set. So it seems like either you need to commit to the system actually reaching one of the target configurations to call it an optimiser, or you need some sort of metric over the configuration space to tell whether it's getting closer to or further away from the target configuration set. But if you're ranking all configurations anyway, then I'm not sure it adds anything to draw a binary distinction between target configurations and all the others. In other words, can't you keep the definition in terms of a utility function, but just add perturbations?

Also, you don't cite Dennett here, but his definition has some important similarities. In particular, he defines several different types of perturbation (such as random perturbations, adversarial perturbations, etc) and says that a system is more agentic when it can withstand more types of perturbations. Can't remember exactly where this is from - perhaps The Intentional Stance?

8Rohin Shah
+1 for swapping out the target configuration set with a utility function, and looking for a robust tendency for the utility function to increase. This would also let you express mild optimization (see this thread).
Would this work for highly non-monotonic utility functions? 
It would work at least as well as the original proposal, because your utility function could just be whatever metric of "getting closer to the target states" would be used in the original proposal.
[-]Ben PaceΩ10170

Curated. Come on dude, stop writing so many awesome posts so quickly, it's too much.

This is a central question in the science of agency and optimization. The proposal is simple, you connected it to other ideas from Drexler and Demski+Garrabrant, and you gave a ton of examples of how to apply the idea. I generally get scared by the academic style, worried that the authors will fill out the text and make it really hard to read, but this was all highly readable, and set its own context (re-explaining the basic ideas at the start). I'm looking forward to you discussing it in the comments with Ricraz, Rohin and John.

Please keep writing these posts!

7Alex Flint
Thank you Ben. Reading this really filled me with joy and gives me energy to write more. Thank you for your curation work - it's a huge part of why there is this place for such high quality discussion of topics like this, for which I'm very grateful.
3Ben Pace
You’re welcome :-)
Seconded that the academic style really helped, particularly discussing the problem and prior work early on. One classic introduction paragraph that I was missing is "what have prior works left unaddressed?".

Thanks for the post, this is my favourite formalisation of optimisation so far!

One concern I haven't seen raised so far, is that the definition seems very sensitive to the choice of configuration space. As an extreme example, for any given system, I can always augment the configuration space with an arbitrary number of dummy dimensions, and choose the dynamics such that these dummy dimensions always get set to all zero after each time step. Now, I can make the basin of attraction arbitrarily large, while the target configuration set remains a fixed size. This can then make any such dynamical system seem to be an arbitrarily powerful optimiser.

This could perhaps be solved by demanding the configuration space be selected according to Occam's razor, but I think the outcome still ends up being prior dependent. It'd be nice for two observers who model optimising systems in a systematically different way to always agree within some constant factor, akin to Kolmogorov complexity's invariance theorem, although this may well be impossible.

As a less facetious example, consider a computer program that repeatedly sets a variable to 0. It seems again we can make the optimisi... (read more)

Two examples which I'd be interested in your comments on:

1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy? (Credit for the example goes to Ramana Kumar).

2. Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the "target states", since whatever state I'm in, I'll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).

3Ben Pace
On the topic of the black hole... There’s a way of viewing the world as a series of ”forces”, each trying to control the future. Eukaryotic life is one. Black holes are another. We build many things, humans, from chairs to planes to AIs. Of those three, turning on the AI feels the most like “a new force has entered the game”.  All these forces are fighting over the future, and while it’s odd to think of a black hole as an agent, sometimes when I look at it it does feel natural to think of physics as another optimisation force that’s playing the game with us.
2Alex Flint
Great examples! Thank you. Yes this would qualify as an optimizing system by my definition. In fact just placing a large planet close to a bunch of smaller planets would qualify as an optimizing system if the eventual result is to collapse the mass of the smaller planets into the larger planet. This seems to me to be a lot like a ball rolling down a hill: a black hole doesn't seem alive or agentic, and it doesn't really respond in any meaningful way to hurdles put in its way, but yes it does qualify as an optimizing system. For this reason my definition isn't yet a very good definition of what agency is, or what post-agency concept we should adopt. I like Rohin's comment on how we might view agency in this framework. Yes it's true that using a set of target states rather than an ordering over states means that we can't handle cases where there is a direction of optimization but not a "destination". But if we use an ordering over states then we run into the following problem: how can we say whether a system is robust to perturbations? Is it just that the system continues to climb the preference gradient despite perturbations? But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system. So then we can say "well it should be an ordering over states with a compact representation" or "it should be more compact than competing explanations". This may be okay but it seems quite dicey to me. It actually seems quite important to me that the definition point to systems that "get back on track" even when you push them around. It may be possible to do this with an ordering over states and I'd love to discuss this more.
Hmmm, I'm a little uncertain about whether this is the case. E.g. suppose you have a box with a rock in it, in an otherwise empty universe. Nothing happens. You perturb the system by moving the rock outside the box. Nothing else happens in response. How would you describe this as an optimising system? (I'm assuming that we're ruling out the trivial case of a constant utility function; if not, we should analogously include the trivial case of all states being target states). As a more general comment: I suspect that what starts to happen after you start digging into what "perturbation" means, and what counts as a small or big perturbation, is that you run into the problem that a *tiny* perturbation can transform a highly optimising system to a non-optimising system (e.g. flicking the switch to turn off the AGI). In order to quantify size of perturbations in an interesting way, you need the pre-existing concept of which subsystems are doing the optimisation. My preferred solution to this is just to stop trying to define optimisation in terms of *outcomes*, and start defining it in terms of *computation* done by systems. E.g. a first attempt might be: an agent is an optimiser if it does planning via abstraction towards some goal. Then we can zoom in on what all these words mean, or what else we might need to include/exclude (in this case, we've ruled out evolution, so we probably need to broaden it). The broad philosophy here is that it's better to be vaguely right than precisely wrong. Unfortunately I haven't written much about this approach publicly - I briefly defend it in a comment thread on this post though.
FYI: I think something got messed up with this link. The text of the link is a valid url, but it links to a mangled one (s.t. if you click it you get a 404 error).
That's weird; thanks for the catch. Fixed.
2Alex Flint
Yes you're right, this system would be described by a constant utility function, and yes this is analogous to the case where the target configuration set contains all configurations, and yes this should not be considered optimization. In the target set formulation, we can measure the degree of optimization by the size of the target set relative to the size of the basin of attraction. In your rock example, the sets have the same size, so it would make sense to say that the degree of optimization is zero. This discussion is updating me in the direction that a preference ordering formulation is possible, but that we need some analogy for "degree of optimization" that captures how "tight" or "constrained" the system's evolution is relative to the size of the basin of attraction. We need a way to say that a constant utility function corresponds to a degree of optimization equal to zero. We also need a way to handle the case where our utility function assigns utility proportional to entropy, so again we can describe all physical systems as optimizing systems and thermodynamics ensures that we are correct. This utility function would be extremely flat and wide, with most configurations receiving near-identical utility (since the high entropy configurations constitute the vast majority of all possible configurations). I'm sure there is some way to quantify this - do you know of any appropriate measure? The challenge here is that in order to actually deal with the case you mentioned originally -- the goal of moving as fast as possible -- we need a measure that is not based on the size or curvature of some local maxima of the utility function. If we are working with local maxima then we are really still working with systems that evolve towards a specific destination (although there still may be advantages to thinking this way rather than in terms of a binary set). Nice - I'd love to hear more about this
Doesn't the set-of-target-states version have just the same issue (or an analogous one)? For whatever behavior the system exhibits, I can always say that the states it ends up in were part of its set of target states. So you have to count on compactness (or naturalness of description, which is basically the same thing) of the set of target states for this concept of an optimizing system to be meaningful. No?
2Alex Flint
Well most system don't have a tendency to evolve towards any small set of target states despite perturbations. Most systems, if you perturb then, just go off in some different direction. For example, if you perturb most running computer programs by modifying some variable with a debugger, they do not self-correct. Same with the satellite and billiard balls example. Most systems just don't have this "attractor" dynamic.
Hmm, I see what you're saying, but there still seems to be an analogy to me here with arbitrary utility functions, where you need the set of target states to be small (as you do say). Otherwise I could just say that the set of target states is all the directions the system might fly off in if you perturb it. So you might say that, for this version of optimization to be meaningful, the set of target states has to be small (however that's quantified), and for the utility maximization version to be meaningful, you need the utility function to be simple (however that's quantified). EDIT: And actually, maybe the two concepts are sort of dual to each other. If you have an agent with a simple utility function, then you could consider all its local optima to be a (small) set of target states for an optimizing system. And if you have an optimizing system with a small set of target states, then you could easily convert that into a simple utility function with a gradient towards those states. And if your utility function isn't simple, maybe you wouldn't get a small set of target states when you do the conversion, and vice versa?
4Alex Flint
I'd say the utility function needs to contain one or more local optima with large basins of attraction that contain the initial state, not that the utility function needs to be simple. The simplest possible utility function is a constant function, which allows the system to wander aimlessly and certainly not "correct" in any way for perturbations.
Ah, good points!

This seems like a good definition of optimization for algorithmic systems, but I don't see how it works for physical systems. Going by the primary definition,

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction.

But in the physical world, there are literally zero closed systems with this property. Entropy always increases*, and the target configuration set will never be smaller than the basin of attraction. The dirt-plus-seed-plus-sunlight system has a vastly smaller configuration space than the dirt-plus-tree-plus-heat system. Perhaps one could object that one should discount the incoming sunlight and outgoing heat since the system isn't really closed, but then consider a very similar system consisting of only dirt, air, and fungal spores. Surely if a growing tree is an optimizing system, then a growing mushroom in a closed system is an optimizer too. But the entropy increase in the latter case is unambiguous: the number of ways to arrange atoms into a fully grown m... (read more)

3Thomas Kwa
I agree that closed physical systems aren't optimizing systems. It seems like the first patch given by the author works when worded more carefully: "We could stipulate that some [low-entropy] power source [and some entropy sink] is provided externally to each system we analyze, and then perform our analysis conditional on the existence of that power source." Then an optimizing system with X bits of "optimization power" (which is log(target states / basin of attraction size) or something) has to sink at least X bits, and this seems like it works. Maybe it gets hard to rigorously define the exact form of the power source and entropy sink though? Disclaimer: I don't know statistical mechanics.

I think this is great.

I would want to relate it to a few key points out which I tried to address in a few earlier posts. Principally, I discussed selection versus control, which is about the difference between what optimization does externally, and how it uses models and testing. This related strongly to your conception of an optimizing system, but focused on how much of the optimization process occurs in the system versus in the agent itself. This is principally important because of how it relates to misalignment and Goodharting of various types.

I had hopes to further apply that conceptual model to meas-optimization, but I was a bit unsure how to think about it, and have been working on other projects. At this point, I think your discussion is probably a better conceptual model than the one I was trying to build there - it just needs to be slightly extended to cover the points I was trying to work out in those posts. I'd like to think about how it relates to mesa-optimization as well, but I'm unlikely to actually work on that

Very good. A lot of potential there, I feel.

This is excellent! Very well done, I would love to see more work like this.

I have a whole bunch of things to say along separate directions so I'll break them into separate comments. This first one is just a couple minor notes:

  • For the universe section, the universe doesn't push "toward" maxent, it just wanders around and usually ends up in maxent states because that's most of the states. The basin of attraction includes all states.
  • Regarding "whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions
... (read more)
Does "ergodic on some manifold" here mean it approaches every point within the manifold, as in the ergodicity assumption, or does it mean described by an ergodic function? I realize the latter implies the former, but what I am driving at is the behavior vs. the formalism.
Not sure.

Planned summary for the Alignment Newsletter:

Many arguments about AI risk depend on the notion of “optimizing”, but so far it has eluded a good definition. One natural approach is to say that an optimizer causes the world to have higher values according to some reasonable utility function, but this seems insufficient, as then a <@bottle cap would be an optimizer@>(@Bottle Caps Aren't Optimisers@) for keeping water in the bottle.
This post provides a new definition of optimization, by taking a page from <@Embedded Agents@> and
... (read more)

This post reminds me of thinking from 1950s when people taking inspiration from Wiener's work on cybernetics tried to operationalize "purposeful behavior" in terms of robust convergence to a goal state:

> When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.

I appreciate the direct attention to this process a... (read more)

The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.


A tree is an optimizing system but not a goal-directed agent system.

I'm not sure this is true, at least not in the sense that we usually think about "goal-directed agent systems".

You make a case that there's no distinct subsystem of the tree which is "doing the optimizing", but this isn't obviously relevant to whether the tree is agenty. For instance, the tree presumably still needs to model its environment to some extent, a

... (read more)

At first I particularly liked the idea of identifying systems with "an optimizer" as those which are robust to changes in the object of optimization, but brittle with respect to changes in the engine of optimization.

On reflection, it seems like a useful heuristic but not a reliable definition. A counterexample: suppose we do manage to build a robust AI which maximizes some utility function. One desirable property of such an AI is that it's robust to e.g. one of its servers going down or corrupted data on a hard drive; the AI itself should be robust to as m

... (read more)
4Alex Flint
Yeah I agree that duality is not a good measure of whether a system contains something like an AI. There is one kind of AI that we can build that is highly dualistic. Most present-day AI systems are quite dualistic, because they are predicated on having some robust compute infrastructure that is separate from and mostly unperturbed by the world around it. But there is every reason to go beyond these dualistic designs, for precisely the reason you point to: such systems do tend to be somewhat brittle. I think it's quite feasible to build highly robust AI systems, although doing so will likely require more than just hardening (making it really unlikely for the system to be perturbed). What we really want is an AI system where the core AI itself tends to evolve back to a stable configuration despite perturbations to its core infrastructure. My sense is that this will actually require a significant shift in how we think about AI -- specifically moving from the agent model to something that captures what is good and helpful in the agent model but discards the dualistic view of things.

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.

First, I want to say that I think your definition says something important.

That said, I'm concerned that the above definition would have some potentially problematic false negatives. I... (read more)

Truly a joy to read! Thank you.

To what extent can we identify subsets of the system corresponding to "that which is being optimized" and "that which is doing the optimization"?

The information theoretic measure of individuality attempts to answer exactly this type of question.

From this view, a set of components (the system) is decomposed into two subsets (subsystem + environment). The proposed subsystem is assigned a degree of individuality by measuring the amount of information it shares with its future state, optionally conditioned o... (read more)

2Alex Flint
Thank you for the pointer to this terminology. It seems relevant and I wasn't aware of the terminology before.

systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.

When determining whether a system "optimizes" in practice, the heavy lifting is done by the degree to which the set of states that the system evolves toward -- the suspected "target set" -- feels like it forms a natural class to the observer.

The issue here is that what the observer considers "a natural class" is informed by the data-distribution that the observer has previously been exposed to.

6Alex Flint
It's worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of "natural class" feeds back into more observers observing data that is distributed according to this idea of "natural class", leading to more optimizing systems being built around that idea of "natural class", and so on. Once a certain idea of "natural class" gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of "natural class" to others, and this forms a feedback loop.

Is a metal bar an optimizer?  Looking at the temperature distribution, there is a clear set of target states (states of uniform temperature) with a much larger basin of attraction (all temperature distributions that don't vaporize the bar).

I suppose we could consider the second law of thermodynamics to be the true optimizer in this case.  The consequence is that any* closed physical system is trivially an optimizing system towards higher entropy.

In general, it seems like this optimization criterion is very easy to satisfy if we don't specify what... (read more)

But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.

FYI, it seems pretty clear to me that a liver should be considered an optimiser: as an organ in the human body, it performs various tasks mostly reliably, achieves homeostasis, etc. The question I was rhetorically asking was whether it is an optimiser of one's income, and the answer (I claim) is 'no'.

the exact same answer it would have output without the perturbation.

It always gives the same answer for the last digit?

2Alex Flint
Well we could always just set the last digit to 0 as a post-processing step to ensure perfect repeatability. But point taken, you're right that most numerical algorithms are not quite as perfectly stable as I claimed.

Is a system that optimizes for destruction an optimizing system?

2Alex Flint
A bomb would not be an optimizing system, because the target space is not small compared to the basin of attraction. An AI that systematically dismantles things would be an optimizing system if for no other reason than that the AI systematically preserves its own integrity.

An interesting point about the agency-as-retargetable-optimisation idea is that it seems like you can make the perturbation in various places upstream of the agent's decision-making, but not downstream, i.e. you can retarget an agent by perturbing its sensors more easily than its actuators.

For example, to change a thermostat-controlled heating system to optimise for a higher temperature, the most natural perturbation might be to turn the temperature dial up, but you could also tamper with its thermistor so that it reports lower temperatures. On the other h... (read more)

You said your definition would not classify a bottle cap with water in it as an optimizer. This might be really nit-picky, but I'm not sure it's generally true.

I say this because the water in the bottle cap could evaporate. Thus, supposing there is no rain, from a wide range of possible states of the bottle cap, it would tend towards no longer having water in it.

I know you said you make an exception for tendencies towards increased entropy being considered optimizers. However, this does not increase the entropy of the bottlecap, It could potentially increa... (read more)

2Alex Flint
Thank you for this comment Chantiel. Yes, a container that engineered to evaporate water poured anywhere into it and condense it into a central area would be an optimizing system by my definition. That is a bit like a ball rolling down a hill, which is also an optimizing system and also has nothing resembling agency. I am The bottle cap example was actually about putting a bottle cap onto a bottle and asking whether, since the water now stays inside the bottle, it should be considered an optimizer. I pointed out that this would not qualify as an optimizing system because if you moved a water molecule from the bottle and place it outside the bottle, the bottle cap would not act to put it back inside.

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.

If I'm reasoning correctly, I think this definition could classify just about anything as an optimizer.

Consider inanimate biological substances, like a leaf. From a wide range of initial ... (read more)

2Alex Flint
Thank you for this comment Chantiel. It's a good question. However, the decomposition of a leaf and of the body are both examples of increases in entropy over time, but actually if you look at the size of the "target configuration set" you find that it's almost as big as the whole configuration space, because most of the configurations of a system are high entropy configurations. So I don't think a leaf or an aging body qualify as optimizing systems according to the definition in this post. See also this section. Well you really have to look at the whole system. It's true that if you have a system that consists of a hot part and cold part, the system overall will evolve towards configurations in which the parts are the same temperature. But this is again an example of entropy increasing. Most of the configurations of the joint rock+environment system have the rock and the environment at approximately the same temperature, since if you randomly sample a temperature for each particle, the large number of particles in the rock and the environment mean that the average temperature of all the particles in the rock will be very similar to the average temperature of all the particles in the environment, with high probability. Yes, just like a ball rolling down a hill qualifies as an optimizing system, a table with with billiard balls qualifies as an optimizing system in the sense that you point out. But the whole point of this post is to get past the notion of "optimizer" and "optimization" to the extent that these concepts imply that there is some "agent" performing optimization, and some thing "being optimized", which sneaks the agent model into all our thinking and leads to a very confused picture of things. Thank you for the pointer!
Both of these examples also seem like increases in entropy if you consider the full system. With a fixed amount of energy, there are a tiny number of ways to use it to make the ball move (or to spend energy putting it somewhere other than the bottom of the hill) but an exponentially vast number of ways to use it to increase the temperature of the billiard ball and table (since there are billions of billions of microscopic degrees of freedom that could be vibrating or whatever). 
Thanks for the response. A lot of the examples I pointed out can end up tending towards increasing entropy, but I think there are a lot of things that would be considered optimizer that don't increase entropy. For example, consider a leaf out in the sun, drying out and going from a greenish color to a yellow one. Pretty much all configurations of the leaf would result in the leaf getting more yellow over time. Is the leaf optimizing for yellow-ness? What about a knife that is being used and never sharpened? From a wide range of configurations the knife would tend towards getting duller. Is it optimizing dullness? What about a spaceship leaving Earth? Is it optimizing for the distance from Earth? I suppose we could consider these things optimizers if you really want to. But I'm concerned that a definition that include leaves, knives, billiard balls, and rocket ships is overly broad. More generally, it seems like this definition classifies a lot of things that change in some way over time as an optimizer. In general, if something tends to be different in some ways when it's young than old, then I think you can say the system is an optimizer optimizing for whatever characteristics correlate with oldness. 

Great post.

I'm not keen on the requirement that the basin of attraction be strictly larger than the target configuration set. I don't think this buys you much, and seems to needlessly rule out goals based on narrow maintenance of some status-quo. Switching to a utility function as suggested by others improves things, I think.

For example: a highly capable AI whose only goal is to maintain a chess set in a particular position for as long as possible, but not to care about it after it's disturbed.

Here the target set is identical to the basin of... (read more)