Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is the first post in a small sequence I'm writing on "Optimizing and Goodhart Effects - Clarifying Thoughts" (I have re-organized to make part 2, "Revisiting What Optimization Means" separate.)

Related to: How does Gradient Descent Interact with Goodhart?, Constructing Goodhart, Selection vs Control, Classifying Specification Problems as variants of Goodhart's Law

Next Posts: Revisiting What Optimization Means with Selection vs. Control, then Applying Overoptimization to Selection vs. Control


Goodhart's law comes in a few flavors, as originally pointed out by Scott, and formalized a bit more in our joint paper. When discussing that paper, or afterwards, we struggled with something Abram Demski clarified recently, which is the difference between selection and control. This matters for formalizing what happens, especially when asking about how Goodhart occurs in specific types of optimizers, as Scott asked recently.

Epistemic Status: This is for de-confusing myself, and has been helpful. I'm presenting what I am fairly confident I understand well for the content written so far, but I'm unclear about usefulness for others, or how clear it comes across. I think that there's more to say after this post, and this will have a few more parts if people are interested. (I spent a month getting to this point, and decided to post and get feedback rather than finish a book first.)

In the first half of the post, I'll review Abram's selection/control distinction, and suggest how it relates to actual design. I'll also argue that there is a bit of a continuum between the two cases, and that we should add an addition extreme case to the typology, direct solution. The second section will revisit what optimization means, and try to note a few different things that could happen and go wrong with Goodhart-like overoptimization.

The third section will talk about Goodhart in this context using the new understanding - trying to more fully explain why Goodhart effects in selection and control fundamentally differs. After this, Part 4 will revisit Mesa-optimizers, and .

Thoughts on how selection and control are used in tandem

In this section, I'll discuss the two types of optimizers Abram discussed; selection, and control, and introduce a third, simpler optimizer, direct solution. I'm also going to mention where embedded agents are different, because that's closely related to selection versus control, and talk about where mesa-optimizers exist.

Starting with the (heavily overused) example of rockets, I want to revisit Abram's categorization of algorithmic optimization versus control. There are several stages involved with getting rockets to go where we want. The first is to design the rocket, which involves optimization, which I'll discuss in two stages, the second is to test, which involves optimization and control in tandem, and the third is to actually guide the rocket we built in flight, which is purely control.

Initially, designing rocket is pure optimization. We might start by building simplified mathematical models to figure out the basic design constraints - if a rocket is bringing people to the moon, we may decide the goal is a rocket and a lander, rather than a single composite. We may decide that certain classes of trajectory / flight paths are going to be used. This is all a set of mathematical exercises, and probably involves only multiply differentiable models that can be directly solved to find an optimum. This is in many ways a third category of "optimizing," in Abram's model, because there is not even a need for looking over the search space. I'll call this direct solution, since we just pick the optimum based on the setup.

After getting a bit closer to actual design, we need to simulate rocket designs and paths, and optimize the simulated solution. This lets you do clever things like build a rocket with a sufficient but not excessive amount of fuel (hopefully with a margin of error.) If we're smart, we optimize with several intended uses and variable factors in mind, to make sure our design is sufficiently robust. (If we're not careful enough to include all relevant factors, we ignore some factor that turns out will matter, like the relationship between temperature of the O-rings and their brittleness, and our design fails in those conditions.) This is all optimizing over a search space. The cost of the search is still comparatively low - not as low as direct solution, and we may use gradient descent, genetic algorithms, simulated annealing, or other strategies. The commonality between these solutions is that they simulate points in the search space, perhaps along with the gradients at that point.

After we settle on a design, we build an actual rocket, and then we test it. This moves back and forth between the very high cost approach of building physical objects and testing them - often to destruction - and simulation. After each test, we probably re-run the simulation to make sure any modifications are still near the optimum we found, or we refine the simulations to re-optimize and pick the next design to build.

Lastly, we build a final design, and launch the rocket. The control system is certainly a mesa-optimizer with regards to the rocket design process. For a rocket, this control is closer to direct optimization than simulation, because the cost of evaluation needs to be low enough for real-time control. The mesa-optimizer would, in this case, use simplified physics to fire the main and guidance rockets to stay near the pre-chosen path. It's probably not allowed to pick a new path - it can't decide that the better solution is to orbit twice instead of once before landing. (Humans may decide this, then hand the mesa-optimizer new parameters.) We tightly constrain the mesa-optimizer, since in a certain sense it's dumber than the design optimizer that chose what to optimize for.

For a more complex system, we may need a complex mesa-optimizer to guide the already designed system. Even for a more complex rocket, we may allow the mesa-optimizer to modify the model used for optimizing, at least in minor ways - it may dynamically evaluate factors like the rocket efficiency, and decide that it's getting 98% of the expected thrust, so it will plan to use that modified parameter in the system model used to mesa-optimize. Giving a mesa-optimizer more control is dangerous, but perhaps necessary to allow it to navigate a complex system.

Now that we've deconfused why optimization is split between selection and control, I can introduce part 2: What does optimization mean?


Ω 14

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 10:52 PM

One way to think about gradient descent is that if there's an N-dimensional parameter space on which you build a grid of M^N points and do a grid search for the minimum, well, you can accomplish the same selection task with O(M) steps of gradient descent (at least if there's no local minimum). So should we say does that M steps of gradient descent gives you O(N log M) bits of optimization? Or something like that? I'm not sure.

I like that intuition overall, but there's a sense that adaptive search does give far more resolution than grid search, but the analysis seems wrong; if I use gradient descent, by eventual accuracy is the step size near the end, which gives far more precise answers than "only" checking a grid of M^N points spaced equally.

Note: as I mentioned in the post, I'd love feedback of all types - for better clarity, criticism of my understanding, suggestions for how to continue, and places the discussion still seems confused.

This is all a set of mathematical exercises, and probably involves only multiply differentiable models that can be directly solved to find an optimum. This is in many ways a third category of "optimizing," in Abram's model, because there is not even a need for looking over the search space. I'll call this direct solution, since we just pick the optimum based on the setup.

Is this distinction a property of the function we're optimizing or the algorithm we're using to optimize it? Is the relevant distinction that there is a unique global optimum? Or that a closed-form solution exists/it takes "constant time"? Or that we can prove a given solution is optimal once we've found it?

I think there are optimization algorithms that satisfy some, but not all, of those properties. Personally I don't think I would create a new entry in the typology for this, and just keep it under "selection".

I found it a bit confusing that you first reffered to selection and control as types of optimizers and then (seemingly?) replaced selection by optimization in the rest of the text.

New to LessWrong?