This is the first post in a small sequence I'm writing on "Optimizing and Goodhart Effects - Clarifying Thoughts" (I have re-organized to make part 2, "Revisiting What Optimization Means" separate.)
Goodhart's law comes in a few flavors, as originally pointed out by Scott, and formalized a bit more in our joint paper. When discussing that paper, or afterwards, we struggled with something Abram Demski clarified recently, which is the difference between selection and control. This matters for formalizing what happens, especially when asking about how Goodhart occurs in specific types of optimizers, as Scott asked recently.
Epistemic Status: This is for de-confusing myself, and has been helpful. I'm presenting what I am fairly confident I understand well for the content written so far, but I'm unclear about usefulness for others, or how clear it comes across. I think that there's more to say after this post, and this will have a few more parts if people are interested. (I spent a month getting to this point, and decided to post and get feedback rather than finish a book first.)
In the first half of the post, I'll review Abram's selection/control distinction, and suggest how it relates to actual design. I'll also argue that there is a bit of a continuum between the two cases, and that we should add an addition extreme case to the typology, direct solution. The second section will revisit what optimization means, and try to note a few different things that could happen and go wrong with Goodhart-like overoptimization.
The third section will talk about Goodhart in this context using the new understanding - trying to more fully explain why Goodhart effects in selection and control fundamentally differs. After this, Part 4 will revisit Mesa-optimizers, and .
Thoughts on how selection and control are used in tandem
In this section, I'll discuss the two types of optimizers Abram discussed; selection, and control, and introduce a third, simpler optimizer, direct solution. I'm also going to mention where embedded agents are different, because that's closely related to selection versus control, and talk about where mesa-optimizers exist.
Starting with the (heavily overused) example of rockets, I want to revisit Abram's categorization of algorithmic optimization versus control. There are several stages involved with getting rockets to go where we want. The first is to design the rocket, which involves optimization, which I'll discuss in two stages, the second is to test, which involves optimization and control in tandem, and the third is to actually guide the rocket we built in flight, which is purely control.
Initially, designing rocket is pure optimization. We might start by building simplified mathematical models to figure out the basic design constraints - if a rocket is bringing people to the moon, we may decide the goal is a rocket and a lander, rather than a single composite. We may decide that certain classes of trajectory / flight paths are going to be used. This is all a set of mathematical exercises, and probably involves only multiply differentiable models that can be directly solved to find an optimum. This is in many ways a third category of "optimizing," in Abram's model, because there is not even a need for looking over the search space. I'll call this direct solution, since we just pick the optimum based on the setup.
After getting a bit closer to actual design, we need to simulate rocket designs and paths, and optimize the simulated solution. This lets you do clever things like build a rocket with a sufficient but not excessive amount of fuel (hopefully with a margin of error.) If we're smart, we optimize with several intended uses and variable factors in mind, to make sure our design is sufficiently robust. (If we're not careful enough to include all relevant factors, we ignore some factor that turns out will matter, like the relationship between temperature of the O-rings and their brittleness, and our design fails in those conditions.) This is all optimizing over a search space. The cost of the search is still comparatively low - not as low as direct solution, and we may use gradient descent, genetic algorithms, simulated annealing, or other strategies. The commonality between these solutions is that they simulate points in the search space, perhaps along with the gradients at that point.
After we settle on a design, we build an actual rocket, and then we test it. This moves back and forth between the very high cost approach of building physical objects and testing them - often to destruction - and simulation. After each test, we probably re-run the simulation to make sure any modifications are still near the optimum we found, or we refine the simulations to re-optimize and pick the next design to build.
Lastly, we build a final design, and launch the rocket. The control system is certainly a mesa-optimizer with regards to the rocket design process. For a rocket, this control is closer to direct optimization than simulation, because the cost of evaluation needs to be low enough for real-time control. The mesa-optimizer would, in this case, use simplified physics to fire the main and guidance rockets to stay near the pre-chosen path. It's probably not allowed to pick a new path - it can't decide that the better solution is to orbit twice instead of once before landing. (Humans may decide this, then hand the mesa-optimizer new parameters.) We tightly constrain the mesa-optimizer, since in a certain sense it's dumber than the design optimizer that chose what to optimize for.
For a more complex system, we may need a complex mesa-optimizer to guide the already designed system. Even for a more complex rocket, we may allow the mesa-optimizer to modify the model used for optimizing, at least in minor ways - it may dynamically evaluate factors like the rocket efficiency, and decide that it's getting 98% of the expected thrust, so it will plan to use that modified parameter in the system model used to mesa-optimize. Giving a mesa-optimizer more control is dangerous, but perhaps necessary to allow it to navigate a complex system.
Now that we've deconfused why optimization is split between selection and control, I can introduce part 2: What does optimization mean?