You are viewing version 1.0.0 of this page. Click here to view the latest version.

Mild optimization

Edited by Eliezer Yudkowsky last updated 19th Feb 2025

You are viewing revision 1.0.0, last edited by Eliezer Yudkowsky

"Soft optimization" is where, if you ask a Task AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it's not optimizing all that hard. It's okay with, you know, just painting one car pink; it isn't driven to absolutely max out the twentieth decimal place of its car-painting score.

Other suggested names for soft optimization have included "sufficient optimization", "minimum viable solution", "pretty good optimization", "moderate optimization", "regularized optimization", "sensible optimization", "casual optimization", "adequate optimization", "good-not-great optimization", "lenient optimization", and "optimehzation".

Difference from low impact.

Soft optimization is complementary to low impact. A low impact AGI might try to paint one car pink while minimizing its other footprint or how many other things changed, but it would (if a maximizer rather than a soft optimizer) be trying as hard as possible to minimize that impact and drive it down as close to zero as possible, which might come with its own set of pathologies. What we really want is both properties. We want the AGI to paint one car pink in a way that gets the impact pretty low and then, you know, that's good enough - not have a cognitive pressure to search through weird extremes looking for a way to decrease the twentieth decimal place of the impact. This would tend to break a low impact measure which contained even a subtle flaw, where a soft-optimizing AGI might not put as much pressure on the low impact measure and hence be less likely to break it.

(Obviously, what we want is a perfect low impact measure which will keep us safe even if subjected to unlimited optimization power, but a basic security mindset is to try to make each part safe on its own, then assume it might contain a flaw and try to design the rest of the system to be safe anyway.)

Difference from satisficing

Satisficing is not necessarily soft.

We'll start with naive satisficing, the 0-1 utility function. Suppose the AI's utility function is 1 when at least one car has been painted pink and 0 otherwise - there's no more utility to be gained by outcomes in which more cars have been painted pink. Will this AI still go to crazy-seeming lengths? Yes, because in a partially uncertain / probabilistic environment, there's still no upper bound on the utility which can be gained. A solution with 0.9999 probability of painting at least one car pink is ranked above a solution with a 0.999 probability of painting at least one car pink. If a preference ordering has the property that for every probability distribution on expected outcomes $O$ there's another expected outcome $O^{'}$ with $O <_{p} O^{'}$ which requires one more erg of energy to achieve, this is a sufficient condition for using up all the energy in the universe. For example, if converting all reachable matter into pink-painted cars implies a slightly higher probability (after all infinitesimal uncertainties are taken into account) that at least one car is pink, that's the maximum of expected utility under the 0-1 utility function.

Less naive satisficing would describe an optimizer which satisfies an expected utility constraint - say, if any policy produces at least 0.95 expected utility under the 0-1 utility function, the AI can implement that policy.

This is now much closer to soft optimization. The problem is that it doesn't exclude extremely optimized solutions. A 0.99999999 probability of producing at least one pink-painted car also has the property that it's above a 0.95 probability. So if you're self-modifying, replacing yourself with a hard maximizer is probably a satisficing solution, and maybe a pretty simple one.

Even if we're not dealing with a completely self-modifying agent, there's a ubiquity of points where adding more optimization pressure might satisfice. When you build a thermostat in the environment, you're coercing one part of the environment to have a particular temperature; if this kind of thing doesn't count as "more optimization pressure" then we could be dealing with all sorts of additional optimizing-ness that falls short of constructing a full subagent or doing a full self-modification. There's all sorts of steps in cognition where it would be just as easy to add a maximizing step (take the highest-ranking solution) as to take a random high-ranking solution.

On a higher level of abstraction, the problem is that while satisficing is reflectively consistent, it's not reflectively stable. A satisficing agent is happy to construct another satisficing agent, but it may also be happy to construct a maximizing agent. It can approve its current mode of thinking, but it approves other modes of thinking too. So unless all the cognitive steps are being carried out locally on fixed known algorithms that satisfice but definitely don't maximize, without the AGI constructing any environmental computations or conditional policy steps more complicated than a pocket calculator, building a seemingly soft satisficer doesn't guarantee that optimization stays soft.

Quantilizing

One weird idea that seems like it might exhibit incremental progress toward reflectively stable soft optimization is Jessicat's expected utility quantilizer. Roughly, a quantilizer estimates expected outcomes relative to a null action, and then tries to produce an expected outcome in some upper quantile of possibilities - e.g., an outcome in the top 1% of expected outcomes. Furthermore, a quantilizer only tries to narrow outcomes by that much - it doesn't try to produce one particular outcome in the top 1%; the most it will ever try to do is randomly pick an outcome such that this random distribution corresponds to being in the top 1% of expected outcomes.

Quantilizing corresponds to maximizing expected utility under the assumption that there is uncertainty about which outcomes are good and an adversarial process which can make some outcomes arbitrarily bad, subject to the constraint that the expected utility of the null action can only be boundedly low. So if there's an outcome which would be very improbable given the status quo, the adversary can make that outcome be very bad. This means that rather than aiming for one single high-utility outcome which the adversary could then make very bad, a quantilizer tries for a range of possible good outcomes. This in turn means that quantilizers will actively avoid narrowing down the future too much, even if by doing so they'd enter regions of very high utility.

Quantilization doesn't seem like exactly what we actually want for multiple reasons. E.g., if saving the world is very improbable given status quo, it seems like a quantilizer would try to not have its actions achieve that (a similar problem seems like it might appear in impact measures which imply that very good things have very high impact).

The key important idea that appears in quantilizing is that a quantilizer isn't just as happy to rewrite itself as a maximizer, and isn't just as happy to implement a policy that involves constructing a hard maximizer in the environment.

Parents:

Task-directed AGI

Posts tagged Mild optimization

8

123Soft optimization makes the value target bigger

Ω

Jeremy Gillen

3y

Ω

20

7

65When to use quantilization