"Soft optimization" is where, if you ask a Task AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it's not optimizing ~~all~~ that hard. It's okay ~~with, you know,~~with just painting one car pink; it isn't driven to ~~absolutely~~ max out the twentieth decimal place of its car-painting score.

Soft optimization is complementary to low impact. A low impact AGI might try to paint one car pink while minimizing its other footprint or how many other things changed, but it would ~~(if a maximizer rather than a soft optimizer)~~ be trying as hard as possible to minimize that impact and drive it down as close to zero as possible, which might come with its own set of pathologies. What we really want is both properties. We want the AGI to paint one car pink in a way that gets the impact pretty low and then, you know, that's good enough - not have a cognitive pressure to search through weird extremes looking for a way to decrease the twentieth decimal place of the impact. This would tend to break a low impact measure which contained even a subtle flaw, where a soft-optimizing AGI might not put as much pressure on the low impact measure and hence be less likely to break it.

It currently seems like the key subproblem in soft optimization revolves around reflective stability - we don't want "replace the soft optimization part with a simple maximizer, ~~it's not~~becoming a maximizer isn't that hard and gets the task done" to count as a 'soft' solution. Even in human intuitive terms of "optimizing without putting in an unreasonable amount of effort", at some point a sufficiently advanced human intelligence gets lazy and starts building an AGI to do things for them because it's easier that way and only takes a bounded amount of effort. We don't want "construct a second AGI that does hard optimization" to count as soft optimization even if it ends up not taking all that much effort for the first AGI, although "construct an AGI that does θ-soft optimization" could potentially count as a θ-soft solution.

Similarly, we don't want to ~~incentivize~~allow the deliberate creation of environmental or internal daemons, even if it's easy to do it that way or ~~rather,~~requires low effort to end up with that side effect - we'd want the optimizing power of such daemons to count against the ~~softness~~measured optimization power and be rejected as optimizing too hard.

Since both of these phenomena seem hard to exhibit in current machine learning algorithms or faithfully represent in a toy problem, unbounded analysis seems likely to be the main way to go. In general, it seems closely related to the Other-izer Problem which also seems most amenable to unbounded analysis at the present time.

"~~Soft~~Mild optimization" is where, if you ask a Task AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it's not optimizing that hard. It's okay with just painting one car pink; it isn't driven to max out the twentieth decimal place of its car-painting score.

Other suggested terms for this concept have included ~~"mild~~"soft optimization", "sufficient optimization", "minimum viable solution", "pretty good optimization", "moderate optimization", "regularized optimization", "sensible optimization", "casual optimization", "adequate optimization", "good-not-great optimization", "lenient optimization", "parsimonious optimization", and "optimehzation".

~~Soft~~Mild optimization is complementary totaskiness and low impact. A low impact AGI might try to paint one car pink while minimizing its other footprint or how many other things changed, but it would be trying as hard as possible to minimize that impact and drive it down as close to zero as possible, which might come with its own set of pathologies.

What we really want is both properties. We want the AGI to paint one car pink in a way that gets the impact pretty low and then, you know, that's good enough - not have a cognitive pressure to search through weird extremes looking for a way to decrease the twentieth decimal place of the impact. This would tend to break a low impact measure which contained even a subtle flaw, where a ~~soft-~~mild-optimizing AGI might not put as much pressure on the low impact measure and hence be less likely to break it.

Satisficing utility functions ~~is not~~don't necessarily ~~soft.~~mandate or even allow mildness.

~~We'll start with naive satisficing, the 0-1 utility function.~~ Suppose the AI's utility function is 1 when at least one car has been painted pink and 0 otherwise - there's no more utility to be gained by outcomes in which more cars have been painted pink. Will this AI still go to crazy-seeming lengths?

If a preference ordering <p has the property that for every probability distribution on expected outcomes O there's another expected outcome O′ with O<pO′ which requires one more erg of energy to achieve, this is a sufficient condition for using up all the energy in the universe. ~~For example, if~~If converting all reachable matter into pink-painted cars implies a slightly higher ~~probability (after all~~ ~~infinitesimal uncertainties~~ ~~are taken into account)~~probability, that at least one car is pink, that's the maximum of expected utility under the 0-1 utility function.

This rule is now ~~much closer to soft~~a Task and would at least permit mild optimization. The problem is that it doesn't exclude extremely optimized solutions. A 0.99999999 probability of producing at least one pink-painted car also has the property that it's above...

Difference from low impact.impact

Quantilization doesn't seem like exactly what we actually want for multiple reasons. E.g., if ~~saving the world is~~long-run good outcomes are very improbable given status quo, it seems like a quantilizer would try to ~~not~~ have its ~~actions achieve~~policies fall short of that in the long run (a similar problem seems like it might appear in impact measures which imply that ~~very~~ good ~~things~~long-run outcomes have ~~very~~ high impact).

The key important idea that appears in quantilizing is that a quantilizer isn't just as happy to rewrite itself as a maximizer, and isn't just as happy to implement a policy that involves constructing a ~~hard maximizer~~more powerful optimizer in the environment.

Relation to other problems

Soft optimization relates directly to one of the three core reasons why aligning at-least-partially superhuman AGI is hard - making very powerful optimization pressures flow through the system puts a lot of stress on its potential weaknesses and flaws. To the extent we can get soft optimization stable, it might take some of the critical-failure pressure off other parts of the system. (Though again, basic security mindset says to still try to get all the parts of the system as flawless as possible and not tolerate any known flaws in them, then build the fallback options in case they're flawed anyway; one should not deliberately rely on the fallbacks and intend them to be activated.)

Soft optimization seems strongly complementary to low impact and satisficing utility functions. Something that's merely low-impact might exhibit pathological behavior from trying to drive side impacts down to absolutely zero. Something that merely optimizes softly might find some 'weak' or 'not actually trying that hard' solution which nonetheless ended up turning the galaxies into pink-painted cars. Something that has a non-aggregative utility function that has a not-too-hard maximum achievable utility might still go to tremendous lengths to drive the probability of achievement to nearly 1. Something that optimizes softly and has a low impact penalty and has a small, clearly achievable goal, seems much more like the sort of agent that might, you know, just paint the damn car pink and then stop.

Soft optimization can be seen as a further desideratum of the currently open Other-izer Problem - besides being workable for bounded agents and reflectively stable, we'd also like an other-izer idiom that has a (stable) softness parameter.

Approaches

It currently seems like the key subproblem in soft optimization revolves around reflective stability - we don't want "replace the soft optimization part with a simple maximizer, it's not that hard and gets the task done" to count as a 'soft' solution. Similarly, we don't want to incentivize the creation of environmental or internal daemons, or rather, we'd want such daemons to count against the softness and be rejected as too hard. Since both of these phenomena seem hard to exhibit in current machine learning algorithms or faithfully represent in a toy problem, unbounded analysis seems likely to be the main way to go.

"Soft optimization" is where, if you ask a Task AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it's not optimizing all that hard. It's okay with, you know, just painting one car pink; it isn't driven to absolutely max out the twentieth decimal place of its car-painting score.

Other suggested names for soft optimization have included "sufficient optimization", "minimum viable solution", "pretty good optimization", "moderate optimization", "regularized optimization", "sensible optimization", "casual optimization", "adequate optimization", "good-not-great optimization", "lenient optimization", and "optimehzation".

Difference from low impact.

Soft optimization is complementary to low impact. A low impact AGI might try to paint one car pink while minimizing its other footprint or how many other things changed, but it would (if a maximizer rather than a soft optimizer) be trying as hard as possible to minimize that impact and drive it down as close to zero as possible, which might come with its own set of pathologies. What we really want is both properties. We want the AGI to paint one car pink in a way that gets the impact pretty low and then, you know, that's good enough - not have a cognitive pressure to search through weird extremes looking for a way to decrease the twentieth decimal place of the impact. This would tend to break a low impact measure which contained even a subtle flaw, where a soft-optimizing AGI might not put as much pressure on the low impact measure and hence be less likely to break it.

(Obviously, what we want is a perfect low impact measure which will keep us safe even if subjected to unlimited optimization power, but a basic security mindset is to try to make each part safe on its own, then assume it might contain a flaw and try to design the rest of the system to be safe anyway.)

Difference from satisficing

Satisficing is not necessarily soft.

We'll start with naive satisficing, the 0-1 utility function. Suppose the AI's utility function is 1 when at least one car has been painted pink and 0 otherwise - there's no more utility to be gained by outcomes in which more cars have been painted pink. Will this AI still go to crazy-seeming lengths? Yes, because in a partially uncertain / probabilistic environment, there's still no upper bound on the utility which can be gained. A solution with 0.9999 probability of painting at least one car pink is ranked above a solution with a 0.999 probability of painting at least one car pink. If a preference ordering <p has the property that for every probability distribution on expected...

			v1.6.0Feb 19th 2025 GMT	(+13/-13)
			v1.5.0Jun 21st 2016 GMT	(+351/-439)
			v1.4.0Apr 2nd 2016 GMT
			v1.3.0Mar 23rd 2016 GMT	(+4/-76)
			v1.2.0Mar 22nd 2016 GMT	(+911/-39)
			v1.1.0Mar 22nd 2016 GMT	(+2581/-97)
			v1.0.0Mar 22nd 2016 GMT	(+7141)

			v1.6.0Feb 19th 2025 GMT	(+13/-13)
			v1.5.0Jun 21st 2016 GMT	(+351/-439)
			v1.4.0Apr 2nd 2016 GMT
			v1.3.0Mar 23rd 2016 GMT	(+4/-76)
			v1.2.0Mar 22nd 2016 GMT	(+911/-39)
			v1.1.0Mar 22nd 2016 GMT	(+2581/-97)
			v1.0.0Mar 22nd 2016 GMT	(+7141)

LESSWRONG
LW