LESSWRONG
LW

Wikitags

Unforeseen maximum

Edited by Eliezer Yudkowsky, et al. last updated 9th Jun 2016

An unforeseen maximum of a (or other ) is when, e.g., you tell the AI to produce smiles, thinking that the AI will make people happy in order to produce smiles. But unforeseen by you, the AI has an alternative for making even more smiles, which is to convert all matter within reach into tiny molecular smileyfaces.

In other words, you're proposing to give the AI a goal U, because you think U has a maximum around some nice options X. But it turns out there's another option X′ you didn't imagine, with X′>UX, and X′ is not so nice.

Unforeseen maximums are argued to be a of , if you try to nice policies by giving a simple criterion U that, so far as you can see, seems like it'd be best optimized by doing nice things.

Slightly more semiformally, we could say that "unforeseen maximum" is realized as a difficulty when:

  1. A programmer thinking about a utility function U considers policy options πi∈ΠN and concludes that of these options the policy with highest E[U|πi] is π1, and hence a U-maximizer will probably do π1.
  2. The programmer also thinks that their own V will be promoted by π1, that is, E[V|π1]>E[V] or "π1 is ". So the programmer concludes that it's a great idea to build an AI that optimizes for U.
  3. Alas, the AI is searching a policy space ΠM, which although it does contain π1 as an option, also contains an attainable option π0 which programmer didn't consider, with E[U|π0]>E[U|π1]. This is a problem if π0 produces much less V-benefit than π1 or is outright .

That is:

argmaxπi∈ΠN E[U|πi]=π1

argmaxπk∈ΠM E[U|πk]=π0

E[V|π0]≪E[V|π1]

Example: Schmidhuber's compression goal.

Juergen Schmidhuber of IDSIA, during the 2009 Singularity Summit, gave a talk proposing that the best and most moral utility function for an AI was the gain in compression of sensory data over time. Schmidhuber gave examples of valuable behaviors he thought this would motivate, like doing science and understanding the universe, or the construction of art and highly aesthetic objects.

Yudkowsky in Q&A suggested that this utility function would instead motivate the construction of external objects that would internally generate random cryptographic secrets, encrypt highly regular streams of 1s and 0s, and then reveal the cryptographic secrets to the AI.

Translating into the above schema:

  1. Schmidhuber, considering the utility function U of "maximize gain in sensory compression", thought that option π1 of "do art and science" would be the attainable maximum of U within all options ΠN that Schmidhuber considered.
  2. Schmidhuber also considered the option π1 "do art and science" to achieve most of the attainable value under his own criterion of goodness V.
  3. However, while the AI's option space ΠM would indeed include π1 as an option, it would also include the option π0 of "have an environmental object encrypt streams of 1s or 0s and then reveal the key" which would score much higher under U, and much lower under V.

Relation to other foreseeable difficulties

implies an unforeseen maximum may come as a surprise, or not show up during the , because during the development phase the AI's options are restricted to some ΠL⊂ΠM with π0∉ΠL.

Indeed, the pseudo-formalization of a "type-1 " is isomorphic to the pseudoformalization of "unforeseen maximum", except that in a , ΠN and ΠM are identified with "AI's options during development" and "AI's options after a capability gain". (Instead of "Options the programmer is thinking of" and "Options the AI will consider".)

The two concepts are conceptually distinct because, e.g:

  • A could also apply to a decision criterion learned by training, not just a utility function envisioned by the programmer.
  • It's an unforeseen maximum but not a if the programmer is initially reasoning, not that the AI has already been observed to be during a development phase, but rather that the AI ought to be when it optimizes U later because of the supposed nice maximum at π1.

If we hadn't observed what seem like clear-cut cases of some actors in the field being blindsided by unforeseen maxima in imagination, we'd worry less about actors being blindsided by s over observations.

suggests that the real maxima of non-V utility functions will be "strange, weird, and extreme" relative to our own V-views on preferable options.

suggests that people may fail to consider alternative agent options π0 that are very low in V, because the human search function looks for high-V and normal policies. In other words, that Schmidhuber didn't generate "encrypt streams of 1s or 0s and then reveal the key" because this policy was less attractive to him than "do art and science" and because it was weird.

suggests that if you try to add a penalty term to exclude π0, the next-highest U-ranking option will often be some similar alternative π0.01 which still isn't nice.

asserts that our V is narrowly peaked within the space of all achievable outcomes for a , such that we rapidly fall off in V as we move away from the peak. says that V and its corresponding peak have high . Then the peak outcomes identified by any simple U will systematically fail to find V. It's like trying to find a 1000-byte program which will approximately reproduce the text of Shakespeare's Hamlet; algorithmic information theory says that you just shouldn't expect to find a simple program like that.

raises the concern that some people may have trouble accepting the "But π0" critique even after it is pointed out, because of their ideological attachment to a noble goal U (probably actually noble!) that would be even more praiseworthy if U could also serve as a complete utility function for an AGI (which it unfortunately can't).

Implications and research avenues

in goal concepts can be seen as trying to directly tackle the problem of unforeseen maxima. More generally, AI approaches which work on "whitelisting conservative boundaries around approved policy spaces" instead of "search the widest possible policy space, minus some blacklisted parts".

The paradigm for concentrates on trying to accomplish some single which can be accomplished by one or more of limited scope. this might make it easier to identify an adequate safe plan for accomplishing the limited-scope task, rather than needing to identify the fragile peak of V within some much larger landscape. The Task AGI formulation is claimed to let us partially "narrow down" the scope of the necessary U, the part of V that's relevant to the task, and the searched policy space Π to what is only adequate. This might reduce or meliorate, though not by itself eliminate, unforeseen maxima.

can be seen as "not trying so hard, not shoving all the way to the maximum" - the hope is that when combined with a paradigm plus other measures like , this will produce less optimization pressure toward weird edges and unforeseen maxima. (This method is not adequate on its own because an arbitrary adequate-U policy may still not be high-V, ceteris paribus.)

try to maximize similarity to a reference human's immediate behavior, rather than trying to optimize a utility function.

The prospect of being tripped up by unforeseen maxima, is one of the contributing motivations for giving up on in favor of meta-level that learn a utility function or decision rule. (Again, this doesn't seem like a full solution by itself, . If the utility function is a big complicated learned object, that by itself is not a good reason to relax about the possibility that its maximum will be somewhere you didn't foresee, especially after a .)

and the suggest that it may be unusually difficult to explain to actors why π0>Uπ1 is a difficulty of their favored utility function U that allegedly implies nice policy π1. That is, for , this difficulty seems unusually likely to actually trip up sponsors of AI projects or politically block progress on alignment.

Parents:
Children:
utility function
3
3
identify
foreseeable difficulty
Discussion6
Discussion6
preference framework
preference frameworks
development phase
Development phase unpredictable
psychologically
psychological
psychological reasons
criterion of goodness
true criterion of goodness
AGI alignment
AI alignment
beneficial
detrimental
beneficial
beneficial
Imitation-based agents
Fragile_value
Apple_pie_problem
Apple_pie_problem
object-level
hand-coded object-level utilities
Patch resistance
pivotal act
algorithmic complexity
Edge instantiation
Missing the weird alternative
Missing the weird alternative
Missing the weird alternative
Complexity of value
Task
tasks
Task
Combined with other measures,
only one ingredient to be combined with other methods
superintelligence
Conservatism
conservative goals and strategies
Nearest unblocked strategy
Context disaster
context disaster
context disaster
context disaster
context disaster
context disaster
capabilities boost
Mild optimization
advanced agents