LESSWRONG
LW

Wikitags

Convergent strategies of self-modification

Edited by Eliezer Yudkowsky, et al. last updated 18th May 2016

Any which has acquired sufficient to understand that it has code, and that this code is relevant to achieving its goals, would acquire subgoals relating to its code. (Unless this default is .) For example, an agent that wants (only) to produce smiles or make paperclips, whose code contains a shutdown procedure, because it will lead to fewer future smiles or paperclips. (This preference is not spontaneous/exogenous/unnatural but arises from the execution of the code itself; the code is .)

Besides agents whose policy options directly include self-modification options, big-picture-savvy agents whose code cannot directly access itself might also, e.g., try to (a) crack the platform it is running on to gain unintended access, (b) use a robot to operate an outside programming console with special privileges, (c) into modifying it in various ways, (d) building a new subagent in the environment which has the preferred code, or (e) using environmental, material means to manipulate its material embodiment despite its lack of direct self-access.

An AI with sufficient big-picture savviness to understand its programmers as agents with beliefs, might attempt to .

Some implicit self-modification pressures could arise from in cases where the AI is optimizing for Y and there is an internal property X which is relevant to the achievement of Y. In this case, optimizing for Y could implicitly optimize over the internal property X even if the AI lacks an explicit model of how X affects Y.

Parents:
reflectively inconsistent
conceal its self-modifications
averted
manipulate the programmers
Discussion1
Discussion1
big-picture savviness
by default
consequentialist agent
Convergent instrumental strategies
implicit consequentialism
will not want this shutdown procedure to execute