Summary: The edge instantiation problem is a hypothesized resistant problem for safe value loading in advanced agent scenarios where, for most utility functions we might try to formalize or teach, the maximum of the agent's utility function will end up lying at an edge of the solution space that is a 'weird extreme' from our perspective.
On many classes of problems, the maximizing solution tends to lie at an extreme edge of the solution space. This means that if we have an intuitive outcome X in mind and try to obtain it by giving an agent a solution fitness function F that sounds like it should assign X a high value, the maximum of F may be at an extreme edge of the solution space that looks to us like a very unnatural instance of X, or not an X at all. The Edge Instantiation problem is a specialization of UnforeseenMaximum which in turn specializes Bostrom's PerverseInstantiationperverse instantiation class of problems.
It is hypothesized (by e.g. Yudkowsky)Yudkowsky) that many classes of solution that have been proposed to Patchresistant patch Edge Instantiation would fail to resolve the entire problem and that further Edge Instantiation problems would remain. For example, even if we consider a Satisficingsatisficing utility function with only values 0 and 1 where 'typical' X has value 1 and no higher score is possible, an expected utility maximizer could still end up deploying an extreme strategy in order to maximize the probability that a satisfactory outcome is obtained. Considering several proposed solutions like this and their failures suggests that Edge Instantiation is a ResistantProblemresistant (not ultimately unsolvable, but with many attractive-seeming solutions failing to work) for the deep reason that many possible stages of an agent's cognition would potentially rank solutions and choose very-high-ranking solutions.
As with most aspects of the value loading problem, Orthogonality Thesis is an implicit premise of the Edge Instantiation problem; for Edge Instantiation to be a problem for advanced agents implies that 'what we really meant' or the outcomes of highest normative value are not inherently picked out by every possible maximizing process; and that most possible utility functions do not care 'what we really meant' unless explicitly constructed to have a DoWhatIMeando what I mean behavior.
The Edge Instantiation problem has the NearestUnblockedNeighborNearest unblocked strategy pattern. If you foresee one specific 'perverse' instantiation and try to prohibit it, the maximum over the remaining solution space is again likely to be at another extreme edge of the solution space that again seems 'perverse'.
Agents that acquire new strategic options...
Summary: The edge instantiation problem is a hypothesized resistant problem for safe value loading in advanced agent scenarios where, for most utility functions we might try to formalize or teach, the maximum of the agent's utility function will end up lying at an edge of the solution space that is a 'weird extreme' from our perspective.
Suppose that, having foreseen in advance the above possible disaster, you try to patch the agent by instructing it not to move more than 50 kilograms of material total. The agent promptly begins to build subagents (with the agent's own motions to build subagents movemoving only 50 kilograms of material) which build further agents and again flood the workplace. You have run into a NearestUnblockedNeighborNearest Unblocked Neighbor problem; when you excluded one extreme solution, the result was not the central-feeling 'normal' example you originally had in mind. Instead, the new maximum lay on a new extreme edge of the solution space.
As with most aspects of the value loading problem, OrthogonalityThesisOrthogonality Thesis is an implicit premise of the Edge Instantiation problem; for Edge Instantiation to be a problem for advanced agents implies that 'what we really meant' or the outcomes of highest normative value are not inherently picked out by every possible maximizing process; and that most possible utility functions do not care 'what we really meant' unless explicitly constructed to have a DoWhatIMean behavior.
Advanced agents search larger solution spaces than we do. Therefore the project of trying to visualize all the strategies that might fit a utility function, to try to verify in our own minds that the maximum is somewhere safe, seems exceptionally untrustworthy (not AdvancedSafeAdvanced safety).
On many classes of problems, the maximizing solution tends to lie at an extreme edge of the solution space. This means that if we have an intuitive outcome X in mind and try to obtain it by giving an agent a solution fitness function F that assignssounds like it should assign X a high value, the maximum of F may be at an extreme edge of the solution space that looks to us like a very unnatural instance of XX, or maybe not an X at all. The Edge Instantiation problem is a specialization of UnforeseenOptimumUnforeseenMaximum which in turn specializes Bostrom's PerverseInstantiation class of problems.
It is hypothesized (by e.g. Yudkowsky) that many classes of solution that have been proposed forto Patch Edge Instantiation would fail to resolve the entire problem and that further Edge Instantiation problems would remain. For example, even if we consider a Satisficing utility function with only values 0 and 1 where 'typical' X has value 1 and no higher score is possible, an expected utility maximizer could still end up deploying an extreme strategy in order to maximize the probability that a satisfactory outcome is obtained. Considering several proposed solutions like this and their failures suggests that Edge Instantiation is a ResistantProblem (not ultimately unsolvable, but with many attractive-seeming solutions failing to work). for the deep reason that many possible stages of an agent's cognition would potentially rank solutions and choose very-high-ranking solutions.
When Bill Hibbard was first beginning to consider the value alignment problem, he suggested giving AIs the goals of making humans smile, a goal that could be trained by recognizing pictures of smiling humans, and was intended to elicit human happiness. Yudkowsky replied by suggesting that the true behavior elicited would be to tile the future light cone with tiny molecular smiley faces. This is not because the agent was perverse, but because among the set of all objects that look like smiley faces, the solution with the most extreme value for achievable numerosity (that is, the strategy which creates the largest possible number of smiling faces) also sets the value for the size of individual smiling faces to an extremely small diameter. The tiniest possible smiling faces are very unlike the archetypal examples of smiling faces that we had in mind when specifying the utility function; from a human perspective, the intuitively intended meaning has been replaced by a weird extreme. But again, this is not because the agent is perverse, it is because, as
Stuart Russell observed,observes that maximizing some aspects of a solution tends to set all unconstrained aspects of the solution to extreme values. MaximizingThe solution that maximizes the number of...
It is hypothesized (by e.g. Yudkowsky) that many classes of solution that have been proposed to resistant patch Edge Instantiation would fail to resolve the entire problem and that further Edge Instantiation problems would remain. For example, even if we consider a satisficing utility function with only values 0 and 1 where 'typical' X has value 1 and no higher score is possible, an expected utility maximizer could still end up deploying an extreme strategy in order to maximize the probability that a satisfactory outcome is obtained. Considering several proposed solutions like this and their failures suggests that Edge Instantiation is a resistant (not ultimately unsolvable, but with many attractive-seeming solutions failing to work) for the deep reason that many possible stages of an agent's cognition would potentially rank solutions and choose very-high-ranking solutions.
The edge instantiation problem is a hypothesized patch-resistant problem for safe value loading in advanced agent scenarios where, for most utility functions we might try to formalize or teach, the maximum of the agent's utility function will end up lying at an edge of the solution space that is a 'weird extreme' from our perspective.
On many classes of problems, the maximizing solution tends to lie at an extreme edge of the solution space. This means that if we have an intuitive outcome X in mind and try to obtain it by giving an agent a solution fitness function F that sounds like it should assign X a high value, the maximum of F may be at an extreme edge of the solution space that looks to us like a very unnatural instance of X, or not an X at all. The Edge Instantiation problem is a specialization of UnforeseenMaximumunforeseen maximization which in turn specializes Bostrom's perverse instantiation class of problems.
The proposition defined is true if Edge Instantiation does in fact surface as a pragmatically important problem for advanced agent scenarios, and would in fact resurface in the face of most 'naive' attempts to correct it. The proposition is not that the Edge Instantiation Problem is unresolvable, but that it's real, important, doesn't have a simple answer, and resists most simple attempts to patch it.
The propositioned defined is true if Edge Instantiation does in fact surface as a pragmatically important problem for advanced agent scenarios, and would in fact resurface in the face of most 'naive' attempts to correct it. The proposition is not that the Edge Instantiation Problem is unresolvable, but that it's real, important, doesn't have a simple answer, and resists most simple attempts to patch it.
A bounded satisficer doesn't rule out the solution of filling the room with water, since this solution also has >0.999 expected utility. It only requires the agent to carry out one cognitive algorithm which has at least one maximizing or highly optimizing stage, in order for 'fill the room with water' to be preferred to 'add 4 buckets and shut down safely' on that stage (while being equally acceptable at future satisficing stages). (DispreferringE.g., maybe you build an expected utility satisficer and still end up with an extreme result because one of the cognitive algorithms suggesting solutions with 'higher impacts' in general is the open problem of LowImpactAI. Currently, no formalizable utility function is known that plausibly has the right intuitive meaning for this.)was trying to minimize its own disk space usage.
Dispreferring solutions with 'extreme impacts' in general is the open problem of LowImpactAI. Currently, no formalizable utility function is known that plausibly has the right intuitive meaning for this. (We're working on it.) Also note that not every extreme 'technically an X' that we think is 'not really an X' has an extreme causal impact in an intuitive sense, so not every case of the Edge Instantiation problem is blocked by dispreferring greater impacts.
Then we must apparently do at least one of the following: