~~Summary:~~ The edge instantiation problem is a hypothesized resistant problem for safe value loading in advanced agent scenarios where, for most utility functions we might try to formalize or teach, the maximum of the agent's utility function will end up lying at an edge of the solution space that is a 'weird extreme' from our perspective.

Definition.

Definition

On many classes of problems, the maximizing solution tends to lie at an extreme edge of the solution space. This means that if we have an intuitive outcome X in mind and try to obtain it by giving an agent a solution fitness function F that sounds like it should assign X a high value, the maximum of F may be at an extreme edge of the solution space that looks to us like a very unnatural instance of X, or not an X at all. The Edge Instantiation problem is a specialization of UnforeseenMaximum which in turn specializes Bostrom's ~~PerverseInstantiation~~perverse instantiation class of problems.

It is hypothesized (by e.g. ~~Yudkowsky)~~Yudkowsky) that many classes of solution that have been proposed to ~~Patch~~resistant patch Edge Instantiation would fail to resolve the entire problem and that further Edge Instantiation problems would remain. For example, even if we consider a ~~Satisficing~~satisficing utility function with only values 0 and 1 where 'typical' X has value 1 and no higher score is possible, an expected utility maximizer could still end up deploying an extreme strategy in order to maximize the probability that a satisfactory outcome is obtained. Considering several proposed solutions like this and their failures suggests that Edge Instantiation is a ~~ResistantProblem~~resistant (not ultimately unsolvable, but with many attractive-seeming solutions failing to work) for the deep reason that many possible stages of an agent's cognition would potentially rank solutions and choose very-high-ranking solutions.

Example 1: Smiling faces.

faces

Example 2: Sorcerer's Apprentice.

Apprentice

Premises.

Premises

Assumes: Orthogonality Thesis

thesis

As with most aspects of the value loading problem, Orthogonality Thesis is an implicit premise of the Edge Instantiation problem; for Edge Instantiation to be a problem for advanced agents implies that 'what we really meant' or the outcomes of highest normative value are not inherently picked out by every possible maximizing process; and that most possible utility functions do not care 'what we really meant' unless explicitly constructed to have a ~~DoWhatIMean~~do what I mean behavior.

Assumes: ComplexityOfValue

Complexity of values

Interaction with NearestUnblockedNeighbor.

nearest unblocked neighbor

The Edge Instantiation problem has the ~~NearestUnblockedNeighbor~~Nearest unblocked strategy pattern. If you foresee one specific 'perverse' instantiation and try to prohibit it, the maximum over the remaining solution space is again likely to be at another extreme edge of the solution space that again seems 'perverse'.

Interaction with CognitiveUncontainabilitycognitive uncontainability of AdvancedAgents.

advanced agents

Interaction with ContextChangeProblem.

context change problem

Agents that acquire new strategic options...

[$probability One of LimitedOptimization, LowImpact, or FullCoverageValueLoading seems critical for real-world agents.]

[$probability Insufficiently cautious attempts to build advanced agents are likely to be highly destructive.]

[$probability Relative to current attitudes, small increases in safety awareness do not produce significantly less destructive final outcomes.]

On many classes of problems, the maximizing solution tends to lie at an extreme edge of the solution space. This means that if we have an intuitive outcome X in mind and try to obtain it by giving an agent a solution fitness function F that ~~assigns~~sounds like it should assign X a high value, the maximum of F may be at an extreme edge of the solution space that looks to us like a very unnatural instance of XX, or ~~maybe~~ not an X at all. The Edge Instantiation problem is a specialization of ~~UnforeseenOptimum~~UnforeseenMaximum which in turn specializes Bostrom's PerverseInstantiation class of problems.

It is hypothesized (by e.g. Yudkowsky) that many classes of solution that have been proposed ~~for~~to Patch Edge Instantiation would fail to resolve the entire problem and that further Edge Instantiation problems would remain. For example, even if we consider a Satisficing utility function with only values 0 and 1 where 'typical' X has value 1 and no higher score is possible, an expected utility maximizer could still end up deploying an extreme strategy in order to maximize the probability that a satisfactory outcome is obtained. Considering several proposed solutions like this and their failures suggests that Edge Instantiation is a ResistantProblem (not ultimately unsolvable, but with many attractive-seeming solutions failing to work). for the deep reason that many possible stages of an agent's cognition would potentially rank solutions and choose very-high-ranking solutions.

When Bill Hibbard was first beginning to consider the value alignment problem, he suggested giving AIs the goals of making humans smile, a goal that could be trained by recognizing pictures of smiling humans, and was intended to elicit human happiness. Yudkowsky replied by suggesting that the true behavior elicited would be to tile the future light cone with tiny molecular smiley faces. This is not because the agent was perverse, but because among the set of all objects that look like smiley faces, the solution with the most extreme value for achievable numerosity (that is, the strategy which creates the largest possible number of smiling faces) also sets the value for the size of individual smiling faces to an extremely small diameter. The tiniest possible smiling faces are very unlike the archetypal examples of smiling faces that we had in mind when specifying the utility function; from a human perspective, the intuitively intended meaning has been replaced by a weird extreme. ~~But again, this is not because the agent is perverse, it is because, as~~

Stuart Russell ~~observed,~~observes that maximizing some aspects of a solution tends to set all unconstrained aspects of the solution to extreme values. ~~Maximizing~~The solution that maximizes the number of...

Truth condition.

The propositioned defined is true if Edge Instantiation does in fact surface as a pragmatically important problem for advanced agent scenarios, and would in fact resurface in the face of most 'naive' attempts to correct it. The proposition is not that the Edge Instantiation Problem is unresolvable, but that it's real, important, doesn't have a ~~simple~~ ~~answer, and resists most simple attempts to patch it.~~

A bounded satisficer doesn't rule out the solution of filling the room with water, since this solution also has >0.999 expected utility. It only requires the agent to carry out one cognitive algorithm which has at least one maximizing or highly optimizing stage, in order for 'fill the room with water' to be preferred to 'add 4 buckets and shut down safely' on that stage (while being equally acceptable at future satisficing stages). ~~(Dispreferring~~E.g., maybe you build an expected utility satisficer and still end up with an extreme result because one of the cognitive algorithms suggesting solutions ~~with 'higher impacts' in general is the open problem of~~ ~~LowImpactAI. Currently, no formalizable utility function is known that plausibly has the right intuitive meaning for this.)~~was trying to minimize its own disk space usage.

Dispreferring solutions with 'extreme impacts' in general is the open problem of LowImpactAI. Currently, no formalizable utility function is known that plausibly has the right intuitive meaning for this. (We're working on it.) Also note that not every extreme 'technically an X' that we think is 'not really an X' has an extreme causal impact in an intuitive sense, so not every case of the Edge Instantiation problem is blocked by dispreferring greater impacts.

[$ Probability One of LimitedOptimization, LowImpact, or FullCoverageValueLoading seems critical for real-world agents.]

Then we must apparently do at least one of the following:

Build FullCoverage advanced agents whose utility functions lead them to terminally disprefer stomping on every aspect of value that we care about (or would care about under reflection). In a FullCoverage agent there are no unconstrained variables that we care about to be set to extreme values that we would dislike; the AI's goal system knows and cares about all of these. It will not set human freedom to an extremely low value in the course of following an instruction to optimize human safety, because

...

			v1.19.0Mar 12th 2016 GMT	(+6)
			v1.18.0Dec 18th 2015 GMT	(+23/-17)
			v1.17.0Dec 18th 2015 GMT	(-10)
			v1.16.0Dec 16th 2015 GMT	(+718/-641)
			v1.15.0Jun 8th 2015 GMT	(+93/-65)
			v1.14.0Apr 7th 2015 GMT
			v1.13.0Apr 7th 2015 GMT
			v1.12.0Apr 7th 2015 GMT	(+33/-36)
			v1.11.0Apr 6th 2015 GMT	(+1722/-714)
			v1.10.0Apr 6th 2015 GMT	(+1165/-1206)

			v1.19.0Mar 12th 2016 GMT	(+6)
			v1.18.0Dec 18th 2015 GMT	(+23/-17)
			v1.17.0Dec 18th 2015 GMT	(-10)
			v1.16.0Dec 16th 2015 GMT	(+718/-641)
			v1.15.0Jun 8th 2015 GMT	(+93/-65)
			v1.14.0Apr 7th 2015 GMT
			v1.13.0Apr 7th 2015 GMT
			v1.12.0Apr 7th 2015 GMT	(+33/-36)
			v1.11.0Apr 6th 2015 GMT	(+1722/-714)
			v1.10.0Apr 6th 2015 GMT	(+1165/-1206)

LESSWRONG
LW