As an extreme case, an 'omni-safe' methodology allegedly remains value-aligned, or fails safely, even if the agent suddenly becomes omniscient and omnipotent (acquires delta probability distributions on all facts of interest and has all describable outcomes available as direct options). Thinking about the 'Omni' scenario is meant to highlight any step on which we've presumed, in a non-failsafe way, that the agent must not obtain definite knowledge of some fact or that it must not have access to some strategic option.

~~E.g., if a new regime has suddenly been entered, posing new EdgeInstantiation problems, then perhaps newly available strategic options~~See real-world agents should ~~not~~ be ~~taken, the agent acting very conservatively pending a programmer consultation. An 'omni-safe' proposal would try to have the above be a rule that applied~~ ~~regardless~~ ~~of what sort of new regime opened up, rather than trying to imagine the probable limits of new regimes and design a rule to operate only inside those guessed limits.~~

~~Running thought experiments against the 'omni' scenario reflects the proposal that a good agent design just~~ ~~shouldn't~~ ~~fail unsafely no matter~~ ~~what~~ knowledge or options it acquires. Why should it, if value alignment and corrigibility have otherwise been handled correctly? Why try to guess what facts an advanced agent can't figure out or what strategic options it can't have? Why make that guess a load-bearing proposition that kills us if we're wrong? Why design an agent that we expect will hurt us if it knows too much or can do too much?

The idea is not so much that we can't lower-bound the speed or upper-bound the power of an advanced agent, as that any problems highlighted by the omni scenario must reflect some kind of underlying flaw in a proposed methodology. Suppose NASA found that an alignment of four planets would cause a rocket's program to crash and the engines to explode. They wouldn't say, "Oh, we're not expecting any alignment like that for the next hundred years, so we're still safe." They'd say, "Wow, that sure was a major bug in the program." Correctly designed programs just shouldn't make the rocket explode under any conditions. If any specific scenario exposes a behavior like that, it shows that some general case is not being handled correctly.

Knows more or has better probability estimates than us
Learns new facts unknown to us
Searches a larger strategy space than we can consider
Confronts new instrumental problems we didn't foresee in detail
Gains power quickly
Has access to greater levels of cognitive power than in the regime where it was previously tested
~~Can use effective~~Wields strategies that wouldn't make sense to us even if we were told about them in advance

~~Much of the reason~~It seems reasonable to ~~be worried about the value alignment problem for cognitively powerful agents is~~expect that there ~~are problems like EdgeInstantiation or UnforeseenMaximums which don't materialize before an agent is advanced, or don't materialize in the same way, or as severely. There are problems~~will be difficulties of dealing with minds smarter than our own, doing things we didn't imagine, that ~~seem~~will be qualitatively different from designing a toaster oven to not burn down a house, or from designing ~~a general~~an AI system that is dumber than human. This means that the concept of 'advanced safety' iswill end up importantly different from the concept of robust pre-advanced AI.

Concretely, it has been argued to be foreseeable for several difficulties including e.g. Edge Instantiation and Unforeseen Maximums, that they won't materialize before an agent is advanced, or won't materialize in the same way, or won't materialize as severely. This means that practice with dumber-than-human AIs may not train us against these difficulties, requiring a separate theory and mental discipline for making advanced AIs safe.

We have observed in practice that many proposals for 'AI safety' do not seem to have been thought through against advanced agent ~~scenarios, so~~scenarios; thus, there seems to be a practical urgency to emphasizing the concept and the difference.

~~EdgeInstantiation~~Edge Instantiation
~~UnforeseenMaximums~~Unforeseen Maximums
ContextChange
ProgrammerDeception
ProgrammerMaximization
PhilosophicalCompetence

~~Summary:~~ A proposal meant to produce value-aligned agents is 'advanced-safe' if it succeeds, or fails safely, in scenarios where the AI becomes much smarter than its human developers.

Definition.

Definition

Importance.

Importance

Concretely, it has been argued to be foreseeable for several difficulties including e.g. ~~Programmer Deception~~programmer deception and ~~Unforeseen Maximums~~unforeseen maximums, that they won't materialize before an agent is advanced, or won't materialize in the same way, or won't materialize as severely. This means that practice with dumber-than-human AIs may not train us against these difficulties, requiring a separate theory and mental discipline for making advanced AIs safe.

Edge ~~Instantiation~~instantiation
Unforeseen ~~Maximums~~maximums
~~ContextChange~~Context change problems
~~ProgrammerDeception~~Programmer deception
~~ProgrammerMaximization~~Programmer maximization
~~PhilosophicalCompetence~~Philosophical competence

Non-advanced-safe methodologies may conceivably be useful if a ~~KnownAlgorithmNonrecursiveAgent~~known algorithm nonrecursive agent can be created that is (a) powerful enough to be relevant and (b) can be known not to become advanced. Even here there may be grounds for worry that such an agent finds unexpectedly strong strategies in some particular subdomain - that it exhibits flashes of domain-specific advancement that break a non-advanced-safe methodology.

Omni-safety.

safety

~~See~~ See: real-world agents should be omni-safe.

Much of the reason to be worried about the value alignment problem for cognitively powerful agents is that there are problems like EdgeInstantiation or UnforeseenMaximums which don't materialize before an agent is advanced, or don't materialize in the same way, or as severely. There are problems of dealing with minds smarter than our own, doing things we didn't imagine, that seem qualitatively different from designing a toaster oven to not burn down a ~~house~~house, or ~~even~~ from designing a general AI system that is dumber than human. This means that the concept of 'advanced safety' is importantly different from the concept of robust pre-advanced AI.

			v1.19.0Dec 16th 2015 GMT	(+215/-210)
			v1.18.0Jul 16th 2015 GMT	(+20/-18)
			v1.17.0Jun 8th 2015 GMT	(+563/-361)
			v1.16.0Jun 8th 2015 GMT	(+108/-7)
			v1.15.0Apr 5th 2015 GMT	(+13/-9)
			v1.14.0Apr 5th 2015 GMT	(+7/-9)
			v1.13.0Apr 5th 2015 GMT	(+6/-10)
			v1.12.0Mar 27th 2015 GMT	(+22/-2870)
			v1.11.0Mar 27th 2015 GMT	(+61/-90)
			v1.10.0Mar 27th 2015 GMT

			v1.19.0Dec 16th 2015 GMT	(+215/-210)
			v1.18.0Jul 16th 2015 GMT	(+20/-18)
			v1.17.0Jun 8th 2015 GMT	(+563/-361)
			v1.16.0Jun 8th 2015 GMT	(+108/-7)
			v1.15.0Apr 5th 2015 GMT	(+13/-9)
			v1.14.0Apr 5th 2015 GMT	(+7/-9)
			v1.13.0Apr 5th 2015 GMT	(+6/-10)
			v1.12.0Mar 27th 2015 GMT	(+22/-2870)
			v1.11.0Mar 27th 2015 GMT	(+61/-90)
			v1.10.0Mar 27th 2015 GMT

Advanced safety