Value achievement dilemma

and that we can see Earth-originating intelligent life as having two possible , and extinction. If intelligent life goes extinct, especially if it drastically damages or destroys the ecosphere in the process, new intelligent life seems unlikely to arise on Earth. If Earth-originating intelligent life becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another.

We can see the place of AI alignment in the larger scheme by considering its parent problem, its sibling problems, and examples of its child ~~problems, grandchildren, and siblings.~~problems.

The value achievement dilemma: How does Earth-originating intelligent life achieve an acceptable proportion of its potential value?
- The AI alignment problem: How do we create AIs such that running them produces (global) outcomes of acceptably high value?
- The value alignment problem: How do we create AIs that want or prefer to cause events that are of high value? If we accept that we should solve the value alignment problem by creating AIs that prefer or want in particular ways, how do we do that? -
  - The Value identification problem or value learning problem: How can we pinpoint, in the AI's decision-making, outcomes that have high 'value'? (Despite all the foreseeable difficulties such as edge instantiation and Goodhart's Curse.)
- Other properties of aligned AIs such as e.g. ~~Corrigibility~~corrigibility: How can we create AIs such that, when we make an error in identifying value or specifying the decision system, the AI does not resist our attempts to correct what we regard as an error?
- Oppositional features such as e.g. boxing that are intended to mitigate harm if the AI's behavior has gone outside expected bounds.
- The intelligence amplification problem. How can we create smarter humans, preferably without driving them insane or otherwise ending up with evil ones?
- The value selection problem. How can we figure out what to substitute in for the metasyntactic variable 'value'? (Answer.)

~~Clickbait: How can Earth-originating intelligent life achieve most of its potential value, whether by AI or otherwise?~~

~~Summary: The value achievement dilemma is the general, broad challenge faced by Earth-originating intelligent life in steering our~~ ~~cosmic endowment~~ ~~into a state of high~~ ~~value~~ - successfully turning the stars into a happy civilization. We face potential existential catastrophes (resulting in our extermination or the corruption of the cosmic endowment) such as the possibility of military use of nanotechnology, non-value-aligned AIs, or insane smart uploads. A strategy is ~~relevant~~ ~~to value achievement only if success is a game-changer for the overall dilemma humanity faces. E.g., [value-aligned]~~ ~~powerful AIs~~ or ~~intelligence-enhanced humans~~ ~~both seem to qualify as strategically relevant, but an AI restricted to~~ ~~only~~ ~~prove theorems in Zermelo-Frankel set theory has no obvious game-changing use.~~

The point of value alignment can be seen in the suggestion by Eliezer Yudkowsky~~Todo~~

and Nick Bostrom

that we can see Earth-originating intelligent life as having two possible , superintelligence and extinction. If intelligent life goes extinct, especially if it damages or destroys the ecosphere, it will probably stay extinct. If it becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another. Furthermore, by the Gandhi stability argument, any sufficiently advanced cognitive agent will be stable in its motivations or . So if and when life wanders into the superintelligence attractor, it will either end up in a stable state of e.g. ~~[fun-loving]~~fun-loving or and hence achieving lots of ~~Value~~value, or it will go on forever.

Direct creation of a fully normatively aligned Sovereign agent.
Creation of a ~~Genie~~genie powerful enough to avert the creation of other UnFriendly AI.
Intelligence-augmented humans (or 64-node clustered humans linked by brain-computer interface brain information exchange, etcetera) who are able and motivated to solve the value alignment problem.

On the other hand, consider someone who proposes that "Rather than building AI, we should build Oracle AIs that just answer questions," and who then, after further exposure to the concept of the ~~[AI-~~AI-Box ~~Experiment]~~Experiment and cognitive uncontainability, further narrows their specification to say that an Oracle running in three layers of sandboxed simulation must output only formal proofs of given theorems in Zermelo-Fraenkel set theory, and a heavily sandboxed and provably correct verifier will look over this output proof and signal 1 if it proves the target theorem and 0 otherwise, at some fixed time to avoid timing attacks. This is meant to ensure that the Oracle can only influence a single binary bit of our world in any predictable way.

Summary: The value achievement dilemma is the general, broad challenge faced by Earth-originating intelligent life in steering our cosmic endowment into a state of high value - successfully turning the stars into a happy civilization. We face ~~problems~~potential existential catastrophes (resulting in our extermination or the corruption of the cosmic endowment) such as the possibility of military ~~technology or~~use of nanotechnology, non-value-aligned ~~AIs.~~AIs, or insane smart uploads. A strategy is relevant to value achievement only if success is a game-changer for the overall dilemma humanity faces. E.g., [value-aligned] powerful AIs or ~~[intelligence-~~intelligence-enhanced ~~humans]~~humans both seem to qualify as strategically relevant, but an AI restricted to only prove theorems in Zermelo-Frankel set theory has no obvious game-changing use.

The point of value alignment can be seen in the suggestion by Yudkowsky Todofind a citation - CFAI? PtS? and Bostrom

that we can see Earth-originating intelligent life as having two possible stable states, superintelligence and extinction. If intelligent life goes extinct, especially if it damages or destroys the ecosphere, it will probably stay extinct. If it becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another. Furthermore, by the , any sufficiently advanced cognitive agent will be stable in its motivations or . So if and when life wanders into the superintelligence attractor, it will either end up in a stable state of e.g. [fun-loving] or the reflective equilibrium of its creators' civilization and hence achieving lots of , or it will go on maximizing paperclips forever.

On the other hand, consider someone who proposes that "Rather than building AI, we should build AIs that just answer questions," and who then, after further exposure to the concept of the [AI-Box Experiment] and cognitive uncontainability, further narrows their specification to say that an Oracle running in three layers of sandboxed simulation must output only formal proofs of given theorems in Zermelo-Fraenkel set theory, and a heavily sandboxed and provably correct verifier will look over this output proof and signal 1 if it proves the target theorem and 0 otherwise, at some fixed time to avoid timing attacks. This is meant to ensure that the Oracle can only influence a single binary bit of our world in any predictable way.

An obvious target strategy for a limited Genie is to ask it to create nanotechnology and use that tech to gently shut down all other AI projects, e.g. by copying the software and then sealing the hardware. But this putative Genie can't model agents, so it may not be able to identify only potential AI projects. We could use such a Genie to build nanotechnology and then heal the sick or create lots of food for the hungry, but while this is a conventional good, we haven't yet identified any path to victory that stops other projects from building AI, or lets us create intelligence-enhanced humans (unless this can be done without modeling human minds or other agents at all).

They must be genuinely easier or safer than the value alignment problem or the non-limited form of the AI.
They must be game-changers for the overall situation in which we find ourselves, opening up a clear path to victory from the newly achieved scenario.

The value achievement dilemma is a way of framing the ~~value~~AI alignment problem in a larger context. This ~~both~~ emphasizes that there might be possible solutions besides ~~AI,~~AI; and also emphasizes that such solutions must meet a high bar of potency or efficacy in order to resolve our basic ~~dilemmas~~dilemmas, the way that a sufficiently value-aligned and cognitively powerful AI could resolve our basic dilemmas. Or at least change the nature of the gameboard, the way that a Task AGI could take actions to prevent destruction by later AGI projects, even if is only narrowly value-aligned and cannot solve the whole problem.

The point of ~~value alignment~~considering posthuman scenarios in the long run, and not just an immediate Task AGI as band-aid, can be seen in the suggestion by Eliezer Yudkowsky

and that we can see Earth-originating intelligent life as having two possible , and extinction. If intelligent life goes extinct, especially if it drastically damages or destroys the ~~ecosphere, it will probably stay extinct.~~ecosphere in the process, new intelligent life seems unlikely to arise on Earth. If itEarth-originating intelligent life becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another.

Furthermore, by the ~~Gandhi~~generic preference stability argument, any sufficiently advanced cognitive agent ~~will~~is very likely to be stable in its motivations or meta-preference framework. So if and when life wanders into the superintelligence attractor, it will either end up in a stable state of e.g. fun-loving or the reflective equilibrium of its creators' civilization and hence achieving lots of value, or ita misaligned AI will go on maximizing paperclips forever.

Other positive events ~~or capacities~~ seem like they could potentially prompt entry into the high-value-achieving superintelligence attractor:

Direct creation of a fully normatively aligned Autonomous AGI agent.
Creation of a ~~genie~~Task AGI powerful enough to avert the creation of other UnFriendly AI.
Intelligence-augmented humans (or 64-node clustered humans linked by brain-computer interface brain information exchange, etcetera) who are able and motivated to solve the ~~value~~AI alignment problem.

On the other hand, consider someone who proposes that "Rather than building AI, we should build Oracle AIs that just answer questions," and who then, after further exposure to the concept of the AI-Box Experiment and cognitive uncontainability, further narrows their specification to say that an Oracle running in three layers of sandboxed simulation must output only formal proofs of given theorems in Zermelo-Fraenkel set ~~theory,~~theory, and a heavily sandboxed and provably correct verifier will look over this output proof and signal 1 if it proves the target theorem and 0 otherwise, at some fixed time to avoid timing attacks....

Read More (1058 more words)

and that we can see Earth-originating intelligent life as having two possible , superintelligence and extinction. If intelligent life goes extinct, especially if it damages or destroys the ecosphere, it will probably stay extinct. If it becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another. Furthermore, by the , any sufficiently advanced cognitive agent will be stable in its motivations or . So if and when life wanders into the superintelligence attractor, it will either end up in a stable state of e.g. or and hence achieving lots of , or it will go on forever.

Direct creation of a fully normatively aligned ~~Sovereign~~Autonomous AGI agent.
Creation of a genie powerful enough to avert the creation of other UnFriendly AI.
Intelligence-augmented humans (or 64-node clustered humans linked by brain-computer interface brain information exchange, etcetera) who are able and motivated to solve the value alignment problem.

On the other hand, suppose someone proposes, as an intended Relevant Limited AI, a ~~[non-~~non-self-~~modifying]~~ modifyingGenie agent that is only allowed to model non-cognitive material systems and has been constructed and injuncted not to model other agents, whether those agents are human or other hypothetical minds.

Clickbait: How can Earth-originating intelligent life achieve most of its potential value, whether by AI or otherwise?

Summary: The value achievement dilemma is the general, broad challenge faced by Earth-originating intelligent life in ~~configuring~~steering our endowment into a state of high value~~. Creating~~ - successfully turning the stars into a happy civilization. We face problems such as the possibility of military technology or non-value-aligned AIs. A strategy is relevant to value achievement only if success is a game-changer for the overall dilemma humanity faces. E.g., [value-aligned] ~~agent~~AIs ~~would~~or [intelligence-enhanced humans] both seem to qualify as strategically relevant, but an AI restricted to only prove theorems in Zermelo-Frankel set theory has no obvious game-changing use.

The value achievement dilemma is a way of framing the value alignment problem in a larger context. This both emphasizes that there might be possible solutions besides AI, and emphasizes that such solutions must meet a high bar of potency or efficacy in order to resolve our basic dilemmas the way that a sufficiently value-aligned and cognitively powerful AI could resolve our basic dilemmas.

The point of value alignment can be seen in the suggestion by Yudkowsky Todoa citation - CFAI? PtS? and Bostrom

cite Superintelligence

that we can see Earth-originating intelligent life as having two possible states, superintelligence and extinction. If intelligent life goes extinct, especially if it damages or destroys the ecosphere, it will probably stay extinct. If it becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another. Furthermore, by the stability argument, any sufficiently advanced cognitive agent will be stable in its motivations or preference framework. So if and when life wanders into the superintelligence attractor, it will either end up in a stable state of e.g. [fun-loving] or reflective equilibrium of its creators' civilization and hence achieving lots of Value, or it will go on paperclips forever.

Among the dilemmas we face in getting into the high-value-achieving attractor, rather than the extinction attractor or the equivalence class of paperclip maximizers, are:

The possibility of careless (or insufficiently cautious, or much less likely malicious) actors creating a non-value-aligned AI that undergoes an intelligence explosion.
The possibility of engineered superviruses destroying enough of civilization that the remaining humans go extinct without ever reaching sufficiently advanced technology.
Conflict between multipolar powers with nanotechnology resulting in a super-nuclear-exchange disaster that extinguishes all life.

Other events or capacities seem like they could prompt entry into the high-value-achieving superintelligence attractor:

Direct creation of a fully normatively aligned Sovereign agent.
Creation of a Genie powerful enough to avert the creation of other AI.
Intelligence-augmented humans (or 64-node

...

			v1.10.0Feb 2nd 2017 GMT	(+51/-38)
			v1.9.0Feb 2nd 2017 GMT
			v1.8.0Feb 2nd 2017 GMT	(+13/-15)
			v1.7.0Feb 2nd 2017 GMT	(+2317/-4240)
			v1.6.0Dec 17th 2015 GMT	(+27/-25)
			v1.5.0Dec 16th 2015 GMT	(+71/-964)
			v1.4.0Dec 15th 2015 GMT
			v1.3.0Jun 12th 2015 GMT	(+235/-46)
			v1.2.0May 14th 2015 GMT	(+8899/-80)
			v1.1.0Apr 25th 2015 GMT	(+5/-5)

			v1.10.0Feb 2nd 2017 GMT	(+51/-38)
			v1.9.0Feb 2nd 2017 GMT
			v1.8.0Feb 2nd 2017 GMT	(+13/-15)
			v1.7.0Feb 2nd 2017 GMT	(+2317/-4240)
			v1.6.0Dec 17th 2015 GMT	(+27/-25)
			v1.5.0Dec 16th 2015 GMT	(+71/-964)
			v1.4.0Dec 15th 2015 GMT
			v1.3.0Jun 12th 2015 GMT	(+235/-46)
			v1.2.0May 14th 2015 GMT	(+8899/-80)
			v1.1.0Apr 25th 2015 GMT	(+5/-5)

LESSWRONG
LW

LESSWRONG
LW

Value achievement dilemma

Value achievement dilemma