LESSWRONG
LW

Wikitags

Value achievement dilemma

Edited by Eliezer Yudkowsky, et al. last updated 2nd Feb 2017

The value achievement dilemma is a way of framing the in a larger context. This emphasizes that there might be possible solutions besides AI; and also emphasizes that such solutions must meet a high bar of potency or efficacy in order to resolve our basic dilemmas, the way that a sufficiently value-aligned and cognitively powerful AI could resolve our basic dilemmas. Or at least , the way that a Task AGI could take actions to prevent destruction by later AGI projects, even if is only value-aligned and cannot solve the whole problem.

The point of considering posthuman scenarios in the long run, and not just an immediate as band-aid, can be seen in the suggestion by Eliezer Yudkowsky

and
that we can see Earth-originating intelligent life as having two possible , and extinction. If intelligent life goes extinct, especially if it drastically damages or destroys the ecosphere in the process, new intelligent life seems unlikely to arise on Earth. If Earth-originating intelligent life becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another.

Furthermore, by the , any sufficiently advanced cognitive agent is very likely to be stable in its motivations or . So if and when life wanders into the superintelligence attractor, it will either end up in a stable state of e.g. or and hence achieving lots of , or a misaligned AI will go on forever.

Among the dilemmas we face in getting into the high-value-achieving attractor, rather than the extinction attractor or the equivalence class of paperclip maximizers, are:

  • The possibility of careless (or insufficiently cautious, or much less likely malicious) actors creating a non-value-aligned AI that undergoes an intelligence explosion.
  • The possibility of engineered superviruses destroying enough of civilization that the remaining humans go extinct without ever reaching sufficiently advanced technology.
  • Conflict between multipolar powers with nanotechnology resulting in a super-nuclear-exchange disaster that extinguishes all life.

Other positive events seem like they could potentially prompt entry into the high-value-achieving superintelligence attractor:

  • Direct creation of a normatively aligned agent.
  • Creation of a powerful enough to avert the creation of other .
  • Intelligence-augmented humans (or 64-node clustered humans linked by brain-computer interface brain information exchange, etcetera) who are able and motivated to solve the AI alignment problem.

On the other hand, consider someone who proposes that "Rather than building AI, build that just answer questions," and who then, after further exposure to the concept of the and , further narrows their specification to say that , and a heavily sandboxed and provably correct verifier will look over this output proof and signal 1 if it proves the target theorem and 0 otherwise, at some fixed time to avoid timing attacks.

This doesn't resolve the larger value achievement dilemma, because there's no obvious thing we can do with a ZF provability oracle that solves our larger problem. There's no plan such that it would save the world if only we could take some suspected theorems of ZF and know that some of them had formal proofs.

The thrust of considering a larger 'value achievement dilemma' is that while imaginable alternatives to aligned AIs exist, they must pass a double test to be our best alternative:

  • They must be genuinely easier or safer than the easiest (pivotal) form of the AI alignment problem.
  • They must be game-changers for the overall situation in which we find ourselves, opening up a clear path to victory from the newly achieved scenario.

Any strategy that does not putatively open a clear path to victory if it succeeds, doesn't seem like a plausible policy alternative to trying to solve the AI alignment problem or to doing something else such that success leaves us a clear path to victory. Trying to solve the AI alignment problem is something intended to leave us a clear path to achieving almost all of the achievable value for the future and its astronomical stakes. Anything that doesn't open a clear path to getting there is not an alternative solution for getting there.

For more on this point, see the page on .

Subproblems of the larger value achievement dilemma

We can see the place of AI alignment in the larger scheme by considering its parent problem, its sibling problems, and examples of its child problems.

  • The value achievement dilemma: How does Earth-originating intelligent life achieve an acceptable proportion of its potential ?
    • The AI alignment problem: How do we create AIs such that running them produces (global) outcomes of acceptably high value?
      • The value alignment problem: How do we create AIs that want or prefer to cause events that are of high value? If we accept that we should solve the value alignment problem by creating AIs that prefer or want in particular ways, how do we do that?
        • The or value learning problem: How can we pinpoint, in the AI's decision-making, outcomes that have high 'value'? (Despite all the such as and .)
      • Other properties of aligned AIs such as e.g. : How can we create AIs such that, when we make an error in identifying value or specifying the decision system, the AI does not resist our attempts to correct what we regard as an error?
      • features such as e.g. that are intended to mitigate harm if the AI's behavior has gone outside expected bounds.
    • The intelligence amplification problem. How can we create smarter humans, preferably without driving them insane or otherwise ending up with evil ones?
    • The problem. How can we figure out what to substitute in for the metasyntactic variable 'value'? (.)
Parents:
Children:
and 3 more
change the nature of the gameboard
pivotal events
narrowly
2
2
Discussion0
Discussion0
AI alignment problem
AI alignment
Task AGI
Task AGI
fully
UnFriendly AI
fun-loving
AI-Box Experiment
generic preference stability argument
Moral hazards in AGI development
Oracle AIs
we should
Autonomous AGI
value selection
stable states
Value identification problem
boxing
the reflective equilibrium of its creators' civilization
edge instantiation
Nick Bostrom
foreseeable difficulties
Coordinative AI development hypothetical
maximizing paperclips
corrigibility
cognitive uncontainability
Oppositional
meta-preference framework
Goodhart's Curse
Answer
an Oracle running in three layers of sandboxed simulation must output only formal proofs of given theorems in Zermelo-Fraenkel set theory
value
value
superintelligence