summary: Different people advocate different views on what we should want for the outcome of a '[value aligned' AI (desiderata like human flourishing, or a fun-theoretic eudaimonia, or coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be irreconcilable; people are sometimes persuaded to change their views of what we should want. Either way, there's (arguably) a tremendous overlap in the technical issues for aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after further discussion.]

In the context of value alignment as a subject, the word 'value' is a speaker-dependent variable that indicates our ultimate goal - the property or meta-property that the speaker wants or 'should want' to see in the final outcome of Earth-originating intelligent life. E.g: human flourishing, ~~Fun~~fun, coherent extrapolated volition, ~~Normativity~~normativity.

Different viewpoints are still being debated on this topic; people sometimes change their minds about their views. We don't yet have full knowledge of which views are 'reasonable' in the sense that people with good cognitive skills might retain them even in the limit of ongoing discussion. Some subtypes of potentially internally coherent views may not be sufficiently ~~Interpersonalizable~~interpersonalizable for even very small AI projects to cooperate on them; if e.g. Alice wants to own the whole world and will go on believing that in the limit of continuing contemplation, this is not a desideratum on which Alice, Bob, and Carol can all cooperate. Thus, using 'value' as a potentially speaker-dependent variable isn't meant to imply that everyone has their own 'value' and that no further debate or cooperation is possible; people can and do talk each other out of positions which are then regarded as having been mistaken, and completely incommunicable stances seem unlikely to be reified even into a very small AI project. But since this debate is ongoing, there is not yet any one definition of 'value' that can be regarded as settled.

Specific views on value.

value

Modularity of 'value'.

Alignable values.

values

Simple purpose.

purpose

E.g., Juergen Schmidhuber stated at the 20XX Singularity Summit that he thought the only proper and normative goal of any agent was to increase compression of sensory information~~Todo~~

. Conditioned on this being the sum of all normativity, 'value' is algorithmically simple. Then the problems of , , and Nearest Unblocked Neighbor are all moot. (Except perhaps as there is an Ontology Identification problem for defining exactly what constitutes 'sensory information' for a .)

Moral internalism.

internalism

AI Rights.

Rights

Moral internalist value. The normative is inherently compelling to all, or almost all cognitively powerful ~~agents; whatever~~agents. Whatever is not thus compelling cannot be normative or a proper object of human desire.
AI rights. The primary thing is to ensure that the AI's natural and intrinsic desires are ~~respected; the~~respected. The ideal is to end up in a diverse civilization that respects the rights of all sentient beings, including AIs. (Generally linked are the views that no special selection of AI design is required to achieve this, or that special selection of AI design to shape particular motivations would itself violate AI rights.)

Summary: Different people advocate different views on what we should want for the outcome of a 'value aligned' AI (like human flourishing, or a ~~[fun-~~fun-theoretic ~~eudaimonia]~~eudaimonia, or coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be Irreconcilable; people are sometimes persuaded to change their views of what we should want. Regardless, it turns out that there's a tremendous overlap in the technical issues you would face in aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after much further discussion.

Different viewpoints are still being debated on this topic; people sometimes change their minds about their views. We don't yet have full knowledge of which views are 'reasonable' in the sense that people with good cognitive skills might retain them even in the limit of ongoing discussion. Some subtypes of potentially internally coherent views may not be sufficiently Interpersonalizable for even very small AI projects to cooperate on them; if e.g. Alice wants to own the whole world and will go on believing that in the limit of continuing contemplation, this is not a desideratum on which Alice, Bob, and Carol can all cooperate. Thus, using 'value' as a potentially speaker-dependent variable isn't meant to imply that everyone has their own 'value' and that no further debate or cooperation is possible; people can and do talk each other out of positions which are then regarded as having been mistaken, and completely incommunicable stances seem unlikely to be reified even into a very small AI project. But since this debate is ongoing, there is not yet any one definition of 'value' that can be regarded as settled.

Consider a Genie with an explicit preference framework targeted on a What I Know I Mean system for making checked wishes. The word 'value' in any discussion thereof should still only be used to refer to whatever the AI creators are targeting for real-world outcomes. We would say the 'value alignment problem' had been successfully solved to the extent that running the Genie produced high-value outcomes in the sense of the humans' viewpoint on 'value', not to the extent that the...

Read More (550 more words)

The following versions of desiderata for AI outcomes would tend to imply that the value alignment / value loading problem is an entirely wrong way of looking at the issue, which might make it ~~disingenous~~disingenuous to claim that 'value' in 'value alignment' can cover them as a metasyntactic variable as well:

~~summary: Different people advocate different views on what we should want for the outcome of a '[value aligned' AI (desiderata like human flourishing, or a~~ ~~fun-theoretic eudaimonia, or~~ ~~coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be~~ ~~irreconcilable~~; people are sometimes persuaded to change their views of what we should want. Either way, there's (arguably) a tremendous overlap in the technical issues for aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after further discussion.]

~~Clickbait: The word 'value' in 'value alignment' is an unknown variable that indicates someone's future goals for AI and intelligent life.~~

~~Summary: Different people advocate different views on what we should want for the outcome of a 'value aligned' AI (like human flourishing, or a~~ ~~fun-theoretic eudaimonia, or coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be~~ ~~Irreconcilable~~; people are sometimes persuaded to change their views of what we should want. Regardless, it turns out that there's a tremendous overlap in the technical issues you would face in aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after much further discussion.

Consider a Genie with an explicit preference framework targeted on a Do What I Know I Mean system for making checked wishes. The word 'value' in any discussion thereof should still only be used to refer to whatever the AI creators are targeting for real-world outcomes. We would say the 'value alignment problem' had been successfully solved to the extent that running the Genie produced high-value outcomes in the sense of the humans' viewpoint on 'value', not to the extent that the outcome matched the Genie's preference framework for how to follow orders.

Reflective equilibrium. We can talk about 'what I should want' as a concept distinct from 'what I want right now' by construing some limit of how our present desires would directionally change given more factual knowledge, time to consider more knowledge, better self-awareness, and better self-control. Modeling this process is extrapolation, a reserved term to mean this process in the context of discussing preferences. Value would consist in, e.g., whatever properties a supermajority of humans would agree, in the limit of reflective equilibrium, are desirable. See also coherent extrapolated volition.
Standard desires. An object-level view that identifies value with qualities that we currently find very desirable, enjoyable, fun, and preferable, such as ~~[Frankena'~~Frankena's list of ~~desiderata]~~desiderata (including truth, happiness, aesthetics, love, challenge and achievement, etc.) On the closely related view of Fun Theory, such desires may be further extrapolated, without changing their essential character, into forms suitable for transhuman minds. Advocates may agree that these object-level desires will be subject to unknown normative corrections by reflective-equilibrium-type considerations, but still believe that some form of Fun or standardly desirable outcome is a likely result. Therefore (on this view) it is reasonable to speak of value as probably mostly consisting in turning most of the reachable universe into superintelligent life enjoying itself, creating transhuman forms of art, etcetera.
Immediate goods. E.g., "Cure cancer" or "Don't transform the world into paperclips." Such replies arguably have problems as ultimate criteria of value from a human standpoint (see linked discussion), but for obvious reasons, lists of immediate goods are a common early thought when first considering the subject.
Deflationary moral error theory. There is no good way to construe a normative concept apart from what particular people want. AI programmers are just doing what they want, and confused talk of 'fairness' or 'rightness' cannot be rescued. The speaker would nonetheless personally prefer not to be turned into paperclips. (This mostly ends up at an 'immediate goods' theory in practice, plus some beliefs relevant to the value selection debate.)
Simple purpose. Value can easily be identified with X, for some X. X is the main thing we should be concerned about passing on to AIs. Seemingly desirable things besides X are either (a) improper to care about, (b) relatively unimportant, or (c) instrumentally implied by pursuing X, qua X.

			v1.30.0Jun 2nd 2016 GMT	(+12/-11)
			v1.29.0May 2nd 2016 GMT
			v1.28.0May 2nd 2016 GMT
			v1.27.0Apr 29th 2016 GMT	(-750)
			v1.26.0Dec 16th 2015 GMT	(+818/-78)
			v1.25.0Dec 15th 2015 GMT	(+22/-21)
			v1.24.0Dec 15th 2015 GMT	(-902)
			v1.23.0Dec 15th 2015 GMT	(+237/-30)
			v1.22.0Oct 14th 2015 GMT
			v1.21.0Jul 3rd 2015 GMT	(+102/-28)

			v1.30.0Jun 2nd 2016 GMT	(+12/-11)
			v1.29.0May 2nd 2016 GMT
			v1.28.0May 2nd 2016 GMT
			v1.27.0Apr 29th 2016 GMT	(-750)
			v1.26.0Dec 16th 2015 GMT	(+818/-78)
			v1.25.0Dec 15th 2015 GMT	(+22/-21)
			v1.24.0Dec 15th 2015 GMT	(-902)
			v1.23.0Dec 15th 2015 GMT	(+237/-30)
			v1.22.0Oct 14th 2015 GMT
			v1.21.0Jul 3rd 2015 GMT	(+102/-28)

LESSWRONG
LW