The word 'value', in the context of the value alignment problem, is reserved to indicate our own full, normative goal - the property or meta-property that the speaker wants to see in the final outcome that results from running an AI. (E.g: human flourishing, Fun, coherent extrapolated volition, 'normative goodness', etc.) Since different viewpoints can exist on this subject, the word 'value' is reserved to act as a metasyntactic placeholder for different viewpoints about the endgoal of the value alignment problem.

Difficulty of stating a single definition.

Different viewpoints exist on the value achievement dilemma (how we ought to proceed, faced with the issues of Artificial Intelligence and other problems). For example, someone who holds to some negation of the Orthogonality Thesis might think that there's no need to talk separately about 'what programmers should normatively try to achieve' versus 'what the advanced agent will actually try to do', since they think any sufficiently advanced mind will be internally compelled by normative goals. However, someone who does believe the Orthogonality Thesis will think they need separate terms reserved to designate 'what the AI actually does' versus their own concept of normative value.

We should not try to pre-emptively settle this debate in our mere definitions of terms - we should not try to make it linguistically impossible to have this argument explicitly. Therefore, our terminology should be rich enough to express all the distinctions that enough people think might be real distinctions. So, in the case above, we need separate linguistic terms to describe 'what the programmers should want' and 'what the AI actually does', in order to consider someone's argument that the two concepts are identical. Similarly, if some people think that talk of 'should', 'ought', 'good', or 'normativity' is misguided, but others think it is not misguided, then we need a language rich enough to carry out that argument, and we need to avoid ambiguity while doing so.

In this context, 'value' is being reserved as a metasyntactic variable that means 'the thing we're trying to get the AI to do', which could mean (depending on the speaker's viewpoint):

What is normatively right for the AI to do, or what we normatively should try to get the AI to do.

What the programmers or other project sponsors would want the AI to do, in the limit of true knowledge, self-reflection, reflective equilibrium, etc.

What the programmers actually do want from the AI, or what the AI's project sponsors want from the AI.

Ending up with a diverse, interesting intergalactic civilization, where the criterion of goodness and justice for this civilization includes AIs being free to pursue their own wants and in which no unfair weight is given to human wants.

Note that in all these cases we might worry about, e.g., the problems of trying to write a formal utility function that can identify 'value', or the question of whether the AI's preference framework will be stable under self-modification. So there can exist analyses of the 'value alignment problem' that mostly modularize and separate the debate over what is or should be the 'value' in question.

There are limits to this modularity. For example, if all AI goals are held to automatically converge to a universal standard of normativity, or if the speaker believes that any starting set of AI goals leads to an equally interesting intergalactic civilization and this is all that should matter, these views may reasonably imply that the value alignment problem is moot. See 'Failures of modularity of value' below.

Propositions like of Value implicitly have a policy component, in that the thesis states that whatever property the speaker thinks is valuable or would think is valuable after reviewing all related arguments (theoretically a policy question), this property has high algorithmic complexity (an empirical question, once conditioned on the policy). In context, it has a probability bar because of the frequency with which people can be persuaded out of an explicit claim that the only thing they value is something that has low algorithmic complexity.

Specific views on value.

Obviously, a listing like this will only summarize some long debates elsewhere, but it at least lets us point to some example views that have been advocated, and not indefinitely defer the question of what the heck 'value' might refer to.

Some of the major views that have been accepted by more than one person are as follows:

Reflective equilibrium. We can talk about 'what I should want' as a concept distinct from 'what I want right now' by construing some limit of how our present desires would directionally change given more factual knowledge, time to consider more knowledge, better self-awareness, and better self-control. Modeling this process is extrapolation, a reserved term to mean this process in the context of discussing preferences. Value would consist in, e.g., whatever properties a supermajority of humans would agree, in the limit of reflective equilibrium, are desirable.

Fun theory. On the object level, identifies value with qualities that we currently find very desirable, enjoyable, fun, and preferable, such as [Frankena's list of desiderata] (including truth, happiness, aesthetics, love, challenge and achievement, etc.) On this view, such fun may be further extrapolated, without changing its essential character, into forms suitable for transhuman minds. Yudkowsky's 31 Laws of Fun suggest guidelines for translating unlimited material power into utopias where somebody might actually want to live. Sympathizers of Fun may agree that these object-level desires might be subject to unknown normative corrections by reflective-equilibrium-type considerations, but still believe that some form of Fun is a likely result. Therefore (on this view) it is reasonable pending further surprises to speak of value as probably consisting in turning most of the reachable universe into superintelligent life enjoying itself.

Intuitive AI desiderata. E.g., "Have it be the sort of AI that would help us cure cancer" or "At least have it be the sort of AI that doesn't transform the world into paperclips."

Deflationary moral error theory. There is no good way to construe a normative concept apart from what particular people want. AI programmers are just doing what they want, and confused talk of 'fairness' or 'rightness' cannot be rescued. The speaker would nonetheless personally prefer not to be turned into paperclips.

Simple values. Value can easily be identified with X, for some X. X is the main thing we should be concerned about passing on to AIs. Seemingly valuable things besides X are either (a) improper to care about, (b) relatively unimportant, or (c) instrumentally implied by pursuing X, qua X.

The following versions of desiderata for AI outcomes would tend to imply that the value alignment / value loading problem is an entirely wrong way of looking at the issue, which might make it disingenous to claim that 'value' in 'value alignment' can cover them as a metasyntactic variable as well:

Moral internalist value. The normative is inherently compelling to all, or almost all cognitively powerful agents; whatever is not thus compelling cannot be normative or a proper object of human desire.

AI rights. The primary thing is to ensure that the AI's natural and intrinsic desires are respected; the ideal is to end up in a diverse civilization that respects the rights of all sentient beings, including AIs.

Modularity of 'value'.

Many issues in value alignment seem to generalize very well across the Reflective Equilibrium, Fun Theory, Intuitive Desiderata, and Deflationary Error Theory viewpoints. In all cases we would have to consider stability of self-modification, the Edge Instantiation problem in value identification, and most of the rest of 'standard' value alignment theory. This seemingly good generalization of the resulting technical problems across such wide-ranging viewpoints, and especially that it (arguably) covers the case of intuitive desiderata, is what justifies treating 'value' as a metasyntactic variable in 'value loading problem'.

In the putative simple values case, the value identification problem might allegedly be relatively simpler than is held by the previous viewpoints, negating Complexity of Value and making e.g. Edge Instantiation less of a resistant problem. E.g., Juergen Schmidhuber stated at the 20XX Singularity Summit that he thought the only proper and normative goal of any agent was to increase compression of sensory information Todoexact quote, exact Summit. Conditioned on this being the sum of all normativity, 'value' is algorithmically simple. Then the problems of Edge Instantiation, Unforeseen Maximums, and Nearest Unblocked Neighbor are all moot. (Except perhaps as there is an Ontology Identification problem for defining exactly what constitutes 'sensory information' for a [self-modifying agent].) The value loading problem would still exist (it would still be necessary to make an AI that cared about X in the first place) and its associated problems of reflective stability (it would be necessary to make an AI that went on caring about X through self-modification). Nonetheless, the overall problem difficulty and immediate technical priorities would be different enough that the Simple Values case seems importantly distinct from e.g. Fun Theory on a policy level.

Some viewpoints on 'value' completely reject Orthogonality. For example, strong versions of moral internalism claim as an empirical prediction that every sufficiently powerful cognitive agent will come to pursue the same end, which is to be identified with normativity and the only proper object of human desire. This would indeed seem to imply that the entire value alignment problem is moot for advanced agents.

Similarly, someone might believe as a proposition of fact that all (accessible) AI designs would have 'innate' desires, believe as a proposition of fact that no AI would become superintelligent enough to wipe out humanity or prevent the existence of other AIs, and assert as a matter of morality that a good outcome consists of everyone being free to pursue their own value and trade. In this case the value alignment problem is being implied to be an entirely wrong way to look at the problem with all associated technical issues moot, so it again might be disingenuous to have 'value' as a metasyntactic variable try to cover this case.

Difficulty of stating a single definition.

In this context, 'value' is being reserved as a metasyntactic variable that means 'the thing we're trying to get the AI to do', which could mean (depending on the speaker's viewpoint):

What is normatively right for the AI to do, or what we normatively should try to get the AI to do.

What the programmers or other project sponsors would want the AI to do, in the limit of true knowledge, self-reflection, reflective equilibrium, etc.

What the programmers actually do want from the AI, or what the AI's project sponsors want from the AI.

Ending up with a diverse, interesting intergalactic civilization, where the criterion of goodness and justice for this civilization includes AIs being free to pursue their own wants and in which no unfair weight is given to human wants.

Specific views on value.

Some of the major views that have been accepted by more than one person are as follows:

Reflective equilibrium. We can talk about 'what I should want' as a concept distinct from 'what I want right now' by construing some limit of how our present desires would directionally change given more factual knowledge, time to consider more knowledge, better self-awareness, and better self-control. Modeling this process is extrapolation, a reserved term to mean this process in the context of discussing preferences. Value would consist in, e.g., whatever properties a supermajority of humans would agree, in the limit of reflective equilibrium, are desirable.

Fun theory. On the object level, identifies value with qualities that we currently find very desirable, enjoyable, fun, and preferable, such as [Frankena's list of desiderata] (including truth, happiness, aesthetics, love, challenge and achievement, etc.) On this view, such fun may be further extrapolated, without changing its essential character, into forms suitable for transhuman minds. Yudkowsky's 31 Laws of Fun suggest guidelines for translating unlimited material power into utopias where somebody might actually want to live. Sympathizers of Fun may agree that these object-level desires might be subject to unknown normative corrections by reflective-equilibrium-type considerations, but still believe that some form of Fun is a likely result. Therefore (on this view) it is reasonable pending further surprises to speak of value as probably consisting in turning most of the reachable universe into superintelligent life enjoying itself.

Intuitive AI desiderata. E.g., "Have it be the sort of AI that would help us cure cancer" or "At least have it be the sort of AI that doesn't transform the world into paperclips."

Deflationary moral error theory. There is no good way to construe a normative concept apart from what particular people want. AI programmers are just doing what they want, and confused talk of 'fairness' or 'rightness' cannot be rescued. The speaker would nonetheless personally prefer not to be turned into paperclips.

Simple values. Value can easily be identified with X, for some X. X is the main thing we should be concerned about passing on to AIs. Seemingly valuable things besides X are either (a) improper to care about, (b) relatively unimportant, or (c) instrumentally implied by pursuing X, qua X.

Moral internalist value. The normative is inherently compelling to all, or almost all cognitively powerful agents; whatever is not thus compelling cannot be normative or a proper object of human desire.

AI rights. The primary thing is to ensure that the AI's natural and intrinsic desires are respected; the ideal is to end up in a diverse civilization that respects the rights of all sentient beings, including AIs.

LESSWRONG
LW

LESSWRONG
LW

Value

Difficulty of stating a single definition.

Specific views on value.

Modularity of 'value'.

Value

Difficulty of stating a single definition.

Specific views on value.

Modularity of 'value'.