The word 'value', in the context of the value alignment problem, is reserved to indicate our own full, normative goal - the property or meta-property that the speaker wants to see in the final outcome that results from running an AI. (E.g: human flourishing, Fun, coherent extrapolated volition, 'normative goodness', etc.) Since different viewpoints can exist on this subject, the word 'value' is reserved to act as a metasyntactic placeholder for different viewpoints about the endgoal of the value alignment problem.
Different viewpoints exist on the value achievement dilemma (how we ought to proceed, faced with the issues of Artificial Intelligence and other problems). For example, someone who holds to some negation of the Orthogonality Thesis might think that there's no need to talk separately about 'what programmers should normatively try to achieve' versus 'what the advanced agent will actually try to do', since they think any sufficiently advanced mind will be internally compelled by normative goals. However, someone who does believe the Orthogonality Thesis will think they need separate terms reserved to designate 'what the AI actually does' versus their own concept of normative value.
We should not try to pre-emptively settle this debate in our mere definitions of terms - we should not try to make it linguistically impossible to have this argument explicitly. Therefore, our terminology should be rich enough to express all the distinctions that enough people think might be real distinctions. So, in the case above, we need separate linguistic terms to describe 'what the programmers should want' and 'what the AI actually does', in order to consider someone's argument that the two concepts are identical. Similarly, if some people think that talk of 'should', 'ought', 'good', or 'normativity' is misguided, but others think it is not misguided, then we need a language rich enough to carry out that argument, and we need to avoid ambiguity while doing so.
In this context, 'value' is being reserved as a metasyntactic variable that means 'the thing we're trying to get the AI to do', which could mean (depending on the speaker's viewpoint):
Note that in all these cases we might worry about, e.g., the problems of trying to write a formal utility function that can identify 'value', or the question of whether the AI's preference framework will be stable under self-modification. So there can exist analyses of the 'value alignment problem' that mostly modularize and separate the debate over what is or should be the 'value' in question.
There are limits to this modularity. For example, if all AI goals are held to automatically converge to a universal standard of normativity, or if the speaker believes that any starting set of AI goals leads to an equally interesting intergalactic civilization and this is all that should matter, these views may reasonably imply that the value alignment problem is moot. See 'Failures of modularity of value' below.
Propositions like of Value implicitly have a policy component, in that the thesis states that whatever property the speaker thinks is valuable or would think is valuable after reviewing all related arguments (theoretically a policy question), this property has high algorithmic complexity (an empirical question, once conditioned on the policy). In context, it has a probability bar because of the frequency with which people can be persuaded out of an explicit claim that the only thing they value is something that has low algorithmic complexity.
Obviously, a listing like this will only summarize some long debates elsewhere, but it at least lets us point to some example views that have been advocated, and not indefinitely defer the question of what the heck 'value' might refer to.
Some of the major views that have been accepted by more than one person are as follows:
The following versions of desiderata for AI outcomes would tend to imply that the value alignment / value loading problem is an entirely wrong way of looking at the issue, which might make it disingenous to claim that 'value' in 'value alignment' can cover them as a metasyntactic variable as well:
Many issues in value alignment seem to generalize very well across the Reflective Equilibrium, Fun Theory, Intuitive Desiderata, and Deflationary Error Theory viewpoints. In all cases we would have to consider stability of self-modification, the Edge Instantiation problem in value identification, and most of the rest of 'standard' value alignment theory. This seemingly good generalization of the resulting technical problems across such wide-ranging viewpoints, and especially that it (arguably) covers the case of intuitive desiderata, is what justifies treating 'value' as a metasyntactic variable in 'value loading problem'.
In the putative simple values case, the value identification problem might allegedly be relatively simpler than is held by the previous viewpoints, negating Complexity of Value and making e.g. Edge Instantiation less of a resistant problem. E.g., Juergen Schmidhuber stated at the 20XX Singularity Summit that he thought the only proper and normative goal of any agent was to increase compression of sensory information Todoexact quote, exact Summit. Conditioned on this being the sum of all normativity, 'value' is algorithmically simple. Then the problems of Edge Instantiation, Unforeseen Maximums, and Nearest Unblocked Neighbor are all moot. (Except perhaps as there is an Ontology Identification problem for defining exactly what constitutes 'sensory information' for a [self-modifying agent].) The value loading problem would still exist (it would still be necessary to make an AI that cared about X in the first place) and its associated problems of reflective stability (it would be necessary to make an AI that went on caring about X through self-modification). Nonetheless, the overall problem difficulty and immediate technical priorities would be different enough that the Simple Values case seems importantly distinct from e.g. Fun Theory on a policy level.
Some viewpoints on 'value' completely reject Orthogonality. For example, strong versions of moral internalism claim as an empirical prediction that every sufficiently powerful cognitive agent will come to pursue the same end, which is to be identified with normativity and the only proper object of human desire. This would indeed seem to imply that the entire value alignment problem is moot for advanced agents.
Similarly, someone might believe as a proposition of fact that all (accessible) AI designs would have 'innate' desires, believe as a proposition of fact that no AI would become superintelligent enough to wipe out humanity or prevent the existence of other AIs, and assert as a matter of morality that a good outcome consists of everyone being free to pursue their own value and trade. In this case the value alignment problem is being implied to be an entirely wrong way to look at the problem with all associated technical issues moot, so it again might be disingenuous to have 'value' as a metasyntactic variable try to cover this case.