summary: Different people advocate different views on what we should want for the outcome of a '[value aligned' AI (desiderata like human flourishing, or a fun-theoretic eudaimonia, or coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be irreconcilable; people are sometimes persuaded to change their views of what we should want. Either way, there's (arguably) a tremendous overlap in the technical issues for aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after further discussion.]
In the context of value alignment as a subject, the word 'value' is a speaker-dependent variable that indicates our ultimate goal - the property or meta-property that the speaker wants or 'should want' to see in the final outcome of Earth-originating intelligent life. E.g: human flourishing, Funfun, coherent extrapolated volition, Normativitynormativity.
Different viewpoints are still being debated on this topic; people sometimes change their minds about their views. We don't yet have full knowledge of which views are 'reasonable' in the sense that people with good cognitive skills might retain them even in the limit of ongoing discussion. Some subtypes of potentially internally coherent views may not be sufficiently Interpersonalizableinterpersonalizable for even very small AI projects to cooperate on them; if e.g. Alice wants to own the whole world and will go on believing that in the limit of continuing contemplation, this is not a desideratum on which Alice, Bob, and Carol can all cooperate. Thus, using 'value' as a potentially speaker-dependent variable isn't meant to imply that everyone has their own 'value' and that no further debate or cooperation is possible; people can and do talk each other out of positions which are then regarded as having been mistaken, and completely incommunicable stances seem unlikely to be reified even into a very small AI project. But since this debate is ongoing, there is not yet any one definition of 'value' that can be regarded as settled.
E.g., Juergen Schmidhuber stated at the 20XX Singularity Summit that he thought the only proper and normative goal of any agent was to increase compression of sensory informationTodo
Summary: Different people advocate different views on what we should want for the outcome of a 'value aligned' AI (like human flourishing, or a [fun-fun-theoretic eudaimonia]eudaimonia, or coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be Irreconcilable; people are sometimes persuaded to change their views of what we should want. Regardless, it turns out that there's a tremendous overlap in the technical issues you would face in aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after much further discussion.
In the context of value alignment as a subject, the word 'value' is a speaker-dependent variable that indicates our ultimate goal - the property or meta-property that the speaker wants or 'should want' to see in the final outcome of Earth-originating intelligent life. E.g: human flourishing, Fun, coherent extrapolated volition, Normativity.
Different viewpoints are still being debated on this topic; people sometimes change their minds about their views. We don't yet have full knowledge of which views are 'reasonable' in the sense that people with good cognitive skills might retain them even in the limit of ongoing discussion. Some subtypes of potentially internally coherent views may not be sufficiently Interpersonalizable for even very small AI projects to cooperate on them; if e.g. Alice wants to own the whole world and will go on believing that in the limit of continuing contemplation, this is not a desideratum on which Alice, Bob, and Carol can all cooperate. Thus, using 'value' as a potentially speaker-dependent variable isn't meant to imply that everyone has their own 'value' and that no further debate or cooperation is possible; people can and do talk each other out of positions which are then regarded as having been mistaken, and completely incommunicable stances seem unlikely to be reified even into a very small AI project. But since this debate is ongoing, there is not yet any one definition of 'value' that can be regarded as settled.
Consider a Genie with an explicit preference framework targeted on a What I Know I Mean system for making checked wishes. The word 'value' in any discussion thereof should still only be used to refer to whatever the AI creators are targeting for real-world outcomes. We would say the 'value alignment problem' had been successfully solved to the extent that running the Genie produced high-value outcomes in the sense of the humans' viewpoint on 'value', not to the extent that the...
The following versions of desiderata for AI outcomes would tend to imply that the value alignment / value loading problem is an entirely wrong way of looking at the issue, which might make it disingenousdisingenuous to claim that 'value' in 'value alignment' can cover them as a metasyntactic variable as well:
summary: Different people advocate different views on what we should want for the outcome of a '[value aligned' AI (desiderata like human flourishing, or a fun-theoretic eudaimonia, or coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be irreconcilable; people are sometimes persuaded to change their views of what we should want. Either way, there's (arguably) a tremendous overlap in the technical issues for aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after further discussion.]
Clickbait: The word 'value' in 'value alignment' is an unknown variable that indicates someone's future goals for AI and intelligent life.
Summary: Different people advocate different views on what we should want for the outcome of a 'value aligned' AI (like human flourishing, or a fun-theoretic eudaimonia, or coherent extrapolated volition, or an AI that mostly leaves us alone but protects us from other AIs). These differences might not be Irreconcilable; people are sometimes persuaded to change their views of what we should want. Regardless, it turns out that there's a tremendous overlap in the technical issues you would face in aligning an AI with any of these goals. So in the technical discussion, 'value' is really a metasyntactic variable that stands in for the speaker's current view, or for what an AI project might later adopt as a reasonable target after much further discussion.
Consider a Genie with an explicit preference framework targeted on a Do What I Know I Mean system for making checked wishes. The word 'value' in any discussion thereof should still only be used to refer to whatever the AI creators are targeting for real-world outcomes. We would say the 'value alignment problem' had been successfully solved to the extent that running the Genie produced high-value outcomes in the sense of the humans' viewpoint on 'value', not to the extent that the outcome matched the Genie's preference framework for how to follow orders.