Pre-adolescent children haven't felt strong lust yet. Those of us who've avoided strong pain are also missing an experience of that affect. Nostalgia can come up very early, but does require a bit of living first. Depression can strike at any age.
So, in general, there are emotions and feelings that people are capable of feeling, in the right circumstances, but that some people have not yet felt.
As a thought experiment, I'd thus like to introduce the emotion of exiert, which no human being has yet felt, because it's only triggered by (very) unusual experiences. Maybe it's something that can only be felt after a decade of weightlessness, or after the human brain has adapted to the living for a long time within the atmosphere of a gas giant.
Let's assume that exiert has some survival uses - it's helpful in navigating the gas giant's ever-changing geography, say - and it can load onto both positive and negative affect (just as confusion and surprise can). Assume also that experiencing exiert can have impact on people's aethetic preferences - it can cause them to like certain colour schemes, or open them to different types of pacing and tone in films or theatre productions.
In the spirit of trying to deduce what human values really are, the question is then: is the AI overriding human preferences in any of the following situations:
- The AI premeptively precludes the experience of exiert, so that we won't experience this emotion even if we are put in the unusual circumstances.
- Suppose that we wouldn't "naturally" experience exiert, but the AI acts to ensure that we would (in those unusual circumstances).
- The AI acts to ensure that some human that had not yet experienced strong lust, nostalgia, or extreme pain, could never experience that emotion.
- Suppose some human would not "naturally" experience strong lust, nostalgia, or extreme pain, but the AI acts to ensure that they would.
- The AI acts to ensure that some human experiences exiert, but that this is triggered by normal circumstances, rather than unusual ones.
I thought a bit about whether or not the existence of such as-of-yet unfelt feelings is plausible and believe I came up with one real-life example:
Depending on whether or not one is willing to qualify the following as a feeling or an emotion, feeling truly and completely anonymous may be one of those feelings that was not "fully" realizable for generations in the past, but is now possible thanks to the internet. The most immersive example of that weird experience of anonymity as of yet may be something like VRchat, which incidentally seems to lead to some rather peculiar behaviors, (e.g. look for some "VRchat uganda knuckles" videos on youtube).
I feel like there's a persistent assumption that not even a well aligned AI will include human choices as a step in decisions like these. Maybe it will just be a checkbox in the overall puppeteering of circumstances that the AI carries out, so keen its prediction, but for it to go completely unmentioned in any of the hypotheticals seems like a glaring omission to me.
Yes, I am worried that getting humans to endorse its decisions is very easy, so it would be better if an AI were well aligned without having to ask for human choices. But the whole system is ultimately backed on human choice, at least in terms of idealised meta-preferences. I just feel that actually asking is a poor and easily manipulatable way of getting these choices, unless it's done very carefully.
"Let's assume that exiert has some survival uses" - If no human has ever experienced it, why would it have survival usages? I can imagine this if exiert is a combination or strengthened version of other, more common experiences, but it seems unlikely if it is something qualitatively different.
What is more plausible is that there might be some states that we would have experienced during evolution, but which we don't experience any more in the modern world. Then these would be more likely to have survival usages.
This is one of those cases where ontology becomes more important. That means you should start by separating concepts like sensations, feelings, emotions and moods. Using naive ontology with unclear boundaries is unlikely to lead to a good answer.
Intuitively, it'd be overriding preferences in 1 (but only if pre-exiert humans generally approve of the existence of post-exiert humans. If post-exiert humans had significant enough value drift that humans would willingly avoid situations that cause exiert, then 1 wouldn't be a preference override),
wouldn't in 2 (but only if the AI informs humans that [weird condition]->[exiert] first),
would in 3 for lust and nostalgia(because there are lots of post-[emotion] people who approve of the existence of the emotion, and pre-[emotion] people don't seem to regard post-[emotion] people with horror) but not for intense pain (because neither post-pain people nor pre-pain people endorse its presence)
wouldn't in 4 for lust and nostalgia, and would for pain, for basically the inverse reasons
and wouldn't be overriding preferences in 5 (but only if pre-exiert humans generally approve of the existence of post-exiert humans)
Ok, what rule am I using here? It seems to be something like "if both pre-[experience] and post-[experience] people don't regard it as very bad to undergo [experience] or the associated value changes, then it is overriding human preferences to remove the option of undergoing [experience], and if pre-[experience] or post-[experience] people regard it as very bad to undergo [experience] or the associated value changes, then it is not overriding human preferences to remove the option of undergoing [experience]"
Seems like you need to run a Bayesian model where the AI has a prior distribution over the value of exiert and carries out some exploration/exploitation tradeoff, as in bandit problems.