For example, you should be able to choose between things that will make no sensory difference to you, such as the well-being of people in Xela.

This is an example of the sort of loose terminology that leads most people into the fog on these sorts of problems. If it makes no sensory difference, then it makes no sensory difference, and there's nothing to care about, as there's nothing to decide between. You can't choose between two identical things.

Or to be more charitable, I should say that what seems to have happened here is that I was using the term "sensory pattern" to refer to any and all subjective experiences appearing on one's visual field, etc., whereas you seem to be using the phrase "makes no sensory difference" to refer to the subset of subjective experience we call 'the real world'.

True, if I've never been to Xela, the well-being of the people there (presumably) makes no difference to my experience of everyday things in the outside world, such as the people I know, or what's going on in the places I do go. But this is not a problem. Mention the place, and explain the conditions in detail, employing colorful language and eloquent description, and before long there will be a video playing in my mind, apt to make me happy or sad, depending on the well-being of the people therein.

And of course you dodge the question of what is "enjoyable" - is a fistfight enjoyable if it makes you grin and your heart race but afterwards you never want to do it again?

I don't see the contradiction. Unless I'm missing something in my interpretation of your example, all that must be said is that the experience was enjoyable because certain dangers didn't play out, such as getting injured or being humiliated, but you'd rather not repeat that experience, for you may not be so lucky in the future. Plenty of things are enjoyable unless they go wrong, and are rather apt to go wrong, and thus are candidates for being something one enjoys but would rather not repeat.

For example, let's say you get lost in the moment, and have unprotected sex. You didn't have any condoms or anything, but everything else was perfect, so you went for it. You have the time of your life. After the fact you manage to put the dangers out of your mind, and just remember how excellent the experience was. Eventually it becomes clear that no STIs were transmitted, nor is there an unplanned pregnancy. The experience, because nothing went wrong, was excellent. But you decide it was a mistake.

There seems to be a contradiction here, saying that the experience was excellent, but that it was a mistake. But then you realize that the missing piece that makes it seem contradictory is the time factor. Once a certain amount of time passes, if nothing went wrong, one can say conclusively that nothing went wrong. 100% chance it was awesome and nothing went wrong. But at the time of the event, the odds were much worse. That's all.

What algorithm should an AI follow to decide?

This seems off topic. Decide what? I thought we were talking about how to discover one's terminal values as a human.

You have to try and reduce "enjoyable" to things like "things you'd do again" or "things that make your brain release chemical cocktail X." And then you have to realize that those definitions are best met by meth, or an IV of chemical cocktail X, not by cool stuff like riding dinosaurs or having great sex.

Well if that's the case then they're unhelpful definitions. As far as I can see, nothing in my post would suggest a theory weak enough to output something like 'do meth', or 'figure out how to wirehead'.

Outline of Possible Sources of Values

by Wei_Dai 3 min read18th Jan 201330 comments


I don't know what my values are. I don't even know how to find out what my values are. But do I know something about how I (or an FAI) may be able to find out what my values are? Perhaps... and I've organized my answer to this question in the form of an "Outline of Possible Sources of Values". I hope it also serves as a summary of the major open problems in this area.

  1. External
    1. god(s)
    2. other humans
    3. other agents
  2. Behavioral
    1. actual (historical/observed) behavior
    2. counterfactual (simulated/predicted) behavior
  3. Subconscious Cognition
    1. model-based decision making
      1. ontology
      2. heuristics for extrapolating/updating model
      3. (partial) utility function
    2. model-free decision making
      1. identity based (adopt a social role like "environmentalist" or "academic" and emulate an appropriate role model, actual or idealized)
      2. habits
      3. reinforcement based
  4. Conscious Cognition
    1. decision making using explicit verbal and/or quantitative reasoning
      1. consequentialist (similar to model-based above, but using explicit reasoning)
      2. deontological
      3. virtue ethical
      4. identity based
    2. reasoning about terminal goals/values/preferences/moral principles
      1. responses (changes in state) to moral arguments (possibly context dependent)
      2. distributions of autonomously generated moral arguments (possibly context dependent)
      3. logical structure (if any) of moral reasoning
    3. object-level intuitions/judgments
      1. about what one should do in particular ethical situations
      2. about the desirabilities of particular outcomes
      3. about moral principles
    4. meta-level intuitions/judgments
      1. about the nature of morality
      2. about the complexity of values
      3. about what the valid sources of values are
      4. about what constitutes correct moral reasoning
      5. about how to explicitly/formally/effectively represent values (utility function, multiple utility functions, deontological rules, or something else) (if utility function(s), for what decision theory and ontology?)
      6. about how to extract/translate/combine sources of values into a representation of values
        1. how to solve ontological crisis
        2. how to deal with native utility function or revealed preferences being partial
        3. how to translate non-consequentialist sources of values into utility function(s)
        4. how to deal with moral principles being vague and incomplete
        5. how to deal with conflicts between different sources of values
        6. how to deal with lack of certainty in one's intuitions/judgments
      7. whose intuition/judgment ought to be applied? (may be different for each of the above)
        1. the subject's (at what point in time? current intuitions, eventual judgments, or something in between?)
        2. the FAI designers'
        3. the FAI's own philosophical conclusions

Using this outline, we can obtain a concise understanding of what many metaethical theories and FAI proposals are claiming/suggesting and how they differ from each other. For example, Nyan_Sandwich's "morality is awesome" thesis can be interpreted as the claim that the most important source of values is our intuitions about the desirability (awesomeness) of particular outcomes.

As another example, Aaron Swartz argued against "reflective equilibrium" by which he meant the claim that the valid sources of values are our object-level moral intuitions, and that correct moral reasoning consists of working back and forth between these intuitions until they reach coherence. His own position was that intuitions about moral principles are the only valid source of values and we should discount our intuitions about particular ethical situations.

A final example is Paul Christiano's "Indirect Normativity" proposal (n.b., "Indirect Normativity" was originally coined by Nick Bostrom to refer to an entire class of designs where the AI's values are defined "indirectly") for FAI, where an important source of values is the distribution of moral arguments the subject is likely to generate in a particular simulated environment and their responses to those arguments. Also, just about every meta-level question is left for the (simulated) subject to answer, except for the decision theory and ontology of the utility function that their values must finally be encoded in, which is fixed by the FAI designer.

I think the outline includes most of the ideas brought up in past LW discussions, or in moral philosophies that I'm familiar with. Please let me know if I left out anything important.