Only humans can have human values

by PhilGoetz 17 min read26th Apr 2010161 comments


Ethics is not geometry

Western philosophy began at about the same time as Western geometry; and if you read Plato you'll see that he, and many philosophers after him, took geometry as a model for philosophy.

In geometry, you operate on timeless propositions with mathematical operators.  All the content is in the propositions.  A proof is equally valid regardless of the sequence of operators used to arrive at it.  An algorithm that fails to find a proof when one exists is a poor algorithm.

The naive way philosophers usually map ethics onto mathematics is to suppose that a human mind contains knowledge (the propositional content), and that we think about that knowledge using operators.  The operators themselves are not seen as the concern of philosophy.  For instance, when studying values (I also use "preferences" here, as a synonym differing only in connotation), people suppose that a person's values are static propositions.  The algorithms used to satisfy those values aren't themselves considered part of those values.  The algorithms are considered to be only ways of manipulating the propositions; and are "correct" if they produce correct proofs, and "incorrect" if they don't.

But an agent's propositions aren't intelligent.  An intelligent agent is a system, whose learned and inborn circuits produce intelligent behavior in a given environment.  An analysis of propositions is not an analysis of an agent.

I will argue that:

  1. The only preferences that can be unambiguously determined are the preferences people implement, which are not always the preferences expressed by their beliefs.
  2. If you extract a set of propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values.
  3. Values exist in a network of other values.  A key ethical question is to what degree values are referential (meaning they can be tested against something outside that network); or non-referential (and hence relative).
  4. Supposing that values are referential helps only by telling you to ignore human values.
  5. You cannot resolve the problem by combining information from different behaviors, because the needed information is missing.
  6. Today's ethical disagreements are largely the result of attempting to extrapolate ancestral human values into a changing world.
  7. The future will thus be ethically contentious even if we accurately characterize and agree on present human values.

Instincts, algorithms, preferences, and beliefs are artificial categories

There is no principled distinction between algorithms and propositions in any existing brain.  This means that there's no clear way to partition an organism's knowledge into "propositions" (including "preferences" and "beliefs"), and "algorithms."  Hence, you can't expect all of an agent's "preferences" to end up inside the part of the agent that you choose to call "propositions".  Nor can you reliably distinguish "beliefs" from "preferences".

Suppose that a moth's brain is wired to direct its flight by holding the angle to the moon constant.  (This is controversial, but the competing hypotheses would give similar talking points.)  If so, is this a belief about the moon, a preference towards the moon, or an instinctive motor program?  When it circles around a lamp, does it believe that lamp is the moon?

When a child pulls its hand away from something hot, does it value not burning itself and believe that hot things burn, or place a value on not touching hot things, or just have an evolved motor program that responds to hot things?  Does your answer change if you learn that the hand was directed to pull back by spinal reflexes, without involving the cortex?

Monkeys can learn to fear snakes more easily than they can learn to fear flowers (Cook & Mineka 1989).  Do monkeys, and perhaps humans, have an "instinctive preference" against snakes?  Is it an instinct, a preference (snake = negative utility), or a learned behavior (lab monkeys are not afraid of snakes)?

Can we map the preference-belief distinction onto the distinction between instinct and learned behavior?  That is, are all instincts preferences, and all preferences instincts?  There are things we call instincts, like spinal reflexes, that I don't think can count as preferences.  And there are preferences, such as the relative values I place on the music of Bach and Berg, that are not instincts.  (In fact, these are the preferences we care about.  The purpose of Friendly AI is not to retain the fist-clenching instinct for future generations.)

Bias, heuristic, or preference?

A "bias" is a reasoning procedure that produces an outcome that does not agree with some logic.  But the object in nature is not to conform to logic; it is to produce advantageous behavior.

Suppose you interview Fred about his preferences.  Then you write a utility function for Fred.  You experiment, putting Fred in different situations and observing how he responds.  You observe that Fred acts in ways that fail to optimize the utility function you wrote down, in a consistently-biased way.

Is Fred displaying bias?  Or does the Fred-system, including both his beliefs and the bias imposed by his reasoning processes, implement a preference that is not captured in his beliefs alone?

Allegedly true story, from a Teaching Company audio lecture (I forget which one):  A psychology professor was teaching a class about conditioned behavior.  He also had the habit of pacing back and forth in front of the class.

The class decided to test his claims by leaning forward and looking interested when the professor moved toward the left side of the room, but acting bored when he moved toward the right side.  By the end of the semester, they had trained him to give his entire lecture from the front left corner.  When they asked him why he always stood there, he was surprised by the question - he wasn't even aware he had changed his habit.

If you inspected the professor's beliefs, and then studied his actions, you would conclude he was acting irrationally.  But he wasn't.  He was acting rationally, just not thinking rationally.   His brain didn't detect the pattern in the class's behavior and deposit a proposition into his brain.  It encoded the proper behavior, if not straight into his pre-motor cortex, at least not into any conscious beliefs.

Did he have a bias towards the left side of the room?  Or a preference for seeing students pay attention?  Or a preference that became a bias when the next semester began and he kept doing it?

Take your pick - there's no right answer.

If a heuristic gives answers consistently biased in one direction across a wide range of domains, we can call it a bias.  Most biases found in the literature appear to be wide-ranging and value-neutral.  But the literature on biases is itself biased (deliberately) towards discussing that type of bias.   If we're trawling all of human behavior for values, we may run across many instances where we can't say whether a heuristic is a bias or a preference.

As one example, I would say that the extraordinarity bias is in fact a preference.  Or consider the happiness paradox:  People who become paralyzed become extremely depressed only temporarily; people who win the lottery become very happy only temporarily.  (Google 'happiness "set-point"'.)  I've previously argued on LessWrong that this is not a bias, but a heuristic to achieve our preferences.  Happiness is proportional not to our present level of utility, but to the rate of change in our utility.  Trying to maximize happiness (the rate of increase of utility) in the near term maximizes total utility over lifespan better than consciously attempting to maximize near-term utility would.  This is because maximizing the rate of increase in utility over a short time period, instead of total utility over that time period, prefers behavior that has a small area under the utility curve during that time but ends with a higher utility than it started with, over behavior with a large area under the utilty curve that ends with a lower utility than it started with.  This interpretation of happiness would mean that impact bias is not a bias at all, but a heuristic that compensates for this in order to maximize utility rather than happiness when we reason over longer time periods.

Environmental factors: Are they a preference or a bias?

Evolution does not distinguish between satisfying preconditions for behavior by putting knowledge into a brain, or by using the statistics of the environment.  This means that the environment, which is not even present in the geometric model of ethics, is also part of your values.

When the aforementioned moth circles around a lamp, is it erroneously acting on a bias, or expressing moth preferences?

Humans like having sex.  The teleological purpose of this preference is to cause them to have children.  Yet we don't say that they are in error if they use birth control.  This suggests that we consider our true preferences to be the organismal ones that trigger positive qualia, not the underlying evolutionary preferences.

Strict monogamy causes organisms that live in family units to evolve to act more altruistically, because their siblings are as related to them as their children are (West & Gardner 2010).  Suppose that people from cultures with a long history of nuclear families and strict monogamy act, on average, more altruistically than people from other cultures; and you put people from both cultures together in a new environment with neither monogamy nor nuclear families.  We would probably rather say that the people from these different cultures have different values; not that they both have the same preference to "help their genes", but that the people from the monogamous culture have an evolved bias that causes them to erroneously treat strangers nicely in this new environment.  Again, we prefer the organismal preference.

However, if we follow this principle consistently, it prevents us from ever trying to improve ourselves, since it in effect defines our present selves as optimal:

  • Humans like eating food with fat, sugar, and salt.  In our ancestral context, that expressed the human value of optimizing nutrition.  The evolutionary preference is for good nutrition; the organismal preference is for fat, sugar, and salt.  By analogy to contraception, liking fat, sugar, and salt is not an evolved but dysfunctional bias in taste; it's a true human value.
  • Suppose fear of snakes is triggered by the shape and motion of snakes.  The organismal preference is against snakes.  The evolutionary preference is against poisonous snakes.  If the world is now full of friendly cybernetic snakes, you must conclude that prejudice against them is a human value to be preserved, not a bias to be overcome.  Death to the friendly snakes!
  • Men enjoy violence.  Hitting a stranger over the head with a stick is naturally fun to human males, and it takes a lot of social conditioning to get them not to do this, or at least to restrict themselves to video games.  By what principle can we say that this is merely an obsolete heuristic to protect the tribe that is no longer helpful in our present environment; yet having sex with a condom is enjoying a preference?
  • (Santos et al. 2010) reports (summarized in Science Online) that children with a genetic mutation causing Williams syndrome, which causes less fear of strangers, have impaired racial stereotyping, but intact gender stereotyping.  This suggests that racism, and perhaps sexism, are evolved preferences actively implemented by gene networks.

So the "organismal vs. evolutionary" distinction doesn't help us choose what's a preference and what's a bias.  Without any way of doing that, it is in principle impossible to create a category of "preferences" distinct from "preferred outcomes".  A "value" consists of declarative knowledge, algorithms, and environment, taken together.  Change any of those, and it's not the same value anymore.

This means that extrapolating human values into a different environment gives an error message.

A ray of hope? ...

I just made a point by presenting cases in which most people have intuitions about which outcome is correct, and showing that these intuitions don't follow a consistent rule.

So why do we have the intuitions?

If we have consistent intuitions, they must follow some rule.  We just don't know what it is yet.  Right?

... No.

We don't have consistent intuitions.

Any one of us has consistent intuitions; and those of us living in Western nations in the 21st century have a lot of intuitions in common.  We can predict how most of these intuitions will fall out using some dominant cultural values.  The examples involving monogamy and violent males rely on the present relatively high weight on the preference to reduce violent conflict.  But this is a context-dependent value!  <just-so story>It arises from living in a time and a place where technology makes interactions between tribes more frequent and more beneficial, and conflict more costly</just-so story>.  But looking back in history, we see many people who would disagree with it:

  • Historians struggle to explain the origins of World War I and the U.S. Civil War.  Sometimes the simplest answer is best:  They were for fun.  Men on both sides were itching for an excuse to fight.
  • In the 19th century, Americans killed off the Native Americans to have their land.  Americans universally condemn that action now that they are secure in its benefits; most Americans condoned it at the time.
  • Homer would not have agreed that violence is bad!  Skill at violence was the greatest virtue to the ancient Greeks.  The tension that generates tragedy in the Iliad is not between violence and empathy, but between saving one's kin and saving one's honor.  Hector is conflicted, but not about killing Greeks.  His speech on his own tragedy ends with his wishes for his son:  "May he bring back the blood-stained spoils of him whom he has laid low, and let his mother's heart be glad."
  • The Nazis wouldn't have agreed that enjoying violence was bad.  We have learned nothing if we think the Nazis rose to power because Germans suddenly went mad en masse, or because Hitler gave really good speeches.  Hitler had an entire ideology built around the idea, as I gather, that civilization was an evil constriction on the will to power; and artfully attached it to a few compatible cultural values.
  • A similar story could be told about communism.

The idea that violence (and sexism, racism, and slavery) is bad is a minority opinion in human cultures over history.  Nobody likes being hit over the head with a stick by a stranger; but in pre-Christian Europe, it was the person who failed to prevent being struck, not the person doing the striking, whose virtue was criticized.

Konrad Lorenz believed that the more deadly an animal is, the more emotional attachment to its peers its species evolves, via group selection (Lorenz 1966).  The past thousand years of history has been a steady process of humans building sharper claws, and choosing values that reduce their use, keeping net violence roughly constant.  As weapons improve, cultural norms that promote conflict must go.  First, the intellectuals (who were Christian theologians at the time) neutered masculinity; in the Enlightenment, they attacked religion; and in the 20th century, art.  The ancients would probably find today's peaceful, offense-forgiving males as nauseating as I would find a future where the man on the street embraces postmodern art and literature.

This gradual sacrificing of values in order to attain more and more tolerance and empathy, is the most-noticable change in human values in all of history.  This means it is the least-constant of human values.  Yet we think of an infinite preference for non-violence and altruism as a foundational value!  Our intuitions about our values are thus as mistaken as it is possible for them to be.

(The logic goes like this:  Humans are learning more, and their beliefs are growing closer to the truth.  Humans are becoming more tolerant and cooperative.  Therefore, tolerant and cooperative values are closer to the truth.  Oops!  If you believe in moral truth, then you shouldn't be searching for human values in the first place!)

Catholics don't agree that having sex with a condom is good.  They have an elaborate system of belief built on the idea that teleology express God's will, and so underlying purpose (what I call evolutionary preference) always trumps organismal preference.

And I cheated in the question on monogamy.  Of course you said that being more altruistic wasn't an error.  Everyone always says they're in favor of more altruism.  It's like asking whether someone would like lower taxes.  But the hypothesis was that people from non-monogamous or non-family-based cultures do in fact show lower levels of altruism.  By hypothesis, then, they would be comfortable with their own levels of altruism, and might feel that higher levels are a bias.

Preferences are complicated and numerous, and arise in an evolutionary process that does not guarantee consistency.  Having conflicting preferences makes action difficult.  Energy minimization, a general principle that may underly much of our learning, simply means reducing conflicts in a network.  The most basic operations of our neurons thus probably act to reduce conflicts between preferences.

But there are no "true, foundational" preferences from which to start.  There's just a big network of them that can be pushed into any one of many stable configurations, depending on the current environment.  There's the Catholic configuration, and the Nazi configuration, and the modern educated tolerant cosmopolitan configuration.  If you're already in one of those configurations, it seems obvious what the right conclusion is for any particular value question; and this gives the illusion that we have some underlying principle by which we can properly choose what is a value and what is a bias.  But it's just circular reasoning.

What about qualia?

But everyone agrees that pleasure is good, and pain is bad, right?

Not entirely - I could point to, say, medieval Europe, when many people believed that causing yourself needless pain was virtuous.  But, by and large yes.

And beside the point (although see below).  Because when we talk about values, the eventual applications we have in mind are never about qualia.  Nobody has heated arguments about whose qualia are better.  Nobody even really cares about qualia.  Nobody is going to dedicate their life to building Friendly AI in order to ensure that beings a million years from now still dislike castor oil and enjoy chocolate.

We may be arguing about preserving a tendency to commit certain acts that give us a warm qualic glow, like helping a bird with a broken wing.  But I don't believe there's a dedicated small-animal-empathy quale.  More likely there's a hundred inferential steps linking an action, through our knowledge and thinking processes, to a general-purpose warm-glow quale.

Value is a network concept

Abstracting human behavior into "human values" is an ill-posed problem.  It's an attempt to divine a simple description of our preferences, outside the context of our environment and our decision process.  But we have no consistent way of deciding what are the preferences, and what is the context.  We have the illusion that we can, because our intuitions give us answers to questions about preferences - but they use our contextually-situated preferences to do so.  That's circular reasoning.

The problem in trying to root out foundational values for a person is the same as in trying to root out objective values for the universe, or trying to choose the "correct" axioms for a geometry.  You can pick a set that is self-consistent; but you can't label your choice "the truth".

These are all network concepts, where we try to isolate things that exist only within a complex homogeneous network.  Our mental models of complex networks follow mathematics, in which you choose a set of axioms as foundational; or social structures, in which you can identify a set of people as the prime movers.  But these conceptions do not even model math or social structures correctly.  Axioms are chosen for convenience, but a logic is an entire network of self-consistent statements, many different subsets of which could have been chosen as axioms.  Social power does not originate with the rulers, or we would still have kings.

There is a very similar class of problems, including symbol grounding (trying to root out the nodes that are the sources of meaning in a semantic network), and philosophy of science (trying to determine how or whether the scientific process of choosing a set of beliefs given a set of experimental data converges on external truth as you gather more data).  The crucial difference is that we have strong reasons for believing that these networks refer to an external domain, and their statements can be tested against the results from independent access to that domain.  I call these referential network concepts.  One system of referential network concepts can be more right than another; one system of non-referential network concepts can only be more self-consistent than another.

Referential network concepts cannot be given 0/1 truth-values at a finer granularity than the level at which a network concept refers to something in the extensional (referred-to) domain.  For example, (Quine 1968) argues that a natural-language statement cannot be unambiguously parsed beyond the granularity of the behavior associated with it.  This is isomorphic to my claim above that a value/preference can't be parsed beyond the granularity of the behavior of an agent acting in an environment.

Thomas Kuhn gained notoriety by arguing (Kuhn 1962) that there is no such thing as scientific progress, but only transitions between different stable states of belief; and that modern science is only different from ancient science, not better.  (He denies this in the postscript to the 1969 edition, but it is the logical implication of both his arguments and the context he presents them in.)  In other words, he claims science is a non-referential network concept.  An interpretation in line with Quine would instead say that science is referential at the level of the experiment, and that ambiguities may remain in how we define the fine-grained concepts used to predict the outcomes of experiments.

Determining whether a network concept domain is referential or non-referential is tricky.  The distinction was not even noticed until the 19th century.  Until then, everyone who had ever studied geometry, so far as I know, believed there was one "correct" geometry, with Euclid's 5 postulates as axioms.  But in the early 19th century, several mathematicians proved that you could build three different, consistent geometries depending on what you put in the place of Euclid's fifth postulate.  The universe we live in most likely conforms to only one of these (making geometry referential in a physics class); but the others are equally valid mathematically (making geometry non-referential in a math class).

Is value referential, or non-referential?

There are two ways of interpreting this question, depending on whether one means "human values" or "absolute values".

Judgements of value expressed in human language are referential; they refer to human behavior.  So human values are referential.  You can decide whether claims about a particular human's values are true or false, as long as you don't extend those claims outside the context of that human's decision process and environment.  This claim is isomorphic to Quine's claim about meaning in human language.

Asking about absolute values is isomorphic to applying the symbol-grounding problem to consciousness.  Consciousness exists internally, and is finer-grained than human behaviors.  Providing a symbol-grounding method that satisfied Quine's requirements would not provide any meanings accessible to consciousness.  Stevan Harnad (Harnad 2000) described how symbols might be grounded for consciousness in sense perceptions and statistical regularities of those perceptions.

(This brings up an important point, which I will address later:  You may be able to assign referential network concepts probabilistic or else fuzzy truth values at a finer level of granularity than the level of correspondence.  A preview: This doesn't get you out of the difficulty, because the ambiguous cases don't have mutual information with which they could help resolve each other.)

Can an analogous way be found to ground absolute values?  Yes and no.  You can choose axioms that are hard to argue with, like "existence is better than non-existence", "pleasure is better than pain", or "complexity is better than simplicity".  (I find "existence is better than non-existence" pretty hard to argue with; but Buddhists disagree.)  If you can interpret them in an unambiguous way, and define a utility calculus enabling you to make numeric comparisons, you may be able to make "absolute" comparisons between value systems relative to your axioms.

You would also need to make some choices we've talked about here before, such as "use summed utility" or "use average utility".  And you would need to make many possibly-arbitrary interpretation assumptions such as what pleasure is, what complexity is, or what counts as an agent.  The gray area between absolute and relative values is in how self-evident all these axioms, decisions, and assumptions are.  But any results at all - even if they provide guidance only in decisions such as "destroy / don't destroy the universe" - would mean we could claim there is a way for values to be referential at a finer granularity than that of an agent's behavior.  And things that seem arbitrary to us today may turn out not to be; for example, I've argued here that average utilitarianism can be derived from the von Neumann-Morgenstern theorem on utility.

... It doesn't matter WRT friendly AI and coherent extrapolated volition.

Even supposing there is a useful, correct, absolute lattice on value system and/or values, it doesn't forward the project of trying to instill human values in artificial intelligences.  There are 2 possible cases:

  1. There are no absolute values.  Then we revert to judgements of human values, which, as argued above, have no unambiguous interpretation outside of a human context.
  2. There are absolute values.  In which case, we should use them, not human values, whenever we can discern them.

Fuzzy values and fancy math don't help

So far, I've looked at cases of ambiguous values only one behavior at a time.  I mentioned above that you can assign probabilities to different value interpretations of a behavior.  Can we take a network of many probabilistic interpretations, and use energy minimization or some other mathematics to refine the probabilities?

No; because for the ambiguities of interest, we have no access to any of the mutual information between how to resolve two different ambiguities.  The ambiguity is in whether the hypothesized "true value" would agree or disagree with the results given by the initial propositional system plus a different decision process and/or environment.  In every case, this information is missing.  No clever math can provide this information from our existing data, no matter how many different cases we combine.

Nor should we hope to find correlations between "true values" that will help us refine our estimates for one value given a different unambiguous value. The search for values is isomorphic to the search for personality primitives.  The approach practiced by psychologists is to use factor analysis to take thousands of answers to questions that are meant to test personality phenotype, and mathematically reduce these to discover a few underlying ("latent") independent personality variables, most famously in the Big 5 personality scale (reviewed in Goldberg 1993).  In other words:  The true personality traits, and by analogy the true values a person holds, are by definition independent of each other.

We expect, nonetheless, to find correlations between the component of these different values that resides in decision processes.  This is because it is efficient to re-use decision processes as often as possible.  Evolution should favor partitioning values between propositions, algorithms, and environment in a way that minimizes the number of algorithms needed.  These correlations will not help us, because they have to do only with how a value is implemented within an organism, and say nothing about how the value would be extended into a different organism or environment.

In fact, I propose that the different value systems popular among humans, and the resulting ethical arguments, are largely different ways of partitioning values between propositions, algorithms, and environment, that each result in a relatively simple set of algorithms, and each in fact give the same results in most situations that our ancestors would have encountered.  It is the attempt to extrapolate human values into the new, manmade environment that causes ethical disagreements.  This means that our present ethical arguments are largely the result of cultural change over the past few thousand years; and that the next few hundred years of change will provide ample grounds for additional arguments even if we resolve today's disagreements.


Philosophically-difficult domains often involve network concepts, where each component depends on other components, and the dependency graph has cycles.  The simplest models of network concepts suppose that there are some original, primary nodes in the network that everything depends on.

We have learned to stop applying these models to geometry and supposing there is one true set of axioms.  We have learned to stop applying these models to biology, and accept that life evolved, rather than that reality is divided into Creators (the primary nodes) and Creatures.  We are learning to stop applying them to morals, and accept that morality depends on context and biology, rather than being something you can extract from its context.  We should also learn to stop applying them to the preferences directing the actions of intelligent agents.

Attempting to identify values is a network problem, and you cannot identify the "true" values of a species, or of a person, as they would exist outside of their current brain and environment.  The only consistent result you can arrive at by trying to produce something that implements human values, is to produce more humans.

This means that attempting to instill human values into an AI is an ill-posed problem that has no complete solution.  The only escape from this conclusion is to turn to absolute values - in which case you shouldn't be using human values in the first place.

This doesn't mean that we have no information about how human values can be extrapolated beyond humans.  It means that the more different an agent and an environment are from the human case, the greater the number of different value systems there are that are consistent with human values.  However, it appears to me, from the examples and the reasoning given here, that the components of values that we can resolve are those that are evolutionarily stable (and seldom distinctly human); while the contentious component of values that people argue about are their extensions into novel situations, which are undefined.  From that I infer that, even if we pin down present-day human values precisely, the ambiguity inherent in extrapolating them into novel environments and new cognitive architectures will make the near future as contentious as the present.


Michael Cook & Susan Mineka (1989).  Observational conditioning of fear to fear-relevant versus fear-irrelevant stimuli in rhesus monkeysJournal of Abnormal Psychology 98(4): 448-459.

Lewis Goldberg (1993).  The structure of phenotypic personality traitsAmerican Psychologist 48: 26-34.

Stevan Harnad (1990) The Symbol Grounding Problem. Physica D 42: 335-346.

Thomas Kuhn (1962).  The Structure of Scientific Revolutions. 1st. ed., Chicago: Univ. of Chicago Press.

Konrad Lorenz (1966).  On Aggression.  New York: Harcourt Brace.

Willard Quine (1969).  Ontological relativity.  The Journal of Philosophy 65(7): 185-212.

Andreia Santos, Andreas Meyer-Lindenberg, Christine Deruelle (2010).  Absence of racial, but not gender, stereotyping in Williams syndrome children.  Current Biology 20(7), April 13: R307-R308.

Stuart A. West and Andy Gardner (2010).  Science 12 March 2010: 1341-1344.