Epistemic status: I just thought of this today, and it made a huge amount of my confusion on this topic disappear. 

Values are this weird concept. Humans aren't utility maximizers, but we think of ourselves as them. There's a paradox in the values-beliefs framework: humans can identify the parts of themselves which correspond to values and beliefs, but this should actually be impossible? Here I propose the following answer:

Humans model themselves with an algorithm which has evolved to model other humans as value-belief based systems.

So why would this be the case? I can think of a few reasons.


The minimax algorithm as applied to chess (or other finite-option, total-information games) means assuming your opponent will play the best move for them on every turn. This is the optimal[1] way to play in the infinite compute limit, and is also pretty close to optimal in most other situations (actual algorithms sometimes attempt to compute the probability of their opponent playing each given move).

So what do you do if you're a hominid playing politics against other hominids? It seems likely that the most successful strategy would be to (by default) assume your opponent is as powerful as possible. This you'll take the fewest risks, and that you'll be the most accurate when it comes to the most dangerous opponents.

Since coherent decisions imply consistent utility, the most powerful opponents are expected utility maximizers, especially those who want things which cause them to compete with you. If your political opponent isn't well approximated by a utility maximizer, they won't be a threat. Even more subtly, if your opponent takes a mixture of actions, the most "important" actions will look like them acting as an expected utility maximizer.  All the other actions will be lost as noise.

Low Compute

Modelling another human as an expected utility maximizer is also probably quite efficient. Starting with a prior that they want the exact same things as a typical human, and then updating away from that, is pretty cheap. Starting with a prior that they will act towards their desires based on all the information they can see is also pretty cheap. Combined they do a decent job of guessing another human's actions. Especially the set of actions which are most relevant to you.

This only requires modelling another's senses, keeping track of their knowledge, any deviations from the "default" desires, and then running your own version of a utility maximizing algorithm on that to predict their actions.

Updating Beliefs and Keeping Risk Low

Since humans are able to update their beliefs about others, being wrong in the right direction is important. It's much better to overestimate a threat than underestimate a threat.

As an illustration: imagine someone is just really stupid about their utility maximization. You'll treat them as a threat until you learn they're an idiot, which is a pretty low-cost mistake to make. If they want something really weird, like to collect some cool rocks, that probably means they're not competing for your political position. Again, a low-cost mistake, and once you figure out what's really going on you can just ignore them.


A key part of humans is our ability to self model. It has been argued that this is the source of "consciousness" (but I'm not getting into that mess again), and it's definitely important. 

Until now I was aware of concepts like the blue-minimizing robot, but it seemed to be begging the question. Why would a blue-minimizing-strategy-executing robot have a self-modelling algorithm which is predisposed to be wrong in some important sense?

Now I feel I have an explanation (not necessarily correct, but gears-level at least!) for why the strategy-executing, reinforcement-learning, other-thing-doing human brain comes equipped with a tendency to model itself as this highly directed utility maximizing thing.

Also, I simply cannot shake the feeling that people (by default) model themselves as a "perfectly rational bayesian homunculus" and then add on patches to the model account for "irrationality". You just are cognitive biases.

  1. ^

    In the sense of maximizing probability of wining the game.


9 comments, sorted by Click to highlight new comments since: Today at 3:59 PM
New Comment

The idea of "human values" is rather late and formed in the middle of 20th century. Before that, people used other models to predict behaviour of peoples around them. E.g. Freudian Id, Ego and SuperEgo, model or Christian model of soul choosing between rules and desires. 

When you say the idea of human values is new, do you mean the idea of humans having values with regards to a utilitarian-ish ethics, is new? Or do you mean the concept of humans maximizing things rationally (or some equivalent concept) is new? If it's the latter I'd be surprised (but maybe I shouldn't be?).

The father of utilitarianism Bentam who worked around 1800 calculated utility as a balance between pleasures and pains, without mentioning "human values". He wrote: "Nature has placed mankind under the governance of two sovereign masters, pain and pleasure. "

The idea of "maximising things" seems to be even later.  Wiki: "In Ethics (1912), Moore rejects a purely hedonistic utilitarianism and argues that there is a range of values that might be maximized." But  "values" he wrote about are just abstarct ideas lie love, not "human values".

The next step was around 1977 when the idea of preference utilitarianism was formed. 

Another important thing for all this is von Neumann-Morgenstern theorem which connects ordered set of preferences and utility function.

So the way to the idea that "humans have values which they are maximising in utilitarian way" formed rather slowly.

I was referring to "values" more like the second case. Consider the choice blindness experiments (which are well-replicated). People think they value certain things in a partner, or politics, but really it's just a bias to model themselves as being more agentic than they actually are.

Answer here is obvious, but let's look at another example: Should I eat an apple? Apple promises pleasure and I want it, but after I have eaten it, I don't want eat anything as I am full. So the expected pleasure source has shifted.

In other words, we have in some sense bicameral mind: a conscious part which always follows pleasure and an unconscious part which constantly changes rewards depending on the persons' needs.  If we want to learn person's preferences, we want to learn rules why the rewards are given to some things and are not given for other. Someone likes reading and other one likes skying. 

And it is not a complete model of mind, just an illustration why reward is not enough to represent human values.

Humans aren't utility maximizers, but we think of ourselves as them

What makes you believe this? I wouldn't assume that most people think that way. In order to maximize utility you first have to define a utility function. This is impossible for most of us.  I find that I have a fuzzy list of wishes to satisfy, with unclear priorities that shift over time. I imagine that if a rational entity were to try to make sense of other entities that appear similar, it might make an assertion like yours. But what if it turns out that the rest of the entities have a much lower mix of rational / non-rational ("system 1" if you will) function during a given time period? It could be that other people are not attempting to maximize anything most of the time. Perhaps once in a while they sit down and reason about a particular goal, and most of the time they are delegating to more basic systems.

For what it's worth, I've looked into the question of what the heck are values are quite a bit in the past. This is basically where I left off the project before switching to other things because I went about as far as I could go at the time (and haven't had anything new interesting to say).

Where I left off:

Deconfusing Human Values Research Agenda v1

What would "incoherent decisions" look like for an agent that has a utility function defined on action-observation histories?