A classic example of human bias is when our political values interfere with our ability to accept data or policies from people we perceive as opponents. When most people feel like new evidence threatens their values, their first instincts are often to deny or subject this evidence to more scrutiny instead of openly considering it. Such reactions are quite common when challenged: it takes active effort not to react purely defensively and consider the critic’s models, even when they are right.

How can we understand why this occurs? Seen from a stereotypical view of human behavior, values and preferences about the world are decoupled from our world model and beliefs. An independent decision theory uses them to guide our actions, and we use our epistemology to update our world models with new information. In this view, all these parts are all nice, independent, consistent things. This view works to some degree: we know facts about all sorts of things, from politics to biology, seem to have values about various circumstances, and so on. Cases like political values interfering with our ability to update our beliefs are then modeled as irrational noise on top of this clean model of human behavior. If we can eliminate this noise and get ourselves to overcome our irrational urges, we can (in principle) behave rationally. 

But to make this model of factored rationality work, we need a lot of corrections to account for all of the cognitive biases that humans display. Each one adds more parameters and noise, and there are a lot of them! It should make us suspicious that we are taking a model of the world, and each time we come across something that contradicts that view, we just add more noise and parameters to it.


Another potential model that avoids this ad-hoc addition of noise is seeing this unwillingness to update as something more fundamental: our values, beliefs, and decision theory are entangled and do not exist independently. 

Consequences of taking this model seriously include:

  • Changing beliefs can change values and vice versa, making us more resistant to updating
  • To have sharp beliefs and values, we must actively implement them. This does not happen by default
  • Even after we implement a belief or value, and a decision theory around it, it is still local. The implementation may still clash with other parts of the messy processes driving our behavior
  • Implementing values and beliefs isn’t free and takes time and effort to do well, so we need to decide when this is worthwhile

To continue the example of accepting critical feedback, say I discuss an idea I have of a tool to build with one of my colleagues. He pushes me on a few practical details: I need to be more concrete about the use case, if there are better ways to do it, and if this is the best use of my time. But in its original form, my idea wasn’t a clean set of claims about the world, which I use to make decisions about what to build. Instead, they got tangled with my values: I like my ideas; they are mine, after all. If I put in the work to untangle my model of reality with these emotions and values, accepting that it might feel bad, I can more directly apply the evidence and models he presents to my idea.

In this model, humans do not have cleanly separated values, world models, and decision theory! One method of dealing with this is explicitly implementing locally consistent beliefs and values and a decision theory based on them. This implementation is limited: it is, at best, locally consistent and takes time and energy to create. 

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 1:06 PM

In my opinion, they key confounder is trust. People get most of their information (beliefs) about faraway and abstract things from other people, but if they are in conflict with those other people, then it is a severe security vulnerability to accept them.

I don't think humans (or anything else with bounded rationality) can have cleanly separated values, world models, and decision theory.

An independent decision theory uses them to guide our actions, and we use our epistemology to update our world models with new information. In this view, all these parts are all nice, independent, consistent things.

You can only update world models on evidence independently if you have enough cognitive capacity to examine every conceivable world model and update each one on the evidence. Due to the combinatorial explosion of interdependence of parts within each of those models, this is physically impossible. Failing that, we can only make updates via non-independent heuristics and try to avoid known systematic failures.

Every increment of reducing failure modes is expensive. They are often difficult to recognize, it is difficult to find better heuristics to replace them, and then there is the difficulty of putting them into actual practice. Many of the heuristics known to fail have no known replacements that are demonstrably better.

What's worse, there seem to be underlying psychological attractors toward known bad processes. Some of these bad epistemic attractors seem to be actually good in many other ways that we can't clearly identify or quantify, and definitely can't yet adequately improve.

We individually, as a community, and as a species don't know nearly enough yet to do much better. I'm not sure that without technological self-modification we can do much better, and that would have enormous risks itself.

Interesting article, especially because I’m currently rereading some decision making material in the light of some LLM projects.

I think a very interesting part of your discussed setups is how the world model is defined in detail.

I see it as something that is already involved in our perception with some major restrictions, i.e., incomplete observability and some kind of biological hard-coded guidance system (“feelings”). This view still allows for a factored model, but it comes with different dependencies between the building blocks, i.e., between the values, beliefs, and the decision theory, that have a great impact on the system.

Do you mean with “locally consistent beliefs and values” in your last paragraph not necessarily consistent to every other belief and value the individual has?

The situation you describe in your first paragraph also nicely fits this framework of human decision making outlined in https://www.worldbank.org/en/publication/wdr2015 (highly recommend):
> First, people make most judgments and most choices automatically, not deliberatively: we call this “thinking automatically.”
> Second, how people act and think often depends on what others around them do and think: we call this “thinking socially.”
> Third, individuals in a given society share a common perspective on making sense of the world around them and understanding themselves: we call this “thinking with mental models.”
So two other strong „processes“ are at play before a mental model / decision theory can be leveraged (if we share the same definition here that these are the same). So this is much more complex to resolve and we maybe need those corrective actions (due to the restrictions mentioned above)?

This is a very interesting topic and I’m looking forward to more discussions.