With thanks to Rebecca Gorman for helping develop this idea.
I've been constructing toy examples of agents with preferences and biases. There turns out to be many, many different ways of doing this, and none of the toy examples seem very universal. The reason for this is that what we call "bias" can correspond to objects of very different type signatures.
Before diving into biases, a brief detour into preferences or values. We can talk about revealed preferences (which look at the actions of an agent and deduce preferences by adding the assumption that the agent is fully rational), stated preferences (which adds the assumption that the stated preferences are accurate), and preferences as internal mental judgments.
There are also various versions of idealised preferences, extrapolated from other types of preferences, with various consistency conditions.
The picture can get more complicated that this, but that's a rough overview of most ways of looking at preferences. Stated preferences and preferences-as-judgements are of type "binary relations": they allow one to say things like "x is better than y". Revealed preferences and idealised preferences are typically reward/utility functions: they allow one to say "x is worth a, y is worth b".
So despite the vast amount of different preferences out there, their type signatures are not that varied. Meta preferences are preferences over one's own preferences, and have a similar type signature.
In the Occam's razor paper, an agent has a reward function R (corresponding to its preferences) and a "planer" p, which is a map from reward functions to the agent's policies. The agent's bias, in this context, is the way in which p differs from the perfectly rational planer.
It seems that any bias could be fit into this general formalism, but most biases are narrower and more specific than that. It is also useful, and sometimes necessary, to talk about biases independently from preferences. Many idealised preferences are defined by looking at preferences after some biases are removed[1:1]. Defining these biases via the preferences would be completely circular. With that in mind, here are various biases with various type signatures. These examples are not exhaustively analysed; most of these biases can be re-expressed in slightly different ways, with different type signatures:
The point of this is that the term "bias" is extremely broad, covering many different types of objects with different type signatures. So when using the term formally or informally, be sure that whoever you're talking to uses the term the same way. And when you're trying to specify biases within a model, be aware that your formalism may not allow you to express many forms of "bias".
For instance, the paper "Libertarian Paternalism Is Not an Oxymoron" defines preferences as the choices people would make 'if they had complete information, unlimited cognitive abilities and no lack of self-control'. ↩︎ ↩︎