Probability space has 2 metrics

by Donald Hobson1 min read10th Feb 201911 comments


Probability & StatisticsHeuristics & BiasesLogic & Mathematics

A metric is technically defined as a function from pairs of points to the non negitive reals. With the properties that and and .

Intuitively, a metric is a way of measuring how similar points are. Which points are nearby which others. Probabilities can be represented in several different ways, including the standard range and the log odds . They are related by and and (equations algebraically equivalent)

The two metrics of importance are the baysian metric and the probability metric .

Suppose you have a prior, in log odds, for some proposition. Suppose you update on some evidence that is twice as likely to appear if the proposition is true, to get a posterior, in log odds. Then . The metric measures how much evidence you need to move between probabilities.

Suppose you have a choice of actions, the first action will make an event of utility happen with probability , the other will cause the probability of the event to be . How much should you care. .

The first metric stretches probabilities near 0 or 1 and is uniform in log odds. The second squashes all log odds with large absolute value together, and is uniform in probabilities. The first is used for baysian updates, the second for expected utility calculations.

Suppose an imperfect agent reasoned using a single metric, something in between these two. Some metric function less squashed up than but more squashed than around the ends. Suppose it crudely substituted this new metric into its reasoning processes whenever one of the other two metrics was required.

In decision theory problems, such an agent would rate small differences in probability as more important than they really were when facing probabilities near 0 or 1. From the inside, the difference between no chance and 0.01, would feel far larger than the distance between probabilities 0.46 and 0.47.

The Allais Paradox

However, the metric is more squashed than , so moving from a 10000:1 odds to 1000:1 odds seems to require less evidence than moving from 10:1 to 1:1. When facing small probabilities, such an agent would perform larger baysian updates than really necessary, based on weak evidence.

Privileging the Hypothesis

As both of these behaviors correspond to known human biases, could humans be using only a single metric on probability space?