# Ω 26

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(This post idea is due entirely to Scott Garrabrant, but it has been several years and he hasn't written it up.)

In 2009, Vladimir Nesov observed that probability can be mixed up with utility in different ways while still expressing the same preferences. The observation was conceptually similar to one made by Jeffrey and Bolker in the book The Logic of Decision, so I give them intellectual priority, and refer to the result as "Jeffrey-Bolker rotation".

Based on Nesov's post, Scott came up with a way to represent preferences as vector-valued measures, which makes the result geometrically clear and mathematically elegant.

## Vector Valued Preferences

As usual, we think of a space of events which form a sigma algebra. Each event has a probability and an expected utility associated with it. However, rather than dealing with directly, we define . Vladimir Nesov called "shouldness", but that's fairly meaningless. Since it is graphed on the y-axis, represents utility times probability, and is otherwise fairly meaningless, a good name for it is "up". Here is a graph of probability and upness for some events, represented as vectors:

(The post title is a pun on the fact that this looks like the complex plane: events are complex numbers with real component P and imaginary component Q. However, it is better to think of this as a generic 2D vector space rather than the complex plane specifically.)

If we assume and are mutually exclusive events (that is, ), then calculating the P and Q of their union is simple. The probability of the union of two mutually exclusive events is just the sum:

The expected utility is the weighted sum of the component parts, normalized by the sum of the probabilities:

The numerator is just the sum of the shouldnesses, and the denominator is just the probability of the union:

But, we can multiply both sides by the denominator to get a relationship on shouldness alone:

Thus, we know that both coordinates of are simply the sum of the component parts. This means union of disjoint events is vector addition in our vector space, as illustrated in my diagram earlier.

## Linear Transformations

When we represent preferences in a vector space, it is natural to think of them as basis-independent: the way we drew the axes was arbitrary; all that matters is the system of preferences being represented. What this ends up meaning is that we don't care about linear transformations of the space, so long as the preferences don't get reflected (which reverses the preference represented). This is a generalization of the usual "utility is unique up to affine transformations with positive coefficient": utility is no longer unique in that way, but the combination of probability and utility is unique up to non-reflecting linear transformations.

Let's look at that visually. Multiplying all the expected utilities by a positive constant doesn't change anything:

Adding a constant to expected utility doesn't change anything:

Slightly weird, but not too weird... multiplying all the probabilities by a positive constant (and the same for Q, since Q is U*P) doesn't change anything (meaning we don't care if probabilities are normalized):

Here's the really new transformation, which can combine with the other 4 to create all the valid transformations. The Jeffrey-Bolker rotation, which changes what parts of our preferences are represented in probabilities vs utilities:

Let's pause for a bit on this one, since it is really the whole point of the setup. What does it mean to rotate our vector-valued measure?

A simple example: suppose that we can take a left path, or a right path. There are two possible worlds, which are equally probable: in Left World, the left path leads to a golden city overflowing with wealth and charity, which we would like to go to with V=+1. The right path leads to a dangerous badlands full of bandits, which we would like to avoid, V=-1. On the other hand, Right World (so named because we would prefer to go right in this world) has a somewhat nice village on the right path, V=+.5, and a somewhat nasty swamp on the left, V=-.5. Supposing that we are (strangely enough) uncertain about which path we take, we calculate the events as follows:

• Go left in left-world:
• P=.25
• V=1
• Q=.25
• Go left in right-world:
• P=.25
• V=-.5
• Q=-.125
• Go right in left-world:
• P=.25
• V=-1
• Q=-.25
• Go right in right-world:
• P=.25
• V=.5
• Q=.125
• Go left (union of the two left-going cases):
• P=.5
• Q=.125
• V=Q/P=.25
• Go right:
• P=.5
• Q=-.125
• V=Q/P=-.25

We can calculate the V of each action and take the best. So, in this case, we sensibly decide to go left, since the Left-world is more impactful to us and both are equally probable.

Now, let's rotate 30°. (Hopefully I get the math right here.)

• Left in L-world:
• P=.09
• Q=.34
• V=3.7
• Left in R-world:
• P=.28
• Q=.02
• V=.06
• Right in L-world:
• P=.34
• Q=-.09
• V=-.26
• Right in R-world:
• P=.15
• Q=.23
• V=1.5
• Left overall:
• P=.37
• Q=.36
• V=.97
• Right overall:
• P=.49
• Q=.14
• V=.29

Now, it looks like going left is evidence for being in R-world, and going right is evidence for being in L-world! The disparity between the worlds has also gotten larger; L-world now has a difference of almost 4 utility between the different paths, rather than 2. R-world now evaluates both paths as positive, with a difference between the two of only .9. Also note that our probabilities have stopped summing to one (but as mentioned already, this doesn't matter much; we could normalize the probabilities if we want).

In any case, the final decision is exactly the same, as we expect. I don't have a good intuitive explanation of what the agent is thinking, but roughly, the decreased control the agent has over the situation due to the correlation between its actions and which world it is in seems to be compensated for by the more extreme payoff differences in L-world.

## Rational Preferences

Alright, so preferences can be represented as vector-valued measures in two dimensions. Does that mean arbitrary vector-valued measures in two dimensions can be interpreted as preferences?

No.

The restriction that probabilities be non-negative means that events can only appear in quadrants I and IV of the graph. We want to state this in a basis-independent way, though, since it is unnatural to have a preferred basis in a vector space. One way to state the requirement is that there must be a line passing through the (0,0) point, such that all of the events are strictly to one side of the line, except perhaps events at the (0,0) point itself:

As illustrated, there may be a single such line, or there may be multiple, depending on how closely preferences hug the (0,0) point. The normal vector of this line (drawn in red) can be interpreted as the dimension, if you want to pull out probabilities in a way which guarantees that they are non-negative. There may be a unique direction corresponding to probability, and there may not. Since , we get a unique probability direction if and only if we have events with both arbitrarily high utilities and arbitrarily low. So, Jeffrey-Bolker rotation is intrinsically tied up in the question of whether utilities are bounded.

Actually, Scott prefers a different condition on vector-valued measures: that they have a unique (0,0) event. This allows for either infinite positive utilities (not merely unbounded -- infinite), or infinite negative utilities, but not both. I find this less natural. (Note that we have to have an empty event in our sigma-algebra, and it has to get value (0,0) as a basic fact of vector-valued measures. Whether any other event is allowed to have that value is another question.)

How do we use vector-valued preferences to optimize? The expected value of a vector is the slope, . This runs into trouble for probability zero events, though, which we may create as we rotate. Instead, we can prefer events which are less clockwise:

(Note, however, that the preference of a (0,0) event is undefined.)

This gives the same answers for positive-x-value, but keeps making sense as we rotate into other quadrants. More and less clockwise always makes sense as a notion since we assumed that the vectors always stay to one side of some line; we can't spin around in a full circle looking for the best option, because we will hit the separating line. This allows us to define a preference relation based on the angle of being within 180° of 's.

## Conclusion

This is a fun picture of how probabilities and utilities relate to each other. It suggests that the two are inextricably intertwined, and meaningless in isolation. Viewing them in this way makes it somewhat more natural to think that probabilities are more like "caring measure" expressing how much the agent cares about how things go in particular worlds, rather than subjective approximations of an objective "magical reality fluid" which determines what worlds are experienced. (See here for an example of this debate.) More practically, it gives a nice tool for visualizing the Jeffrey-Bolker rotation, which helps us think about preference relations which are representable via multiple different belief distributions.

A downside of this framework is that it requires agents to be able to express a preference between any two events, which might be a little absurd. Let me know if you figure out how to connect this to complete-class style foundations which only require agents to have preferences over things which they can control.

# Ω 26

New Comment

I think a lot of commenters misunderstand this post, or think it's trying to do more than it is. TLDR of my take: it's conveying intuition, not suggesting we should model preferences with 2D vector spaces.

The risk-neutral measure in finance is one way that "rotations" between probability and utility can be made:

• under the actual measure P, agents have utility nonlinear in money (e.g. risk aversion), and probability corresponds to frequentist notions
• under the risk-neutral measure Q, agents have utility linear in money, and probability is skewed towards losing outcomes.

These two interpretations explain the same agent behavior. The risk-neutral measure still "feels" like probability due to its uniqueness in an efficient market (fundamental theorem of asset pricing), plus the fact that quants use and think in it every day to price derivatives. Mathematically, it's no different from the actual measure P.

The Radon-Nikodym theorem tells you how to transform between probability measures in general. For any utility function satisfying certain properties (which I don't know exactly), I think one can find a measure Q such that you're maximizing that utility function under Q. Sometimes when making career decisions, I think using the "actionable AI alignment probability measure" P_A which is P conditioned on my counterfactually saving the world. Under P_A, the alignment problem has a closer to 50% chance of being solved, my research directions are more tractable, etc. Again, P_A is just a probability measure, and "feels like" probability.

This post finds a particular probability measure Q which doesn't really have a physical meaning [1]. But its purpose is to make it more obvious that probability and utility are inextricably intertwined, because

• instead of explaining behavior in terms of P and the utility function V, you can represent it using P and Q
• P and Q form a vector space, and you can perform literal "rotations" between probability and utility that still predict the same agent behavior.

As far as I can tell, this is the entire point. I don't see this 2D vector space actually being used in modeling agents, and I don't think Abram does either.

Personally, I find it pretty compelling to just think of the risk-neutral measure, to understand why probability and utility are inextricably linked. But actually knowing there is symmetry between probability and utility does add to my intuition.

[1]: actually, if we're upweighting the high-utility worlds, maybe it can be called "rosy probability measure" or something.

As far as I can tell, this is the entire point. I don't see this 2D vector space actually being used in modeling agents, and I don't think Abram does either.

I largely agree. In retrospect, a large part of the point of this post for me is that it's practical to think of decision-theoretic agents as having expected value estimates for everything without having a utility function anywhere, which the expected values are "expectations of".

A utility function is a gadget for turning probability distributions into expected values. This object makes sense in a context like VNM, where you are asking agents to judge between arbitrary gambles. In the jeffrey-bolker setting, you instead only ask agents to choose between events, not gambles. This allows us to directly derive coherence constraints on expectations without introducing a function they're expectations "of".

For me, this fits better with the way humans seem to think; it's relatively easy to compare events to each other, but nigh impossible to take entire world-descriptions and compare them (which is what a utility function does).

The rotation comes into play because looking at preferences this way is much more 'situated': you are only required to have preferences relating to your current beliefs, rather than relating to arbitrary probability distributions (as in VNM). We can intuit from our experience that there is some wiggle room between probability vs preference when representing situations in the real world. VNM doesn't model this, because probabilities are simply given to us in the VNM setting, and we're to take them as gospel truth.

So jeffrey-bolker seems to do a better job of representing the subjective nature of probability, and the vector rotations illustrate this.

On the other hand, I think there is a real advantage to the 2d vector representation of a preference structure. For agents with identical beliefs (the "common prior assumption"), Harsanyi showed that cooperative preference structures can be represented by simple linear mixtures (Harsanyi's utilitarian theorem). However, Critch showed that combining preferences in general is not so simple. You can't separately average two agent's beliefs and their utility function; you have to dynamically change the weights of the utility-function averaging based on how bayesian updates shift the weights of the probability mixture.

Averaging the vector-valued measures together works fine, though, I believe. (I haven't worked it out in detail.) If true, this makes vector-valued measures an easier way to think about coalitions of cooperating agents who merge preferences in order to select a pareto-optimal joint policy.

I am confused. My current understanding is that we're starting with only a preference relation, and no assumptions on probability (so no lotteries, as in the VNM theorem). In that case, there are tons of utility functions that can model any given arbitrary preference relation. It seems like I could get a result like this by saying "take the preference relation, write down a utility function that encodes it, decompose it into the ratio of two parts, call one of them 'probability' and the other 'probability*utility', and now note that there are transformations to other utility functions that encode the same preference relation and unsurprisingly they change the relative amounts of each of the parts -- therefore probability and utility are inextricably linked". (This is almost certainly either wrong or a strawman, but I don't know how.) But in all of this there's no reason to think of the denominator of the ratio as "probability", we just called it that suggestively. Perhaps my critique is that if we start with _just_ a preference relation and only need to keep the preference relation intact, we shouldn't expect to recover anything like normal expected utility theory, because there's no formal reason to have anything like probabilities. Even if you want to interpret probability as a "caring measure" instead of "magical reality fluid" it should still show up before you work through the math and interpret one of the quantities as "caring measure". But mostly I'm confused so who knows, this may all be incoherent.

I'll admit that I'm skeptical. It's a cool mathematical trick, but why should we think it is anything more than that?

The uniqueness of 0 is only roughly equivalent to the half plane definition if you also assume convexity (I.e. the existence of independent coins of no value.)

I have the following question, the answer to which may be obvious but I have difficulty to understand: "expected utility" in a game is already multiplication of expected prize on its probability. Why we multiply it on the probability again?

Abram is multiplying the conditional expected utility of an event by the probability of that event. For example, the utility of a lottery ticket conditional on winning the lottery could be a million dollars, and we multiply that by the probability of winning the lottery. The result is "probutility" of an event. Taking the union of disjoint events is linear in both probabilities and probutilities, so we can think of them as coordinates of a vector.

I still have a feeling that he is using "expected utility" term differently than it is used in other places where it is already presented as (utility)x(probability), like here: https://wiki.lesswrong.com/wiki/Expected_utility

E.g.: In your example: utility of a winning ticket = 1 million USD

Probability of winning: one millionth

Expected utility of a ticket = 1 USD.

Probutility = ???

I was confused about this too, but now I think I have some idea of what's going on.

Normally probability is defined for events, but expected value is defined for random variables, not events. What is happening in this post is that we are taking the expected value of events, by way of the conditional expected value of the random variable (conditioning on the event). In symbols, if is some event in our sample space, we are saying , where is some random variable (this random variable is supposed to be clear from the context, so it doesn't appear on the left hand side of the equation).

Going back to cousin_it's lottery example, we can formalize this as follows. The sample space can be and the probability measure is defined as and . The random variable represents the lottery, and it is defined by and .

Now we can calculate. The expected value of the lottery is:

The expected value of winning is:

The "probutility" of winning is:

So in this case, the "probutility" of winning is the same as the expected value of the lottery. However, this is only the case because the situation is so simple. In particular, if was not equal to zero (while winning and losing remained exclusive events), then the two would have been different (the expected value of the lottery would have changed while the "probutility" would have remained the same).

What is happening in this post is that we are taking the expected value of events, by way of the conditional expected value of the random variable (conditioning on the event).

...and I was enlightened. Assuming this is correct (it fits with how I read this post and a couple others), this seems like a much better way to explain what's going on with probutility.

Probutility of winning = 1 USD

So what is the difference between probutility and "expected utility"? is it just another name for well-known idea? (The comment was edited as at first I read "probutility" as "probability" in your comment.)

I can't make sense of the part with R-world and L-world. You assign probabilities to your possible actions (by what rule?) then do arithmetic on them to decide which action to take (why does that depend on probabilities of actions?) then rotate the picture and find that actions are correlated with hidden facts (how can such correlation happen?) It looks like this metaphor doesn't work very well for decision-making, or we're using it wrong.

Well... I agree with all of the "that's peculiar" implications there. To answer your question:

The assignment of probabilities to actions doesn't influence the final decision here. We just need to assign probabilities to everything. They could be anything, and the decision would come out the same.

The magic correlation is definitely weird. Before I worked out an example for this post, I thought I had a rough idea of what Jeffrey-Bolker rotation does to the probabilities and utilities, but I was wrong.

I see the epistemic status of this as "counterintuitive fact" rather than "using the metaphor wrong". The vector-valued measure is just a way to visualize it. You can set up axioms in which the Jeffrey-Bolker rotation is impossible (like the Savage axioms), but in my opinion they're cheating to rule it out. In any case, this weirdness clearly follows from the Jeffrey-Bolker axioms of decision theory.

The assignment of probabilities to actions doesn't influence the final decision here. We just need to assign probabilities to everything. They could be anything, and the decision would come out the same.

Aren't there meaningful constraints here? If I think it's equally likely that I'm in L-world and R-world and that this is independent of my action, then I have the constraint that P(Left, L-world)=P(Left, R-world) and another constraint that P(Right, L-world)=P(Right, R-world), and if I haven't decided yet then I have a constraint that P>0 (since at my present state of knowledge I could take any of the actions). But beyond that, positive linear scalings are irrelevant.

What does it look like to rotate and then renormalize?

There seem to be two answers. The first answer is that the highest probability event is the one farthest to the right. This event must be the entire . All we do to renormalize is scale until this event is probability 1.

If we rotate until some probabilities are negative, and then renormalize in this way, the negative probabilities stay negative, but rescale.

The second way to renormalize is to choose a separating line, and use its normal vector as probability. This keeps probability positive. Then we find the highest probability event as before, and call this probability 1.

Trying to picture this, an obvious question is: can the highest probability event change when we rotate?

The restriction that probabilities be non-negative means that events can only appear in quadrants I and IV of the graph.

Why do we restrict the probabilities to be non-negative? Is there anything in particular that keeps us from pulling an Aaronson and generalizing probability to include negative and complex components, even absent a clear motivator like QM?

Doesn't the necessity of this half-plane containing the actions restore a difference, even just within this decision algorithm (leaving aside the uniqueness of probability in learning and reasoning), between probability and utility? Probability is the direction perpendicular to this invisible line, and utility is the slope relative to this invisible line.

It can, if there is a unique line. There isn't a unique line in general -- you can draw several lines, getting different probability directions for each.

Sure. And given the rescalability, for any set of values you can rescale everything so that almost any line is possible. But then everything is in some suspiciously narrow band, which again lends itself to a principal component sort of coordinate system.

When inferring such a division, interestingly the ordering of value remains unchanged, but the ordering of the inferred probability can be different than the ordering of the original probability, because average-value events are interpreted as more probable.