Geometric Rationality is Not VNM Rational

46Eric Neyman

12Scott Garrabrant

1Justin Helps

32Wei Dai

3cousin_it

2ESRogs

4Slider

2ESRogs

10eapi

4Scott Garrabrant

16eapi

7Scott Garrabrant

5Vitor

4Charlie Steiner

2Scott Garrabrant

4gsastry

4Scott Garrabrant

3A.H.

2PaulK

1PaulK

2DanielFilan

4Scott Garrabrant

1Closed Limelike Curves

5Scott Garrabrant

1Closed Limelike Curves

1MichaelStJules

New Comment

Hi! I just wanted to mention that I *really* appreciate this sequence. I've been having lots of related thoughts, and it's great to see a solid theoretical grounding for them. I find the notion that bargaining can happen across lots of different domains -- different people or subagents, different states of the world, maybe different epistemic states -- particularly useful. And this particular post presents the only argument for rejecting a VNM axiom I've ever found compelling. I think there's a decent chance that this sequence will become really foundational to my thinking.

Thanks!

I don't know that I have much "solid theoretical grounding." From my perspective, this sequence is me putting together a bunch of related concepts (and thus doing some of the hard parts of noticing that they are related), but not really giving good theoretical grounding. In fact, I was putting off posting this sequence, so I could have time to develop theoretical grounding, but then gave up on that and just posted what I had in response to the community wanting orientation around FTX.

This reminds me of an example I described in this SL4 post:

After suggesting in a previous post [1] that AIs who want to cooperate with each other may find it more efficient to merge than to trade, I realized that voluntary mergers do not necessarily preserve Bayesian rationality, that is, rationality as defined by standard decision theory. In other words, two "rational" AIs may find themselves in a situation where they won't voluntarily merge into a "rational" AI, but can agree merge into an "irrational" one. This seems to suggest that we shouldn't expect AIs to be constrained by Bayesian rationality, and that we need an expanded definition of what rationality is.

Let me give a couple of examples to illustrate my point. First consider an AI with the only goal of turning the universe into paperclips, and another one with the goal of turning the universe into staples. Each AI is programmed to get 1 util if at least 60% of the accessible universe is converted into its target item, and 0 utils otherwise. Clearly they can't both reach their goals (assuming their definitions of "accessible universe" overlap sufficiently), but they are not playing a zero-sum game, since it is possible for them to both lose, if for example they start a destructive war that devastates both of them, or if they just each convert 50% of the universe.

So what should they do? In [1] I suggested that two AIs can create a third AI whose utility function is a linear combination of the utilities of the original AIs, and then hand off their assets to the new AI. But that doesn't work in this case. If they tried this, the new AI will get 1 util if at least 60% of the universe is converted to paperclips, and 1 util if at least 60% of the universe is converted to staples. In order to maximize its expected utility, it will pursue the one goal with the highest chance of success (even if it's just slightly higher than the other goal). But if these success probabilities were known before the merger, the AI whose goal has a smaller chance of success would have refused to agree to the merger. That AI should only agree if the merger allows it to have a close to 50% probability of success according to its original utility function.

The problem here is that standard decision theory does not allow a probabilistic mixture of outcomes to have a higher utility than the mixture's expected utility, so a 50/50 chance of reaching either of two goals A and B cannot have a higher utility than 100% chance of reaching A and a higher utility than 100% chance of reaching B, but that is what is needed in this case in order for both AIs to agree to the merger.

I remember my reaction when first reading this was "both AIs delegate their power, then a jointly trusted coinflip is made, then a new AI is constructed which maximizes one of the utility functions". That seems to solve the problem in general.

But if these success probabilities were known before the merger, the AI whose goal has a smaller chance of success would have refused to agree to the merger. That AI should only agree if the merger allows it to have a close to 50% probability of success according to its original utility function.

Why does the probability need to be close to 50% for the AI to agree to the merger? Shouldn't its threshold for agreeing to the merger depend on how likely one or the other AI is to beat the other in a war for the accessible universe?

Is there an assumption that the two AIs are roughly equally powerful, and that a both-lose scenario is relatively unlikely?

It is first past the post, minorities get nothing. There might be an implicit assumption that the created new agent agrees with probablities with the old agents. 49% plausible papperclips, 51% plausible staples will act 100% staple and does not serve at all for paperclips.

Ah, maybe the way to think about it is that if I think I have a 30% chance of success before the merger, then I need to have a 30%+epsilon chance of *my goal being chosen* after the merger. And my goal will only be chosen if it is estimated to have the higher chance of success.

And so, if we assume that the chosen goal is def going to succeed post-merger (since there's no destructive war), that means I need to have a 30%+epsilon chance that my goal has a >50% chance of success post-merger. Or in other words "a close to 50% probability of success", just as Wei said.

I'm confused by the "no dutch book" argument. Pre-California-lottery-resolution, we've got , but post-California-lottery-resolution we simultaneously still have and "we refuse any offer to switch from to ", which makes me very uncertain what means here.

Is this just EDT vs UDT again, or is the post-lottery subtly distinct from the pre-lottery one, or is "if you see yourself about to be dutch-booked, just suck it up and be sad" a generally accepted solution to otherwise being DB'd, or something else?

I think it is EDT vs UDT. We prefer B to A, but we prefer CA to CB, not because of dutch books, but because CA is good enough for Bob to be fair, and A is not good enough for Bob.

...huh. So UDT in general gets to just ignore the independence axiom because:

- UDT's whole shtick is credibly pre-committing to seemingly bad choices in some worlds in order to get good outcomes in others, and/or
- UDT is optimizing over
*policies*rather than actions, and I guess there's nothing stopping us having preferences over*properties of the policy*like fairness (instead of only ordering policies by their "ground level" outcomes).- And this is where comes in, it's one way of encoding something-like-fairness.

Sound about right?

I find this example interesting but very weird. The couple is determining fairness by using "probability mass of happiness" as the unit of account. But it seems very natural to me to go one step further and adjust for the actual outcomes, investing more resources into the sub-agent that has worse luck.

I don't know if this is technically doable (I foresee complications with asymmetric utility functions of the two sub-agents, where one is harder to satisfy than the other, or even just has more variance in outcomes), but I think such an adjustment should recover the VNM independence condition.

Figure I should put this somewhere: I recently ran into some arguments from Lara Buchak that were similar to this (podcast: https://www.preposterousuniverse.com/podcast/2022/12/12/220-lara-buchak-on-risk-and-rationality/)

From listening to that podcast, it seems like even she would not advocate for preferring a lottery between two outcomes to either of the pure components.

See also: https://www.lesswrong.com/posts/qij9v3YqPfyur2PbX/indexical-uncertainty-and-the-axiom-of-independence for an argument against independence

Note that I tentatively think this will be the last post in the Geometric Rationality sequence.

I am confused about something. You write that a preference ordering is geometrically rational if.

This is compared to VNM rationality which favours if and only if .

Why, in the the definition of geometric rationality, do we have both the geometric average and the arithmetic average? Why not just say "an ordering is geometrically rational if it favours if and only if " ?

As I understand it, this is what Kelly betting does. It doesn't favour lotteries over either outcome, but it does reject the VNM continuity axiom, rather than the independence axiom.

These are super interesting ideas, thanks for writing the sequence!

I've been trying to think of toy models where the geometric expectation pops out -- here's a partial one, which is about conjunctivity of values:

Say our ultimate goal is to put together a puzzle (U = 1 if we can, U = 0 if not), for which we need 2 pieces. We have sub-agents A and B who care about the two pieces respectively, each of whose utility for a state is its probability estimates for finding its piece there. Then our expected utility for a state is the product of their utilities (assuming this is a one-shot game, so we need to find both pieces at once), and so our decision-making will be geometrically rational.

This easily generalizes to an N-piece puzzle. But, I don't know how to extend this interpretation to allow for unequal weighing of agents.

Another setting that seems natural and gives rise to multiplicative utility is if we are trying to cover as much of a space as possible, and we divide it dimension-wise into subspace, each tracked by a subagent. To get the total size covered, we multiply together the sizes covered within each subspace.

We can kinda shoehorn unequal weighing in here if we have each sub-agent track not just the fractional or absolute coverage of their subspace, but the per-dimension geometric average of their coverage.

For example, say we're trying to cover a 3D cube that's 10x10x10, with subagent A minding dimension 1 and subagent B minding dimensions 2 and 3. A particular outcome might involve A having 4/10 coverage and B having 81/100 coverage, for a total coverage of (4/10)*(81/100), which we could also phrase as (4/10)*(9/10)^2.

I'm not sure how to make uncertainty work correctly within each factor though.

A preference ordering on lotteries over outcomes is called geometrically rational if there exists some probability distribution over interval valued utility functions on outcomes such that if and only if .

How does this work with Kelly betting? There, aren't the relevant utility functions going to be either linear or logarithmic in wealth?

Yeah, I think this definition is more centrally talking about Nash bargaining than Kelly betting. Kelly betting can be expressed as maximizing a utility function that is logarithmic in wealth, and so can be seen as VNM rational

One elephant in the room throughout my geometric rationality sequence, is that it is sometimes advocating for randomizing between actions, and so geometrically rational agents cannot possibly satisfy the Von Neumann–Morgenstern axioms.

It's not just VNM; it just doesn't even make logical sense. Probabilities are about your knowledge, not the state of the world: barring bizarre fringe cases/Cromwell's law, I can always say that whatever I'm doing has probability 1, because I'm *currently doing it, *meaning it's physically impossible to randomize your own actions. I can certainly have a probability other than 0 or 1 that I *will* do something, if this action depends on information I haven't received. But as soon as I receive all the information involved in making my decision and update on it, I can't have a 50% chance of doing something. Trying to randomize your own actions involves refusing to update on the information you have, a violation of Bayes' theorem.

The problem is they don't want to switch to Boston, they are happy moving to Atlanta.

In *this* world, the one that actually exists, Bob still wants to move to Boston. The fact that Bob made a promise and would now face additional costs associated with breaking the contract (i.e. upsetting Alice) doesn't change the fact that he'd be happier in Boston, it just means that the contract and the action of revealing this information changed the options available. The choices are no longer "Boston" vs. "Atlanta," they're "Boston and upset Alice" vs. "Atlanta and don't upset Alice."

Moreover, holding to this contract after the information is revealed also rejects the possibility of a Pareto improvement (equivalent to a Dutch book). Say Alice and Bob agree to randomize their choice as you say. In this case, both Alice and Bob are strictly worse off than if they had agreed on an insurance policy. A contract that has Bob more than compensate Alice for the cost of moving to Boston if the California option fails would leave *both *of them strictly better off.

So, I am trying to talk about the preferences of the couple, not the preferences of either individual. You might reject that the couple is capable of having preference, if so I am curious if you think Bob is capable of having preferences, but not the couple, and if so, why?

I agree if you can do arbitrary utility transfers between Alice and Bob at a given exchange rate, then they should maximize the sum of their utilities (at that exchange rate), and do a side transfer. However, I am assuming here that efficient compensation is not possible. I specifically made it a relatively big decision, so that compensation would not obviously be possible.

Whether the couple is capable of having preferences probably depends on your definition of “preferences.” The more standard terminology for preferences by a group of people is “social choice function.” The main problem we run into is that social choice functions don’t behave like preferences.

I wrote a post about ex ante prioritarianism some time ago, with some other references that might be of interest: https://forum.effectivealtruism.org/posts/bqcxp57hTybusvcqp/ex-ante-prioritarianism-and-negative-leaning-utilitarianism-1

More recent objection, probably basically a money pump (I haven't read the article): "In this article, I argue that Ex-Ante Prioritarianism suffers from much the same problem: it violates a sequential version of Ex-Ante Pareto, that is, it prescribes sequences of choices that worsen the expectations for everyone." https://www.cambridge.org/core/journals/utilitas/article/exante-prioritarianism-violates-sequential-exante-pareto/EC2F27EC7F39D4BC009AC76C86F1C7F7

One elephant in the room throughout my geometric rationality sequence, is that it is sometimes advocating for randomizing between actions, and so geometrically rational agents cannot possibly satisfy the Von Neumann–Morgenstern axioms. That is correct: I am rejecting the VNM axioms. In this post, I will say more about why I am making such a bold move.

## A Model of Geometric Rationality

I have been rather vague on what I mean by geometric rationality. I still want to be vague in general, but for the purposes of this post, I will give a concrete definition, and I will use the type signature of the VNM utility theorem. (I do not think this definition is good enough, and want it to restrict its scope to this post.)

A preference ordering on lotteries over outcomes is called geometrically rational if there exists some probability distribution P over interval valued utility functions on outcomes such that L⪯M if and only if GU∼PEO∼LU(O)≤GU∼PEO∼MU(O).

For comparison, an agent is VNM rational there exists a single utility function U, such that L⪯M if and only if EO∼LU(O)≤EO∼MU(O).

Geometric Rationality is weaker than VNM rationality, since under reasonable assumptions, we can assume the utility function of a VNM rational agent is interval valued, and then we can always take the probability distribution that assigns probability 1 to this utility function.

Geometric Rationality is strictly weaker, because it sometimes strictly prefers lotteries over any of the deterministic outcomes, and VNM rational agents never do this.

The VNM utility theorem says that any preference ordering on lotteries that satisfies some simple axioms must be VNM rational (i.e. have a utility function as above). Since I am advocating for a weaker notion of rationality, I must reject some of these axioms.

## Against Independence

The VNM axiom that I am rejecting is the independence axiom. It states that given lotteries A, B, and C, and probability p, A⪯B if and only if pC+(1−p)A⪯pC+(1−p)B. Thus, mixing in a probability p of C will not change my preference between A and B.

Let us go through an example.

Alice and Bob are a married couple. They are trying to decide where to move, buy a house, and live for the rest of their lives. Alice prefers Atlanta, Bob prefers Boston. The agent I am modeling here is the married couple consisting of Alice and Bob.

Bob's preference for Boston is sufficiently stronger than Alice's preference for Atlanta, that given only these options, they would move to Boston (A≺B).

Bob is presented with a unique job opportunity, where he (and Alice) can move to California, and try to save the world. However, he does not actually have a job offer yet. They estimate an 80 percent chance that he will get a job offer next week. Otherwise, they will move to Atlanta or Boston.

California is a substantial improvement for Bob's preferences over either of the other options. For Alice, it is comparable to Boston. Alice and Bob are currently deciding on a policy of what to do conditional on getting and not getting the offer. It is clear that if they get the offer, they will move to California. However, they figure that since Bob's preferences are in expectation being greatly satisfied in the 80 percent of worlds where they are in California, they should move to Atlanta if they do not get the offer (pC+(1−p)B≺pC+(1−p)A).

Alice and Bob are collectively violating the independence axiom, and are not VNM rational. Are they making a mistake? Should we not model them as irrational due to their weird obsession with fairness?

## Dutch Books and Updatelessness

You might claim that abandoning the independence axiom opens up Alice and Bob up to get Dutch booked. The argument would go as follows. First, you offer Alice and Bob a choice between two policies:

Policy CA: California if possible, otherwise Atlanta, and

Policy CB: California if possible, otherwise Boston.

They choose policy CA. Then, you reveal that they did not get the job offer, and will have to move to Atlanta. You offer them to pay you a penny to instead be able to move to Boston. In this way, you extract free money from them!

The problem is they don't want to switch to Boston, they are happy moving to Atlanta. Bob's preferences are being extra satisfied in the other possible worlds where he is in California. He can take a hit in this world.

If California did not exist, they would want to move to Boston, and would pay a penny to move to Boston rather than Atlanta. The problem is that they are being updateless. When they observe they cannot choose California, they do not fully update on this fact and pretend that the good California worlds do not exist. Instead they follow through with the policy that they agreed to initially.

We can take this further, and pretend that they didn't even consider Atlanta vs Boston. They just got a job offer, and decided to move to California. Then all the world saving money disappeared over night, the job offer was retracted, and Alice and Bob are newly considering Atlanta vs Boston. They might reason, that if they would have taken the time to consider this possibility up front, they would have chosen Atlanta, so they follow through the policy that they would have chosen if they would have thought about it more in advance.

They have a preference for fairness, and this preference is non-local. It cares about what happens in other worlds.

I gave the above example about a married couple, because it made it cleaner to understand the desire for fairness. However, I think that it makes sense for individual humans to act this way with respect to their various different types of preferences.