Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Diffractor

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Post Status: Original research, mathy, high-context. Make sure you've read the last three posts in the sequence first, this is going to be a very rough one to start on.

As a recap of the last three posts, bargaining has an elegant notion of what a neutral point is: the Nash bargaining solution. And, games where the two players can pay each other money (transferrable-utility games) have an elegant notion of what a neutral point is: the CoCo value. And, as it turns out, games in general have an elegant notion of what a neutral point is that subsumes the previous two concepts as special cases, which can be motivated by either trying to generalize the Nash bargaining solution to games without a clear disagreement point (as John Harsanyi did), or by trying to generalize the CoCo value to games without transferrable utility (as I did).

Sadly, there's an issue with all of these where a player can get a lot more utility by inventing increasingly effective ways of destroying the utility of everyone else. Basically, someone can go "I have a torturizer, favor me or I'll use it on you", and the threat will work. In fact, such reasoning is the default for the CoCo value. That issue is detailed in the second and third posts in the sequence.

And so, there's a question of "is there a distinguished point in transferrable-utility games which is generated by a more threat-resistant line of reasoning?" As per the third post in the sequence, using a correlated equilibrium as the fallback point can serve that role, but they're not unique.

There is! (kinda)

I have a lot of uncertainty whether the equation I wrote down in this post is The Threat-Resistant Bargaining Solution, but if it isn't, it's at least pretty close to The Solution. My thinking about it is still in flux; here's a rundown of the current situation, which also parallels the structure of the rest of the post.

Following the example set by CoCo values in the first two posts, there's a highly general pathway to amplify a solution concept for 2-player transferrable-utility games into a solution concept for n-player transferrable-utility games, and from there into a notion of equilibrium for games in general. Thus, in the quest for a more threat-resistant solution concept, we can restrict our attention to the special case of 2-player transferrable-utility games. These are vastly simpler because the players can settle matters by paying each other, and their utility functions can be calibrated against a common scale, namely, how much they'd pay for an outcome.

To aid in the search for a "good" solution for transferrable-utility games, the first step is listing out a whole bunch of desired properties to check. Also, in that section, the "n-player transferrable-utility case $\to$ full generality" step is discussed. There's a theorem that pretty much says "X equilibria exist, where X=your preferred solution concept", but it was spectacularly hard to prove.

In the toy case of 2-player games where Bob can act freely and Alice only has the ability to pay Bob, I found The Solution, as bolstered by a slick visual proof that if you take a few obvious desiderata, one of which is immunity to threats, it's the only possible solution.

There's an obvious way to generalize from the previous toy case to the more general setting of 2-player transferrable-utility games, but I don't have a comparable proof that it's the only possible threat-resistant solution and I would be extremely interested in someone proving that. It still fulfills all the nice properties I was looking for. And if you squint hard enough at what the process is doing, it looks sorta like it's generated from Yudkowsky's "block the opponent from getting more than what you think is fair" procedure to negotiate with agents that have different ideas of fairness. Also, it introduces the key concept of an "initiative game".

By using Shapley values, there's an obvious way to generalize to n-player transferrable-utility games in general. The result is threat-resistant and fulfills a bunch of nice properties, but has the baffling and fatal drawback of not being guaranteed to give all players $\geq$ their maximin value, as demonstrated by a tidy little toy problem (this only arises for $n \geq 3$ )

And, right before this post was slated to go out, I came up with an alternate n-player generalization of the 2-player case that's threat-resistant and gets pretty much all the nice properties at once, including giving players $\geq$ their maximin value!

Looking at what the solution does in the special case of bargaining problems, it induces a novel scale-and-shift-invariant bargaining solution that (kinda) doesn't depend on where the disagreement point is! And there's an easy heuristic for applying it in real life.

And finally there's some thought experiments that suggest that this isn't threat-proof enough, and also I don't have an argument that these sorts of solutions flow out of more primitive desiderata about agent behavior. So if someone wants to take this further, there's definitely remaining room to do so.

It's a little tricky to present, however, because whoo boy this is going to be a long post. Grab a snack and a whiteboard, you'll need it.

General Preliminaries:

Amplifying Solutions

One of the insights gained from looking at CoCo values, and how they generated Harsanyi equilibria (see posts 1 and 2 in the sequence), is that there's a general pathway to take an equilibrium notion for 2-player games to an equilibrium notion for n-player games with transferrable utility, and from there, to generate an equilibrium notion for n-player games in full generality.

Two to N with Shapley Values

A game consists of a set of players $N$ , the actions of the players ${A_{i}}_{i \in N}$ , and utility functions for the players, ${U_{i}}_{i \in N}$ , of type $\prod_{i \in N} A_{i} \to R$ . Well, to be fully technical, the action spaces must be compact metric spaces and the utility functions must be continuous, but usually you can ignore stuff like that.

Transferrable-utility games are those where we're interpreting all the numbers as money. The payoffs are how much the various players value different outcomes, as denominated in money. And also the players are allowed to pay each other, if they want. These side payments don't count as an action in the game.

Now, let's say you've got a preferred way of taking a two-player transferrable-utility game and figuring out how much value the two players "should" walk away with (which might take a side payment to actually implement). Given any two-player game as input, you conclude that the special point they "should" be aiming for is the perfect, the elegant, [INSERT-NAME-HERE] value!

Sooo cool. Aliens would independently reinvent it, you're sure.

Let's denote "value assigned to Alice in game G" as $v_{G} (alice)$ . In general, you've got some function $v$ which maps 2-player games to their value for a player. What's the big deal with that?

Well, hypothetically, if two big coalitions $S$ and $¯ ¯¯ ¯ S$ were fighting each other in a grand n-player game, you could view it as just a two-player game! The coalitions are the players, their move spaces are the spaces $\prod_{i \in S} A_{i}$ and $\prod_{j \notin S} A_{j}$ , (everything the teams can do if they coordinate) and their utility functions are the team utilities $\sum_{i \in S} U_{i}$ and $\sum_{j \notin S} U_{j}$ Then you could use your function $v$ to work out the value of a coalition, $v_{G} (S)$ .

The reason this is important is because, the natural notion of how to split a pile of value in an n-player game is the Shapley values. There isn't a comparable competitor. And in order to define Shapley values in the first place, you need some notion of "the value of a coalition, if they work together". So, your two-player solution notion can be amplified into an n-player solution notion by using it to figure out what payoff coalitions should get, and using that to compute Shapley values. In an n-player game, player i will get a value of

$ϕ_{G} (i) = \sum_{i \in S \subseteq N} \frac{(n - s)! (s - 1)!}{n!} (v_{G} (S) - v_{G} (¯ ¯¯ ¯ S))$

This isn't the standard presentation of Shapley values, but the standard presentation can be reshuffled into this equation. $ϕ_{G} (i)$ will henceforth be used as an abbreviation for the Shapley value of player i in game G. $¯ ¯¯ ¯ S$ is just $N / S$ , the coalition of everyone else. n and s are the cardinalities of the sets N and S respectively.

A way of intuitively stating what the Shapley value does, is that a coalition is formed in random order until it includes everyone, and everyone demands their marginal contribution to the value of the forming coalition. Average over all possible random orders, and that gives the Shapley values for everyone.

In particular, the CoCo value follows this procedure. It's most naturally defined for two-player games, and the generalization of it to n-player games follows this pattern of "use the two-player case to define the value of a coalition, then give everyone their Shapley values"

N to Full Generality With Importance Weights

This is for games with transferrable utility. What about games in general where money might not be available? Well, points on the Pareto frontier giving everyone's utilities, $(x_{1}, x_{2}, x_{3} . . .)$ , are associated with a tuple of importance weights $(a_{1}, a_{2}, a_{3} . . .)$ , where $x_{1}, x_{2}, x_{3} . . .$ is the tuple of payoffs from the utility functions that maximizes $\sum_{i} a_{i} U_{i}$ . Ie, every Pareto-optimal point can be post-hoc rationalized as the result that maximizes a weighted mix of utilities, for some way of weighting everyone (the importance weights). Given a spot $(x_{1}, x_{2}, x_{3} . . .)$ is there an easy way of reading off the "importance weights" that make that spot the best one? Yup! The "importance weights" are just the normal vector to the Pareto frontier at the given spot.

An interesting aspect is that the weights $a_{i}$ can also be interpreted as currency conversion factors, like curnits/utilon for player i. (curnit=currency unit). As an example, if Alice has $a_{a l i c e} = 5$ and Bob has $a_{b o b} = 1$ , then this is saying that an Alice-utilon is worth 5 Bob-utilons. Or that spending 5 currency units for an Alice utilon is as good a deal as spending the 5 currency units on 5 Bob utilons, for someone trying to spend money to maximize the weighted sum of utilities $5 U_{a l i c e} + U_{b o b}$ .

So, given a Pareto-frontier point, you can use the importance weights at that point to make up an imaginary currency. Then you redenominate everyone's utilities in that, ask what the Shapley values are for everyone, convert from money back to utilons, and get... well, usually you'll get a point that's past the Pareto-frontier. It's too good for everyone and cannot be attained by any combination of actions. But there will be some points where, when you do this, you get back where you started. They correspond to ways of importance-weighting everyone that make "maximize the weighted sum of utilities" (utilitarianism??) and "give everyone their Shapley payoffs" (fairness??) line up.

These will be called "general equilibria" when we're not talking about a specific notion of how to assess the value of a coalition, and for more specific notions of how to assess the value of a game, they're named correspondingly, like "CoCo equilibria"

Two to Full Generality: Transporting Desiderata

If the function mapping a 2-player game and a player to their value is nicely behaved, it will bestow nice properties on the "everyone gets their Shapley payoffs" function, which bestows nice properties on the corresponding general equilibria. As an example, let's take shift-invariance of the $v$ function. If you start the game with 100 extra dollars and your foe starts with 40 extra dollars, you should get your old value plus 100 dollars, and your foe should get their old value plus 40 dollars. Shift-invariance of this form will imply that the "everyone gets Shapley payoffs" result is also shift-invariant. And this implies that the corresponding general equilibria will be scale-and-shift invariant, and not depend on how everyone reports their utility functions.

So, assuming "everyone gets their Shapley payoffs" is convincing to the reader, that leaves ONE point of intervention to get something which is not the CoCo value. Find a different way to define the value of a coalition. That's the only option available. And things get even more constrained if you accept that the value of a coalition should be definable as the value of the associated 2-player game between them and everyone else.

The Big Desiderata List

For the following 10 desiderata (not all of which are attainable, especially because a strict subset of them, if fulfilled, imply that the CoCo value is the only possibility), they come in several variants.

As an example, let's take "player i should get better than their maximin payoff". It could apply to the $v_{G} (i)$ games for 2-player games. It could apply to the $v_{G} (S)$ games for coalitions containing i, that if i is on a team, the team won't have player i lose too badly. It could apply to the Shapley values for 2-player games, or for general n-player games. Or it could apply to all the general equilibria.

Several of these will imply each other (like, doing better than maximin for the n-player Shapley values implies doing better than maximin in all general equilibria), but since they don't all imply each other, it's important to make fine distinctions between, for example, action monotonicity for the $v_{G} (i)$ games, and action monotonicity for the resulting Shapley values. They'll be stated informally at first,but there are formal versions of all of them.

Desideratum 1: 2-Player Reduction The value of a coalition $S$ in a general game is found by treating the two coalitions $S$ and $N / S$ as just two big players, and evaluating the value of the corresponding player s in the corresponding two-player game.

This desideratum, if fulfilled, means that we only need to define values for 2-player games, considerably simplifying things.

Desideratum 2a-b: Shift-Invariance for $v$ (or $ϕ$ ). In a 2-player game, if Alice has a constant amount $c$ added to all their payoffs, then the value of the new game for Alice is the value of the old game plus $c$ . Adding a constant amount to Bob's payoffs has no effect on Alice's payoffs. Similar for n-player games and the Shapley payoffs.

Desideratum 2c: Scale-and-Shift Invariance for General Equilibria If $x \in R^{n}$ is a general equilibrium for the game $G$ , then for all ways of scale-and-shifting everyone's utility functions, the scale-and-shift of $x$ is a general equilibrium for the scale-and-shift of the game $G$ .

These desiderata, if fulfilled, means that the solution to a game doesn't depend on irrelevant details of how everyone reports their utilities, or if someone has an extra 100 bucks in their drawer at home.

Desideratum 3a.a-c: Weak/Strong/Really Strong Pareto Optimality for $v$ For all 2-player games, the tuple of payoffs for the two players is on the weak Pareto frontier/strong Pareto frontier/maximizes the sum of utilities. If you haven't seen it before, the weak Pareto frontier is "there's no way to make everyone strictly better off", and the strong Pareto frontier" is "there's no way to make some group strictly better off and everyone else the same".

Desideratum 3b: Pareto Optimality for $ϕ$ . For all n-player games, the sum of the Shapley values for the players equals the maximum sum of utilities.

Desideratum 3c.a-b: Weak/Strong Pareto Optimality for General Equilibria For all n-player games, every general equilibrium is on the weak Pareto Frontier/strong Pareto frontier.

Pareto-optimality is an extremely obvious desiderata to have. A nonobvious aspect, though, is that 3a, Pareto optimality for $v$ , is almost completely disconnected from the rest of them. It's entirely possible to have the Shapley values attain Pareto optimality, without the values for 2-player games being Pareto-optimal, even in a weak sense. The "active ingredient" in making the Shapley values Pareto optimal seems to be that the value of the "everyone" coalition should be the maximum attainable value, and other than that, it doesn't really matter what the value function $v$ is doing.

Desideratum 4a-b: Continuity for $v$ (or $ϕ$ ) For a 2-player game, we can consider the family of games with the same action spaces. Within this family, we can abbreviate a game as just its tuple of utility functions, of type $A_{1} \times A_{2} \to R^{2}$ (continuous functions from actions to results). The value function, of type $(A_{1} \times A_{2} \to R^{2}) \to R^{2}$ should be continuous when the function space is equipped with the topology of uniform convergence, for all families of games. For Shapley values, it should be the same, just use the Shapley value function of type $(\prod_{i \in N} A_{i} \to R^{n}) \to R^{n}$ .

Desideratum 4ca: Existence of General Equilibria The general equilibrium function, of type $(\prod_{i \in N} A_{i} \to R^{n}) \to P (R^{n})$ mapping a game to its set of general equilibria, should never return the empty set, regardless of the family of games.

Desideratum 4cb: Closed-Graph for General Equilibria The general equilibrium function, of type $(\prod_{i \in N} A_{i} \to R^{n}) \to P (R^{n})$ mapping a game to its set of general equilibria, should have closed graph for all families of games.

Existence is obviously important, the early continuity desiderata less so. It may seem obvious that microscopic variations in everyone's utilities shouldn't have dramatic effects on what happens, but these desiderata actually pack a serious punch in constraining solutions. As an example, $U_{2} ({argmax}_{a} U_{1} (a))$ (the utility you get when the foe argmaxes) doesn't vary continuously as $U_{1}$ varies, so all terms like this are forbidden from showing up in equations.

Desideratum 5a-c: Redundancy Invariance for $v$ (or $ϕ$ ) (or General Equilibria) Giving players additional actions that are equivalent to just probabilistic mixtures of existing actions should have no effect on the values given by $v$ , or the Shapley values, or the set of general equilibria.

This is another one that seems obvious but has unexpected power. I actually had a few variants of my bargaining solution, and this killed one of them. Admittedly, it naively seems like this desiderata fails when you have the computational power to read the foe's move, but this desiderata can be preserved even in that case if you set up the problem correctly.

Desideratum 6a-c: Threat-Resistance for $v$ (or $ϕ$ ) (or General Equilibria) If all players get a kit of "destroy the utility of this other player" buttons of varying intensity that can be pressed with no cost and paired with any move, then regardless of how much utility the buttons can destroy, the values/Shapley values/set of general equilibria remain unchanged.

And this is a special one. It effectively says that if your foe has a "send you to hell" button that's costless for them to press, then the button would have NO effect on negotiations, lest you incentivize your foe to create such a button because you'd respond. It gets even more powerful when paired with continuity, because then it levels up to say something like "if the foe has a button that sends me to hell and they slightly value pressing that button, I'll given them just a little bit of extra value to have them put the button away". It's not quite as good as it seems, though, which is why it's named Threat-Resistance instead of Threat-Immunity. More about that near the end of this post. Really, if fulfilled, this just confers immunity to a specific class of threats. There are threat-like shenanigans that this desiderata doesn't rule out.

The following three desiderata (7,8,9) are constraints on what payoffs the various players can expect.

Desideratum 7a-c: Payoff Dominance for $v$ (or $ϕ$ ) (or General Equilibria) If Alice always gets higher utility than Bob, regardless of what event happens, then the values/Shapley values/general equilibria will also say that Alice gets higher utility than Bob. Same for always getting lower utility.

Desideratum 8a-c: Maximin Dominance for $v$ (or $ϕ$ ) (or General Equilibria) All the values/Shapley values/general equilibria will say that each player gets $\geq$ their maximin payoff.

Desideratum 9a-c: Sub-Max for $v$ (or $ϕ$ ) (or General Equilibria) All the values/Shapley values/general equilibria will say that each player gets $\leq$ their maximum payoff.

Of these three desiderata, Maximin Dominance is the most likely, and Sub-Max is the most contentious. It's reasonable to state that a player who can help others massively, but who can't gain much utility themselves, should get compensated greatly for their trouble. The same considerations also make Payoff Dominance look unlikely in the general n-player case.

Desideratum 10: Action Monotonicity for $v$ (or $ϕ$ ) (or General Equilibria). If a player gains access to more actions, their value/Shapley value should improve. For general equilibria, the worst-case equilibrium payoff for the player that gained access to more actions should improve.

This desiderata is the key piece that mandates the CoCo value, and so, it must fail.

General Equilibria Existence and Properties

Some of the implications are, if not trivial, extremely easy to prove. Others have simple proofs, but the simple proofs depend on details of exactly what the proposed bargaining solution is. However, there's one result that towers far over the rest in difficulty, the General Equilibrium Existence Theorem.

As we're inventing various notions of Shapley-like games, we might want a guarantee that they all have associated equilibria, regardless of which notion we're looking at. CoCo/Harsanyi equilibria were generated by going "take a point $x$ on the Pareto frontier, read off the importance vector $a$ from the point, use it to rescale everyone's utilities, compute the CoCo value in that game, convert from money back to utilons, and if you made $x$ again by that process, it's a CoCo equilibria". They always exist, regardless of the game. But, what if we invent some new value notion besides the CoCo value? Wouldn't it be nice to prove that all games have equilibria induced by that new notion? And what if we change our opinion about what sort of value to use? Wouldn't it be nice to not have to redo the equilibria existence proof? Really, you'd want some sort of equilibria existence proof that works under highly general assumptions. And so, that's what this does.

Now, the full theorem statement is actually very complicated, so what's stated here is a simplified statement of the theorem that doesn't miss anything intuitive.

Theorem 1: General Equilibria Existence Theorem: A game $G$ has a general equilibrium assuming the following five conditions on the value function $v$ , and one condition on how the value function interacts with games.

Property 1: For all games $G$ , if a new game $G^{'}$ is defined where every player can voluntarily decrease their own utility down to some specified minimum value, then the values for all coalitions don't change. Put another way, if everyone can shoot themselves in the foot, it has no effect on negotiations because they just won't.

Property 2: For all games $G$ , coalitions $S$ , and players i, we have that $v_{G} (S \cup {i}) - v_{G} (S) \geq {min}_{m \in \prod_{i \in N} A_{i}} U_{i} (m)$ . A player joining a coalition can always contribute at least their worst-case utility. This is an extremely weak form of "everyone should get more than their minimax value" that's more like "everyone should get more than their minimum value"

Property 3: For all games $G$ , if a new game $G^{'}$ is defined where everyone can pair their normal action with a second action that does nothing at all, the value of a coalition in the new game is the same as the value of a coalition in the old game. This is a weak form of Redundancy Invariance that's like "duplicate copies of actions don't matter"

Property 4: For all games $G$ , the sum of the Shapley values for that game maximizes the sum of utilities. This is just Pareto-optimality.

Property 5: The value function $v$ is continuous.

Property 6: The game $G$ is augmented to a game $G^{* ϵ}$ , where everyone gets two extra moves they can pair with their ordinary move. One is "shooting themselves in the foot", ie, voluntarily decreasing their utility to whatever level they specify unless it's too low. The second move is "specify payoffs for everyone", ie, everyone can pick a vector in $B^{n}$ (the n-dimensional L2 ball, the hypersphere in n dimensions), all the vector coordinates are multiplied by $ϵ$ , and those tiny payoffs or penalties are added to everyone's utilities. For all vectors $a \in Δ N$ (importance weights on all the players), sufficiently low $ϵ$ , and players i, if a player has an importance weight of exactly 0 in the game $G^{* ϵ}$ , then their Shapley value will be positive.

Note: The rest of this section will be me drilling down into math details, and is skippable.

Begin the Infodump!

Intuitively, what's going on with that odd final condition is that, the "everyone can specify a tiny payoff or penalty for everyone else" modification makes it so that the set of all the possible utility tuples doesn't have creases or corners, instead it'll be smooth, and every point on the Pareto frontier induces exactly one normal vector/vector of importance weights $a$ . This "smoothing" tames some pathologies, and so you just need to find general equilibria for the "smoothed" games, and take the limit as $ϵ \to 0$ , to get equilibria for the original game. That's why we care about such games.

Separately, there's division-by-zero errors that may show up when a player has an importance weight of literally 0. This is because you have to divide by the importance weight to convert the payoff of a player from currency back to utilons. Now, if the Shapley value is positive for all players that have 0 importance, this isn't really an issue, because $f (0) > 0$ means that ${lim}_{δ \to 0} \frac{f (δ)}{δ} = \infty$ . To assess the behavior around where some players have an importance weight of 0, you can just go "as importance goes to 0, payoffs for those players blows up to infinity no matter how we take the limit, and so we can rule out a neighborhood around all the "someone has 0 importance" options as not containing equilibria". But if someone could have 0 Shapley value if they had 0 importance, then the function may do weird things, or depend on how you take the limit, and you're no longer justified in ignoring the region around where that player has 0 importance.

Now, why expect this assumption of "if 0 importance, positive Shapley value" to be true in practice for the specified class of games (an original game $G$ , but where everyone can also decrease their own utility and distribute small amounts of utility to everyone else)? Well, the Shapley value for player $i$ in $G^{* ϵ}$ can be written as a sum of terms like $v_{G^{* ϵ}} (S \cup {i}) - v_{G^{* ϵ}} (S)$ , and so even if a player has literally 0 importance and their utility doesn't matter, they could shift the beneficiaries of their "distribute a tiny amount of utility" action from "everyone else" to "team $S$ ", benefiting the coalition they're joining by a nonzero amount. And they can do this for any coalition, and attain nonzero Shapley value that way. But actually proving this requires you to look into the details of how the value function $v$ is defined instead of making general arguments that apply to any value function.

Inspired by this argument from two paragraphs up, we can prove

Proposition 1: If a game $G$ fulfills the property that $\forall a \in Δ N, i \in N : a_{i} = 0 \to ϕ_{G, a} (i) > 0$ , then all of its general equilibria will be on the strict Pareto frontier. $ϕ_{G, a}$ is the Shapley values of the game $G$ but everyone's utility functions are rescaled from $U_{i}$ to $a_{i} U_{i}$ .

Effectively, this is saying that the "everyone can benefit others a little bit" criterion of nonzero Shapley values at 0 importance is a sufficient criterion for all equilibria to be strictly Pareto optimal, and everyone will matter a little bit. The argument for why is that "payoff blowup" argument a few paragraphs back.

Proposition 2: If the value function $v$ is continuous, then the function mapping a game $G$ to its set of general equilibria has closed graph.

This is just a general result.

I would like to take a moment to say that proving that General Equilibria Existence Theorem was ABSOLUTELY HORRIBLE. "Just use a fixpoint theorem, bro". I tried, I really did. The issue is, the function you're trying to find a fixpoint of doesn't produce convex sets so you can't apply any of the usual fixpoint theorems right out of the box. However, convexity isn't preserved under homeomorphisms, which is kinda suspicious. Eventually it turned out from Horvath 1991, that it's possible to define a sort of "generalized convexity" that says something kinda like "A space $X$ is generalized-convex if you can come up with a notion of generalized-convex-hull that takes a finite set of points and produces a contractible set, and subsets $Y$ are generalized-convex if you can take any set of finitely many points from $Y$ and the generalized-convex-hull is still a subset of $Y$ ". The paper had some fixpoint theorems that carried over to this setting, and after a whooole lot of work, I was able to coax them to give me the result I wanted.

Last-minute edit: If you need to salvage one of these results in a more general case, you can have analogous conditions apply to the "all n players get payoffs" function directly instead of the 2-player value function $v$ .

ROSE Value, Step-by-Step

ROSE Value, Toy Case

ROSE is an acronym which will be defined later because it's a spoiler at this early stage.

Let's inspect a simple special case here, where Alice has only one possible action, and Bob has many, and Alice and Bob can pay each other money. If we cannot get a sensible equilibrium notion here in such a restricted toy case, we can't get one for games in general. Observe the picture.

What does the CoCo value say about this? Well, it'd pick

Ie, Bob goes "a default point will be set, for if we can't come to an agreement. We'll evenly split surplus over the default point. So I'll pick this point right here in the lower-left to maximize my take. Ie, threaten Alice to get money".

Drawing the Desiderata

If we're aiming for something nonthreatening, then we should probably look for a different mathematically distinguished point in this picture. The first issue to address is what the various desiderata say about this class of toy problems. 2-player reduction isn't relevant here, because it's already a 2-player game.

Shift Invariance (that adding constants to a player's utility means that the solution point should have the same constant added to it) carries the implication that our choice of distinguished point shouldn't depend on the background grid lines of how much money the players get, it should be selected in a background-invariant way. This seems one of the highly reliable axioms to adopt, so let's delete the background grid.

Pareto Optimality, another essential axiom to have, says that our choice of distinguished point should be on this diagonal line which corresponds to the outcomes where Bob maximizes the sum of utilities instead, and one of the players pays the other.

Redundancy Invariance (that adding actions that can equally well be accomplished with probabilistic mixtures of existing actions does nothing) is a touch less essential but still worth adopting. This implies that our choice of distinguished point should just depend on the convex set, not which points in it are pure actions vs mixtures of other actions.

Maximin Dominance (that both players do better than their maximin payoff) is another essential axiom to have, and it says that our choice of distinguished point should be on this portion of the frontier. Any further down and Bob goes "wait, I can just argmax!" and does that, and any further to left, Alice goes "wait, I can just refuse to pay Bob!".

And that leaves five more axioms remaining, slightly more dubious. Sub-Max and Payoff Dominance, along with Continuity, Threat Resistance, and Action Monotonicity.

Sub-Max (both players get $\leq$ the maximum utility they could achieve alone without paying each other) mandates the single point where Alice goes "alright Bob, since you could just argmax, how about you maximize surplus instead, and I'll pay you the exact amount you'd need to be ok with doing that instead of argmaxing". But this is an incredibly strong condition and there's no obvious reason why we should desire it.

Payoff Dominance (that if Alice outscores Bob for all possible outcomes, then the net solution should also feature Alice outscoring Bob, and vice-versa) is an odd one when you try to translate what it's saying into geometrical terms. It effectively says that the solution must be in the following range.

Now, that geometrical meaning isn't obvious at all, so here's where it came from. Payoff Dominance as stated, is an oddball because adding and subtracting constants (by Shift-Invariance) shouldn't affect anything, and yet doing that can change whether Alice consistently outscores Bob, or whether the opposite happens. Actually, that's a feature, not a bug. The proper way to wield this axiom is to use Shift Invariance first to set the 0,0 point on one of these diagonal lines, and then invoke Payoff Dominance to go "Alice consistently outscores Bob in all outcomes (or vice-versa), so she/he should get a higher score". And that's where this geometric way of viewing it came from. When restated in geometric terms, the true role of Payoff Dominance seems to be that it's smuggling in the "ooh, diagonal lines are nice" conclusion. I mean, I agree with that conclusion, but it seems in poor form.

And then we get to the really interesting three.

Action Monotonicity, specialized to this particular case, says that if the convex shape is bigger, Bob gets more utility. More about this one later.

Continuity is a fascinating one because it rules out a solution which is otherwise extremely attractive. It's incredibly suspicious that, for the CoCo value, the threat point if negotiations break down is Bob maximizing BobUtility-AliceUtility. In general, if you don't want your foe making threats, you should disregard every threat point where your foe isn't maximizing their own utility. And so, obviously, the TRUE threat point is the one where Bob just maximizes his own utility, and we split surplus 50/50 from there.... right?? But, observe the following violation of continuity.

And so, that brings us to the most special desiderata of all, Threat Resistance. It says that if you close the convex shape to the left and downwards (by adding in the "incinerate utility" buttons), it has no effect on the selected point. Only the Pareto frontier of Bob's actions affect what happens.

Continuity and Threat Resistance, when combined, are really interesting, because they actually make it quite difficult to even specify many of the points on the Pareto frontier in the first place. The obvious solution of "assume Bob will maximize his own utility, and then offer a 50/50 split from there" violates continuity, and most variations on the CoCo solution (where you vary the slope of the tangent line) are certainly continuous, but violate Threat Resistance.

Really, there's only two solutions I could come up with. One of them violated Action Monotonicity for Bob (which is a plausible desideratum since Alice only has one action), and that leaves only one option that remains.

Toy-Case Visual Proof of the ROSE Value

The outcome where Alice goes "ok, Bob. I know you can get 41 dollars worth of value from picking your favorite move. May I interest you in the option where you play that Pareto-Frontier move to maximize the sum of our utilities instead? I commit to paying you if you do so, and paying you enough so that you walk away with 41 dollars and 1 cent worth of value." And then Bob takes it, so they hit the green dot in the below image. Bob is responsible for picking a spot in the green region that's on the Pareto-frontier/purple line, and Alice is responsible for paying him afterwards, moving diagonally up and back to the green spot, which is slightly better for Bob than Bob just directly argmaxing.

That pair of payoffs is the ROSE value of the game.

But is this really the only possible solution? Do we have a theorem that says it's the only solution that fulfills certain axioms? Yes. Does it cheat by using the axiom of $\leq$ maximum value? No it does not. In fact, I can actually give a slick visual proof of it!

But what axioms are needed? Only five. Maximin Dominance, Threat-Resistance, Redundancy-Invariance, Pareto-Optimality and Action Monotonicity (for Bob). And technically you only need the weak form of Redundancy-Invariance that says that adding duplicate actions does nothing. If you believe the following:

1: Players should get $\geq$ the amount of utility they can guarantee on their own. After all, if a notion of bargaining says a player should lose, and they can force a win, why should they assent to that bargaining notion?

2: If Alice and Bob are both indifferent between action A and B, then deleting one of them shouldn't affect anything because Bob can just do the other one.

3: If Alice and Bob are doing a suboptimal pair of actions, they should be able to coordinate on doing something which is Not That.

4: Bob is happy to come up with extra actions/unhappy to lose existing actions, since Alice can only act by paying him.

5: If Bob brings a gun to negotiations, and he doesn't intrinsically value shooting Alice, Alice shouldn't react to the gun, lest she incentivize Bob to bring a gun to negotiations.

Then the only possible solution is Alice paying Bob just enough to motivate him to play the Pareto-Frontier/surplus-maximizing move instead of what he'd do otherwise.

Let's get started with that visual proof! Let's say that Bob's only moves were as follows. Bob can press a button that gives both players the ROSE value, or he take any action that gives Alice equal or greater utility than the ROSE value. Those are the only options. The only solution to this game that's maximin-compliant is Bob pressing the ROSE button.

Bob brings a gun which can be used to shoot himself in the foot or shoot Alice. Er, to be more precise, Bob's action space augments from $B$ to $B \times [- d, 0]^{2}$ , where d is a sufficiently large number. This is how much utility to destroy from Bob and Alice, respectively. By Threat-Resistance, payoffs don't change from what they were, namely, the utilities associated with Bob pressing the ROSE button.

Bob gains access to his missing kit of actions (not paired with the "destroy utility" buttons this time). All of which could have been implemented by just pressing the ROSE button and decreasing utility appropriately, so by Redundancy-Invariance, payoffs don't change from what they were, namely, the utilities associated with Bob pressing the ROSE button.

The "burn utility" buttons get locked at 0,0, ie, they don't decrease utility at all. And also, Bob's ROSE button gets destroyed. This is a loss of actions for Bob, so Bob must get equal or lower utility. But, by Maximin Dominance (since Bob has his argmax action to fall back on), he must get equal utility. And by Pareto Optimality, Alice's utility doesn't change either, and so they both get the utilities associated with the ROSE value.

And now, we can go "oh hey, Bob's action space is now comprised of all of his original actions. Therefore, in the original game we were looking at, the ROSE value must be played, that's the only option"

Q.E.D. (￢‿￢ )

Net Alice payoff: ${max}_{b} [U_{1} + U_{2}] - {max}_{b} [U_{2}]$

Net Bob payoff: ${max}_{b} [U_{2}]$

Looking over the applicable desiderata, this is shift-invariant, Pareto-optimal, continuous, redundancy-invariant, threat-resistant, and fulfills payoff dominance, maximin dominance, sub-max, and action monotonicity (for Bob)

But, y'know, assuming Alice only has one action is a pretty big ask. Can we find a generalization to Alice and Bob having multiple actions?

ROSE Value, 2-Player Case

Well, I do have a generalization that seems nice. I don't have an analogue of the "this is the only possible outcome" proof, though. (Later edit: I do from copying the structure of the previous proof, but the axioms to make it work are nonobvious enough that I'm not entirely satisfied with it)

Time for seven seemingly unrelated digressions!

Digression 1: The fundamental sort of game that the CoCo value is defined around is the game where Player 1 tries to maximize its own utility and minimize 2's utility, and 2 is doing the reverse. It's zero-sum and zero-sum games have a unique payoff pair. But there's the nasty issue where the players aren't trying to maximize their own utilities. Rather, they're doing a mix between that and minimizing the foe utilities. Ie, threatening to get their way.

Ok, so make both players just maximize their own utilities and stop aiming to hurt the other player! Aaaand... cool, now there's no longer a distinguished pair of utilities for the two players, and we're back at the original 2-player game we were trying to solve in the first place. So... what the heck sort of game closely based on the original game has a distinguished pair of utilities for the two players, and has both players argmaxing their own utility function?

Digression 2: The major nasty issues in defining what it means for an agent to do the Proper UDT Thing in a game is that it's somewhat ambiguous what counts as "not giving in to threats", and connected to that, there's an issue with which players are considered to get the "first move". Like imagine a game of Chicken. A player could just go straight, and go "your best move in response is swerving, any alternative that you try to make me believe you're doing is noncredible" and when you try to negotiate a more equitable solution of randomizing who goes straight and who swerves, it goes "nope", and when you're like "I'll crash into you if you keep being unfair", it goes "the only reason you make that threat is that you expect I'll respond, NOPE". The game comes, and it charges you at full speed yelling "I DO NOT GIVE IN TO THREATS"

I mean, to be fair, you did try to lower its utility in a way that'd hurt you too if you succeeded, in a way you would not have attempted if you were negotiating with a rock, and agents should not wish they were a rock. You acted in a way that didn't maximize your own utility and that always gets ignored. It's pretty clearly a threat. But, y'know, you play two of those agents against each other, they smash into each other, that seems pretty suboptimal. They both seem to think they get first move. There doesn't really seem to be a good notion of a UDT outcome that treats the two agents symmetrically. BUT, if one of the agents "wins initiative"/"moves first in logical time", and the second is known to just react to it, then it suddenly seems much more feasible to define what it "should" do.

Digression 3: There's an infinite regress in levels of meta for 2-player games where someone moves first. Let's say Alice gets first move, Bob gets second move. You could think of it as Alice acting, and Bob reacting. Or, Bob could go "hm, Alice is acting based on her prediction of how I react. I can pick my policy appropriately to incentivize good actions from Alice!". And then Bob would be in the acting role (by picking his policy) and Alice would be in the reacting role (by picking her action). But then Alice could go "hm, Bob is acting based on his prediction of how I react. I can pick how I react to policies appropriately to incentivize good policies from Bob!" And then Alice would be in the acting role (by picking her metapolicy) and Bob would be in the reacting role (by picking his policy). And then Bob goes... ok, you get the idea.

I won't belabor this any further, except to observe that even in 2-player games with an objective ordering, the one who moves "logically first" can switch. And the problem is, at the higher levels, assuming both players are argmaxing, there's a sort of universal convergence to the strategy of "I'll hurt you as bad as I can unless you roll over and let me win". Really, this issue rears its head as soon as Bob tries to pick a policy that incentivizes Alice picking a good move in the first place. That's the first time where Alice starts wishing she was just a rock that picked a certain move and never looked at Bob's move in the first place, and starts regretting that she has access to so many moves. The very first level of this tower, where Alice just gets first move, and Bob just argmaxes in reaction to her move, seems to be the only level where nobody's trying to extort the other. Alice might want to commit at this early point to just assume that Bob is an argmaxer that reacts to her action, and ignore any evidence otherwise, unless Bob is offering an option that's better for her than the default point of "Alice argmaxes assuming Bob naively argmaxes in response". And similarly, Bob might want to commit to ignoring Alice if she's trying to do something like "I'll pick an action you dislike unless you commit to rolling over for me when I tell you to".

Digression 4: So, the functor $S : A \mapsto [[A \to R] \to A]$ (for some object $R$ ) may be recognizable as Scott Garrabrant's "type signature of agency". It's actually a well-studied monad, the selection monad. And it induces a bunch of basic fundamental operations about how to stitch together agents and have them look at the environment and punt decisions off to other agents. The process of "exhaustive tree search" can be succinctly defined in one line with this monad. But, there's something very strange about it. It does NOT have native support for symmetric games, and I'm still unsure how to describe a symmetric game in the language of that monad. The way of stitching together multiple agents into a game which the selection monad naturally gives you, is one where there's an intrinsic ordering. Someone has to go first, and someone else has to go second.

Digression 5: The obvious way to define the value of a 2-player game, the Schelling point where Alice goes first, and Bob goes second, and they both just argmax in the most naive way, isn't continuous. Tiny variations in Bob's utilities can affect what he picks via argmax, and have large effects on Alice's utilities if she doesn't change her move. So Alice immediately changes her move, and that has a big effect on Bob's utilities.

Digression 6: Having one player go first and the other go second has the issue of player 2 reading player 1's moves. Like in matching pennies (Alice and Bob say heads or tails, Alice wants what they say to be the same, Bob wants what they say to be different, it's a zero-sum game), the ideal move is Alice and Bob randomizing 50/50. Now, averaging the utilities from "Alice goes first, Bob goes second", and "Bob goes first, Alice goes second" still gets the same pair of utilities. But, here's a game where averaging the two "transparent games" (where one player can read the other's moves) doesn't give the maximin payoff. Credit to Alex Mennen for the game.

Alice says heads or tails or pass, Bob says heads or tails. Alice's utilities are 1 if the pennies match, -1 if they don't, and 0.5 if she says pass; it's a zero-sum game. If Alice goes first, she gets 0.5. If she goes second, she gets 1. The maximin payoff for Alice is 0.5, which is not the average of the two transparent games. The players might want to implement their moves in a way where whoever goes second can read the probability distribution $μ$ , but can't read the actual move. Much like the move-reading assumption that goes into Nash equilibria, where both players can read the probability distribution of the other, but not the actual move that gets played!

Digression 7: The intuitive picture for Shapley value is that the players are given a random ordering, and they join a coalition one at a time in order, and everyone gets the marginal value they contribute to the coalition. Averaging the payoffs over all orderings, that gives everyone their Shapley value.

Motivating the ROSE Value (2-Player)

So, digression 1 says that CoCo value sucks and we want a game that's closely based on the original, where both players are argmaxing their own utility functions, that has a distinguished pair of payoffs.

Digression 2 says that UDT doesn't really have an obvious symmetric form, but an asymmetric form where a player gets first move might have promise.

Digression 3 says that there's an infinite regress problem, along with a bunch of threat-making, even in games with a well-defined ordering of who goes first, and that it might be good to cut off the infinite tower at the very bottom step and just go "a player going first means they just go first against a dumb argmaxing foe, dammit, no going logically first at metalevels or dealing with the foe going logically first by picking policies. That's the only stage where player one doesn't wish they were a rock and isn't trying to extort player two".

Digression 4 says that for some reason, the natural mathematical core underlying agents really seems to prefer games with a time ordering over symmetric games.

Digression 5 says that the naive payoff pair of mutual argmax isn't continuous, and that flaw seems very similar to the continuity issue we identified already in the "Alice can only pay money" special case.

Digression 6 says that you can't just average transparent games together; the only hope for getting maximin payouts involves the second player only seeing your probability distribution, not your move.

Digression 7 says that Shapley value involves taking an average over all possible orderings of the players.

And there's our "pay the other player just enough to get them to surplus-max instead of maximizing their own utility function, so they get their argmax payout" solution to consider.

So, maybe we could do something like... randomize between which of the two players can "go first" (inspired by digression 7), and let the players be able to make side payments to each other and both know the other is a naive argmax agent. That fits digression 1: these two games are closely based on the original game and both players are argmaxing their own utility functions. It fits digressions 2 and 4: we're equipping these games with an intrinsic ordering. It fits digression 3: we're cutting off the infinite tower of meta at a fairly low level. The ability to pay each other can at least make things continuous for the player who moves first, alleviating the issue identified in digression 5. Digression 6 says that this whole deal should be implemented by giving the player that lost initiative the ability to read the probability distribution of the player with initiative, not the moves (the mindreading assumption that goes into Nash equilibria, but it only goes one way this time).

And so, we arrive at our proposed solution. To compress it, let $S_{1, ν}$ be an abbreviation for

${max}_{w \in Δ A_{2}} E_{ν \times w} [U_{1} + U_{2}] - {max}_{ζ \in Δ A_{2}} E_{ν \times ζ} [U_{2}]$

This quantity is the maximum amount of surplus value that player 1 can extract if it plays $ν \in Δ A_{1}$ . Player 1 can't extract surplus any more effectively from that move than by hitting up Player 2 and going "maximize surplus in response, ok? I'll pay you so you get slightly more value than you'd get in the absence of this agreement". Similarly, $S_{2, w}$ is an abbreviation for the maximum amount of surplus that player 2 can extract if it plays $w \in Δ A_{2}$

With those abbreviations, we get that player 1 earns

$\frac{1}{2} ({max}_{μ \in Δ (A_{1} \times A_{2})} E_{μ} [U_{1} + U_{2}]) + \frac{1}{2} ({max}_{ν \in Δ A_{1}} S_{1, ν} - {max}_{w \in Δ A_{2}} S_{2, w})$

And Player 2 gets

$\frac{1}{2} ({max}_{μ \in Δ (A_{1} \times A_{2})} E_{μ} [U_{1} + U_{2}]) + \frac{1}{2} ({max}_{w \in Δ A_{2}} S_{2, w} - {max}_{ν \in Δ A_{1}} S_{1, ν})$

These are their ROSE values (acronym still not defined yet). There's three ways to look at what this is doing.

The first way is that it's basically the CoCo value, where the cooperation part is defined as usual, but instead of the score in the competition part being defined as "my utility - foe utility" (developing a torturizer is good because I can use it in the competition part), it's "my extractable surplus - foe extractable surplus". And developing a torturizer doesn't help. If you fire it, it'll lower the foe's max score, but it'll also lower the max surplus by the same amount, so you can't extract any more value by doing that. And it doesn't lower the foe's extractable surplus either because the only way to drop that quantity is finding a move that you really like to force the foe to bribe you harder.

Shapley Sums and Initiative Games

The second way of looking at this is that it's Shapley value, for a particular sort of game, henceforth called an "Initiative Game". It randomizes between who goes first, but either way, the game follows the following reasoning.

The player who "gets initiative" will act like a really inflexible UDT agent going first and go "oh hey, I'm up against an argmaxer who goes second. They're too dumb to listen to something like "I'll hurt you if you don't promise to roll over for me", but I can get away with extracting all their available surplus by paying them the minimum possible amount to do so. They'd give in like that if I was a rock locked into a particular action. Since I have initiative, gaining more actions should be strictly good for me. I shouldn't ever wish I was a rock locked into taking a particular action, I could just commit and have the foe see that. So I should get at least ${max}_{ν \in Δ A_{1}} S_{1, ν}$ , the maximum extractible surplus. If the foe refuses to go along, well, that's noncredible, they're no longer maximizing their own utility function. DON'T GIVE IN TO THREATS"

The player who "yields initiative" will go along with this (and have the ability to read Player 1's probability distribution on moves), but then go "alright, time to extract some surplus of my own, but one level up. Instead of extracting surplus within individual moves, I'll extract some surplus in the overall game. Hey, P1? Yeah, I guess I'd actually cave. You can ensure at least that much utility. But, may I interest you in playing your part of the overall surplus-maximizing outcome? I'll go along with it and actually give some money back to you if you do that! In fact, you'll get precisely your surplus-extracting utility that you demand, plus one cent!" And if player 1 goes "nope", P2 goes "noncredible, you're not maximizing your own utility function. You did it to me, turnabout is fair play". And they've now hit the Pareto frontier! Player 2 gets ${max}_{μ \in Δ (A_{1} \times A_{2})} [U_{1} + U_{2}] - {max}_{ν \in Δ A_{1}} S_{1, ν}$ . Total surplus minus the maximum surplus that player 1 could extract.

Moving first means you'll always benefit from having more moves available, so you'll fulfill action monotonicity. Moving second means you'll always get more than ${min}_{μ} {max}_{ν} [U_{2}]$ utility (where you're picking $ν$ and you're player 2), so you get an enhanced ability to react to the foe. Moving first means you can take the best-case of extracting surplus value from your foe. Moving second means you can extract the rest of the surplus between that and the Pareto frontier.

Just do Shapley value with those games! But there's a little bit of a question. Is the value of a game for a player the value of a game where they have initiative or where they yield initiative? Well, you can try defining $v_{G}^{+} (alice)$ as the value that Alice gets in the game where she has initiative (and similar for Bob), and $v_{G}^{-} (alice)$ is the value that Alice gets when she yields initiative (and similar for Bob). As it turns out, it doesn't depend on whether Alice computes her Shapley values using the initiative games, or whether Alice computes her Shapley values using the yielding games! The same payouts will arise.

Intuitively, the reason this happens is because Alice's Shapley values can be written as a giant weighted sum of terms like $v_{G}^{+} (S) - v_{G}^{+} (N / S)$ (initiative value of a coalition S Alice is in, minus the initiative value of the opposing coalition), and since $v_{G}^{+} (S) + v_{G}^{-} (N / S) = v_{G} (N)$ (where $v_{G} (N)$ is the maximum attainable value by everyone working together), Alice's terms can go

$v_{G}^{+} (S) - v_{G}^{+} (N / S) = v_{G} (N) - v_{G}^{-} (N / S) - (v_{G} (N) - v_{G}^{-} (S)) = v_{G}^{-} (S) - v_{G}^{-} (N / S)$

And so, Alice's Shapley sum of terms can all be flipped from initiative terms to yielding terms, or vice-versa, and it'll have no effect. Same deal for everyone else.

Threat-Resistant Bargaining?

And now for the third way of viewing things! The CoCo value could be viewed as starting at a disagreement point, and going "let's move to the Pareto frontier and evenly split the gain". The disagreement point that powers the CoCo solution is "we have an incredibly nasty zero-sum fight where both of us are trying to maximize our own utility and minimize the foe utility".

Well, the ROSE value can also be viewed as starting at a disagreement point and going "let's move to the Pareto frontier and evenly split the gain". But what the heck sort of disagreement point does the ROSE value originate from?

Well... there's an issue in bargaining games, like the Ultimatum game, of "how do you prevent getting extorted, but without blowing up bargains with an agent that has a slightly different definition of fairness than you?" Like, if Bob thinks a fair split of money in the Ultimatum game is 50/50, and Alice thinks a fair split is 60/40, then what should Bob do to avoid incentivizing foes to offer unfair splits, while still preserving most of the gains? Yudkowsky wrote an interesting post a while back saying the answer was to accept the unfair split with approximately a $\frac{5}{6} - ϵ$ probability, so Alice gets a bit less than the 50 dollars they would have gotten if they just caved, so the incentives are going the right way, but Bob still gets a nontrivial chunk of utility. Graphically, it looks like this; the point played is where the lines intersect.

And the disagreement point for the ROSE value can be viewed the same way!

Alice goes "If I win initiative, the resulting game features Bob getting this much value. I will designate this as Bob's "fair" utility level and prevent him from getting any more than that", and Bob goes "if I win initiative, the resulting game features Alice getting this much value. I will designate this as Alice's "fair" utility level and prevent her from getting any more than that". Or maybe it goes the other way and both players want to be the one that loses initiative. Basically, both sides are being self-favoring and going "fair is when I win and you get the remaining scraps, I'll block you from getting any more". And if they both implement Yudkowsky's bargaining trick, that's the disagreement point that generates the ROSE values! The disagreement point is where they're maximizing their own utility (and being biased towards themselves about it), and trying to incentivize the foe to go along, and otherwise destroying minimum utility. Then they look at each other and both go "well, this outcome sucks. Wanna move to the Pareto frontier instead and evenly split the wins?" and do that, hitting the ROSE point.

ROSE Value, N-Player Case??

Why aren't we proving properties of it yet? Well, we want to do the proofs in maximum generality, so we'll do those proofs at the end and get the two-player results as special cases.

But, from earlier, we can amplify the two-player case into the n-player case by using Shapley Values! And amplify those into general game equilibria! Just treat coalitions as big players that settle on the ROSE value between them, use the Shapley value to figure out what the individual players get, and we're done!

Well, no. As it turns out, you get pretty much all the nice properties out of Shapley-ROSE that you could want. With one exception. One VERY VERY BIG exception.

Maximin dominance. If you compute Shapley values from the 2-player ROSE value in the obvious way, a player can perform worse than their maximin value, the highest utility that they can guarantee no matter what everyone else does. This is a really major issue because, it doesn't matter how nice your equilibrium concept is, if someone can achieve strictly higher utility by resigning from the coalition of everyone, no matter how everyone else tries to retaliate, they're just going to do that, and say goodbye to the whole "get everyone on board with the compromise" thing. The bizarre part of this whole thing, however, is that maximin dominance does apply for 2-player games! It just doesn't apply when you try to go from 2-player games to n-player games via Shapley value!

Somebody set up us the Bomb

Here's the toy case. Let's say that the players are a bunch of people in a group house, and one bomb that will detonate, leveling the entire group house, unless someone puts a dollar into it. Or, to put it in an agenty format, the bomb will be a player that has -1 utility if it doesn't explode, and 0 utility if it does explode, and explosion means everyone else gets 0 utility too. And everyone else's monetary utilities are well above 0, and about much larger amounts of utility (because it's important for this toy case that, for all coalitions that aren't "just the bomb", the surplus-maximizing action involves the bomb not going off).

A house meeting is called (including the bomb), and it is decided that everyone will get their Shapley value. And yes, the bomb is included as a player. So, let's try computing the Shapley value of the bomb!

Well, it'll be a big weighted sum of terms like $v_{G} (S \cup {bomb}) - v_{G} (S)$ , where S is some coalition. Let's say S is some coalition of players that has 1 or more group house members on team S, and 1 or more group house members on team $N / S$ . This complementary team, minus the bomb (ie, every human on the bomb's team) will be denoted as T. $U_{S}, U_{T}, U_{N}$ are the summed utilities of team S, team T, and everyone (S and T and bomb), respectively. We're implicitly using initiative value to make the calculations a bit simpler, though using yielding value, or the average of the two, gets the same results.

$v_{G}^{+} (S \cup {bomb}) - v_{G}^{+} (S)$

$= {max}_{μ \in Δ (\prod_{i \in S \cup {bomb}} A_{i})} ({max}_{ν \in Δ \prod_{j \in T} A_{j}} E_{μ \times ν} [U_{N}] - {max}_{w \in Δ \prod_{j \in T} A_{j}} E_{μ \times w} [U_{T}])$

$- {max}_{μ \in Δ \prod_{i \in S} A_{i}} ({max}_{ν \in Δ \prod_{j \in T \cup {bomb}} A_{j}} E_{μ \times ν} [U_{N}] - {max}_{w \in Δ \prod_{j \in T \cup {bomb}} A_{j}} E_{μ \times w} [U_{T \cup {bomb}}])$

Notice that the above two lines are all one big term that I had to break up for space reasons. Let's specifically break out the bomb utilities.

$= {max}_{μ \in Δ (\prod_{i \in S \cup {bomb}} A_{i})} ({max}_{ν \in Δ \prod_{j \in T} A_{j}} E_{μ \times ν} [U_{S \cup T} + U_{bomb}] - {max}_{w \in Δ \prod_{j \in T} A_{j}} E_{μ \times w} [U_{T}])$

$- {max}_{μ \in Δ \prod_{i \in S} A_{i}} ({max}_{ν \in Δ \prod_{j \in T \cup {bomb}} A_{j}} E_{μ \times ν} [U_{S \cup T} + U_{bomb}] - {max}_{w \in Δ \prod_{j \in T \cup {bomb}} A_{j}} E_{μ \times w} [U_{T} + U_{bomb}])$

Note that thing that team $S + bomb$ can do to maximize extractable surplus is have the bomb not go off. If the bomb goes off, the extractible surplus is 0 since everyone gets 0 utility. Similarly, for team T+bomb, detonation is super-bad for business, so the best team move is not having that bomb go off.

$= {max}_{μ \in Δ \prod_{i \in S} A_{i}} ({max}_{ν \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times ν} [U_{S \cup T} + U_{bomb}] - {max}_{w \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times w} [U_{T}])$

$- {max}_{μ \in Δ \prod_{i \in S} A_{i}} ({max}_{ν \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times ν} [U_{S \cup T} + U_{bomb}] - {max}_{w \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times w} [U_{T} + U_{bomb}])$

In all cases, the bomb pouts for $- 1$ utility since it didn't get to blow up the group house.

$= {max}_{μ \in Δ \prod_{i \in S} A_{i}} ({max}_{ν \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times ν} [U_{S \cup T} - 1] - {max}_{w \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times w} [U_{T}])$

$- {max}_{μ \in Δ \prod_{i \in S} A_{i}} ({max}_{ν \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times ν} [U_{S \cup T} - 1] - {max}_{w \in Δ \prod_{j \in T} A_{j}} E_{μ \times δ_{no boom} \times w} [U_{T} - 1])$

Pull out the -1's, cancel out everything you can, and you get...

$v_{G}^{+} (S \cup {bomb}) - v_{G}^{+} (S) = - 1$

Now, most terms in the Shapley sum are like this. The only coalitions where the bomb can get its maximin value of 0 when it joins the coalition are the null coalition, and the coalition consisting of everyone but the bomb (they both break the assumption that there are group house members on both teams). And, so, the Shapley value of the bomb will be slightly above $- 1$ .

Worst Teammate Ever

There's two ways to interpret this. One way is that, by splitting everyone up into two grand teams, small players won't have an opportunity to extract their own shard of utility for themselves, they're coaxed into acting for the greater good of the team they're on. Like the poor unhappy bomb.

The other way of interpreting it is that nobody wants the bomb on their team. Sure, you can interpret "bomb doesn't explode, team utility decreases by 1" as the bomb being nagged into doing the surplus-maximizing thing by its team and pouting. Another interpretation is that someone on the team with the bomb paid it 1 dollar to not explode. We're just shifting the location of the $- 1$ from the bomb to the pocket of someone on the same team. If we reinterpret the math in this way, we can see that in the game $v_{G}^{+} (S)$ , what's going on is that team T has the bomb, and if they're maximizing their own team utility, they've gotta pay the bomb to not detonate. But then the bomb joins team S instead. And team T is like "cool! When considering the hypothetical where we maximize our team utility, we don't have to worry about feeding that damn thing! It's not on our team anymore!". And then you can interpret the result as either team S being able to extract one less dollar of surplus from T as a result, or as team S inheriting the responsibility to pay the bomb 1 dollar to not explode.

No matter how you interpret it, team S is 1 dollar worth of unhappy that their foes aren't taking care of the bomb and that S has the responsibility of addressing it now. Since nearly all possible teams are unhappy that they have to take care of the bomb, and they can't even wave it around as a threat to squeeze more money from the other team, the bomb gets negative Shapley value.

And so, nobody pays the bomb enough, and it explodes. Oops.

ROSE Value, N-Player Case!!

Well, how to repair this problem? The fundamental flaw seemed to be that, when the bomb joined team S, it went from "team T has to pay me to not explode" to "team S has to pay me to not explode", making team S unhappy. But if the bomb was still able to demand money from team T, then team S would be neutral about the bomb joining their team.

Maybe if the bomb had the sort of surplus-extracting power associated with being the first player in an initiative game? It might not even have to come first, just come before the late players in team T so it's extracting from them and doesn't switch to being a bad teammate...

And so, there followed a lot of frustrating attempts at visualizing initiative ordering and how it was implicitly grouping players up into coalitions to get the Shapley values...

And a glorious moment of insight, that came from getting sorta confused about how to reconcile Shapley values and initiative games.

Shapley values are about adding everyone one-by-one to a team in a random order and everyone gets their marginal value they contributed to the team.

And that's kinda like giving everyone a random initiative ordering and giving everyone the surplus they can extract in the resulting initiative game.

If we're doing that, then maybe a player, regardless of their position, can ensure they get their maximin value? Maybe this sort of Random-Order Surplus Extraction can work. ROSE.

This is a pretty huge step. We're throwing Axiom 1, that the value of a coalition can be reduced to a 2-player game, out the window. Hell, we're throwing Shapley values out the window. Well, not really, the ROSE value will turn out to be pretty dang similar to Shapley values. But still, the initiative games which build the ROSE value will turn out to be sensitive to order, in a way the coalition games which build the Shapley value are not.

I Know You Know They Know

To check whether this works, we'll have to work out what n-player initiative games look like, given that we only know what they look like in the two-player case.

Now, the equations for these get very unwieldy very fast, so let's develop some terse shorthand. Something like $[1234]^{34}$ is meant as "players 3 and 4 in the initiative ordering pick a distribution over moves that maximizes the expected utility of the team comprised of players 1 through 4". Of course, this doesn't say what players 1 and 2 are doing, which is essential for this term to have an actual value. So it'd have to only appear inside a context where players 1 and 2 have picked their moves already. Think of it as a term with free variables for what players 1 and 2 are doing. It's probably best illustrated with examples.

For 1-player games, the value for player 1 would be $[1]^{1}$ . Ie, player 1 maximizes its own utility.

For 2-player games, the values for player 1 and 2 (remember, number of the players is their position in the initiative ordering) would be, respectively,

${[[12]^{2} - [2]^{2}]}^{1}$

$[12]^{12} - {[[12]^{2} - [2]^{2}]}^{1}$

Player 1 maximizes how much surplus it can extract. Player 2 takes the gap between Player 1 extracting as much surplus as it can, and overall surplus maximization.

For 3-player games, the values for players 1, 2, and 3, would be

${[{[[123]^{3} - [3]^{3}]}^{2} - {[[23]^{3} - [3]^{3}]}^{2}]}^{1}$

${[[123]^{3} - [3]^{3}]}^{12} - {[{[[123]^{3} - [3]^{3}]}^{2} - {[[23]^{3} - [3]^{3}]}^{2}]}^{1}$

$[123]^{123} - {[[123]^{3} - [3]^{3}]}^{12}$

But what's the interpretation of these 3-player equations? Well, for player 3, the interpretation of their payoff is "player 1 and 2 will effectively coordinate as if they're on a team against me, but I get all the rest" (line 3).

For player 2, the interpretation is that it's automatically incorporating the cost of bribing player 3 into everything. $[123]^{3} - [3]^{3}$ looks something like "overall surplus $[123]$ " but it's incorporating the cost of bribing player 3 to help out. And $[23]^{3} - [3]^{3}$ is playing the role of "player 2's utility when they disregard their obligations to everyone else", and also incorporates the cost of bribing player 3 the minimum amount needed to help out. It's got a very similar structure to the equation for what player 3 gets.

And for player 1, if we interpret $[[23]^{3} - [3]^{3}]^{2}$ as "what I have to pay player 2 to get it on my team", player 1 is, yet again, trying to extract as much surplus as it can subject to the requirements of having to give everyone minimal bribes to do it.

Now, it's not at all obvious that these are the right equations. One key property they fulfill is that if you have either player 1, player 2, or player 3 being a null player (a player with one move that does nothing, and that always has 0 utility), the equations for the remaining players will reduce to the 2-player equations, and the null player will get no value.

This isn't enough to distinguish these equations from the equations that the naive Shapley values would correspond to, detailed below.

${[[123]^{23} - [23]^{23}]}^{1}$

${[[123]^{3} - [3]^{3}]}^{12} - {[[123]^{23} - [23]^{23}]}^{1}$

$[123]^{123} - {[[123]^{3} - [3]^{3}]}^{12}$

But, the originally stated 3-player equations do give everyone $\geq$ their maximin value, while these Shapley equations don't.

Moving up to the 4-player equations, I actually botched them at first because they got too complicated to hold in my head, and only the "no matter where you put the null player, things should reduce to the 3-player equations" desiderata saved me.

${[{[{[[1234]^{4} - [4]^{4}]}^{3} - {[[34]^{4} - [4]^{4}]}^{3}]}^{2} - {[{[[234]^{4} - [4]^{4}]}^{3} - {[[34]^{4} - [4]^{4}]}^{3}]}^{2}]}^{1}$

${[{[[1234]^{4} - [4]^{4}]}^{3} - {[[34]^{4} - [4]^{4}]}^{3}]}^{12} - above line$

${[[1234]^{4} - [4]^{4}]}^{123} - {[{[[1234]^{4} - [4]^{4}]}^{3} - {[[34]^{4} - [4]^{4}]}^{3}]}^{12}$

$[1234]^{1234} - {[[1234]^{4} - [4]^{4}]}^{123}$

Ok, what the hell is going on. Well, let's call them Alice, Bob, Carol, and Dave. Going in order, Alice has the most initiative.

1: In a context where Alice, Bob, and Carol have moved, Dave would argmax.

2: In a context where Alice and Bob have moved, Carol would foresee this, and predictably pay Dave to extract his surplus, and pick the best move for herself extracting value from him, that's Carol argmaxing.

3: In a context where Alice has moved, Bob would foresee this, and pay Carol to, instead of doing that, join a Bob/Carol alliance to extract from Dave, and pick the best move for Bob extracting value from that, that's Bob argmaxing.

4: Alice would foresee this, and pay Bob to, instead of doing that, join an Alice/Bob alliance. Said Alice/Bob alliance would involve paying off Carol to, instead of extracting Dave's surplus for herself, or for Bob/Carol, join an Alice/Bob/Carol alliance in extracting from Dave, and pick the best move for Alice benefiting from that whole mess.

5: Bob goes "hang on, Alice, you extracting value for yourself doesn't leave me well off. How about I promise you get that value, we coordinate on doing what's best for the Alice/Bob alliance, and I extract the rest of the benefit?". Alice goes along, and the Alice/Bob alliance lines up a promising move.

6: Carol goes "hang on, Alice/Bob team, you extracting value for yourselves doesn't leave me well off. How about I promise you two get those values, we coordinate on doing what's best for the Alice/Bob/Carol alliance, and I extract the rest of the benefit?". The Alice/Bob team goes along, and the Alice/Bob/Carol alliance lines up a promising move.

7: Dave goes "hang on, Alice/Bob/Carol team, you three extracting value for yourselves doesn't leave me well off. How about I promise you three get those values, we coordinate on something Pareto-optimal, and I extract the rest of the befit?". The Alice/Bob/Carol team goes along, and the alliance of everyone lines up the surplus-maximizing move, and everyone pays each other so everyone gets what they have claimed.

We're not typing out the five-player equations, or analyzing them. This is enough to skip straight to the end. Finding a nice tidy recursion that generates the values of players in an initiative game, and proving nice things about it.

ROSE Value, Final Definitions:

Definition 1: Initiative Game An initiative game is a pair of a game $G$ (with its set of players $N$ , and their move spaces and utility functions), and a bijection $σ : N \to {1, 2, . . ., n}$ . The notations $A_{i}^{σ}$ and $U_{i}^{σ}$ are used as an abbreviation for $A_{σ^{- 1} (i)}$ and $U_{σ^{- 1} (i)}$ , the action space and utility function of the player that $σ$ assigned to rank $i$ .

Effectively, an initiative game is a game that also has the players being ranked from most initiative (1) to least initiative (n). $n$ will be typically used to denote the number of players in the game.

Definition 2: Context Given an initiative game $G, σ$ , an $m$ -context (for $0 \leq m < n$ ) is an element of $\prod_{i = 1}^{m} Δ A_{i}^{σ}$ . Note that, since the empty product is the single-point space, there is a unique 0-context, which will be denoted by $∙$ .

An $m$ -context is a list of strategies for players 1 through m in the initiative game. You can think of it as being inside of a hypothetical where the early players have moved already and the later players are reacting to that.

Definition 3: Contextual Value Given an initiative game $G, σ$ , and an $a, b, μ$ where $1 \leq a \leq b \leq n$ and $μ$ is a $(b - 1)$ -context, the contextual value $V_{μ, a : b}^{σ}$ is recursively defined as follows.

For the $b < n$ case,

$V_{μ, a : b}^{σ} := {max}_{ν \in Δ A_{b}^{σ}} [V_{μ \times ν, a : b + 1}^{σ} - V_{μ \times ν, b + 1 : b + 1}^{σ}]$

For the $b = n$ case,

$V_{μ, a : b}^{σ} := {max}_{ν \in Δ A_{n}^{σ}} E_{μ \times ν} [\sum_{i = a}^{n} U_{i}^{σ}]$

As a special case, the malformed contextual value $V_{∙, 1 : 0}^{σ}$ might appear in later equations. This is defined as $0$ .

A contextual value $V_{μ, a : b}^{σ}$ may be thought of as the value that the coalition of players a through b (according to the initiative ordering $σ$ ) will rack up within context $μ$ . The recursive definition is effectively saying that, for coalitions that go from a-to-n, since everyone else has moved, the contextual value is the maximum team utility that the last player can rack up. And for coalitions that range from a-to-b, since everyone but b has moved, the contextual value is the maximum team utility that b can rack up. Specifically, that value is the value of the a-to-b+1 coalition, minus what player b+1 must be bribed to join the big coalition.

Definition 4: Initiative Value Given an initiative game $G, σ$ , and some b that fulfills $1 \leq b \leq n$ , the value given to the b'th-ranked player in the initiative game $V_{b}^{σ}$ , is defined as

$V_{b}^{σ} := {max}_{μ \in \prod_{i = 1}^{b - 1} Δ A_{i}^{σ}} [V_{μ, 1 : b}^{σ}] - {max}_{ν \in \prod_{i = 1}^{b - 2} Δ A_{i}^{σ}} [V_{ν, 1 : b - 1}^{σ}]$

The value that player b gets is the gap between the maximum amount the 1-to-b coalition can get, and the maximum amount the 1-to-(b-1) coalition can get. Basically, b joins the 1-to-(b-1) coalition, and is like "yo, we can rack up more value now. I'll be claiming all my marginal value contributed". Just like how, in the Shapley games, each player claims their marginal value contributed from joining a coalition!

Two special cases must be noted where this behaves a bit unintuitively. For $b = 2$ , that second "maximize over contexts" thing ends up maximizing over an empty product, ie, the single-point space containing only the trivial context. So we get

$V_{2}^{σ} = {max}_{μ \in Δ A_{1}^{σ}} [V_{μ, 1 : 2}^{σ}] - V_{∙, 1 : 1}^{σ}$

And for $b = 1$ , not only do we have the empty product shenanigans, we also end up with the malformed context $V_{∙, 1 : 0}$ , which, by definition was 0, so that cancels out, and we have

$V_{1}^{σ} = V_{∙, 1 : 1}^{σ}$

And now we can finally define the ROSE value rigorously!

Definition 4: ROSE Value Given a game $G$ , the ROSE (Random-Order Surplus Extraction) value for player $i$ is defined as

$R (i) = E_{σ \sim u} [V_{σ (i)}^{σ}]$

Where $u$ is the uniform distribution over initiative orderings $σ : N \to {1... n}$ .

And that's the ROSE value. The average over all possible initiative orderings of the value that player i gets in the corresponding initiative game. The replacement for the CoCo value.

Something odd you might have noticed here was our restriction to factorizing strategies $\prod_{i = 1}^{m} Δ A_{i}^{σ}$ , instead of joint strategies $Δ \prod_{i = 1}^{m} A_{i}^{σ}$ . Well, it's only like that because I couldn't prove the Sub-Max desiderata with joint strategies. You're welcome to try dropping it and see where it takes you.

ROSE Value Properties

We've got quite a few properties to check. There's the basic desiderata, sure, but there's also a few things of independent interest. And if we're junking Shapley values, we should probably show that the ROSE values are about as nicely behaved as Shapley values.

There was an informal desideratum that we didn't list, that the players should be maximizing their own utility functions, not intentionally trying to minimize the utility functions of others. I personally think ROSE values fit this, given their interpretation as paying the foes the minimum possible amount you need for them to go along with your plans.

Desideratum 1, 2-Player Reduction, fails spectacularly.

For Desideratum 2, Shift-Invariance, we'll actually be proving a stronger property than that, which is one of the four defining axioms that uniquely pin down the Shapley value. Linearity. Namely, let's say games G and H are played completely independently of each other. The Shapley value for G and H is the sum of the Shapley values for the two individual games. Similarly, if the payoffs are scaled by a factor of a, then the Shapley values assigned to everybody scale up by a factor of a.

The connection between Linearity and Shift-Invariance is that a game with various constant payoffs added to the various players can be viewed as a sum of the original game $G$ , and a game where everyone has only one move and is guaranteed to get specified constant payoffs. If the initiative games are linear, or ROSE values are, then you can decompose into the value of the original game, plus the value of the "you get a constant!" game. Which would be the relevant constant.

Linearity also lets you prove stuff like "if a team of players is acting on one island, and another team is on another island, you can compute the ROSE values of the two islands separately".

Proposition 3.1: For all games with the same number of players $G, H$ , nonnegative constants $d, e \geq 0$ , orderings $σ$ , and players $i$ , $V_{i}^{d G + e H, σ} = d V_{i}^{G} + e V_{i}^{H}$ holds, where $d G + e H$ is the game where $G$ and $H$ are played in parallel and their payoffs are scaled by the appropriate factors.

Proposition 3.2: For all games with the same number of players $G, H$ , nonnegative constants $d, e \geq 0$ , and players $i$ , $R^{d G + e H} (i) = d R^{G} (i) + e R^{H} (i)$ holds.

Proposition 3.3: ROSE equilibria are scale-and-shift invariant.

Alright, time for Pareto-Optimality! Since we're assuming everyone's utilities are on the same scale, this turns into "does the sum of everyone's payouts maximize the sum of utilities". And for the equilibria, it's trivial that this strong form of Pareto-optimality for ROSE values would lead to weak Pareto-optimality for any ROSE equilibria.

Proposition 4.1: For all games $G$ and initiative orderings $σ$ , we have $\sum_{i = 1}^{n} V_{i}^{G, σ} = {max}_{a \in \prod_{1}^{n} A_{i}} \sum_{i = 1}^{n} U_{i} (a)$

Proposition 4.2: For all games $G$ , we have $\sum_{i = 1}^{n} R^{G} (i) = {max}_{a \in \prod_{1}^{n} A_{i}} \sum_{i = 1}^{n} U_{i} (a)$

Now for continuity. We actually have something even better than continuity. Specifically, Lipschitzness! If you haven't seen it before, it's a stronger version of continuity that bounds the size of the perturbation in output from a given perturbation in input. For each game, there's a fixed constant $c$ (that only scales with the number of players. Exponentially, I believe, unless there's unexpected cancellations) where variations in everyone's payoffs by $ϵ$ only have an $c ϵ$ effect on everyone's ROSE payoffs.

Proposition 5.1: For all games $G$ and initative orderings $σ$ , the value function $V^{G, σ}$ (interpreted as a function from tuples of utility functions to tuples of payoffs) is Lipschitz.

Proposition 5.2: For all games $G$ , the ROSE values $R^{G}$ , (interpreted as a function from tuples of utility functions to tuples of payoffs) is Lipschitz.

Proposition 5.3: The function mapping a game to its set of ROSE equilibria has closed graph.

But, we do have one bit of terrible news. Remember that General Equilibria Existence Theorem, and all its prerequisites? Well, we've got all the prerequisites to invoke it except for that complicated sixth property, about how, if you shrink a player's utility function to 0, but they can still designate value to be given to others, their Shapley/ROSE value stays bounded above 0. That fails. The recipients will pay the minimum amount to be benefited, which, if a player's utility function is shrinking to 0, is an amount of money that shrinks to 0. So, actually, appealing to the existence theorem is going to take more refined arguments. Still, most of the proof should work and I'm hopeful it can be salvaged.

Now for Redundancy-Invariance!

Proposition 6.1: If $G$ is the original game and $G^{'}$ is a new game with redundant actions added, then for all initiative orderings $σ$ and players $i$ , $V_{i}^{G^{'}, σ} = V_{i}^{G, σ}$ .

Proposition 6.2: If $G$ is the original game and $G^{'}$ is a new game with redundant actions added, then for all players i, $R^{G^{'}} (i) = R^{G} (i)$

Proposition 6.3: Adding redundant actions to a game doesn't affect its set of ROSE equilibria.

And for Threat-Resistance, it's got a really specific formulation, which is the only reason it wasn't called "Threat-Immunity". Specifically, a new game $G^{'}$ , the "destruction game", is defined where everyone's action spaces augment from $A_{i}$ to $A_{i} \times \prod_{j = 1}^{n} [- D_{i, j}, 0]$ . $D_{i, j}$ is the maximum amount of utility that i can burn j for. So, basically, each player i specifies a "destruction vector" for how much utility they want to remove from everyone else. And i's utility is $U_{i} - \sum_{j = 1}^{n} d_{j, i}$ , remove the sum of all the destruction coming in. So, be aware that this is a rather specific class of threats.

Proposition 7.1: If $G$ is the original game, then for all initiative orderings $σ$ and players $i$ , for the destruction game $G^{'}$ , we have $V_{i}^{G^{'}, σ} = V_{i}^{G, σ}$ .

Proposition 7.2: If $G$ is the original game, then for all players $i$ , for the destruction game $G^{'}$ , we have $R^{G^{'}} (i) = R^{G} (i)$

Proposition 7.3: Adding a kit of "burn utility" buttons to a game doesn't affect its set of ROSE equilibria.

Now we're going to cover a rather weird desiderata. While typing this, I had the thought "if the game $G$ is augmented to a payment game $G^{'}$ where everyone can pay each other money, are the ROSE values and initiative game values the same? It'd be really embarrassing if quantities defined under a transferrable-utility assumption started behaving differently when you introduce the option of giving each other money as, y'know, an actual game move."

Fortunately yes, the ROSE values are money-redundant.

Proposition 8.1: If $G$ is the original game, then for all initiative orderings $σ$ and players $i$ , $V_{i}^{G^{'}, σ} = V_{i}^{G, σ}$ , where $G^{'}$ is the payment game.

Proposition 8.2: If $G$ is the original game, then for all players $i$ , $R^{G^{'}} (i) = R^{G} (i)$ where $G^{'}$ is the payment game.

Now, there's two more defining properties of Shapley Value where we must check if they apply. The first is symmetry. Namely, two identical players must receive equal payoffs. But what does it mean to say that two players are identical? It means the following. Let's say Alice and Bob are the relevant players.

So, for the utility functions, let's reserve their first input for what everyone else does, second input for what Alice does, and third input for what Bob does. Alice and Bob are equivalent if they have the same (or isomorphic) spaces of actions, and, for all $c$ (moves from everyone else), and $x, y \in A_{alice/bob}$ , we have the following.

For $i \notin {alice, bob}$ , $U_{i} (c, x, y) = U_{i} (c, y, x)$

For the other two, $U_{alice} (c, x, y) = U_{bob} (c, y, x)$

So, from the perspective of everyone else's preferences, there's no difference between Alice taking out the trash and Bob microwaving fish, and Bob taking out the trash and Alice microwaving fish, but generalized to all pairs of actions from them. The key part though, is that last bit, because it allows for actions like "I eat a cake". Alice's utility from eating a cake is the same as Bob's utility from him eating a cake, that's how the transposition works.

So, now that those are the conditions to declare two players "identical", we might hope that they get the same payoffs. This is one of the axiomatic defining features of Shapley values, at least.

Proposition 9: ROSE values are symmetric. If Alice and Bob are identical, $R^{G} (alice) = R^{G} (bob)$

This is not as trivial as it looks! In the initiative games, Alice and Bob might end up getting very different payoffs! An early Alice can behave very differently from a late Bob. The important part for symmetry is that from everyone else's perspectives, early-Alice, late-Bob performs the same as early-Bob, late-Alice. And also that, when these two different initiative games are averaged together, Alice and Bob end up getting the same payoffs.

Another desiderata, the fourth out of four that uniquely pins down the Shapley value, is Null player. Namely, if you add a player that always gets 0 utility, and only has a single action, then their Shapley value should be 0, and everyone else's Shapley values should be the same as a game where the null player is absent. We have that. If $G^{'}$ is the game with a null player $c$ , and $G$ is the corresponding ordinary game, then

Proposition 10: If $i$ is not the null player, then $R^{G^{'}} (i) = R^{G} (i)$ . As for the null player $c$ , $R^{G^{'}} (c) = 0$

What remains? Well, just some desiderata about what payoffs everyone receives.

For Payoff Dominance, the one where Alice outscoring Bob for every possible outcome means that she gets a higher ROSE value (or value in the initiative game) than Bob... I didn't prove it. Not "tried and failed", more like "damn I need to get this post out already and I don't care that much whether it's true or not". I'm pretty dang sure it's true for the 2-player case, at least.

What about Maximin Dominance? That's an important one. Well, it holds!

Proposition 11.1: For all games $G$ , orderings $σ$ , and players $i$ , $V_{i}^{G, σ} \geq maximin value of i$

Proposition 11.2: For all games $G$ and players $i$ , $R^{G} (i) \geq maximin value of i$

Proposition 11.3: For all games $G$ and players $i$ , all ROSE equilibria assign player $i$ equal or greater value than its maximin value.

Sub-Max is a much much stranger one. After all, what if a player can massively benefit everyone else? Then shouldn't they get a really huge monetary payout?

The ROSE value says nope. They won't get a higher value than the maximum payout they can get in the base game. I'm not entirely sure how to interpret this feature of it. Actually, this specific feature of the ROSE value is what messes with our ability to invoke the General Equilibria Existence Theorem.

Proposition 12.1: For all games $G$ , orderings $σ$ , and players $i$ , $V_{i}^{G, σ} \leq max value of i$

Proposition 12.2: For all games $G$ and players $i$ , $R^{G} (i) \leq max value of i$

And one last one. Action Monotonicity. Now, since this is the key motivating piece of CoCo value, why do we care about checking it? Well, something pretty funny happens...

Proposition 13: If the game $G^{'}$ is like $G$ but player i has a larger action space, then for all orderings $σ$ where $σ (i) = 1$ , $V_{σ (i)}^{G^{'}, σ} \geq V_{σ (i)}^{G, σ}$

Basically, although it's not always to your advantage to have more actions, if you've got the most initiative, then yes, you'll never regret having extra actions!

ROSE Bargaining

So, since ROSE equilibria are a thing, what happens if we find the ROSE equilibria of a bargaining game? Y'know, the old setup of "both players must coordinate on action or get disagreement payoffs". Do we get anything unusual, or does it reduce to one of the well-studied notions?

Yes! ROSE values do induce their own special sort of equilibria in bargaining games. But there's a weird issue where there can be multiple ROSE points in a single bargaining game. It isn't like there's The One Nash Solution, or The One Kalai-Smorodinsky Solution. It's really quite a thorny issue, dealing with all the ROSE points.

A ROSE point is anywhere where, if you drew two lines at the highest utilities that players 1 and 2 could get without sending the other below the disagreement point utility, there's a tangent line at the ROSE point that has equal distance to the two max-utility boundaries.

Basically, an imaginary currency is introduced, depending on the point of interest. If Alice has the initiative, she'll try to get the highest utility she possibly can without sending Bob below his disagreement utility, look at what Bob would get if they were allowed to pay each other in that currency, and go "Bob, you can't have more than this". Symmetric considerations apply for Bob. This yields a Pareto-inferior point, they go "let's split the gain, as denominated in this currency, 50/50", and end up at the ROSE point.

There's still agreement with the other bargaining solutions that if Alice values a widget at 5 dollars, and Bob values it at 13 dollars, then the final sale price should be halfway between the two at 9 dollars. That behavior is unchanged.

Disagreement-Point Invariance and the Random Dictator Point

But there's something really special about ROSE points. They (kinda) don't depend on where the disagreement point is! Look, we can move the disagreement point around to be various levels of bad for Alice and Bob, and the ROSE point doesn't change!

Note that wherever we put the disagreement point, it doesn't affect the ROSE point! That defining property of a ROSE point stays the same if you shift the disagreement point around, and the ROSE point only starts changing where it is if the disagreement point starts placing nontrivial caps on how much someone can ask for/what the maximum of their utility is. So I had to cut out the chunk of green sticking off the left edge, otherwise moving the disagreement point around would have effects.

What ROSE says the role of a disagreement point is, is to put an upper bound on how much the foe can ask for. It's not to hurt the foe, or to worry about it hurting you if it occurs. The point of a disagreement point is that it makes it impossible for the foe to get everything it wants. It renders the "I get all I want, you get a terrible outcome" possibilities noncredible because you can just walk away.

Also, interesting note. All the ROSE points will be Pareto-superior to the random-dictator point. This is the payoff point associated with flipping a coin to determine who gets to maximize their utility function (up to the limit set by the other player going "screw this" and playing the disagreement point)

This is an example where there are three ROSE points in green, all of which are Pareto-superior to the random-dictator solution in purple.

Fun note, the random-dictator point can be implemented in a way that doesn't incentivize lying! If you know there's going to be a 50/50 mix between your favorite outcome and your foe's favorite outcome, your incentive is to be truthful about what would be best for you.

So, ROSE points are non-unique, and they're invariant under altering the disagreement point, unless the disagreement point imposes nontrivial constraints on the extent to which one of the players can maximize their utility. And they'll always outperform "flip a coin to determine who wins".

Put another way, if you and a foe have settled on a nice acceptable equilibrium, and then it suddenly turns out that disagreeing means you get eaten by sharks, don't change a thing! Unless you were using the old disagreement point as a load-bearing way to cap how much the foe could reasonably ask for. Then, yeah, you've gotta give them more of what they want. But if them winning and you rolling over was better than the old disagreement point, the new disagreement point being way worse for you doesn't change a single thing. Either way, whatever you two settle on after the whole "oh no sharks" realization should be better for both of you than flipping a coin to decide who gets to be dictator.

Actually, this is an easy-to-implement lesson for bargaining in practice. Whatever you decide on must be better for both sides than "flip a coin, the winner gets their favorite outcome up to the limits of the loser going "screw this" and walking away" if you want to be ROSE-compliant.

Maybe Not So Threatproof?

So, the "threat-resistance" axiom is less powerful than it seems. Trying to prove it really gives you a sense for how fragile it is. The key piece that makes it tick was that the threatening action, namely, the "burn utility" button, was unconditional. It destroys value from the surplus-maximizing outcome just as effectively as it destroys value from the "foe maximizing their own utility" outcome.

But, what about conditional threats? Like "this action I can take will interfere very badly with the other various actions you can take, except for actions which benefit me!"

Yeah, not so threatproof after all.

Other actions are not pictured (for clarity), but this is how Alice burning Bob's utility doesn't harm Bob in the ultimate outcome. Since pressing the burn button decreases the payoff of all of Bob's moves equally, it shifts things down. The minimum amount that Alice must pay Bob to get him to surplus-max goes down, but so does the available surplus. So the net result is a pure loss of utility for Bob, which, after the move back up to the Pareto frontier in purple (surplus-maximizing Alice actions not shown), doesn't change things.

But let's say Alice can make modifications to her actions, where she can make different Bob actions differentially worse than they'd otherwise be, instead of decreasing utility equally. Let's say that, for the action of interest, the surplus-maximizing action from Bob is also where Bob wins big. Alice can act in a way that only ruins Bob if he wins. How much does it help Alice to do this?

In this image, the surplus-maximizing choice is the same as the Bob-maximizing choice, and then Alice invents a new move where the "Bob wins" move becomes suboptimal for Bob. Instead of the green point being the ROSE point, now the red point is the most value that Alice can claim, and when Bob claims the rest of the spare utility due to having less initiative, it moves up to the red point on the purple Pareto-frontier line, for a net Alice win, but proportional to how much she got from it, not how much it hurt Bob.

So yes, conditional threats let you claim a bit more utility by doing them. But notice that since you're presumably still capable of not ruining things for your foe if you've developed new tricks to make things bad for them, that step at the end where the foe claims all remaining surplus means that they don't lose out too badly from it. When you're at the Pareto frontier, things are zero-sum. So developing really nasty conditional threats to make yourself 10 bucks better off at -100 bucks for the foe (I'm envisioning this as enlarging your action space) doesn't actually make the foe too much worse off. They'd just pay you 10 bucks to not do it and claim all remaining surplus.

In the limit of adding new actions that can only destroy value relative to an existing action, not create any, what happens if you have initiative?

Well, you could take the action that enables you to attain your highest payoff, and make a modified version of that action where everything the foe can do in response goes poorly for them, except for them rolling over and letting you win.

Then you would attain the highest payoff possible, namely, the maximum value you can get without the foe paying you. Yeah, remember that Sub-max desiderata that looked hard to get and kinda baffling and screwed us out of showing that ROSE equilibria always exist? It's secretly a "cap the damage from getting screwed over in initiative games" property!

And of course, remember, this is one of the two initiative games. The other one has your foe get first move, and you're constrained to just naively argmax in response and your foe gets away with paying you the minimum possible amount to go along with its plans. Yeah, that one didn't go away.

So, you can still profit to some degree by differentially making things worse off for your foe. But that requires gaining new actions where you can screw over all the foe's self-favoring responses and leave them as bad or worse off than just them giving you what you want. And the foe can earn some money back from the "claim remaining surplus" step, as well as their own initiative game.

I mean, threats still pay with this. They just don't pay in proportion to how bad you make the threat (as CoCo value does), they only pay off in proportion to what you're getting out of them, and there are definitely some mitigating factors present.

So, this isn't threat immune. But it's definitely a huge improvement on the CoCo value. And maybe there's some better notion out there? I'd be very interested if there was.

I suppose, in the limit of both players developing sophisticated ways to differentially screw over the other if they're given the first move, then the mix between the two initiative games would mean that the end result is basically a 50/50 mix between "Alice wins outright" and "Bob wins outright".

Which is awfully reminiscent of a ROSE point in a bargaining game. That invariance to disagreement point really comes in handy right about now, and also the property that they're all Pareto-superior to the random-dictator solution.

I will note, at this point, that dominating the random-dictator point is not a property that Nash bargaining/CoCo value has.

This is a concrete example of such. The random-dictator point/only ROSE point is at the green dot, and the Nash Bargaining Solution is at the red dot.

Further Questions

So, that's ROSE value. A replacement to CoCo value that's a whole lot more resistant to threats, and also introduces a new notion of bargaining point that's (sorta) invariant to where the disagreement point is. And also, the initiative games are quite fascinating from a UDT standpoint as well, what with their connection to "moving logically first".

There's some further issues to clear up.

1: Can we find some characterization of the ROSE value as the unique value that fulfills obvious axioms?

2: Is there a variant of the ROSE value that handles conditional threats better? Or a proof that no such thing can exist?

3: CoCo value has a generalization to games where the players have private information about how the game turns out. One of the properties CoCo fulfills is that no player wishes they had less information. Is there an analogous extension of the ROSE value to imperfect-information games, fulfilling a similar property? What about when the players don't have a common prior?

4: Can I repair the General Equilibria Existence Theorem to work with the ROSE value?

5: Is there an efficient way to compute or approximate the value of an initiative game? If you can do this, then ROSE values can easily be approximated, just sample a bunch of random perturbations, compute the initiative game values, and average.

6: Is it possible to get the ROSE value or something analogous to flow out of more primitive assumptions about agents?

7: Did I forget some important question that someone will ask in the comments?

~~8: What implications does this have for the ending of Planecrash?~~

Thank you for reading this far. ^_^

[-]Diffractor2yΩ711-1

I think I have a contender for something which evades the conditional-threat issue stated at the end, as well as obvious variants and strengthenings of it, and which would be threat-resistant in a dramatically stronger sense than ROSE.

There's still a lot of things to check about it that I haven't done yet. And I'm unsure how to generalize to the n-player case. And it still feels unpleasantly hacky, according to my mathematical taste.

But the task at least feels possible, now.

EDIT: it turns out it was still susceptible to the conditional-threat issue, but then I thought for a while and came up with a different contender that feels a lot less hacky, and that provably evades the conditional-threat issue. Still lots of work left to be done on it, though.

[-]Daniel Kokotajlo2yΩ360

Awesome work!

Misc. thoughts and questions as I go along:

1. Why is Continuity appealing/important again?

2. In the Destruction Game, does everyone get the ability to destroy arbitrary amounts of utility, or is how much utility they are able to destroy part of the setup of the game, such that you can have games where e.g. one player gets a powerful button and another player gets a weak one?

[-]Diffractor2yΩ340

For 1, it's just intrinsically mathematically appealing (continuity is always really nice when you can get it), and also because of an intution that if your foe experiences a tiny preference perturbation, you should be able to use small conditional payments to replicate their original preferences/incentive structure and start negotiating with that, instead.

I should also note that nowhere in the visual proof of the ROSE value for the toy case, is continuity used. Continuity just happens to appear.

For 2, yes, it's part of game setup. The buttons are of whatever intensity you want (but they have to be intensity-capped somewhere for technical reasons regarding compactness). Looking at the setup, for each player pair i,j, is the cap for how much of j's utility that i can destroy. These can vary, as long as they're nonnegative and not infinite. From this, it's clear "Alice has a powerful button, Bob has a weak one" is one of the possibilities, that would just mean $D_{a, b} >> D_{b, a}$ . There isn't an assumption that everyone has an equally powerful button, because then you could argue that everyone just has an equal strength threat and then it wouldn't be much of a threat-resistance desideratum, now would it? Heck, you can even give one player a powerful button and the other a zero-strength button that has no effect, that fits in the formalism.

So the theorem is actually saying "for all members of the family of destruction games with the button caps set wherever the heck you want, the payoffs are the same as the original game".

[-]Daniel Kokotajlo2yΩ220

OK, thanks! Continuity does seem appealing to me but it seems negotiable; if you can find an even more threat-resistant bargaining solution (or an equally threat-resistant one that has some other nice property) I'd prefer it to this one even if it lacked continuity.

[-]Richard_Ngo1yΩ350

I just stumbled upon the Independence of Pareto dominated alternatives criterion; does the ROSE value have this property? I'm pattern-matching it as related to disagreement-point invariance, but haven't thought about this at all.

[-]DaemonicSigil2yΩ450

Yes!

Is there a way to deal with the issue of there being multiple ROSE points in some games? If Alice says "I think we should pick ROSE point A" and Bob says "I think we should pick ROSE point B", then you've still got a bargaining game left to resolve, right?

Anyways, this is an awesome post, thanks for writing it up!

[-]Diffractor2yΩ470

My preferred way of resolving it is treating the process of "arguing over which equilibrium to move to" as a bargaining game, and just find a ROSE point from that bargaining game. If there's multiple ROSE points, well, fire up another round of bargaining. This repeated process should very rapidly have the disagreement points close in on the Pareto frontier, until everyone is just arguing over very tiny slices of utility.

This is imperfectly specified, though, because I'm not entirely sure what the disagreement points would be, because I'm not sure how the "don't let foes get more than what you think is fair" strategy generalizes to >2 players. Maaaybe disagreement-point-invariance comes in clutch here? If everyone agrees that an outcome as bad or worse than their least-preferred ROSE point would happen if they disagreed, then disagreement-point-invariance should come in to have everyone agree that it doesn't really matter exactly where that disagreement point is.

Or maybe there's some nice principled property that some equilibria have, which others don't, that lets us winnow down the field of equilibria somewhat. Maybe that could happen.

I'm still pretty unsure, but "iterate the bargaining process to argue over which equilibria to go to, you don't get an infinite regress because you rapidly home in on the Pareto frontier with each extra round you add" is my best bad idea for it.

EDIT: John Harsanyi had the same idea. He apparently had some example where there were multiple CoCo equilibria and his suggestion was that a second round of bargaining could be initiated over which equilibria to pick, but that in general, it'd be so hard to compute the n-person Pareto frontier for large n, that an equilibria might be stable because nobody can find a different equilibria nearby to aim for.

So this problem isn't unique to ROSE points in full generality (CoCo equilibria have the exact same issue), it's just that ROSE is the only one that produces multiple solutions for bargaining games, while CoCo only returns a single solution for bargaining games. (bargaining games are a subset of games in general)

[-]M. Y. Zuo2y10

They approach the Pareto frontier, but never in fact reach it, from what I understand. So wouldn’t this just move the problem into the meta level? What’s stopping the Players from going through the threat escalation spiral to force their ideal ROSE point?

Sure they may be very tiny slices of utility in contention, but then it only takes one obstinate Player to threaten all utility to ensure they lose zero.

[-]Diffractor2y20

What I mean is, the players hit each other up, are like "yo, let's decide on which ROSE point in the object-level game we're heading towards"
Of course, they don't manage to settle on what equilibrium to go for in the resulting bargaining game, because, again, multiple ROSE points might show up in the bargaining game.
But, the ROSE points in the bargaining game are in a restricted enough zone (what with that whole "must be better than the random-dictator point" thing) to seriously constrain the possibilities in the object-level game. "Worst ROSE point for Alice in the object-level-game" is a whole lot worse for her than "Worst ROSE point for Alice in the bargaining game about what to do in the object-level-game".

So, the players should be able to ratchet up their disagreement point and go "even if the next round of bargaining fails, at least we can promise that everyone does this well, right? Sure, everyone's going for their own idea of fairness, but even if Alice ends up with her worst ROSE point in the bargaining game, her utility is going to be at least this high, and similar for everyone else."

And so, each step of bargaining that successfully happens ratchets up the disagreement point closer to the Pareto frontier, in a way that should quickly converge. If someone disagrees on step 3, then the step-3 disagreement point gets played, which isn't that short of the Pareto frontier. And if someone doesn't have time for all this bargaining, they can just break things off at step 10 or something, that's just about as good as going all the way to infinity.

Or at least, it should work like this. I haven't proved that it does, and it depends on things like "what does ROSE bargaining look like for n players" and "does the random-dictator-point-dominance thing still hold in the n-player case" and "what's the analogue of that strategy where you block your foe from getting more than X utility, when there are multiple foes?". But this disagreement-point ratcheting is a strategy that address your worries with "ever-smaller pieces of the problem live on higher meta-levels, so the process of adding layers of meta actually converges to solving the problem, and breaking it off early solves most of the problem"

Regarding your last comment, yes, you could always just have a foe that's a jerk, but you can at least act so they don't gain from being an jerk, in a way robust against you and a foe having slightly different definitions of "jerk".

Regarding your last comment, yes, you could always just have a foe that's a jerk, but you can at least act so they don't gain from being an jerk, in a way robust against you and a foe having slightly different definitions of "jerk".

How would someone ensure that the jerk does not gain from their threats regarding the choice of ROSE points, assuming Players cannot exit the game?

With this.

Reading Clarifications 1, 2, and 3 seems to imply that it would not be useful or applicable in this scenario.

Clarification 1: Yes, utilities are invariant up to a positive affine transformation so there's no canonical way to split utilities evenly. Hence the part about "Assume a magical solution N which gives us the fair division." If we knew the exact properties of how to implement this magical solution, taking it at first for magical, that might give us some idea of what N should be, too.
Clarification 2: The way this might work is that you pick a series of increasingly unfair-to-you, increasingly worse-for-the-other-player outcomes whose first element is what you deem the fair Pareto outcome: (100, 100), (98, 99), (96, 98). Perhaps stop well short of Nash if the skew becomes too extreme. Drop to Nash as the last resort. The other agent does the same, starting with their own ideal of fairness on the Pareto boundary. Unless one of you has a completely skewed idea of fairness, you should be able to meet somewhere in the middle. Both of you will do worse against a fixed opponent's strategy by unilaterally adopting more self-favoring ideas of fairness. Both of you will do worse in expectation against potentially exploitive opponents by unilaterally adopting looser ideas of fairness. This gives everyone an incentive to obey the Galactic Schelling Point and be fair about it. You should not be picking the descending sequence in an agent-dependent way that incentivizes, at cost to you, skewed claims about fairness.
Clarification 3: You must take into account the other agent's costs and other opportunities when ensuring that the net outcome, in terms of final utilities, is worse for them than the reward offered for 'fair' cooperation. Offering them the chance to buy half as many paperclips at a lower, less fair price, does no good if they can go next door, get the same offer again, and buy the same number of paperclips at a lower total price.

Can you show how this would lead to a 'jerk' never gaining in the aforementioned scenario?

[-]NicholasKross4mo42Review for 2022 Review

Decision theory is hard. In trying to figure out why DT is useful (needed?) for AI alignment in the first place, I keep running into weirdness, including with bargaining.

Without getting too in-the-weeds: I'm pretty damn glad that some people out there are working on DT and bargaining.

[-]Anthony DiGiovanni4mo10

Can you say more about what you think this post has to do with decision theory? I don't see the connection. (I can imagine possible connections, but don't think they're relevant.)

[-]NicholasKross4mo20

So there's a sorta-crux about how much DT alignment researchers would have to encode into the-AI-we-want-to-be-aligned, before that AI is turned on. Right now I'm leaning towards "an AI that implements CEV well, would either turn-out-to-have or quickly-develop good DT on its own", but I can see it going either way. (This was especially true yesterday when I wrote this review.)

And I was trying to think through some of the "DT relevance to alignment" question, and I looked at relevant posts by [Tamsin Leake](https://www.lesswrong.com/users/tamsin-leake) (whose alignment research/thoughts I generally agree with). And that led me to thinking more about value-handshakes, son-of-CDT (see Arbital), and systems like ROSE bargaining. Any/all of which, under certain assumptions, could determine (or hint at) answers to the "DT relevance" thing.

[-]Anthony DiGiovanni4mo21

Sorry, to be clear, I'm familiar with the topics you mention. My confusion is that ROSE bargaining per se seems to me pretty orthogonal to decision theory.

I think the ROSE post(s) are an answer to questions like, "If you want to establish a norm for an impartial bargaining solution such that agents following that norm don't have perverse incentives, what should that norm be?", or "If you're going to bargain with someone but you didn't have an opportunity for prior discussion of a norm, what might be a particularly salient allocation [because it has some nice properties], meaning that you're likely to zero-shot coordinate on that allocation?"

Yeah, mostly agreed. My main subquestion (that led me to write the review, besides this post being referenced in Leake's work) was/sort-of-still-is "Where do the ratios in value-handshakes come from?". The default (at least in the tag description quote from SSC) is uncertainty in war-winning, but that seems neither fully-principled nor nice-things-giving (small power differences can still lead to huge win-% chances, and superintelligences would presumably be interested in increasing accuracy). And I thought maybe ROSE bargaining could be related to that.

The relation in my mind was less ROSE --> DT, and more ROSE --?--> value-handshakes --> value-changes --?--> DT.

[-]ProgramCrafter4mo30Review for 2022 Review

This post suggests a (and, quite possibly, the) way to select outcome in bargaining (fully deterministic multi-player game).

ROSE values replace Competition-Cooperation ones by having players not compete against each other in attempts to extract more utility but average their payoffs over possible initiative orders. A probable social consequence is noted, that people wouldn't threaten everyone else in order to get something they want but would rather maximize own utility (improve their situation themselves).

ROSE values are resistant to unconditional threats. Unfortunately, conditional ones can't be easily distinguished from conditional bonuses (if player B decides to pay A if A chooses some action from a certain set) so counteracting that requires further research. There are at least two more important directions of research: non-deterministic and imperfect-information games.

In comments, a problem is mentioned: what happens if there are multiple ROSE equilibrium points? I don't believe solution "bargain over those points only" will work, as it's possible all of those points will remain equilibriums when calculating ROSE values again. Hopefully, this players disagreement will involve only tiny slices of utility and not matter much overall.

[-]Donald Hobson6moΩ120

Suppose Bob is a baker who has made some bread. He can give the bread to Alice, or bin it.

By the ROSE value, Alice should pay $0.01 to Bob for the bread.

How is an honest baker supposed to make a profit like that?

But suppose, before the bread is baked, Bob phones Alice.

"Well the ingredients cost me $1" he says, "how much do you want the bread?"

If Alice knows pre baking that she will definitely want bread, she would commit to paying $1.01 for it, if she valued the bread at at least that much. If Alice has a 50% chance of wanting bread, she could pay $1.01 with certainty, or equivalently pay $2.02 in the cases where she did want the bread. The latter makes sense if Alice only pays in cash and will only drive into town if she does want the bread.

If Alice has some chance of really wanting bread, and some chance of only slightly wanting bread, it's even more complicated. The average bill across all worlds is $1.01, but each alternate version of Alice wants to pay less than that.

LESSWRONG
LW

143

Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

General Preliminaries:

Amplifying Solutions

The Big Desiderata List

General Equilibria Existence and Properties

ROSE Value, Step-by-Step

ROSE Value, Toy Case

ROSE Value, 2-Player Case

ROSE Value, N-Player Case??

ROSE Value, N-Player Case!!

ROSE Value Properties

ROSE Bargaining

Maybe Not So Threatproof?

Further Questions

New to LessWrong?

143

Ω 63