Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Strategies for coalitions in unit-sum games

0Stuart_Armstrong

0jessicata

4Stuart_Armstrong

New Comment

Interesting. But theorem 2 may say less than it seems. If you subtract from every player, you get a zero-sum game, and then theorem 2 seems to reduce to saying that a majority coalition can always expect to not lose in a symmetric zero-sum game.

I agree that Theorem 2 only says that the majority coalition expects to get a fraction of the universe proportional to its size, and does not say they get more. This fact is unsurprising.

Actually, I'm wrong, it is possible for a majority coalition to take a loss in a zero-sum game: http://lesswrong.com/r/discussion/lw/oj4/a_majority_coalition_can_lose_a_symmetric_zerosum/

A consequence of that is that your theorem 2 is sharp. You can't guarantee more than what you stated. In particular, there exists games with coalitions arbitrarily close to that can't get more than of the value.

I'm going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can't guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.

**

In a unit-sum game:

A unit-sum game is symmetric if, for any permutation p:{1,...,n}→{1,...,n}, we have si(x,a1,...,an)=sp(i)(a,ap(1),...,ap(n)).

A coalition in a unit-sum game is a set of players. If c⊆{1,...,n} is a coalition, then a policy for that coalition π:ΔAc is a distribution assigning an action to each player in that coalition. We will assume that there are coalitions c1,...,cm such that each player appears in exactly one coalition.

We will consider the expected amount of shares a coalition will get, based on the coalitions' policies. Specifically, define rj(x,π1,...,πm):=∑a1∈A,...,an∈A⎛⎝m∏j′=1πj′(acj′)⎞⎠∑i∈cjsi(x,a1,...,an) where acj′:Acj′ specifies the actions for the players in the coalition cj′. In general, my goal in proving theorems will be to

guaranteea high rj value for a coalition regardless of x.A coalition containing a majority (>50%) of the players can, in some cases, gain an arbitrarily high fraction of the shares:

Theorem 1:For any ϵ>0 and n>0, there exists a symmetric unit-sum game with n players in which any coalition controlling a majority of the players can get at least 1−ϵ expected shares.Proof:Fix ϵ>0, n>0. Let k be such that 1/k≤ϵ. Define the set of actions A:={1,...,k}. Define si to split the shares evenly among players who give the action that the majority players chose (with ties being resolved towards lower actions). The variable X is unused. Clearly, this unit-sum game is symmetric.Let cj be a majority coalition. Consider the following policy for the coalition: select an action uniformly at random and have everyone take that action. Clearly, the action this coalition chooses will always be the majority action.

By symmetry among the different actions, any player outside the coalition has a 1/k chance of choosing the majority action. Upon choosing the majority action, a player outside the coalition gets at most 2n shares. Since there are at most n2 players outside the majority coalition, in expectation they get at most 1k≤ϵ shares in total. So the majority coalition itself gets at least 1−ϵ shares in expectation.

□

As a result of theorem 1, we won't be able to design good general strategies for non-majority coalitions. Instead we will focus on good general strategies for majority coalitions.

Theorem 2:In a symmetric unit-sum game, if a coalition j has at least a k−1k fraction of the players (for integer k>1), then given the policies for the other coalitions π−j, coalition j has a policy πj resulting in getting at least k−1k expected shares regardless of x, i.e. ∀x∈X:rj(x,π1,...,πm)≥k−1k.Proof:Without loss of generality, assume there are only 2 coalitions, j=1, and the other coalition has index 2. To define the majority's policy π1, divide the coalition c1 into k−1 sub-coalitions of |c2| players each, plus leftover players (who take some arbitrary action). Each sub-coalition will independently select actions for its members according to the distribution π2. Note that each sub-coalition is "equivalent" to c2, so by symmetry of the unit-sum game, each sub-coalition and c2 gets the same expected number of shares (regardless of x). So the coalition c2 gets at most a 1k expected fraction of the shares. Conversely, c1 gets at least a k−1k expected fraction of the shares.□

## Spying

One issue with the formalism is that it seems easier for a small coalition to spy on a large one than for a large coalition to spy on a small one, which makes it implausible that a large coalition can have a shared source of randomness not available to small coalitions.

However, note that the policy defined in Theorem 2 does not rely on the majority coalition having more coordination than the opposing coalition. This is because the policy factors c1 into k−1 independent subcoalitions whose sizes are |c2|, so shared sources of randomness are only needed within subcoalitions of size |c2| (and this shared randomness is equivalent to the shared randomness within c2 itself).

## Discussion

Theorem 2 is good news if we expect a large majority of powerful AI systems to be aligned with human values. It means that (under some assumptions) these AI systems can achieve a large expected fraction of the universe without having good priors about the random variable X.

To do this, it is necessary to know something about what the other coalitions' strategies are, such that these strategies can be copied. A major problem with this is that, in the real world, the action one should take to gain resources depends on relative facts (e.g. one's location), whereas the actions A are not context-dependent in this way. Therefore, the actions A should be interpreted as "ways of turning one's context into a resource-gathering strategy". It is not obvious how to interpret another agent's policy as a "way of turning their context into a resource-gathering strategy" such that it can be copied, and this seems like a useful topic for further thought.