Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Mentioned in

Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima

3rd Jun 2017

6AlexMennen

0Scott Garrabrant

0Vanessa Kosoy

0Diffractor

0Vanessa Kosoy

0Stuart_Armstrong

2paulfchristiano

0Stuart_Armstrong

New Comment

8 comments, sorted by Click to highlight new comments since: Today at 6:11 AM

This notion of dependency seems too binary to me. Concretely, let's modify your example from the beginning so that must grant an extra utility to either or , and gets to decide which. Now, everyone's utility depends on everyone's actions, and the game is still zero-sum, so again, so any strategy profile with will be a stratified Pareto optimum. But it seems like and should ignore still ignore .

I agree with this. I think that the most interesting direction of future work is to figure out how to have better notions of dependency. I plan on writing some on this in the future, but basically we have not successfully figured out how to deal with this.

In the infinite game example, I think that something doesn't add up in the definition of . A single-valued Kakutani map into a compact space is just a continuous map, but is not continuous.

It looks legitimate, actually.

Remember, is set-valued, so if , . In all other cases, . is a nonempty convex set-valued function, so all that's left is to show the closed graph property. If the limiting value of is something other than 0, the closed graph property holds, and if the limiting value of is 0, the closed graph property holds because .

Hi Alex!

I agree that the multimap you described is Kakutani and gives the correct fair set, but in the OP it says that if then , not . Maybe I am missing something about the notation?

How about using some conception of "coalition-stable"? In which an option has that property if there is no sub-coalition of players that can unilaterally increase their utility, whatever all the other players choose to do.

In this post, we generalize the notions in Cooperative Oracles: Nonexploited Bargaining to deal with the possibility of introducing extra agents that have no control but have preferences. We further generalize this to infinitely many agents. (Part of the series started here.)

Consider example 1 from the previous post: P1 and P2 are in a prisoners dilemma. Their utility functions are U1(p,q)=2q−p and U2(p,q)=2p−q. Their fair sets are F1={(p,q)∈R|p≤q} and F2={(p,q)∈R|q≤p}. We will introduce one extra player P3 which has no options.

However, he will have preferences, U3=−U1−U2. Since he has no actions, his fair set must be everything.

Now, since the game is zero sum, every outcome is Pareto Optimal, so using the method described in the previous post, any equilibrium in which p=q will be valid. We will modify the framework to make this no longer the case, and make p=q=1 the only valid solution. For this, we will define a "stratified Pareto optimum."

We will still have a finite number of players P1,…,Pn. Each player will specify a subset of players they will depend on. Let Di be set of players that Pi depends on. Ui must only be a function of the actions of players in Di, and Fi must be closed under changing the actions of any player not in Di.

We define a preorder on the agents. We say that Pi≺Pj if Pi∈Dj, and we take the transitive closure, so Pi≺Pk whenever Pi≺Pj and Pj≺Pk. We also say that Pi≺Pi for all i.

We say that r′ is a stratified Pareto improvement over r for player Pi if:

We say that r is stratified Pareto optimal in F⊆R for player Pi if there is no alternate outcome r′∈F such that r′ is a stratified Pareto improvement over r for player Pi. If r is stratified Pareto optimal for all players, we say it is stratified Pareto optimal.

Claim:The set F=⋂i≤nFi has a stratified Pareto optimum.Proof:Consider some nonempty subset of players S with the property that for all Pi,Pj∈S, Pi≺Pj, and there is no Pi∈S, Pk∉S with Pk≺Pi. We know that such a subset exists, since the preorder over players induces a poset over equivalence classes of players (where every player in an equivalence class transitively depends on every other player), and this poset must have a minimal element.We then take an element of ∏Pi∈SΔi, which is Pareto optimal for all the players in S. Note that it is important here that the players in S have utility functions and fair sets that only depend on the other players in S. We can then lock in all the players in S as behaving according to this local Pareto optimum. We will treat them a constants, and effectively remove from the game, and repeat the process with the remaining players.

Take r found by above process. Assume by way of contradiction that r′ is a stratified Pareto improvement over r for Pi. Consider the set of players Pk with Pk≺Pi. We can restrict the game to only those players, which makes sense since these players only have utility functions and fair sets that refer to each other. Thus, we may assume without loss of generality that every player Pk≺Pi. Thus Pi is in the set of players locked in on the last step. Notice that in this case r′ being a stratified Pareto improvement is equivalent to r′ being a Pareto improvement for the players locked in on the last step, while fixing all the actions that were already locked in on previous steps, which contradicts the Pareto optimality of the distributions chosen in the last step. □

Now, we can give a new bargaining procedure, which is the same as from the previous post, except instead of choosing a Pareto optimal point in F, we choose a stratified Pareto optimal point in F. Observe that in the above 3 player game, we end up with the first two players cooperating, since they do not depend on the third player.

Unfortunately, the above construction depends heavily on the fact that there only finitely many players. Indeed, if you extend the above definitions to infinitely many players in the obvious way, stratified Pareto optima need not exist:

Consider a game with a player Pi for all i∈Z. Each player outputs a single probability ri∈Δi=[0,1] of outputting ⊤ (as opposed to ⊥). The utility function Ui(r) is just equal to ri, the probability that that player outputs ⊤. The fair set Fi is the set of points such that either ri=ri−1/2 or ri−1=0. This is a valid fair set, since we can use the strategy f(r)=ri if ri−1=0 and f(r)=ri−1/2 otherwise, which is in fact Kakutani. Observe that Pi≺Pj iff i≤j.

The set F, which is the intersection of all the fair sets of all the players is still closed and nonempty. (And this is still true for arbitrary games with infinitely many players) We have that r∈F if either ri=0 for all i or if there is some first nonzero ri, and all rj for j>i are equal to rj/2j−i. Either way, there is some i such that rj=0 for all j<i. Consider the stratified Pareto improvement r′ for Pi given by r′i=1, r′j=0 for j<i, and r′j=2i−j for j>i.

We therefore need a weaker notion than stratified Pareto optimal for infinitely many players. Note that we can still get stratified Pareto optima if there are no infinite descending chains of strict dependency. (e.g. if the above example grounded out and only had positive integers)

The way that we weaken stratified Pareto Optima is to take the (point wise) closure of the notion of stratified Pareto Optima for player P. We say that r is an almost stratified Pareto Optimum (ASPO) in F for player P if it is in the point wise closure of the set of all stratified Pareto optima for player P. We say that r is ASPO in F if it is ASPO for P for all players P.

Claim:In a game with possibly infinitely many points, the set F of fair points fair for all players is nonempty and there exists an ASPO in F.Proof:First, observe that for any finite subset of players, the set of points fair for all those players is compact and nonempty by the Kakutani fixed point theorem. By compactness, the set F of points fair for all players is also compact and non-empty.Now, observe that for any finite subset S of players, there exists an r∈F which is stratified Pareto optimal in F for each all players in S, using the argument from the finite case. Thus, the subset of F which is ASPO for all players in S is nonempty and compact. Again by compactness, this means that the subset of F which is ASPO for all players is also nonempty and compact. Thus, there exists an r∈F which is ASPO in F for all players. □

At first this weakening may seem unnatural. Notice that the all 0's vector in the above example is ASPO, even though there are many Pareto improvements and stratified Pareto improvements. (Locally, it is arbitrarily close to points with all small positive probabilities that cannot be improved.) However, we think that this condition is actually pretty natural, and the most you could hope for. The point of the players having Kakutani responses and continuous utility functions was to set them up to only have continuous(ish) access to the game state. We similarly need our notion of Pareto optimal to be such that you only violate the notion if you can point out a Pareto improvement in a continuous way, which roughly corresponds to taking the interior of the notion of Pareto improvement, and thus the closure of the notion of Pareto optimal.

In the next post, I plan to connect this up with the existing notion of reflective oracles, to define cooperative oracles.