Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Cooperative Oracles: Nonexploited Bargaining

0Stuart_Armstrong

2Scott Garrabrant

0Stuart_Armstrong

2Scott Garrabrant

0Stuart_Armstrong

0Scott Garrabrant

New Comment

6 comments, sorted by Click to highlight new comments since: Today at 2:47 AM

Every point in this set is a Pareto optimum, so the outcome is based on how we choose the Pareto optimum, which is under specified.

Recurse, using the agent's concepts of fairness, using the new Pareto set as the set of possibilities? And continue until the set stabilises? And then the agents will be truly indifferent between the outcomes.

Yeah. The original generator of these ideas was that we were trying to find (or prove impossible) an improvement on NicerBot: An agent with reflective oracles that cooperates with itself (regardless of what reflective oracle is chosen), but is never exploited in expectation (even by epsilon.)

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

Consider a game between n players, P1,…,Pn. We denote Ai as the finite set of actions available to player Pi, and Δi as the set of distributions on Ai. Let R denote Δ1×…×Δn, the set of possible outcomes of this game, and given an r∈R, let ri∈Δi be the ith component.

Each player Pi will be equipped with a "fair set" Fi⊆R of outcomes in which they feel that they are not being exploited, as well as a utility function, a continuous function Ui:R→[0,1]. Note that the use of the word fair here is a little weird. We will think of agents as identifying an outcome as fair if they do not feel that they have been exploited. They may find it perfectly fair to exploit others.

Fi cannot be an arbitrary set. There must be a set valued function fi:R→P(Δi) with closed graph and fi(r) nonempty and convex for all r∈R, such that Fi is exactly the set of r such that ri∈fi(r). i.e. Fi should be the set of points for which a given Kakutani map fixes the Δi coordinate.

This condition may seem arbitrary, but it is both exactly what we need for our bargaining procedure to work, and exactly what we would expect to get out of agents that have access to everyone's actions with something like a reflective oracle. This condition in part implies that for every collection of distributions on actions that all the other players take, there must be some distribution on actions that you can take in which you consider yourself nonexploited. The restriction is slightly stronger that that, requiring a type of continuity in the way that your nonexploited responses are a function of the other players' actions.

Our Bargaining procedure is now simple. We take the set F of fair points to be the intersection over all i of Fi, and we choose an r∈F which is Pareto optimal in F (i.e. there is no other r′∈F with Ui(r′)≥Ui(r) for all i and Ui(r′)>Ui(r) for some i).

There may be more than one Pareto Optimum. If so, we choose one arbitrarily.

This procedure is not obviously well defined. In particular, it is not obvious that F is nonempty, or that given that it is nonempty that it contains Pareto optimum.

Claim:F is nonempty and contains a Pareto optimum (over F).Proof:First, we show that F is compact and nonempty. Since each fi is Kakutani, the product of them, r↦f1(r),…,fn(r) is also Kakutani, and thus must have a nonempty compact set of fixed points. This set is exactly the intersection of the Ni.Now, since F is compact, U1 must achieve a maximum value on some nonempty compact set. Let S1 be the subset of F on which U1 is maximized. Define Si similarly as the subset of Si−1 on which Ui is maximized. Sn is nonempty, and every point in Pn must be Pareto optimal in F. □

Now, lets go through an couple examples.

Example 1:Two players P1 and P2 are in a prisoner's dilemma Δ1=Δ2=[0,1] meaning the probability that the player cooperates. The utility functions are U1(p,q)=2q−p and U2(p,q)=2p−q.Both players consider themselves to be exploited if the other player gets more utility than they do. Thus we have F1={(p,q)∈R|p≤q} and F2={(p,q)∈R|p≥q}. We need to verify that these fair sets satisfy the condition of being the fixed points of a Kakutani map. Indeed, consider f1(p,q)=min(p,q). f1 is continuous, and thus Kakutani. and the set of (p,q) for which f1(p,q)=p is exactly F2. Similarly we can take f2(p,q)=min(p,q).

Since F1 and F2 were both valid fair sets, they must have nonempty intersection. In this case, their intersection is the set of (p,q) with p=q, in which both player's cooperate with the same probability. This set has a unique Pareto optimum, (1,1) in which both players cooperate with certainty.

Example 2:Two players are again in a prisoner's dilemma with the same utilities as before. However this time each player thinks that they should cooperate only 90% of the time while the other player cooperates all the time. Thus each player thinks they should get 1.1 utility, while the other only gets 0.8. They refuse to let the other player take advantage of them and get more than 0.8 utility. They also want to make sure that the unique best utility for the other player is to agree to the 1.1, 0.8 split. To do this, they say that for every unit of utility that they get less than 1.1, the opponent should get 0.1 less utility than 0.8.In utility space, a pair of utilities is acceptable to P1 are if 0.8−U2≥.1(1.1−U1). Translating this to action space, we say that (p,q)∈F1 if p≤40q+2370, and similarly (q,p)∈F2 if q≤40p+2370. Again, you can see these are valid sets by considering f1(p,q)=min(p,40q+2370) and f2(p,q)=min(q,40p+2370). (Note that if we changed the numbers, these function might not have been valid, since they might not have mapped the unit interval into itself.)

The intersection of F1 and F2 is a kite shape, with a unique Pareto Optimum at p=q=23/30

Example 3:Two players are again in a prisoner's dilemma with the same utilities as before. Player 1 has the same fair set as in Example 1 (F1={(p,q)∈R|p≤q}). Player 2 however is cooperate bot and thinks that a solution is fair if he cooperates (F2={(p,q)∈R|q=1}, f2(p,q)=1).Now, F2 is a subset of F1, so the set of mutually fair points F is just the set where P2 cooperates. Every point in this set is a Pareto optimum, so the outcome is based on how we choose the Pareto optimum, which is under specified.