Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

In the previous post, I looked at potentially Pareto agents playing the prisoner's dilemma. But the prisoner's dilemma is an unusual kind of problem: the space of all possible expected outcomes can be reached by individual players choosing independent mixed strategies.

To see that this need not be the case, consider the variant of the PD where the rewards for defection are much larger:

The background grey triangle is the space of all possible expected outcomes. However, by choosing independently, the agents can only reach the purple area. All previous agents that reached a Pareto outcome - LP(), neLP(), Green-neLP() and Yellow-neLP() (apart from Yellow-neLP() against itself) - will still reach a "Pareto" outcome here, where Pareto means that it cannot be mutually improved without a joint source of randomness.

Now this PD variant is bad enough, but there are some problems where the "Pareto" boundary is very far from the ideal one. Consider this variant of the coordination problem:

Unless one player gets almost all their utility, the outcome is likely to be very suboptimal.

But things can be even worse than that. Consider the stag hunt problem:

Here there is also an area that can't be reached, but since it's Pareto-dominated by the area that can be reached, that isn't the problem. The problem is that at the (Hare,Hare) equilibrium point at (3,3), there are no local Pareto improvements. Any increase in the probability of Stag, by either or both players, cannot increase the joint utilities.

The LP(), neLP() agents can therefore converge to (H,H) as an equilibrium.

Indeed, it's only when one player has a probability of Stag greater than 3/4 (or both with probability greater than 3/8), that it becomes possible for there to be local Pareto improvements in the direction of (S,S). It's perfectly possible to modify LP() into P() (no longer local) so that it increases S (and thus decreases its utility) if the probability of S is below 3/8.

If this happens, then two copies of P() will reach (S,S) on the stag problem. If P() is willing to increase the probability of S up to 3/4, then it will reach (S,S) against LP(). But that means that P() has laid itself open to ending up at a value of (3/4)0+(1/4)3 = 3/4 if it's faced with a Hare-rock. Again, to make things better against some agents, one has to make things worse against others.

What if the agents has some joint source of randomness, say a process R() that takes certain values, which they will see before their final decisions? Assume that R() is δ^2-complete, in that we can divide the set of outcomes from r into 1/δ^2 disjoint sets, each with probability less than δ^2 + δ^5 (we use δ^5 so that, even when summed over all δ^2 sets, the error remains bounded by δ^3).

Let p be a combined probability distribution over the two agents and R(). Given the fineness of R(), we can "ε-move" p towards any point on the true Pareto boundary, with resolution δ. To see that, let A=q(o1,o2)+(1-q)(o1',o2') be the target point on the Pareto boundary. Then choose randomly a set R1 of outcomes of R(), of probability εq, and a set R2 of probability ε(1-q). Replace p with p', where p'=(o1,o2) on R1, p'=(o1',o2') on R2, and p'=p otherwise. This obviously ε-moves p-{R1,R2} towards A, and since R1 and R2 were chosen randomly, the whole process moves p towards A.

Then we can define the GP() (Globally-Pareto) agent:

define GP():
For all outputs o1,o2,r of (GP(),Q(),R()),
compute probabilities.
Call the probability distribution generated p.
If p is not δ-Pareto:
ε-move p towards some point on the Pareto boundary.
output the ε-moved p.
Else output p.

Of course, "output p" means output the GP() portion of p, and this choose randomly among points on the Pareto boundary.

This must reach a globally Pareto point, and, of course, there are also various neGP() variants.

In the previous post, I looked at potentially Pareto agents playing the prisoner's dilemma. But the prisoner's dilemma is an unusual kind of problem: the space of all possible expected outcomes can be reached by individual players choosing independent mixed strategies.

To see that this need not be the case, consider the variant of the PD where the rewards for defection are much larger:

The background grey triangle is the space of all possible expected outcomes. However, by choosing independently, the agents can only reach the purple area. All previous agents that reached a Pareto outcome - LP(), neLP(), Green-neLP() and Yellow-neLP() (apart from Yellow-neLP() against itself) - will still reach a "Pareto" outcome here, where Pareto means that it cannot be mutually improved without a joint source of randomness.

Now this PD variant is bad enough, but there are some problems where the "Pareto" boundary is very far from the ideal one. Consider this variant of the coordination problem:

Unless one player gets almost all their utility, the outcome is likely to be very suboptimal.

But things can be even worse than that. Consider the stag hunt problem:

Here there is also an area that can't be reached, but since it's Pareto-dominated by the area that can be reached, that isn't the problem. The problem is that at the (Hare,Hare) equilibrium point at (3,3), there are no local Pareto improvements. Any increase in the probability of Stag, by either or both players, cannot increase the joint utilities.

The LP(), neLP() agents can therefore converge to (H,H) as an equilibrium.

Indeed, it's only when one player has a probability of Stag greater than 3/4 (or both with probability greater than 3/8), that it becomes possible for there to be local Pareto improvements in the direction of (S,S). It's perfectly possible to modify LP() into P() (no longer local) so that it increases S (and thus decreases its utility) if the probability of S is below 3/8.

If this happens, then two copies of P() will reach (S,S) on the stag problem. If P() is willing to increase the probability of S up to 3/4, then it will reach (S,S) against LP(). But that means that P() has laid itself open to ending up at a value of (3/4)0+(1/4)3 = 3/4 if it's faced with a Hare-rock. Again, to make things better against some agents, one has to make things worse against others.

What if the agents has some joint source of randomness, say a process R() that takes certain values, which they will see before their final decisions? Assume that R() is δ^2-complete, in that we can divide the set of outcomes from r into 1/δ^2 disjoint sets, each with probability less than δ^2 + δ^5 (we use δ^5 so that, even when summed over all δ^2 sets, the error remains bounded by δ^3).

Let p be a combined probability distribution over the two agents and R(). Given the fineness of R(), we can "ε-move" p towards any point on the true Pareto boundary, with resolution δ. To see that, let A=q(o1,o2)+(1-q)(o1',o2') be the target point on the Pareto boundary. Then choose randomly a set R1 of outcomes of R(), of probability εq, and a set R2 of probability ε(1-q). Replace p with p', where p'=(o1,o2) on R1, p'=(o1',o2') on R2, and p'=p otherwise. This obviously ε-moves p-{R1,R2} towards A, and since R1 and R2 were chosen randomly, the whole process moves p towards A.

Then we can define the GP() (Globally-Pareto) agent:

Of course, "output p" means output the GP() portion of p, and this choose randomly among points on the Pareto boundary.

This must reach a globally Pareto point, and, of course, there are also various neGP() variants.