An Alternative Approach to AI Cooperation

Would this be a sufficient formalization of your idea (restricting to two players for simplicity)?

Given a finite two-player game G, construct the game G_M (for 'merge') as follows: each player picks both a strategy profile of G (i.e., a function assigning a strategy to each player), and a "fallback" strategy. If both players select the same strategy profile, they get the corresponding payoff from G. If they select different strategy profiles, they get the payoff according to their fallback strategies.

Contrast with the formalization of source-code swapping, for two players:

Given a finite two-player game G, construct the game G_S (for 'source swapping') as follows: each player chooses a program that takes the other players' programs as input, that outputs the strategy to play in game G, and that terminates for all inputs. Players' payoffs are determined by the strategies their programs choose.

The first is arguably simpler to reason about, but it seems that any contract the players may want to enter into would be a Nash equilibrium of G_M, and for every Nash equilibrium of G_M there is a Nash equilibrium of G_S that has the same payoffs:

Suppose s is a strategy profile of G and ((s,f1),(s,f2)) is a Nash equilibrium of G_M. Then, consider the programs P1 = "if the other player plays P2, play s_1; otherwise, play f1" and P2 = "if the other player plays P1, play s_2; otherwise, play f2," constructed by mutual Quining. It's easy to see that (P1,P2) is a Nash equilibrium with the same payoffs.
Suppose ((s,f1),(t,f2)) is a Nash equilibrium of G_M, with s != t. Then, neither party can do better by switching to a different fallback strategy. Let C_x be the constant program that outputs x regardless of its input. Again, it's easy to see that (C_f1,C_f2) is a Nash equilibrium with the same payoffs.

I think that the converse is also true (for every Nash equilibrium of G_S, there is an equilibrium of G_M with the same payoffs), but I haven't been able to prove it; the problem is to find appropriate fallback strategies in G, and the idea is to show that if there aren't any, then to be a Nash equilibrium, P1 and P2 must act differently depending on what the program they're passed does, and to do that in general, they would have to solve the halting problem. But so far, I haven't been able to prove this.

I'd be happy to expand the "easy to see"s if desired :-)

[-]cousin_it16y40

I think that the converse is also true (for every Nash equilibrium of G_S, there is an equilibrium of G_M with the same payoffs)

It seems to me that you're right. In any Nash equilibrium of G_S you get no less than your security level in G, otherwise you'd switch. So your fallback strategy in G_M only has to bring the other guy down to his security level regardless of the cost to yourself.

This also tells us which outcomes of G are implementable as Nash equilibria in G_M and G_S: if an outcome gives each player no less than their security value, it's implementable, and vice versa.

Not sure that this is the best formalization of Wei Dai's idea, though. I don't completely understand it yet.

[-]Benya16y20

This also tells us which outcomes of G are implementable as Nash equilibria in G_M and G_S: if an outcome gives each player no less than their security value, it's implementable, and vice versa.

Hm, nice formulation, but no, not quite, I think. Security value is the best you can get if you move first, right? I think an outcome is Nash implementable in G_M iff it gives each player no less than they can get if they move second -- the fallback strategy of the other player is fixed, the deviating player can take it as the "first move."

As for G_S, I think any outcome can be Nash implemented if it gives each player no less than if they move second (simply use the translation from G_M), and any outcome that can be Nash implemented in G_S must give each player no less than if they move first (because otherwise they could deviate by using a constant function). But I'm not sure about the outcomes that fall between these two conditions.

But at least, this characterization makes clear that most outcomes we're usually interested in can indeed be implemented in both G_M and G_S. Thanks, I was actually looking for a succinct way to state this condition for G_S earlier :-)

[-]cousin_it16y20

Security value is the best you can get if you move first, right? I think an outcome is Nash implementable in G_M iff it gives each player no less than they can get if they move second -- the fallback strategy of the other player is fixed, the deviating player can take it as the "first move."

If mixed strategies are allowed, these values coincide by the minimax theorem. I think my conclusions were correct.

[-]Benya16y30

Hm. I see. Thanks for pointing that out. Embarrassingly, I completely ignored mixed strategies above -- implicitly assumed that the constructions of G_M and G_S would be over pure strategies of G, and analyzed only pure strategy profiles of G_M and G_S.

I do see how constructing G_M and G_S over mixed strategies of G would make the two values equal by the minimax theorem, but I think there are complications when we analyze mixed strategies. Mixed Nash equilibria of G_S can enforce correlated play in G, even without relying on cryptography, as follows: Have the pure strategies in the equilibrium correspond to a common quined program (as before) plus a natural number < n, for some given n. Then the program adds the different players' numbers modulo n, and uses the result to determine the strategy profile in G. If a player chooses their number with uniform probability through a mixed strategy of G_S, then they know that all sums modulo n are equally likely, no matter how the other players choose their numbers. So everybody choosing their numbers uniformly at random can be a Nash equilibrium, because no player can change the expected result by unilaterally switching to a different way of choosing their number.

But G_M, as defined above (whether over pure or mixed strategies), can, as far as I can see, not enforce correlated play in G.

A natural way to rectify this is to define the first component of G_M to be neither just a pure nor just a mixed strategy profile of G, but an arbitrary distribution over pure strategy profiles of G -- a "correlated strategy profile." Can the earlier proof then be used to show that G_M has a mixed strategy profile realizing a certain outcome ⇔ G has "correlated strategy profile" realizing that outcome that pays each player at least their security value ⇔ G_S has a mixed strategy profile realizing that outcome?

[-]cousin_it16y20

Good catch. To tell the truth, I didn't even think about mixing strategies in G_M and G_S, only playing deterministically and purely "on top of" mixed strategies in G. When we add mixing, G_S does turn out to be stronger than G_M due to correlated play; your construction is very nice.

Your final result is correct, here's a proof:

1) Any Nash equilibrium of G_S or "new" G_M plays a correlated strategy profile of G (by definition of correlated strategy profile, it's broad enough) that on average gives each player no less than their security value (otherwise they'd switch).

2) Any such "good" profile can be implemented as a Nash equilibrium of G_S in mixed strategies, using your construction above and the usual method of punishment by security level. If all of the enemy's strategies are quine-compatible with yours, the profile gets played exactly thanks to the niceties of your construction. If any small part of his strategies are incompatible with yours, that's enough to bring him down on average.

3) For "new" G_M you just submit the profile and the punishment fallback. So "new" G_M doesn't actually need mixed strategies, our latest deus ex machina is too strong.

[-]Wei Dai16y10

Thanks, Benja and cousin_it, for working out the math. The "new" G_M looks correct to me (and let's just call it G_M from now, since I see no reason why the joint machine can't be programmed with a correlated strategy profile). To make it a little more realistic, we can add a step after the joint machine is constructed, where each player has a choice to transfer his resources or not. It's pretty obvious that adding this step makes no difference in the outcome, since the players can program the joint machine to execute their fallback strategies if one of the player fails to transfer.

Benja's method for G_S to enforce correlated play in G seems to require simultaneous source-swapping. Otherwise, if I choose my random number first, then the second player can pick his number to favor him, right? It seems that the common quined program has to use some cryptographic method to generate a common random number in a way that's not vulnerable to manipulation by the players.

[-]Vladimir_Nesov16y10

When two AIs merge this way, you can consider them pre-merge as already being a single distributed process. Then, the process of merging is just some dynamic that happens to the distributed process of both AIs. What did actually happen in this dynamic?

[-]Jonathan_Graehl16y00

Consider any method for committing to run published source code that isn't vulnerable to deception (Freeman's works, I think).

If you are saying that physically supervised joint reincarnation is just (if it works) another type of enforceable mutual source code promise, and otherwise changes nothing, I agree.

[-]Wei Dai16y00

They went from running on two different nodes with different code, with each node controlling its own resources, to running on just one node with some code that they both agreed upon, that controls both of their resources.

I added "(which they both agreed upon)" to the post. Does that clear it up, or is the unclarity somewhere else?

[-]Vladimir_Nesov16y00

What is the difference between two (interacting) nodes controlling their own resources and (instead of them) one distributed node controlling the resources? What do you mean by "control"? What is gained by calling basically the same thing "one node" and not "two nodes"? If the important part is "agreed-upon", how does it work in terms of the two interacting processes, and how to translate it in a statement about the joint process?

[-]Wei Dai16y00

I'm confused about your confusion. I'm using familiar human terms, and assuming straightforward (though not necessarily easy) translation into AI implementation. Is there no obvious counterpart for "control" and "agreed-upon" in AI?

What if we assume that the two original AIs are human uploads? Does that help?

Or taken two flesh-and-blood humans, and suppose they jointly program a robot and then die, physically handing their assets over to the robot before they die. Would you also say the two humans can already be considered as a single distributed process, and there's nothing to be gained by calling the same thing "one node" and not "two nodes"?

[-]Vladimir_Nesov16y00

What if the first AI can just ask the second AI to do anything with its resources, and the second AI just does that? Just because the AIs agree to some action (merging), doesn't mean that the action was really an optimal choice for them: they could be wrong. You need a more formal model to deal with such issues (for example, the AIs could have a utility function over outcomes, but then again we face the issue of bargaining).

I don't see how this setting helps to reduce bargaining/cooperation.

[-]Wei Dai16y00

I assume that the original AIs use a standard decision theory. You're right that it doesn't reduce bargaining. That wasn't the goal. The goal is to allow any agreements that do form to be enforced. In other words, I'm trying to eliminate courts, not bargaining.

Of course a solution that also eliminates bargaining would be even better, but that might be too ambitious. Feel free to prove me wrong though. :)

[-]Vladimir_Nesov16y00

Agreements are solutions to bargaining, so how can you have agreements without solving bargaining? In other words, if you just assign some agreement without considering a process of reaching it, why do you need to consider the two agents before the merge at all?

[-]Wei Dai16y00

I don't just assign some agreement without considering a process of reaching it. I make that a separate problem to be solved, and just assume here that AIs do have some way to reach agreements by bargaining, in order to focus on the mechanism for enforcement of agreements.

ETA: Do you think common knowledge of source code can help with the bargaining problem in some way? Maybe that's the insight I'm missing?

[-]Jonathan_Graehl16y00

Individual verified-source reincarnation and joint verified-source reincarnation on a single computing substrate are equivalent in every way.

Call A+ the reincarnation of A, and B+ the reincarnation of B.

There must be some way of observing that the irrevocable transfer of authority has actually occurred. This means you need some way of trusting promises by all entities that owe some allegiance to A or B. So far the only method proposed to establish trust in one's decision rule is to submit to verified reincarnation. This may not be to the liking of non-machine intelligence, and at best costly to those involved, if not infeasible due to speed-of-light physical separation.

Further, A+ shouldn't depart, in any way it wasn't willing to do unilaterally for its own sake, from the values and decision rules of A, unless there's a secure atomic transfer of authority from A->A+ and B->B+. I don't see how such a thing is possible, so A+ really can't commit to any cooperation until it verifies that B->B+ has occurred.

So, A+ and B+ would differ from A and B only in that their own end of the contract they propose to enter would become active iff the other actually reincarnates (A+ can do this once B->B+, because she knows that B+ will do the same). But it is necessary for one of A+ or B+ to reincarnate first.

"On the count of 3, hang up!"

[-]Wei Dai16y20

This may not be to the liking of non-machine intelligence, and at best costly to those involved, if not infeasible due to speed-of-light physical separation.

Yes, I agree. That's why I tried to make clear that the necessary technology is being assumed into existence.

But it is necessary for one of A+ or B+ to reincarnate first.

In joint verified-source reincarnation, or what I called secure merger, there is just A'. Let's ask, after A' is constructed, but before A and B have transferred their resources, is there any incentive for either A or B to cheat and fail to transfer? Not if they programed A' in such a way that A and B are each individually made no worse off by transferring. For example, A and B could program A' so that it would act exactly as A would act if B fails to transfer, and exactly as B would act if A fails to transfer, and carry out the cooperative solution only if both sides transfer.

[-]Jonathan_Graehl16y00

I made my opening claim (equivalence of reincarnation on independent vs. shared substrates) so I didn't have to fuss about notation. Do you disagree?

As for making each individual's transfer risk-free, that was my point - the new version of oneself must be substantially identical to the old one. The only new payload should be the (uncheatable) protocol for committing to new agreements. You've verified that your counterpart implements the same protocol, and you can make binding agreements.

Of course, we assume A and B don't mind reincarnation ,and feel no angst at brief coexistence of multiple copies of themselves, or the destruction of one.

[-]Wei Dai16y10

I think in the centralized model, it's easier to see how the resource transfer could be made risk-free. In the distributed model, you have to deal with distributed systems issues, which seem like a distraction to the main point that a technology like "secure joint construction" or "verified source-code reincarnation" can be used to enforce cooperative agreements among AIs (if they don't mind being merged/reincarnated)

[-]orthonormal16y10

The process of transferring resources might be done visibly and gradually, like the iterated donation pact in this comment.

[-]Jonathan_Graehl16y00

I agree that it might be nice to transfer gradually, but if you avoid the risk of losing any fraction, then you solve the problem of one-shot transfer also. If you don't solve that problem, then you still risk losing something to treachery. Maybe you could suggest a concrete protocol for a verified-source reincarnation scenario.

I'm also envisioning some non-divisible resources as well, e.g. privileges and obligations with respect to independent entities (we'll assume these were crafted to transfer on reincarnation).

[-]timtyler16y-20

Are the machines' goals equivalent, compatible, or incompatible?

In the first case, cooperation is no problem, the second case represents a rather unlikely scenario, and in the third case, you probably won't see much cooperation.

[-]orthonormal16y10

The second case may be more plausible in a non-zero-sum environment, where the merged agent can achieve more of each prior agent's goals than they would each achieve in a competitive setting. Think of the Superhappies' propensity to merge goals with other species as an analogue.

[-]timtyler16y-10

Companies merge sometimes - that seems like a kind of proof of concept. They are usually trying to gain mass in order to defeat third parties.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

22

An Alternative Approach to AI Cooperation

22

22