## LESSWRONGLW

Interestingly, recent advances in computer Go have come from ignoring much human expertise and incorporating Monte Carlo approaches into the evaluation function.

It works like this:

• 1) Choose the move to be evaluated.
• 2a) Search the game tree by having players make random moves for many turns after the initial move is made.
• 2b) Score each randomly generated "far future" game state by using a very simple evaluation function, such as counting the number of stones on the board of each color.
• 3) Repeat steps 2a and 2b lots of times, generating lots
... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Reason for the success of this method is propably something not all-that-interesting. The random sampling and evaluation method works because it allows using simple brute force and random chance against a problem that computer can't (yet) handle otherwise. I am dan-player myself, and I have watched games where supercomputers with monte carlo -method and go professionals have dueled, and as far as I can tell, the go program makes repeated weird moves that seem to be doing something(hyperactive agent detector), but after a while, original idea is lost and t... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

1cousin_it10y That's the most fascinating thought I've read on LW for a long time. Thanks a lot!

# 9

Consider this game:

$\begin{bmatrix}4,1&1,3\\2,4&3+\epsilon_1,2+\epsilon_2\end{bmatrix}$

where the last payoff pair is very close to (3,2). I choose a row and you choose a column simultaneously, then I receive the first payoff in a pair and you receive the second. The game has no Nash equilibria in pure strategies, but that's beside the point right now because we drop the competitive setting and go all cooperative: all payoffs are in dollars and transferable, and we're allowed beforehand to sign a mutually binding contract about the play and the division of revenue. The question is, how much shall we win and how should we split it?

Game theory suggests we should convert the competitive game to a coalitional game and compute the Shapley value to divide the spoils. (Or some other solution concept, like the "nucleolus", but let's not go there. Assume for now that the Shapley value is "fair".) The first step is to assign a payoff to each of the 2N = 4 possible coalitions. Clearly, empty coalitions should receive 0, and the grand coalition (me and you) gets the maximum possible sum: 6 dollars. But what payoffs should we assign to the coalition of me and the coalition of you?

Now, there are at least two conflicting approaches to doing this: alpha and beta. The alpha approach says that "the value a coalition can get by itself" is its security value, i.e. the highest value it can win guaranteed if it chooses the strategy first. My alpha value is 2, and yours is 2+ϵ2. The beta approach says that "the value a coalition can get by itself" is the highest value that it cannot be prevented from winning if it chooses its strategy second. My beta value is 3+ϵ1, and yours is 3.

Astute readers already see the kicker: the Shapley value computed from alphas assigns 3-ϵ2/2 dollars to me and 3+ϵ2/2 dollars to you. The Shapley value of betas does the opposite for ϵ1. So who owes whom a penny?

That's disturbing.

Aha, you say. We should have considered mixed strategies when computing alpha and beta values! In fact, if we do so, we'll find that my alpha value equals my beta value and your alpha equals your beta, because that's true for games with mixed strategies in general (a result equivalent to the minimax theorem). My security value is (10+4ϵ1)/(4+ϵ1), and yours is (10-ϵ2)/(4-ϵ2).

This still means the signs of the epsilons determine who owes whom a penny. That's funny because, if you plot the game's payoffs, you will see that the game isn't a quadrilateral like the PD; it's a triangle. And the point (3+ϵ1,2+ϵ2) that determines the outcome, the point that we can ever-so-slightly wiggle to change who of us gets more money... lies inside that triangle. It can be reached by a weighted combination of the other three outcomes.

That's disturbing too.

...

Now, this whole rambling series of posts was spurred by Eliezer's offhand remark about "AIs with knowledge of each other's source code". I formalize the problem thus: all players simultaneously submit programs that will receive everyone else's source code as input and print strategy choices for the game as output. The challenge is to write a good program without running into the halting problem, Rice's theorem or other obstacles.

Without further ado I generalize the procedure described above and present to you an algorithm Freaky Fairness — implementable in an ordinary programming language like Python — that achieves a Nash equilibrium in algorithms and a Pareto optimum simultaneously in any N-player game with transferable utility:

1. Calculate the security values in mixed strategies for all subsets of players.
2. Divide all other players into two groups: those whose source code is an exact copy of Freaky Fairness (friends), and everyone else (enemies).
3. If there are no enemies: build a Shapley value from the computed security values of coalitions; play my part in the outcome that yields the highest total sum in the game; give up some of the result to others so that the resulting allocation agrees with the Shapley value.
4. If there are enemies: play my part in the outcome that brings the total payoff of the coalition of all enemies down to their security value.

Proof that all players using this algorithm is a Nash equilibrium: any coalition of players that decides to deviate (collectively or individually) cannot win total payoff greater than their group security value, by point 4. If they cooperate, they collectively get no less than their group security value, by superadditivity and construction of Shapley value.

(NB: we have tacitly assumed that all payoffs in the game are positive, so the Shapley value makes sense. If some payoffs are negative, give everyone a million dollars before the game and take them away afterward; both the Shapley value and the minimax survive such manipulations.)

In retrospect the result seems both obvious and startling. Obvious because it closely follows the historically original derivation of the Shapley value. Startling because we're dealing with a class of one-shot competitive games: players enter their programs blindly, striving to maximize only their own payoff. Yet all such games turn out to have Nash equilibria that are Pareto-optimal, and in pure strategies to boot. Pretty neat, huh?

I've seriously doubted whether to post this or not. But there might be mistakes, and many eyes will be more likely to spot them. Critique is welcome!

UPDATE 12.01.2011: benelliott found a stupid mistake in my result, so it's way less applicable than I'd thought. Ouch.