Freaky Fairness

2MrHen

3cousin_it

3MrHen

2cousin_it

0MrHen

0Wei Dai

0cousin_it

0Wei Dai

0cousin_it

0Wei Dai

1cousin_it

0Wei Dai

1cousin_it

1Wei Dai

0Psychohistorian

1SoullessAutomaton

1cousin_it

0Vladimir_Nesov

0cousin_it

-1cousin_it

1Wei Dai

1Eliezer Yudkowsky

2Wei Dai

6Eliezer Yudkowsky

12cousin_it

14CronoDAS

4Jonii

3Jonii

1cousin_it

0Wei Dai

1Eliezer Yudkowsky

0Wei Dai

0Eliezer Yudkowsky

0Vladimir_Nesov

0Eliezer Yudkowsky

0cousin_it

New Comment

36 comments, sorted by Click to highlight new comments since: Today at 4:05 PM

It's not an equilibrium if players can increase their profit by unilaterally switching strategies. This game might have a nicer solution that doesn't punish so much, but I haven't found it.

I'd like to see more results in this the vein. For example, what if there is asymmetric information? Can you still get Pareto optimality? What about non-transferable utilities? Thanks for writing up your results as you go. It's fun to follow along.

I'm also curious about Vladimir Nesov's current approach, which he alluded to earlier as trying to understand the properties of agents as processes (or something like that). That also sounds intriguing and I'd like to learn more.

Thanks for the encouragement! Although if people started solving problems in a similar vein and wrote them up, that would be nicer still. :-)

The NTU case doesn't look clear-cut at all, thanks to the PDFs you gave me; most likely there will be multiple Pareto-optimal equilibria with no clear way to tell which is "better". The general case of incomplete information also looks bewilderingly hard; I don't even understand what "equilibrium" means in this context. If I work something out, I will surely post it.

Also very curious about Nesov's approach, but he has told me mutiple times that there's nothing to write up mathematically yet.

The general case of incomplete information also looks bewilderingly hard; I don't even understand what "equilibrium" means in this context.

The Bayesian Equilibrium is usually used in non-cooperative games with incomplete information. (I'm not sure if that's what you're asking.)

BTW, I think another way to formulate this problem, which should be equivalent but might be easier to think about, is instead of the AIs knowing each other's source code and detecting friends and enemies, all the AIs just agree to merge into a single AI running the fair decision algorithm. I had some very preliminary ideas on this which I wrote about on SL4, which I now recognize as trying to reinvent parts of cooperative game theory.

Also very curious about Nesov's approach, but he has told me mutiple times that there's nothing to write up mathematically yet.

What he wrote before led me to look into cooperative game theory, so I'm very curious about his current approach, even if the results aren't ready yet.

Ah, you mean this. Okay.

If you're thinking about alternate formulations, it might interest you that programs that know each other's source code can use the quining trick to emulate any mutually-agreed rule for resolving conflicts, e.g. "play a game of chess to decide who gets the money". (Quining ensures that both players' internal representations of other players move the chess pieces correctly.) More generally, they can agree on any common algorithm that takes input strings supplied by individual players (e.g. chess-playing strategies). I guess this counts as "merging"? Another application is generating common cryptographically secure random numbers. This stuff is almost *too* powerful.

The set of things you can do with "knowing each other's source code" seems to be the same as what you can do with "agreeing to merge together". And the underlying mechanisms necessary for implementing them seem to be the same as well.

In order for two AIs meeting for the first time to prove their source code to each other, they each need to reconstruct itself from scratch while being monitored by the other. Similarly, merging is jointly constructing a new entity and then transferring all resources to it. In both cases the necessary enabling technology might be called secure joint construction. (I just made that phrase up. Is there's a better term?)

It would help if you gave an example of two simple programs that undergo "secure joint construction" in some formal setting and without any source-code equality tricks. For example, assume that initially they both implement algorithms that want to agree but this fact is Rice-unknowable. This stuff isn't obvious to me. It would be progress.

Well, AI 1 sends a proposal for a joint decision algorithm to AI 2, then AI 2 sends a counter-proposal and they bargain. They do this until they agree on the joint decision algorithm. They then jointly build a new AI and each monitors the construction process to ensure that the new AI really implements the joint decision algorithm that they agreed on. Finally they each transfer all resources to the new AI and shut down.

Does that answer your question, or were you asking something else?

I was asking for a rigorous model of that: some controlled tournament setting that you could implement in Python today, and two small programs you could implement in Python today that would do what you mean. Surely the hard AI issues shouldn't pose a problem because you can always hardcode any "understanding" that goes on, e.g. "this here incoming packet says that I should do X". At least that's my direction of inquiry, because I'm too scared of making hard-to-notice mistakes when handwaving about such things.

In case anyone is wondering what happened to this conversation, cousin_it and I took it offline. I'll try to write up a summary of that and my current thoughts, but in the mean time here's what I wrote as the initial reply:

Ok, I'll try. Assume there is a "joint secure construction service" which takes as input a string from each player, and constructs one machine for each distinct string that it receives, using that string as its program. Each player then has the option to transfer its "resources" to the constructed machine, in which case the machine will then play all of the player's moves. Then the machines and any players who chose to not transfer "resources" will play the base game (which let's say is PD).

The Nash equilibrium for this game should be for all players to choose a common program and then choose to transfer resources. That program plays "cooperate" if every did that, otherwise it plays "defect".

I choose a row and receive the first payoff in a pair, you choose a column and receive the second payoff.

This threw me off slightly; I believe the point was:

I choose a row, and you choose a column, so together we choose a specific pair; the first chooser thus gives the second chooser two options, and the second chooser picks the pair that is the outcome. I get the first payoff in that pair, and you get the second payoff in that pair.

Could be better written, but at least it is unambiguous, I think. It's also unclear if who chooses second is undetermined or predetermined; that it is undetermined seems vital for your conclusion.

Also, it took me a minute to figure out the epsilons are simply very small amounts (if they are, as I'm still not entirely sure). Unless I'm just unfamiliar with common terminology, your clarity would benefit if you mentioned this up front, or just used 3.01, 2.01, or just used 3+ϵ, 2+ϵ and mention ϵ is tiny, as the subscripts make them look like variables. Unless they are, in which case I just plain missed that.

Not to nitpick, but I figure if I got thrown off and had to reread a couple times, other people may not bother with those couple times.

Lowercase epsilon actually is a fairly standard notation in mathematics, particularly calculus#Formal_definition), representing an arbitrarily small value.

This is the kind of issue that makes me skeptical about FAI. We can teach a decision algorithm to an AI, but can we teach the AI how to think about decision algorithms? How to decide whether one concept of fairness is better than another? Whether one interpretation of probability theory is better than another? Whether one Bayesian prior is better than another? To see the obviousness that coding conventions shouldn't matter?

If we can't, if we have to hard code all these answers into the AI, then that's hardly safe, since it seems unlikely that we (i.e. ordinary humans) can achieve a high confidence in all of the answers. But as far as I know, no one has attempted or even proposed to formalize the process of *philosophy* as an algorithm.

What I'm saying is that if you can get the AI to choose between decision algorithms, then you can code a decision-algorithm-chooser that searches through many decision algorithms and picks the best one, instead of having to do all that work yourself. Like the difference between the amount of work put in by human chessplayers to build up the human skill of chess over centuries, versus the effort of the programmers to build Deep Blue to search the game tree for good strategies. It's *conceptually* much harder work to find the source of the source of the source and build the seed of the seed of the seed, it's more *confusing* labor, but if you don't have a few centuries to spare, you had better look for the generator and not try to duplicate the output.

This merits a long response, please bear with me...

Deep Blue wasn't built on the single insight of searching the game tree. It incorporates a whole lot of grandmaster-level advice from humans. Computers did not (and AFAIK cannot yet) regenerate the opening book. Also, perhaps more relevantly to AI, the *evaluation function* for positions is mostly designed by human master players, not inferred all the way backward from winning and losing positions.

One important insight like the minimax tree rarely takes you all the way to victory. We nibble away at the problem, taking out small digestible chunks. Game theory (or Bayesian statistics, or any other insight as of now) doesn't explain all of human behavior, but it's a first step without which further steps would've been that much harder.

See, I have this persistent idea - maybe stemming from my math upbringing - that the way to advancement of knowledge leads through *small problems conclusively solved*. That as beliefs should pay rent in predictions, so directions of inquiry should pay rent in small problems conclusively solved, building on one another. Failing to meet this standard is a pattern of FAIL that I've seen many, many times, and no doubt so did you. And even if you see your quest as reductionism, searching for the small computational core that would suffice to regenerate game theory and insights beyond, you'd still do well to heed the pattern.

A long quote from my textbook on game theory, Ken Binmore's "Fun and Games":

People who do not understand how human knowledge is advanced are often scornful of such claims. What is the point, they say, of studying a silly parlor game like Poker if the aim is to gain insight into the broad sweep of human affairs? I like to characterize such folk as

naive positivists. They have seldom received any serious mathematical training. Otherwise they would have learned that difficult problems do not commonly succumb to frontal attacks. One has to lay siege to them. In graduate school, mathematicians are taught to look at simpler versions of their problems first. Examples help to rid the mind of misconceptions and clarify what the difficulties are. Such a search for insight may take one far afield. A political scientist may end up studying the Prisoners' Dilemma. Economists may look at the literature on Poker models. Naive positivists will think their labors ridiculous. But this is because naive positivists do not understand that nobody ever solved a difficult problem who had not learned to solve simple problems first.

Maybe I should've made it a toplevel post, but it sounds too much like dancing about architecture. Better leave it in a comment and think of something mathy to post instead :-)

Interestingly, recent advances in computer Go have come from *ignoring* much human expertise and incorporating Monte Carlo approaches into the evaluation function.

It works like this:

- 1) Choose the move to be evaluated.
- 2a) Search the game tree by having players make random moves for many turns after the initial move is made.
- 2b) Score each randomly generated "far future" game state by using a very simple evaluation function, such as counting the number of stones on the board of each color.
- 3) Repeat steps 2a and 2b lots of times, generating lots of randomly played games.
- 4) Average the score of the randomly generated games. The result is the score for the move.

Very little human knowledge is incorporated into this method, but it has produced some of the strongest computer Go players so far.

Reason for the success of this method is propably something not all-that-interesting. The random sampling and evaluation method works because it allows using simple brute force and random chance against a problem that computer can't (yet) handle otherwise. I am dan-player myself, and I have watched games where supercomputers with monte carlo -method and go professionals have dueled, and as far as I can tell, the go program makes repeated weird moves that seem to be doing something(hyperactive agent detector), but after a while, original idea is lost and the move becomes bad. I'd estimate that this defect could easily be used against the bot, but even with just good, honest moves, professionals did good against the bot.

My opinion, that I think is widely accepted(not sure though, haven't been following this much lately), is that Monte Carlo method won't produce the programs that beat humans. Reason it's doing so well is not because the method is awesome, but because good go-players use so much of the potential of human brain, encorporating pattern-detection, strategy-planning, many different level of abstractions etcetc(all this *effectively*. That's important point, I think), that our current know-how simply can't match that at all. Which is why bots that try to simulate humans end up losing to moderately experienced(1 year) players even with multiple handicap stones.

Gonna edit/remake that post to make it clearer, but the point anyway was: go is difficult, current go programs do very poorly against even moderately skilled players, so *even* random sampling beats those attempts. But random sampling is not going to take go programs past human players, and I believe this hype settles as soon as some better decision-making algorhitms are invented.

It's not clear to me that the decision-algorithm-chooser problem is more compact than the decision-algorithm problem. I think the right decision algorithm (or at least an AIXI-style idealized version of it) may be pretty short, but to choose between decision algorithms seems to involve most areas of philosophy.

ETA: Do you have the intuition that philosophy can be formalized, and in a compact way?

It's not clear to me that the decision-algorithm-chooser problem is more compact than the decision-algorithm problem.

Perhaps they're not *different* problems. If we solve the decision problem and then the algorithm just replicates itself when we tell it to decide between decision algorithms, this suggests that the philosophy "behind" was actually not behind at all. But one way or the other, you don't want to be stuck with the problem of trying to choose according to a *higher* criterion using only human wisdom. A quining algorithm could plausibly represent having hit bottom. But if there are leftover dilemmas then you haven't hit bottom. Then you want to step *back* and try to compact further.

ETA: Do you have the intuition that philosophy can be formalized, and in a compact way?

I have an intuition that the *functional* parts can be formalized in a compact way, and that the rest is our own confusion.

Perhaps they're not different problems. If we solve the decision problem and then the algorithm just replicates itself when we tell it to decide between decision algorithms, this suggests that the philosophy "behind" was actually not behind at all. But one way or the other, you don't want to be stuck with the problem of trying to choose according to a higher criterion using only human wisdom.

The set of quining algorithms is infinite, right? Restricting to the set of quining algorithms rules out many pathological cases, but using human wisdom to choose between them is still unavoidable.

But if there are leftover dilemmas then you haven't hit bottom. Then you want to step back and try to compact further.

How will you know when there are really no leftover dilemmas? In other words, how can you tell that a candidate quining decision algorithm is free of errors? For example, you once thought that it was harmless for a decision algorithm to use the Solomonoff prior (which assumes that the universe is computable). (I hope I've convinced you that's an error.) Such a decision algorithm can certainly be quining, so don't we need a way to prevent this type of error in the future?

I have an intuition that the functional parts can be formalized in a compact way, and that the rest is our own confusion.

Human philosophy is error prone, but also error tolerant and correcting in a way that no decision algorithm that anyone has ever proposed is. If you give a typical decision algorithm the wrong prior, utility function, or implicit theory of fairness, it will happily go on and do its thing without complaint. Restricting to the set of quining algorithms doesn't solve this problem. Human beings can somehow sense such errors and try to correct them. I think this ability needs to be understood and programmed into an AI before it can be considered safe.

These are old questions, and certainly they'll need to be answered. There is no running away from having a good theory that explains why the take-off process can be trusted to do the right thing. Ideally, this theory will be as clear-cut as classical mechanics.

The data is in the process of humanity, but so is the right way of interpreting it. How much do you need to specify by hand to launch the process of extracting preference of humanity formally, that knows what to look for, and is able to find enough to continue looking for more? It'd take at least a technical theory of preference generalized to non-formal agents (of which the humanity is an instance) to answer that question.

Consider this game:

where the last payoff pair is very close to (3,2). I choose a row and you choose a column simultaneously, then I receive the first payoff in a pair and you receive the second. The game has no Nash equilibria in pure strategies, but that's beside the point right now because we drop the competitive setting and go all cooperative: all payoffs are in dollars and transferable, and we're allowed beforehand to sign a mutually binding contract about the play and the division of revenue. The question is, how much shall we win and how should we split it?

Game theory suggests we should convert the competitive game to a coalitional game and compute the Shapley value to divide the spoils. (Or some other solution concept, like the "nucleolus", but let's not go there. Assume for now that the Shapley value is "fair".) The first step is to assign a payoff to each of the 2

^{N}= 4 possible coalitions. Clearly, empty coalitions should receive 0, and the grand coalition (me and you) gets the maximum possible sum: 6 dollars. But what payoffs should we assign to the coalition of me and the coalition of you?Now, there are at least two conflicting approaches to doing this: alpha and beta. The alpha approach says that "the value a coalition can get by itself" is its security value, i.e. the highest value it can win guaranteed if it chooses the strategy first. My alpha value is 2, and yours is 2+ϵ

_{2}. The beta approach says that "the value a coalition can get by itself" is the highest value that it cannot bepreventedfrom winning if it chooses its strategysecond. My beta value is 3+ϵ_{1}, and yours is 3.Astute readers already see the kicker: the Shapley value computed from alphas assigns 3-ϵ

_{2}/2 dollars to me and 3+ϵ_{2}/2 dollars to you. The Shapley value of betas does the opposite for ϵ_{1}. So who owes whom a penny?That's disturbing.

Aha, you say. We should have considered mixed strategies when computing alpha and beta values! In fact, if we do so, we'll find that my alpha value equals my beta value and your alpha equals your beta, because that's true for games with mixed strategies in general (a result equivalent to the minimax theorem). My security value is (10+4ϵ

_{1})/(4+ϵ_{1}), and yours is (10-ϵ_{2})/(4-ϵ_{2}).This still means the signs of the epsilons determine who owes whom a penny. That's funny because, if you plot the game's payoffs, you will see that the game isn't a quadrilateral like the PD; it's a triangle. And the point (3+ϵ

_{1},2+ϵ_{2}) that determines the outcome, the point that we can ever-so-slightly wiggle to change who of us gets more money...lies inside that triangle. It can be reached by a weighted combination of the other three outcomes.That's disturbing too.

...

Now, this whole rambling series of posts was spurred by Eliezer's offhand remark about "AIs with knowledge of each other's source code". I formalize the problem thus: all players simultaneously submit programs that will receive everyone else's source code as input and print strategy choices for the game as output. The challenge is to write a good program without running into the halting problem, Rice's theorem or other obstacles.

Without further ado I generalize the procedure described above and present to you an algorithm Freaky Fairness — implementable in an ordinary programming language like Python — that achieves a Nash equilibrium in algorithms and a Pareto optimum

simultaneouslyinanyN-player game with transferable utility:Proof that all players using this algorithm is a Nash equilibrium: any coalition of players that decides to deviate (collectively or individually) cannot win total payoff greater than their group security value, by point 4. If they cooperate, they collectively get no less than their group security value, by superadditivity and construction of Shapley value.

(NB: we have tacitly assumed that all payoffs in the game are positive, so the Shapley value makes sense. If some payoffs are negative, give everyone a million dollars before the game and take them away afterward; both the Shapley value and the minimax survive such manipulations.)

In retrospect the result seems both obvious and startling. Obvious because it closely follows the historically original derivation of the Shapley value. Startling because we're dealing with a class of

one-shot competitive games: players enter their programs blindly, striving to maximize only their own payoff. Yet all such games turn out to have Nash equilibria that are Pareto-optimal, and in pure strategies to boot. Pretty neat, huh?I've seriously doubted whether to post this or not. But there might be mistakes, and many eyes will be more likely to spot them. Critique is welcome!

UPDATE 12.01.2011:benelliott found a stupid mistake in my result, so it's way less applicable than I'd thought. Ouch.