A case for fairness-enforcing irrational behavior — LessWrong

x

A case for fairness-enforcing irrational behavior — LessWrong

There's a long-standing and possibly unsolvable puzzle about how AIs should behave in game-theoretic situations with each other. The simplest example is the Ultimatum Game, where player A proposes how a dollar should be split between A and B, and B either accepts or rejects. In case of rejection both A and B get nothing. There are many Nash equilibria, one for each possible split, making the game indeterminate.

You can put all kinds of complexities on top of the game, like making A and B computer programs that can make conclusions about each other, but the essential nature of indeterminacy remains: the players have to pick a point on the Pareto frontier, their interests being directly opposed, making it a tug-of-war. The game is so simple that any complicated analysis seems almost hopeless.

However, when people play this game in reality, it seems that they bring in other considerations, not just choose what's best for them. The person being offered 20% of the pot will often reject. The reason for such behavior seems to come from a notion of fairness.

This points the way to how AIs could solve the puzzle as well. Imagine you're an AI forced to play some complicated ultimatum-type game with another AI. Then you could ignore the strategic picture of the game entirely, and focus only on what outcome seems "fair", in the sense that you and the other player get about equal amount of whuffies (however understood). And if the other player offers you an unfair deal, you could "flip the table" and make them get nothing, even at cost to you. As long as the "flip the table" option is available to you, this seems a viable approach.

Maybe this is a very simple idea, but it flips my understanding of game-theoretic situations on its head. Until today I thought that the game matrix, what actions are available to players, was the important part. And things like each player's utility scaling were merely afterthoughts. But under the "fairness" view, the figure and the ground invert. Now we care only about comparing the players' utilities, making sure everyone gets roughly equal amount of whuffies. The particular strategic details of each game matter less: as long as each player has access to a "flip the table" strategy, and is willing to use that strategy irrationally when the outcome seems unfair, that's enough.

Of course this can fail if the two players have incompatible views on fairness. For example, if player A thinks "taller people should get more food" and player B thinks "heavier people should get more food", and A is taller but B is heavier, the result is a food fight. So the focus switches even deeper: we no longer think about rational behavior in games, nor about fairness according to players, but about the processes that give rise to notions of fairness in players, and how to make these processes give compatible results.

What would that mean for negotiation between AIs in practice? Let's say a human-built AI travelling between the stars meets an alien AI, and they end up in an ultimatum-type situation regarding the fate of the entire universe. And further, imagine that the alien AI has the upper hand. But the human AI can still be coded to act like this:

Does the situation contain another agent getting whuffies from it?
Is the other agent acting as to give unfairly high whuffies to itself and unfairly low whuffies to me?
Do I have access to a "flip the table" action, denying whuffies to the other agent even at cost to myself?
If yes, take it!

Note that this is technically irrational. If the alien AI came with a precommitment of its own saying "demand all whuffies no matter what", the rational thing would be for us to accept, yet we still reject. I think however that this approach has a nice quality to it: it cuts short the arms race. We could have everyone in the universe spending time to make their AIs better at ultimatum tug-of-wars; or we could make an "irrational" AI that simply goes for fairness no matter what, then there's no incentive for others to build better strategies, and the outcome ends up alright for everyone.