Observe the payoff matrix at right (the unit of reward? Cookies.). Each player wants to play 'A', but only so long as the two players play different moves.
Suppose that Red got to move first. There are some games where moving first is terrible - take Rock Paper Scissors for example. But in this game, moving first is great, because you get to narrow down your opponent's options! If Red goes first, Red picks 'A', and then Blue has to pick 'B' to get a cookie.
This is basically kidnapping. Red has taken all three cookies hostage, and nobody gets any cookies unless Blue agrees to Red's demands for two cookies. Whoever gets to move first plays the kidnapper, and the other player has to decide whether to accede to their ransom demand in exchange for a cookie.
What if neither player gets to move before the other, but instead they have their moves revealed at the same time?
Red: "I'm going to pick A, you'd better pick B."
Blue: "I don't care what you pick, I'm picking A. You can pick A too if you really want to get 0 cookies."
Red: "Okay I'm really seriously going to pick A. Please pick B."
Blue: "Nah, don't think so. I'll just pick A. You should just pick B."
And so on. They are now playing a game of Chicken. Whoever swerves first is worse off, but if neither of them give in, they crash into each other and die and get no cookies.
So, The Question: is it better to play A, or to play B?
This is definitely a trick question, but it can't be too trickish because at some point Red and Blue will have to figure out what to do. So why is it a trick question?
Because this is a two-player game, and whether it's good to play A or not depends on what your opponent will do.
A thought experiment: suppose we threw a party where you could only get dessert (cookies!) by playing this game. At the start, people are unfamiliar with the game, but they recognize that A has higher payoffs than B, so they all pick A all the time. But alas! When both people pick A, neither get anything, so no cookies are distributed. We decide that everyone can play as much as they want until we run out of cookies.
Quite soon, one kind soul decides that they will play B, even though it has a lower payoff. A new round of games is begun, and each person gets a turn to play against our kind altruist. Soon, each other person has won their game, and they have 2 cookies each, while our selfless altruist has just one cookie per match they played. So, er, 11 or so cookies?
Many of the other party-goers are enlightened by this example. They, too, want to be selfless and altruistic so that they can acquire 11 cookies / win at kidnapping. But a funny thing happens as each additional person plays B - the people playing A win two more cookies per round (one round is everyone getting to play everyone else once), and the people playing B win one fewer cookie, since nobody gets cookies when both play B either. Eventually, there are eight people playing A and four people playing B, and all of them nom 8 cookies per round.
It's inevitable that the people playing B eventually get the same number of cookies as the people playing A - if there was a cookie imbalance, then people would switch to the better strategy until cookies were balanced again. Playing A has a higher payoff, but all that really means is that there are eight people playing A and only 4 playing B. It's like B has an ecological niche, and that niche is only of a certain size.
What does the party case say about what Red and Blue should do when playing a one-shot game? The ratios of players turn into probabilities: if you're less than 67% sure the other person will play A, you should play A. If you're more than 67% sure, you should play B. This plan only works for situations similar to drawing an opponent out of a pool of deterministic players, though.
Stage two of the problem: what if we allow players access to each others' source code?
While you can still have A-players and B-players, you can now have a third strategy, which is to play B against A-players and play A against B-players. This strategy will have a niche size in between playing A and playing B.
What's really great about reading source code, though, is that running into a copy of yourself no longer means duplicate moves and no cookies. The best "A-players" and "B-players" now choose moves against their copies by flipping coins, so that half the time they get at least one cookie. Flipping a coin against a copy of yourself averages 3/4 of a cookie, which is almost good enough to put B-players out of business. In fact, if we'd chosen our payoff matrix to have a bigger reward for playing A, we actually could put B-players out of business. Fun question: is it possible to decrease the total number of cookies won by increasing the reward for playing A?
An interesting issue is how this modification changes the advice for the one-shot game. Our advice against simpler opponents was basically the "predictor" strategy, but that strategy is now in equilibrium with the other two! Good advice now is more like a meta-strategy. If the opponent is likely to be an A-player or a B-player, be a predictor, if the opponent is likely to be a predictor, be an A-player. Now that we've been this cycle before, it should be clearer that this "advice" is really a new strategy that will be introduced when we take the game one meta-level up. The effect on the game is really to introduce gradations of players, where some play A more often and some play B more often, but the populations can be balanced such that each player gets the same average reward.
An interesting facet of a competition between predictors is what we might call "stupidity envy" (See ASP
). If we use the straightforward algorithm for our predictors (simulate what the opponent will do, then choose the best strategy), then a dumb predictor cannot predict the move of a smarter predictor. This is because the smarter predictor is predicting the dumb one, and you can't predict yourself in less time than you take to run. So the dumber predictor has to use some kind of default move. If its default move is A, then the smarter predictor has no good choice but to take B, and the dumber predictor wins.
It's like the dumber predictor has gotten to move first. Being dumb / moving first isn't always good - imagine having to move first in rock paper scissors - but in games where moving first is better, and even a dumb predictor can see why, it's better to be the dumber predictor.
On our other axis of smartness, though, the "meta-level," more meta usually produces better head-to-head results - yet the humble A-player gets the best results of all. It's only the fact that A-players do poorly against other A-players that allows a diverse ecology on the B-playing side of the spectrum.