Now that we know a bit about derivatives, it's time to use them to find dominant strategies and Nash equilibria. It helps if the reader is familiar with Nash equilibria already.

Prisoner's dilemma

The payoff matrix of the Prisoner's dilemma can be as follows:

We can see that the payoff for Prisoner 1 depends on her own action (Cooperate/Defect) but also on the action of Prisoner 2. Therefore, the payoff function for Prisoner 1 is a multivariable function: , where  is the action of Prisoner  (and ). Let's say  when the action of Prisoner  is Cooperate, and  for Defect. So . Then , and crucially, . So for Defect (), Prisoner 1's payoff will be 0 higher than for Cooperate (), as can be confirmed in the table. Note that  doesn't show up in : Defect gives  more for Prisoner 1 regardless of what Prisoner 2 does, which makes Defect a dominant strategy. Don't get me wrong: Prisoner 1's payoff certainly does depend on what Prisoner 2 does. The point is that no matter what Prisoner 2 does, Prisoner 1's payoff will be $10 higher when she (Prisoner 1) defects - and that's what's reflected in .

Since the payoff matrix is symmetrical,  and . Prisoner 2 therefore also has a dominant strategy: Defect. The Prisoner's dilemma, then, has a Nash equilibrium: when both prisoners defect. With the partial derivatives, we demonstrated that when both prisoners defect, no one prisoner can do better by changing her action to Cooperate. If e.g. Prisoner 1 were to do this, then  would go from  to , and since , that would lower  (regardless of ). By symmetry, the same is true for Prisoner 2.

Nonlinear payoff functions

In the Prisoner's dilemma, the payoffs of both players (prisoners) can be modelled by linear payoff functions. What if the payoffs are nonlinear?

Let's say  and . Then  and . A Nash equilibrium is a point where no player can do better by doing another action given the action of the other player; therefore,  should be maximized with respect to  while keeping  constant, whereas  should be maximized with respect to  while keeping  constant. If  has a peak value with respect to  must be  in that point.  gives . So  could represent a peak, but also a valley, since  would be  in both. If . So  represents a local maximum in  (when  is held constant)! Since  is quadratic, we can be sure this local maximum is the global maximum too (so there are no values for  for which  is higher when  is held constant).

 gives  and , so  again represents a local maximum.  is quadratic, so this is a global maximum as well.

So  represents a global maximum for  (for a constant ), and represents a global maximum for  (for a constant ). That means  is a dominant strategy for player 1,  is a dominant strategy for player 2 and we have a Nash equilibrium in .

Making things a bit more complicated

Let's now define  and . Then for , we have , which is negative when .

For  we have , so this is a local optimum - and also the global one, since  is quadratic. For . Solving for  gives  (which we found earlier as well). And since  and therefore , we now have a local maximum for ! For a constant  is quadratic, so this is the global maximum as well. We found a Nash equilibrium: .


New Comment

New to LessWrong?