Why you should minimax in two-player zero-sum games

by Nisan1 min read17th May 20201 comment


Ω 3

World OptimizationRationality
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Disclaimer: I'm surely reinventing the wheel here.

Last week I wrote that I was dissatisfied with von Neumann and Morgenstern's argument for playing the minimax strategy in a two-player zero-sum game. Here's an argument I like:

We have two players who play strategies . is trying to maximize . Since it's a zero-sum game, is trying to minimize .

  • 's minimax strategy is
  • 's minimax strategy is
  • The minimax theorem gives us

We assume that believes that no matter what does, will be epistemically and instrumentally rational. Epistemic rationality means that 's factual beliefs are accurate — i.e., 's beliefs conditional on playing are accurate if is the strategy that actually plays; but 's beliefs conditional on playing other strategies (his properly counterfactual beliefs) are unconstrained. Instrumental rationality means that if believes an action is better than all other actions, then will take that action.

I claim that given the above assumption, it's instrumentally rational for to play minimax. Here's why:

  • believes that if he plays minimax, . This follows from the definitions of and .
  • If is epistemically rational, he knows the value of . If , then would be instrumentally irrational (because he knows he would be better off playing ). So if has both kinds of rationality, .
  • Since believes is epistemically and instrumentally rational, believes that , no matter what strategy plays.
  • Also, believes that if she plays minimax, . This follows from the definitions of and .
  • Thus believes that minimax is at least as good as any other strategy.

Note that can still play something other than minimax, as long as she's sure that .

This argument does not justify playing minimax in non-zero-sum games or games with more than 2 players. In those games it's not generally true that one player is trying to minimize another player's reward.

If you squint, this is basically the argument that von Neumann and Morgenstern use, with all the steps and assumptions written out. And if you formalize this argument, I guess you'd get Kripke semantics for games.


Ω 3