I’d like an easy way to distinguish the behavior of payoff-maximizing players who would or would not play a strategy in an extensive form game that deviates from the subgame-perfect (Bayesian) equilibrium strategy profile of the game (when their strategy is known to their opponent, and their opponent is also payoff-maximizing).

Example

An example of what I’m interested in can be seen in an ultimatum game where a proposer presents a responder with an offer of the form , where  and  for  are the respective payoffs of the proposer and the responder.

For now, let’s call a strategy subgame-optimal if for every subgame of the game, the strategy’s restriction to that subgame is still optimal within the subgame. In other words, at each decision node of the game, a payoff-maximizing player with a subgame-optimal strategy chooses the action which maximizes their expected payoff (as calculated at that decision node, rather than as calculated according to their prior). A payoff-maximizing player who can commit to a subgame-suboptimal strategy will play the action which maximizes the payoff they expected at their initial decision node, without the need to play a strategy that holds up to backward induction.

Say that the responder’s strategy is known to the proposer ahead of time, and the proposer is restricted to only subgame-optimal strategies. What strategy should the responder use? A (subgame-suboptimal) strategy to reject all proposals with a > ε for arbitrary ε would force the proposer to make an offer arbitrarily in the responder’s favor. But this is impossible for a responder who can only play subgame-optimal strategies. Updateless agents don’t have to worry about this problem, but updateful agents without commitment devices or values other than payoff maximization do.

Possible terms for this distinction

  • Ex interim or ex post rationality vs. ex ante rationality
    I like the clarity of focusing on from what perspective the expected payoff of an action is being calculated. A big advantage to these terms is that they are in use to some extent. For example, Posner (1997) classifies retaliation in a non-repeated interaction as ex ante rational but ex post irrational. On the other hand, these terms might sound overly jargon-y or be confused with different uses of “ex ante” and “ex interim/post”. I prefer “ex interim rational” to “ex post rational” because in games of imperfect information, it sounds to me like the ex post strategy is what the player should have played with perfect information. But in games of perfect information, people would be more likely to understand what ex post means than ex interim.
     
  • Updateful rationality vs. updateless rationality
    I’ve talked about situations like this before by reference to what a UDT agent would do / what a CDT agent would do, which is probably most clear for people who have thought about updateless decision theory, but I think it might be useful to have a term that’s not tied to updateless decision theory for this. Rejecting suboptimal offers in the ultimatum game described above is not only the domain of updateless agents—for example, a CDT agent with a commitment device would do so as well.
     
  • Episodic rationality vs. serial rationality
    I like this one, but it may be confused with rational play in one-shot vs. iterated games. “Serial rationality” seems to have been used a little in discussing Sartre, but this doesn’t seem to be a serious barrier to use.
     
  • Subgame-perfect rationality vs. subgame-imperfect rationality
    I think this gets very close to capturing what I want here while using some kinda standard terminology, but it seems very jargon-y and is easily mixed up for people less familiar with game theory. Also, subgame-perfection seems to be pretty much only used to discuss a profile of both players’ strategies, rather than the strategy of a particular agent, so maybe its extension to cover something else here is confusing.
     
  • Subgame-optimal rationality vs. subgame-suboptimal rationality
    Similar to the above, but the meaning of “optimal” might be a little more obvious than that of “perfect”, and it can be more concise. For example, it is briefer to write that an agent’s strategy is “subgame-optimal” than that their strategy is “subgame-imperfect rational”. Having “suboptimal” in the name might also create the impression that I think subgame-suboptimal rational play is bad, which is not the case.
     
  • Sequential rationality vs. non-sequential rationality
    “Sequential rationality” is an extension of subgame-perfection to games of incomplete information, prescribing that actions at a decision node X be optimal in an expectation calculated from using the information set associated with that node. I like this a little better than "subgame-(im)perfect rationality" or "subgame-optimal" rationality.
     
  • Action optimization vs. policy optimization
    This might be a little confusing, since in some contexts an agent that can only play subgame-optimal policies is still held to be selecting an “optimal” policy/strategy, just pursuant to its constraint on subgame-optimality. Maybe "action-based" vs. "policy-based" rationality would also work? 
     
  • Myopic rationality vs. non-myopic rationality
    This is out I think, since it would be confused with other uses of myopia, but an agent who is restricted to subgame-optimal play is myopic in the sense of having only partial agency.  
     
  • Backward-inductive rationality vs. non-backward inductive rationality
    I think this shares most of its pros and cons with “subgame-(im)perfect rationality”.

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

Dagon

Jun 30, 2022

20

A few VERY CONCRETE examples would help a lot.  I can't tell if you're just talking about holistic-optimization, where there are carryovers or correlation between subgames, so deviation from optimal in early games gives you better future subgames.  Or whether you're talking about more decision-theory variance, based on opponent's predictive power and "who goes first, logically" questions.

In the situations commonly discussed around here, "proposer" and "responder" are misleading terms, because the whole problem is that the sequence of events doesn't match the sequence of decisions - once you introduce precommitment and prediction, you've messed with causality in a way that mixes up the terms.  It's not clear that "optimal" even belongs in the term for this, so perhaps "strategically sub-optimal" or the like might work.

It's still the case that the long run is a strict sum of short runs, and "holistic" is the term I'd use for decision theories that include all effects of decisions, not just the ones in an identified subgame.