One of the cottage industries of our community is the construction of new decision theories. I think our community is also in agreement about how to compare decision theories: the rational way to make decisions is the one that best enables us accomplish our goals.

In this post I'll describe a benchmark that frames our choice of decision theory as a strategic one, where outcomes depend on the decision theories chosen by many agents that are all trying to accomplish their own goals. Our decision theory should lead to good results if adopted universally, but also be unexploitable by agents that choose different decision theories.

Meta-Games

There is a particular type of game that I think will serve well as a benchmark for decision theories. I'll call them Meta-Games, which have Meta-Players. Each Meta-Player chooses a decision theory , and then a random game  is chosen to be played. Each player  is then randomly assigned to a decision theory chosen by a Meta-Player. The outcome of the game is then passed back to the Meta-Players as feedback for how well they did.

The number of Meta-Players doesn't need to match the number of Players in any particular game. Unless otherwise specified, we can imagine that each Meta-Player writes their decision theory on a piece of paper, and that each Player is assigned a decision theory by drawing pieces of paper, replacing after each draw. (Decision theories are IID, according to the distribution of decision theories chosen by the Meta-Players.)

When there is only one Meta-Player, we are studying "what decision theories lead to the best results if universalized?" That is, if every player in every game acted according to the same decision theory. "Leading to good results if universalized" seems like an important quality for any decision theory we'd actually want to widely advocate that people use.

One big reason to study this sort of Meta-Game is that we can bring in all of the conceptual machinery from game theory and start applying them to decision theories. Does one decision theory dominate another on a certain class of games? Can a new decision theory be better adapted to an environment and become more and more represented in the population?

Superrationality

In general, I think we're going to be most interested in symmetric Meta-Games; that is, where swapping the roles of two Meta-Players doesn't change the strategic situation that either one faces. One reason for this is to import our existing intuitions about superrationality. I want to frame superrationality as having two parts:

  • Decision-makers facing symmetrical decisions are likely to make symmetrical choices
  • They should take that into account when making their decision.

In the context of a Meta-Game, superrationality cashes out as "whatever decision theory I choose, the other Meta-Players will probably choose too. So I should adopt a decision theory which leads to outcomes I like if everyone were to use it." I think this formulation is equivalent to Kant's categorical imperative: "Choose as if the way you make choices were to be universalized."

I want to emphasize that the first part of superrationality is an empirical claim about the correlation between choices made by different agents. The strength and decision-relevance of this correlation depends on what is concretely happening in the world to bring it about.

I also want to note that Hofstadter and Kant both appeal to instrumental rationality for their notion of "should" when they say that actors "should" choose as if other actors will choose in a symmetrical way when facing a symmetrical decision. That's what makes it a categorical imperative, one that applies to every agent no matter what their goals are, rather than a hypothetical imperative like "if your goal is to act morally, this is what you should do."

On this view, a lot of morality falls out of simply trying to achieve one's own goals, in a world where other people's decisions are correlated with yours. One shouldn't Defect in the Prisoners' Dilemma for the same reason one shouldn't lie or cheat or steal or break promises: because other people will probably choose similarly when they face the same decision, and it's better for you personally if everyone refrains from those actions.

New to LessWrong?

New Comment