Best-Responding Is Not Always the Best Response

StrivingForLegibility

Delegative Decision Theory

Best-Responding Is Not Always the Best Response

by StrivingForLegibility

3 min read4th Jan 2024No comments

10

Dath IlanDecision TheoryFairnessGame TheoryOpen Source Game TheoryAIRationality

Frontpage

Suppose that you are designing a software system to bargain on your behalf. Either side can walk away from negotiations, and we'll use this as our baseline for each side receiving $0 of the possible gains from working together. How should such a system handle a case where it receives a take-it-or-leave-it offer of $0.01 for you, and $99.99 for the proposer? And how can we generalize a solution to bargaining problems to achieve socially optimal outcomes in a wide range of strategic contexts?

Classical Analysis

The classical game theory analysis is as follows: Your system has two options at this point in the game: Accept or Reject. Accept leads to a payoff of $0.01, Reject leads to a payoff of $0.00. A penny is better than nothing, so Accepting leads to a higher payoff for you than Rejecting. A classically-rational agent Accepts an offer of $0.01 for itself, and $99.99 for the proposer.

What Went Wrong

Quick terminology review: A system's policy defines how it behaves in any situation. A subgame is part of a game, where some stuff has already happened. A non-credible threat is a policy which calls for an agent to pay costs in order to impose costs on another agent. (Pay costs in the sense of "take an action in a subgame which leads to less payoff for that agent, than they could get in that subgame by unilaterally doing something else.") A Nash equilibrium is a state of affairs where no agent can do better for themselves by unilaterally changing their policy. In other words, a Nash equilibrium is a mutual best-response.

In the classical analysis, making a non-credible threat can be part of a Nash equilibrium. Actually carrying out a non-credible threat can't be part of any Nash equilibrium. This is because making a non-credible threat is free, while actually carrying out a non-credible threat is costly. An agent which finds themselves in a position to carry out their non-credible threat has a local incentive to not follow through. And classically-rational agents always follow their local incentives.

I'm using local incentives here to refer to "the payoffs of the subgame in which an agent currently finds themselves." An agent may be able to do better for themselves globally by acting according to a policy which involves acting against their incentives locally. Classically-rational agents employ Causal Decision Theory, which ignores the counterfactual branches of the games they play. This deficit was one of the problems that motivated Wei Dai to develop Updateless Decision Theory.

Doing Better

Eliezer Yudkowsky's dath ilan has reportedly solved game theory. dath ilan-rational agents employ the Algorithm, a perfect gem of a decision theory which is spotlighted by many common-sense criteria which this margin is too small to contain. When dath ilani attempt to split the gains from trade, the Algorithm prescribes that they each independently evaluate what would constitute a fair split. If time permits, they talk about what they think is fair, and attempt to find common ground at the level of "what constitutes a fair split of the benefits from our cooperation?"

... such that the problem of arriving at negotiated prices is locally incentivized to become the problem of finding a symmetrical Schelling point.

At the end of negotiation, there is a final offer, and a final decision about whether to accept that offer. The Algorithm prescribes accepting any offer which gives you at least what you think is fair, and probabilistically rejecting unfair offers. The probability of rejection should be high enough to eliminate the offerer's expected gains from offering an unfair split. But not too much higher than that. Reasonable people might disagree about what is fair, but each party should shape their policy so that neither has an incentive to exaggerate what they think is fair.

I want to highlight that this Algorithm calls for the evaluator of a take-it-or-leave-it offer to employ a non-credible threat, a policy of sometimes going against their local incentives and rejecting an unfair offer, in order to shape the local incentives of the party making the offer. The Algorithm also calls for negotiators to collaborate on finding a fair solution, instead of each trying to get as much for themselves as possible. It's valid, under the Algorithm, to be persuaded by another negotiator about what constitutes a fair split. But when actually making and evaluating offers, both parties are called to employ a policy which, if best-responded to, leads to a Nash equilibrium in which their own notion of a fair split is implemented.

This, indeed, is what makes the numbers the parties are thinking about be about the subject matter of 'fairness', that they're about a division of gains from trade intended to be symmetrical, as a target of surrounding structures of counterfactual actions that stabilize the 'fair' way of looking things without blowing up completely in the presence of small divergences from it...

Up next: when should our software systems make non-credible threats? When should they give in to such threats?