In the last post, we talked about strategic time and the strategic time loops studied in open-source game theory. In that context, agents have logical line-of-sight to each other and the situation they're both facing, which creates a two-way information flow at the time each is making their decision. In this post I'll describe how agents in one context can use this logical line-of-sight to condition their behavior on how they behave in other contexts. This in turn makes those contexts strategically sequential or loopy, in a way that a purely causal decision theory doesn't pick up on.

Sequential Games and Leverage

As an intuition pump, consider the following ordinary game: Alice and Bob are going to play a Prisoners' Dilemma, and then an Ultimatum game. My favorite framing of the Prisoners' Dilemma is by Nicky Case: each player stands in front of a machine which accepts a certain amount of money, e.g. $100.[1] Both players choose simultaneously whether to put some of their own money into the machine. If Alice places $100 into the machine in front of her, $200 comes out of Bob's machine, and vice versa. If a player withholds their money, nothing comes out of the other player's machine. We call these strategies Cooperate and Defect respectively.

Since neither player can cause money to come out of their own machine, Causal Decision Theory (CDT) identifies Defect as a dominant strategy for both players. Dissatisfaction with this answer has motivated many to dig into the foundations of decision theory, and coming up with different conditions that enable Cooperation in the Prisoners' Dilemma has become a cottage industry for the field. I myself keep calling it the Prisoners' Dilemma (rather than the Prisoner's Dilemma) because I want to frame it as a dilemma they're facing together, where they can collaboratively implement mechanisms that incentivize mutual Cooperation. The mechanism I want to describe today is leverage: having something the other player wants, and giving it to them if and only if they do what you want.

Suppose that the subsequent Ultimatum game is about how to split $1,000. After the Prisoners' Dilemma, a fair coin is flipped to determine Alice and Bob's roles in the Ultimatum game. The evaluator can employ probabilistic rejection to shape the incentives of the proposer, so that the proposer has the unique best-response of offering a fair split. (According to the evaluator's notion of fairness.) And both players might have common knowledge that "a fair split" depends on what both players did in the Prisoners' Dilemma. 

If Alice is the evaluator, and she Cooperated in the first round but Bob Defected, then she is $200 worse-off than if Bob had Cooperated, and she can demand that Bob compensate her for this loss. Similarly, if Alice is the proposer, she might offer Bob $500 if he Cooperated but $300 if he Defected. Since Bob only gained $100 compared to Cooperating, his best-response is to Cooperate if he believes Alice will follow this policy. And Bob can employ the same policy, stabilizing the socially optimal payoff of ($600, $600) as a Nash equilibrium where neither has an incentive to change their policy.

Crucially, this enforcement mechanism relies on each player having enough leverage in the subsequent game to incentivize Cooperation in the first round. If the Ultimatum game had been for stakes less than $200, this would be less than a Defector can obtain for themselves if the other player Cooperates. Knowing that neither can incentivize Cooperation, both players might fall back into mutual Defection. (Or they might be able to use another mechanism to coordinate on Something Else Which Is Not That.)

Bets vs Unexploitability

Even if Alice knows she has enough leverage that she can incentivize Bob to Cooperate, she might be uncertain about whether Bob will implement a policy that's a best-response to hers. In the worst case, Alice Cooperates while Bob Defects, and the Ultimatum game results in no deal, a -$100 payoff for Alice when she knows she can guarantee $0 for herself by Defecting. She doesn't have perfect copy of Bob's mind like the agents in an open-source game do, and if Alice wants to be logically certain that she won't end up with the sucker's payoff (unexploitable), she needs to employ some other mechanism to ensure Cooperation or else Defect.

Superrationality is based on the postulate that, when facing symmetrical decisions, agents are likely to make symmetrical choices. If Alice has sufficient credence in the superrationality postulate, she can justify Cooperating as a rational bet that Bob will choose symmetrically and Cooperate as well. As a simplified model, she's risking $100 to potentially gain $100. If Alice assigns a probability of at least  that her bet will pay off, it can make sense to gamble her $100 on trusting Bob and her ability to extract concessions afterwards if needed. (If $100 is a much bigger deal as a loss than a gain for Alice, she needs to be much more confident to justify this risk.)

Similarly, if Alice and Bob were playing an iterated Prisoners' Dilemma for 100 rounds, they know if they Defect each round they can guarantee themselves a payoff of $0. But if they can manage to Cooperate in every round, that's 100 occasions where they each become $100 richer, for a total of $10,000 for each of them. In the classic analysis, rational agents know they'll Defect in the last round, since there's no leverage left for either to incentivize Cooperation. Knowing this, rational agents Defect in the second-to-last round, and every round all the way back to the beginning, leaving them with a payoff of $0 each.

In a desperate effort to do better than classically-rational agents, Bob might consider employing a tit-for-tat strategy: Cooperate in the first round, and then in subsequent rounds do whatever Alice did in the previous round. If Alice uses a similar strategy, they'll both walk away with $10,000. If Bob believes there's at least a 1% chance that this will happen, and he feels like he can afford the risk, he can justify betting his initial $100 that Alice will reciprocate his altruism[2] 

It seems like this is one common way that trust develops between people, starting out by opening up in ways that leave one person a little vulnerable to another. If the trustee acts in a trustworthy manner, the trustor can become a little more confident that they will do the same if the stakes are a little higher, and so on. It can be rational to bet that another agent is trustworthy, if the prior probability of trustworthiness is high enough. Software systems that can read and understand each other's source code can skip all of this uncertainty and vulnerability, and directly condition their behavior on the legible properties of their counterparts.

Counterfactual Games and Leverage

With our intuitions primed, here's the game that I claim has the same information flow as "Ultimatum Game after Prisoners' Dilemma": Alice and Bob will be writing programs AliceBot and BobBot to play on their behalf. After these programs have been chosen, a fair coin is flipped. If it comes up Heads, AliceBot and BobBot will be playing a closed-source Prisoners' Dilemma. If it comes up Tails, they'll be playing an open-source Ultimatum game, with roles assigned by another fair coin flip as before.

The information flow between these two counterfactual subgames is the same as between the sequential subgames in the original game. On the open-source branch, AliceBot and BobBot have access to each other's source code and can use their logical crystal balls to see how they would have behaved on the closed-source branch. And they can use this information to adjust their demands in the Ultimatum Game they're playing.

Over on the closed-source branch, the delegates don't have a logical line-of-sight to the open-source branch, since they don't have enough information to reason about it in detail. But they know that their open-source counterfactual selves will be able to see what they do here in the closed-source Prisoner's Dilemma and condition their behavior accordingly.

Alice and Bob can design their programs to employ a negotiation strategy analogous to counterfactual mugging: they offer more/accept less if they predict that their counterpart would have Cooperated if they were playing a closed-source Prisoners' Dilemma. This approach relies on the same type of leverage requirement as for the sequential version.

As a concrete scenario, suppose that Alice and Bob have enough wealth that they think of money as having roughly linear value to them. That is, a coin flip between losing $100 and gaining $100 has roughly the same expected value to them as $0. The game is symmetrical, so let's focus on Alice's perspective. She's going to design her program so that if Bob's program is a best-response to hers, the result will be fair on expectation. If the coin comes up Heads, AliceBot will Cooperate. If BobBot also Cooperates, this leads to a joint payoff of ($100, $100). If the coin comes up Tails, AliceBot will use its logical crystal ball to see how the Prisoners' Dilemma would have gone. If BobBot would have Cooperated, AliceBot offers/demands $500. If BobBot would have Defected, AliceBot will offer less/demand more to compensate.

To calculate how much more AliceBot needs to receive in the Ultimatum branch, we need to calculate how much expected value Alice thinks it's fair for her to receive. If her decision theory and notions of fairness were universalized, she'd receive $100 with 50% probability, and $500 with 50% probability, for an expected value of $300. If instead Bob employs a program that Defects on AliceBot, Alice receives -$100 with 50% probability and ($500 + Demand) with 50% probability. Supposing that BobBot best-responds to AliceBot in the Ultimatum game, Alice can insist that she receive an additional $200 in compensation, so that her expected value remains $300. This is exactly the adjustment called for in the sequential form of the game. [3]

The takeaway here is that logical line-of-sight introduces a strategic information flow between these two counterfactuals, in the sense that agents in one counterfactual can condition their behavior on the other, as if they were sequential games. If the both counterfactual games are open-source, the logical line-of-sight/strategic information flow goes both ways, and agents in each counterfactual subgame can attempt to condition their behavior on their actions in the other counterfactual. [4] 

Open-source games happen strategically-after other games involving the same programs. Which might introduce an information loop if two games are both strategically-after each other. We can think of this as adding acausal information arrows leading to open-source games, making them sequential or loopy in a way that causal arrows alone don't show.

Collateral as a Type of Leverage

One more note about leverage: if Alice wants to be unexploitable, in the sense of never doing worse that what she can unilaterally guarantee for herself, generic leverage isn't enough. But collateral is a particular type of leverage that enables unexploitability: something that Alice and Bob both value, and that Alice can unilaterally take for herself if Bob doesn't perform as expected. If Alice has collateral she values at $100, she can safely put her $100 into the machine without being vulnerable if Bob doesn't reciprocate.

Typically, collateral is associated with a "locking mechanism", which allows Alice to collect the collateral if and only if Bob doesn't perform as expected. This protects Bob from the possibility that he might do his part, only for Alice to take the collateral anyway. Often something like a legal system, escrow service, or smart contract will serve this role.

Another way that Alice might have collateral with respect to Bob is if she generally gives Bob a fair share of resources, even in contexts where she has the power to unilaterally take more for herself. If we think of Bob Defecting while Alice Cooperates as creating a debt between them, those resources can act as collateral securing that debt. Alice can use all of the causal and acausal information channels available to her to determine the balance of this debt, and adjust her local behavior to collect on it.

Alice might also attempt to enforce fair distributions of resources on behalf of third parties. If Bob has defaulted on a debt to Carol, and Alice has line-of-sight to that situation, she might use her bargaining power to insist that Bob repay Carol if any negotiated agreement is to be made. Alice might do this because of a separately negotiated agreement with Carol, or just because Alice cares about resources being split fairly and thinks it's the right thing to do with her power.

  1. ^

    Feel free to rescale "$100" to "a day's wages", or whatever gives the scenario enough stakes to be interesting but not so much that it's stressful to think about.

  2. ^

    This 1% is based on an assumption that Bob's utility is roughly linear in money, at least for the amounts involved. If Bob's utility is logarithmic in his total wealth, his marginal utility depends on how much wealth he already has.

    To sketch out an example, if Bob has $1,000 before the game, he might see the bet as risking walking away with only $900, but with a potential upside of walking away with $11,000. If Bob's utility function is , then he might see unconditional Defection as a guaranteed 3 units of utility (aka utilons). Whereas trusting Alice risks only getting  2.95 utilons, but with a potential upside of earning  4.04 utilons. Bob needs to assign at least  4.2% chance of Alice being trustworthy for this bet to have positive expected value.

    If logarithmic-Bob's initial wealth is closer to $100, he needs to be more and more confident that this bet will pay off. Conversely, as his initial wealth increases, his marginal utility becomes more and more linear and his required confidence approaches  1%.

    I tried not to use words like "betray" in this post, because I think a person can reasonably feel like they can't afford to risk $100 trusting a stranger. I chose $100 because it's a nice round number that's on the order of what a median American earns in a day: a big enough amount that it's worth thinking for a couple minutes about whether to risk it, but not much more devastating than getting sick and having to call off work for a day. For someone making $1 a day, risking $100 is a much bigger deal, and might simply not be justifiable even if given $100 to work with at the beginning of the game.

  3. ^

    Admittedly, this is a coincidence that happens when a fair coin is used to determine which game is played. This probability matters for calculating the fair point, and this in turn affects the compensation needed. As the probability  of playing the closed-source Prisoners' Dilemma goes up, the compensation needed also goes up. It looks like . If this probability reaches , the compensation required reaches $1,000, which the limit of what Alice can demand in the Ultimatum game. This is the critical threshold where she stops having enough leverage to incentivize Cooperation. As you would expect, as  approaches 0 so does the compensation Alice needs to receive in order for her expected value to be fair.

  4. ^

    The literal information doesn't flow along an acausal channel between counterfactuals, it comes along a causal channel like "being given the other program's source code". But strategic information has to do with a decision-maker's ability to condition their decision on something, and that can be modelled as each player being equipped with a logical crystal ball that can "view" counterfactual situations and/or the decision-making processes within them. Which acts like an acausal channel for strategic information.

New to LessWrong?

New Comment