Counterfactual Mugging Poker Game


Ω 2

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Consider the following game:

Player A receives a card at random that is either High or Low. He may reveal his card if he wishes.

Player B then chooses a probability that Player A has a high card.

Player A always loses dollars. Player B loses dollars if the card is low and dollars if the card is high.

Note that Player B has been given a proper scoring rule, and so is incentivized to give his true probability (unless he makes some deal with player A).

You are playing this game as player A. You only play one time. You are looking at a low card. Player B is not trying to make a deal with you, and will report his true probability. Player B is very good at reasoning about you, but you are in a separate room, so Player B cannot read any tells unless you show the card. Do you show your card?

Since your card is low, if you show it to player B, you will lose nothing, and get the best possible output. However, if player B reasons that if you would show your card if it was low, then in the counterfactual world in which you got a high card, player B would know you had a high card because you refused to show. Thus, you would lose a full dollar in those counterfactual worlds.

If you choose to not reveal your card, player B would assign probability 1/2 and you would lose a quarter.

I like this variant of the counterfactual mugging because it takes the agency out of the predictor. In the standard counterfactual mugging, you might reject the hypothetical and think that the predictor is trying to trick you. Here, there is a sense in which you are creating the counterfactual mugging yourself by trying to be able to keep secrets.

Also, think about this example the next time you are tempted to say that someone would only Glomarize if they had an important secret.


Ω 2