Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Applying the Counterfactual Prisoner's Dilemma to Logical Uncertainty

2abramdemski

2Chris_Leong

0Dagon

2Chris_Leong

2Chris_Leong

New Comment

Notice however that for Logical Counterfactual Mugging to be well defined, you need to define what Omega is doing when it is making its prediction. In Counterfactuals for Perfect Predictors, I explained that when dealing with perfect predictors, often the counterfactual would be undefined. For example, in Parfit's Hitchhicker a perfect predictor would never give a lift to someone who never pays in town, so it isn't immediately clear that predicting what such a person would do in town involves predicting something coherent.

Another approach is to change the example to remove the objection.

The poker-like game at the end of Decision Theory (I really titled that post simply "decision theory"? rather vague, past-me...) is isomorphic to counterfactual mugging, but removes some distractions, such as "how does Omega take the counterfactual".

Alice receives a High or Low card. Alice can reveal the card to Bob. Bob then states a probability for Alice's card being Low. Bob's incentives just encourage him to report honest beliefs. Alice loses .

When Alice gets a Low card, she can just reveal it to Bob, and get the best possible outcome. But this strategy means Bob will know if she has a high card, giving her the worst possible outcome in that case. In order to successfully bluff, Alice has to sometimes act like she has different cards than she has. And indeed, the optimal strategy for Alice in this case is to never show her cards.

This example will get less objections from people, because it is grounded in a very realistic game. Playing poker well requires this kind of reasoning. The powerful predictor is replaced with another player. We can still technically ask "how is the other player deal with undefined counterfactuals?", but we can skip over that by just reasoning about strategies in the usual game-theoretic way -- if Alice's strategy were to reveal low cards, then Bob could always call high cards.

We can then insert logical uncertainty by stipulating that Alice gets her card pseudorandomly, but neither Alice nor Bob can predict the random number generator.

Not sure yet whether you can pull a similar trick with Counterfactual Prisoner's Dilemma.

== nitpicks ==

## Applying the Counterfactual Prisoner's Dilemma to Logical Uncertainty

Why isn't the title "applying logical uncertainty to the counterfactual prisoner's dilemma"? Or "A Logically Uncertain Version of Counterfactual Prisoner's Dilemma"? I don't see how you're applying CPD to LU.

The Counterfactual Prisoner's Dilemma is a symmetric version of the original

Symmetric? The original is already symmetric. But "symmetric" is a concept which applies to multi-player games. Counterfactual PD makes PD into a one-player game. Presumably you meant "a one-player version"?

where regardless of whether the coin comes up heads or tails you are asked to pay $100 and you are then paid $10,000 if Omega predicts that you would have paid if the coin had come up the other way. If you decide updatelesly you will always received $9900, while if you decide updatefully, then you will receive $0.

This is only true if you use classical CDT, yeah? Whereas EDT can get $9900 in both cases, provided it believes in a sufficient correlation between what it does upon seeing heads vs tails.

So unlike Counterfactual Mugging, pre-committing to pay ensures a better outcome regardless of how the coin flip turns out, suggesting that focusing only on your particular probability branch is mistaken.

I don't get what you meant by the last part of this sentence. Counterfactual Mugging already suggests that focusing only on your particular branch is mistaken. If someone bought that you should pay up in this problem but not in counterfactual mugging, I expect that person to say something like "because in this case that strategy is guaranteed better *even in this branch*" -- hence, they're not necessarily convinced to look at other branches. So I don't think this example necessarily argues for looking at other branches.

Also, why is this posted as a question?

I'm curious, do you find this argument for paying in Logical Counterfactual Mugging persuasive? What about the Counterfactual Prisoner's Dilemma argument for the basic Counterfactual Mugging?

Another approach is to change the example to remove the objection

Interesting point about the poker game version. It's still a one shot game, so there's no real reason to hide a 0 unless you think they're a pretty powerful predictor, but it is always predicting something coherent.

I don't see how you're applying CPD to LU

The claim is that you should pay in the Logical Counterfactual Prisoner's Dilemma and hence pay in Logical Counterfactual Mugging which is the logically uncertain version of Counterfactual Mugging.

Symmetric? The original is already symmetric. But "symmetric" is a concept which applies to multi-player games. Counterfactual PD makes PD into a one-player game. Presumably you meant "a one-player version"?

Edited now. I meant it's a symmetric version of counterfactual mugging. So not in the game theory sense, but just that there is now no difference between heads and tails.

This is only true if you use classical CDT, yeah? Whereas EDT can get $9900 in both cases, provided it believes in a sufficient correlation between what it does upon seeing heads vs tails.

Point noted. Maybe I should have been more careful about specifying what I was comparing

Also, why is this posted as a question?

Accident. It's fixed now

I always enjoy convoluted Omega situations, but I don't understand how these theoretical entities get to the point where their priors are as stated (and especially the meta-priors about how they should frame the decision problem).

Before the start of the game, Omega has some prior distribution of the Agent's beliefs and update mechanisms. And the Agent has some distribution of beliefs about Omega's predictive power over situations where the Agent "feels like" it has a choice. What experiences cause Omega to update sufficiently to even offer the problem (ok, this is easy: quantum brain scan or other Star-Trek technobabble)? But what lets the Agent update to believing that their qualia of free-will is such an illusion in this case? And how do they then NOT meta-update to understand the belief-action-payout matrix well enough to take the most-profitable action?

Moved to my shortform - it's not a direct answer to the post.

[This comment is no longer endorsed by its author]

I would prefer if you wrote up your own post objecting to the general frame in which I'm operating in. I don't feel that this is the right location to have this discussion.

The Counterfactual Prisoner's Dilemma is a symmetric version of Counterfactual Mugging where regardless of whether the coin comes up heads or tails you are asked to pay $100 and you are then paid $10,000 if Omega predicts that you would have paid if the coin had come up the other way. If you decide updatelesly you will always received $9900, while if you decide updatefully, then you will receive $0. So unlike Counterfactual Mugging, pre-committing to pay ensures a better outcome regardless of how the coin flip turns out, suggesting that focusing only on your particular probability branch is mistaken.

The Logical Counterfactual Mugging doesn't use a coin flip, but instead looks at the parity of sometime beyond your ability to calculate, like the 10,000th digit of pi. You are told it is even and then asked to pay $100 on the basis that if Omega predict you would have paid, then he would have given you $10,000 if had turned out to be odd.

You might naturally assume that you couldn't construct a logical version of the Counterfactual Prisoner's Dilemma. I certainly did at first. After all, you might say, the coin could have come up tails, but the 10,000th digit of pi couldn't have turned out to be odd. After all, that would be a logical impossibility.

But could the coin actually have come up tails? If the universe is deterministic, then the way it came up was the only way it could ever have come up. So is there is less difference between these two scenarios than it looks at first glance?

Let's see. For the standard counterfactual mugging, you can't find the contradiction because you lack information about the world, while for the logical version, you can't find the contradiction because of processing power. In the former, we could actually construct two consistent worlds - one where it is heads and one where it is tails - that are consistent with the information you have about the scenario. In the later, we can't.

Notice however that for Logical Counterfactual Mugging to be well defined, you need to define what Omega is doing when it is making its prediction. In Counterfactuals for Perfect Predictors, I explained that when dealing with perfect predictors, often the counterfactual would be undefined. For example, in Parfit's Hitchhicker a perfect predictor would never give a lift to someone who never pays in town, so it isn't immediately clear that predicting what such a person would do in town involves predicting something coherent.

However, even though we can't ask what the hitchhiker would do

in an incoherent situation, we can ask what they would do when they receive an inputrepresenting an incoherent situation(see Counterfactuals for Perfect Predictors for a more formal description). Indeed, Updateless Decision Theory uses this technique - programs are as defined as input-output maps - although I don't know whether Wei Dai was motivated by this concern or not.Similarly, the predictor in Logical Counterfactual Mugging must be predicting something that is well defined. So we can assume that it is producing a prediction based on an input, which may possibly represent a logically inconsistent situation. Given this, we can construct a logical version of the Counterfactual prisoner's dilemma. Writing this explicitly:

There really isn't any difference between how we make the logical case coherent and how we make the standard case coherent. At this point, we can see that just as per the original Counterfactual Prisoner's Dilemma always paying scores you $9900, while never paying scores you nothing. You are guaranteed to do better regardless of the coin flip (or in Abram Demski's terms we now have an all-upside updateless situation).