Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

In my last post, I wrote that the counterfactuals in Transparent-Box Newcomb's problem were largely a matter of social convention. One point I overlooked for a long time was that formalising a problem like Newcomb's is tricker than it seems. Depending on how it is written, some statements may seem to apply to just our actual world, some may seem to be also referring to counterfactual worlds and some may seem ambiguous.

To clarify this, I'll consider phrases that one might hear in relation to this problem + some variations and draw out their implications. I won't use modal logic since it really wouldn't add anything to this discussion except more jargon.

The idea that counterfactuals could have a social element should seem really puzzling at first. After all, counterfactuals determine what counts as a good decision and surely what is a good decision isn't just a matter of social convention? I think I know how to resolve this problem and I'll address that in a post soon, but for now I'll just provide a hint and link you to a comment by Abram Demski talking about how probabilities are somewhere between subjective and objective.

Example 1:

a) Omega is a perfect predictor

b) You find out from an infallible source that Omega will predict your choice correctly

The first suggests that Omega will predict you correctly no matter what you choose, so we might take it to apply to every counterfactual world, while it is technically possible that Omega might only be a perfect predictor in this world. The second is much more ambiguous and you might take its prediction to only be correct in this world and not the counterfactual.

Example 2:

a) The first box always contains $1000

b) The first box contains $1000

First seems to be making a claim about counterfactual worlds again, while the second is ambiguous. It isn't clear if it applies to all worlds or not.

Example 3:

"The game works as follows: the first box contains $1000, while the second contains $0 or $1000 depending on whether the predictor predicts you'll two-box or one-box"

Talking about the rules of the game seems to be a hint that this will apply to all counterfactuals. After all, decision problems are normally about winning within a game, as opposed to the rules changing according to your decision.

Example 4:

a) The box in front of you contains $1 million

b) The box in front of you contains either $0 or $1 million. In this case, it contains $1 million

The first is ambiguous. The second seems to make a statement about all counterfactuals, then one about this world. If it were making a statement just about this world then the first sentence wouldn't have been necessary.


This could be leveraged to provide a critique of the erasure approach. This approach wants to construct a non-trivial decision problem by erasing information, but this analysis suggests that either a) this may be unnecessary because it is already implicit in the problem which information is universal or not or b) the issue isn't that we need to figure out which assumption to erase, but that the problem is ambiguous about which parts should be taken universally.

New Comment
4 comments, sorted by Click to highlight new comments since:

Counterfactuals are in the mind, so of course they depend on the mental models, including "social conventions". They are also a bad model of the actual world, because they tempt you to argue with reality. (Here I assume a realist position.) There is only one world. There is no what could have been, only what was, is and may yet be. And that "may" is in the mind, not in the territory. Being an embedded agent, you cannot change the world, only learn more about which of your maps, if any, are more accurate. That's why a's and b's in your examples are identical, only some sound more confused than others. There is no difference between an opaque and a transparent Newcomb's.

"There is only one world. There is no what could have been, only what was, is and may yet be. And that "may" is in the mind, not in the territory" - this is largely my intuition, but I think we can construct a meaningful (but non-unique) notion of counterfactuals in the map.

"There is no difference between an opaque and a transparent Newcomb's" - I agree that this is true to an extent as well

think we can construct a meaningful (but non-unique) notion of counterfactuals in the map

Quite likely, and you have been working on it for some time. But is it a useful direction to work in? Every time I read what people write about it, I get the impression that they end up more confused than when they had started.

With the really difficult philosophy problems, there's a lot of going forwards, then going backwards. But that doesn't mean that progress hasn't been made.