# 17

In this article, I analyze Newcomb's paradox with pure Bayesianism. The focus will be on the comparison of counterfactual expected gains and of the 1-boxer and 2-boxer strategies, using only the laws of probability. The main trick of the analysis is a clear separation between agents' epistemic thinking (assumed to be pure Bayesianism) and their decision-making processes.

I prove that engaging in counterfactual optimization implies 2-boxing. Weirdly enough, though, I also show the player can believe that , if the player thinks that the predictor Omega knows sufficiently more about the player's decision-making algorithm than the player does themself. In such a case, counterfactual optimization would favor 1-boxing. However, even in this case, as we will explain in greater details, this does not mean that the player should then engage in counterfactual reasoning.

I will then make a few remarks about non-Bayesians, before concluding.

# A slightly generalized Newcomb's paradox

Newcomb's paradox is a classical paradox of decision theory, which has been widely discussed by philosophers, as well as here on LessWrong. Evidently, I have not dug through the entire (massive!) literature on the problem, so I apologize if the analysis below has been done elsewhere.

Let's start by describing the problem. Two boxes A and B are presented to Alice.

• Box A is opaque. Its content is unknown to Alice. But Alice knows that an algorithm called Omega already decided what amount to put in, according to principles that we shall detail later on. The content of Box A can no longer be changed.
• Box B is transparent and contains $1,000. Alice must choose between the 1-boxer strategy and the 2-boxer strategies. • The 1-boxer only takes Box A. • The 2-boxer takes both Box A and Box B. What makes Newcomb's paradox interesting is the way Omega decides what to put in Box A. Essentially, Omega will put a large amount of money in Box A if it predicts the player to choose the 1-boxer strategy. To make it realistic, here, we typically assume that Omega gathered a huge amount of data on Alice like, say, her entire Google, Facebook and Amazon data. Based on this and all sorts of other data, such as this online discussion about Newcomb's paradox, Omega made a probabilistic guess about what strategy Alice will choose. In this article, we'll assume that Omega is a pure Bayesian, in the sense that the probabilistic guess is derived from the laws of probability. Denote the probability estimated by Omega that Alice will follow the 1-boxer strategy. Here, we assume that Omega flips a biased coin, so that, with probability , it puts$1,000,000 in Box A. Otherwise, Omega leaves Box A empty.

(The classical version of Newcomb's paradox is the special case where the probability always either equals 0 or 1)

Alice knows all of this. The two boxes are in front of her. She can either take take Box A only (1-boxing), or Alice can take both Box A and Box B (2-boxing).

Imagine you're Alice. What do you do?

Go.

Hopefully, you've already had endless debates about this famous paradox, including with people you disagree with. Essentially, the 1-boxer will say that their expected gain is larger than the 2-boxer's. The 2-boxer will respond that what matters is that they got $1,000 more than what they would have got, had they taken only one box. It turns out that both the 1-boxer and the 2-boxer are engaging in counterfactual reasoning. In other words, they compare their expected gain as they follow their strategy, to what they would have got in expectation by following the opposite strategy. In probabilistic terms, both compared to , where is the content of Box A, is the content of Box B, 1 corresponds to the 1-boxer strategy, and 2 corresponds to the 2-boxer strategy. Evidently, the crux of the problem is the uncertainty on the content of Box A, which affects the probability computation. The 1-boxer claims that which leads them to conclude that . However, the 2-boxer argues that , which leads to the conclusion . So which is it? Who's right? # Pure Bayesianism* In causal decision theory, it is common to consider expectations and , where the do-operator consists of forcing the probability distribution over all variables that are not "caused" by event 1 and 2 to remain the same. Since the content of the box is usually considered not to be "caused" by event 1 and 2, it is common in causal decision theory to consider that , in which case . It is thus common in causal decision theory to argue that the 2-boxer strategy is the better strategy. Annoyingly, causal decision theory does have "bugs". In particular, it is arguably severely "model-dependent", in the sense that our choice of the causal model affects what the do-operator really means. Thus, two causal decision theorists may still disagree, because they choose to model causality differently — this "bug" applies to "causality" more generally. One way to fix this would be to consider a universal prior over all causal models. But then, one might wonder why we dogmatically chose to restrict ourselves to causal models only, thereby steering us away from the universal prior over all computable theories that Solomonoff created for us. In any case, by adding the do-operator, causal decision theory is not purely Bayesian; it is adding an axiom to the laws of probability. But this is not specific to causal decision theory. More generally, the trouble is that pure Bayesianism does not model decision-making. Granted, it may seem "natural" to make decisions based on the maximization of expected future rewards, conditioned on all observed data. But it should be stressed that this is one way of making decisions, which is no way derived from the laws of probability themselves. So to be sure we're going for pure Bayesianism, let's be agnostic to decision-making processes. This means in particular that we will clearly separate an individual's epistemic thinking from its decision-making process. The epistemic thinking computes how the individual is viewing the world. It is the individual's best attempt to describe the world as it is, including objects in the world such as the individual themself. In this article, we will assume that both Alice and Omega will rely on pure Bayesianism to do so. What I mean by pure Bayesianism here can be Solomonoff's induction, which runs on a universal prior over all computable probability distributions over data. But I won't need the full artillery of Solomonoff. Let's just consider that any prediction is derived from applying solely the laws of probability with some initial prior on any observable data. Note that this means that we are ignoring computational complexity constraints, at least for now. We will come back to this point later on. Arguably, what makes this problem interesting in practice is precisely the fact that Bayesian computations are intractable (if not uncomputable!). But for now, we will simply assume that both Alice and Omega have infinite computational power, which allows them to run any terminating algorithm instantly, and to immediately identify algorithms that never halt as such. Now, Omega's decision-making process will exploit its epistemic thinking, as has been described above. Namely, it will decide to put$1M in Box A with probability equal to its prediction that Alice will choose the 1-boxer strategy.

However, we will not assume anything about Alice's decision-making process. In fact, Alice's actual decision-making algorithm may or may not exploit her epistemic computations. Typically, if Alice pre-commits to 1-boxing, then her decision-making algorithm will not be exploiting her epistemic thinking.

# The zero-surprise theorem

Clearly, if Alice's epistemic system perfectly knows her decision-making algorithm, then this epistemic system can predict the decisions she will make. Indeed, it can simply run the decision-making algorithm to derive a deterministic prediction of Alice's decision. But this has a weird consequence. Indeed, given all of her past data, Alice essentially cannot be surprised by her decisions given these data!

The reason why this remark is critical is because of an easy Bayesian theorem, which I'll call the zero-surprise theorem.

Theorem (zero surprise). If a new piece of data is unsurprising to a Bayesian, then this Bayesian does not change her belief by learning . More precisely, for any theory , if , then .
Proof. This follows straightforwardly from Bayes rule, after noting that, for , implies .

Thus, if Alice's decision-making fully follows from her epistemic beliefs, then she would believe , for either (1-boxer) or (2-boxer). In both cases, the fact that she decides cannot surprise her. And thus, .

Now the 1-boxers out there may argue that Alice could still make randomized decision-making algorithms, for instance by throwing a coin at some point (or by drawing random bits), to make her decision. Alas, a more general version of the zero-surprise theorem applies.

Theorem (equal surprise). If the likelihood of data is the same in all theories, i.e. for any pair of theories and , and assuming that theories partition the probability space, then the posterior equals the prior, i.e. .
Proof. This also follows from Bayes rule, by expanding the denominator using the law of total probabilities and canceling likelihood terms.

As a result, assuming that the random bits of Alice's decision-making algorithm are also unknown to Omega (when Omega estimated ), given that the probability of these bits will be the same whether Box A contains money or not, then Alice must conclude , even if she goes for a randomized decision. In this case, no matter what decision-making algorithm Alice opts for, Alice must conclude that the 2-boxer strategy yields counterfactually larger expected rewards. Counterfactual reasoning based on a transparent decision-making algorithm favors 2-boxing.

Theorem (transparent counterfactual optimization). If Alice's decision-making algorithm uses counterfactual optimization, and if Alice's epistemic thinking knows that this is how decisions are made, then Alice would be a 2-boxer.

In fact, it can be shown that a sufficient condition to this is Alice's ability to predict deterministically Omega's prediction of Alice's decision-making algorithm. Indeed, Alice's decision will not change Alice's credence on Omega's prediction , which will thus not change her credence on the contents of Box A.

# Opaque decision-making algorithm

Critically, the above conclusion considers that, in a sense, Alice's decision-making algorithm is not fully known to Alice's epistemic system. We got away by assuming that what Alice does not know about her algorithm is unknown to Omega too. But what if Omega had access to bits of Alice's decision-making algorithm that Alice's epistemic system does not quite know? More interestingly still, what if Alice knows that Omega had a better knowledge of her own decision-making algorithm when Omega made its prediction and decided the content of Box A?

This may sound outlandish. But it may not be so unreasonable. After all, our own decision-making algorithms are the result of numerous electrical impulses in our brains' complex neural networks. Clearly, they are opaque to us. In fact, it may be particularly hard for us to predict how our decision-making algorithm will behave, especially in unusual settings — such as Newcomb's paradox. And even if we're given an oracle to compute Bayes rule on all our observed data, this oracle will likely still only return a probabilistic distribution over the set of algorithms that we apply to make decisions, since it does not have precise data about the topology of our brains.

Meanwhile, today's machine learning algorithms already collect huge amounts of data about a lot of different humans. By leveraging data about how some humans behave in unusual settings, it seems possible for them to better infer how our decision-making algorithms will behave in such settings than we can infer it ourselves. After all, if we know what challenges our friends will face, and how people similar to our friends behave in these challenges, we can sometimes correctly bet that our friends will be doing well, even when they are more hesitant.

Well, the opacity of Alice's decision-making algorithm to her epistemic system changes everything. Things now become very interesting! Intuitively, the fact that Alice now concludes that she will be a 1-boxer reveals to her features of her decision-making algorithms, which will thus modify her belief on what Omega put in the box. And thus, it is now possible that she actually concludes that . In fact, we will provide an example where this is the case.

But before getting to our example, for the 2-boxers out there, it's worth detailing what this means intuitively. Of course, the content of Box A will not change because Alice chooses to opt for the 1-boxer strategy. What does change, however, is Alice's credence of what Box A contains. The world did not move. But Alice's model of the world did. And this change of belief modifies her computation of her counterfactual expected gains.

# More knowledgeable Omega

To present a simple example of , we will assume that Omega has access to all of Alice's data, in addition to some additional data that may inform Omega on Alice's decision-making system. This will simplify computations as it allows us to apply the Bayesian theorem of the argument of authority.

Theorem (argument of authority). If Alice and Omega are honest Bayesians with the same fundamental prior, if Omega knows strictly more data than Alice, and if Omega tells Alice about a prediction it made, then Alice must now make Omega's prediction. More precisely, for any data and any random variable , we have .
Proof. Alice has a prior on data . The event rules out all data D for which . Thus all data D that still have nonzero probability given must satisfy . Decomposing the computation of using the law of total probability over all possible values of given then yields an average over values that all equal . This average must thus also equal .

An interesting corollary of the theorem is that it is actually now sufficient to express all the uncertainties of the problem in terms of Omega's prediction , rather than in terms of the uncertainty on Alice's decision-making algorithm. In particular, Alice can now easily derive her belief on the fact that her decision-making algorithm will make her choose the 1-boxer strategy.

Lemma 1. Alice's prior on the fact that she will be a 1-boxer is equal to her expected prior on Omega's prediction. In other words, denoting , we have .
Proof. To obtain this, we first apply the law of total probability to note that . Now, we note that if Omega knows strictly more than Alice, thus if Omega predicts that Alice will opt for the 1-boxer strategy with probability , then Alice must now believe so. By the Bayesian theorem of the argument of authority, . As a result, we have .

Interestingly, we have a similar result on the prior probability that Box A contains money.

Lemma 2. Denote the event where Box A contains $1M. Then . In other words, from Alice's viewpoint, the probability that Box A contains$1M is Alice's expected prior on Omega's prediction.
Proof. . Now, as we saw, Omega's decision-making process is such that Omega puts $1M in Box A with probability . As a result, we have . We move on to the computation of the posterior probability that Box A contains$1M.

Lemma 3. , where is the prior variance of Omega's prediction.
Proof. Bayes rule yields . Using as we saw in the two previous paragraphs, we now have .
Now, to compute , we use again the law of total probability, by conditioning over different values of . This yields . Now, the fact that the coin throw of Omega is independent of the choice of the individual, conditioned on the estimation of ω by Omega, means that , which, as we saw, is equal to by virtue of the Bayesian theorem of the argument of authority. We thus have . For the other quantity , we use Bayes rule , which yields . We then note that, by definition of Omega's choice for the content of Box A, we have . As we already saw, the denominator is . Overall, we obtain .
Using allows to conclude.

In other words, especially if Alice has a large uncertainty on Omega's prediction, while assuming that Omega is likely predicting that she will be a 2-boxer, Alice will increase her credence that Box A contains $1M if her decision-making makes her choose Box A. An immediate corollary of this is that, conversely, if the Bayesian opts for the 2-boxer strategy then her credence in Box A containing$1M decreases. Indeed, we have the following Bayesian no-confirmation-bias theorem.

Theorem (no confirmation bias). A Bayesian's prior must be equal to the expectation of her posterior. She cannot expect to increase her belief after looking at more data. More precisely, .
Proof. This is exactly the law of total probability.

Thus, must be an average of and . But in fact, we can compute the last term explicitly.

Lemma 4. .
Proof. By Bayes rule, . But we have , and . Combining it all yields .

Does it mean that we should opt for the 1-boxer strategy? Well, it depends on the details of the computation. Let us denote . we then have .

Main Theorem. Counterfactual optimization favors 1-boxing inequality, if and only if, we have .
Proof. Lemma 3 implies . Meanwhile, Lemma 4 implies . We then have . Therefore, we observe that , if and only if, .

Therefore, the expected gain is larger for the 1-boxer if the possible large content of Box A is much larger than that of Box B, if the uncertainty on Omega's prediction is large, and if Omega's prediction is likely to be nearly deterministic. Arguably, this result captures some of the intuition we may have about this problem, as the 1-boxer strategy may feel more compelling if and if we trust Omega to know much more about our own decision-making process than we do.

# Shouldn't Alice engage in counterfactual optimization?

Now, given all of this, assuming , should we conclude that Alice must then choose to be a 1-boxer?

Well, technically, all we said is that if Alice follows her decision-making algorithm, if Omega knows strictly more than Alice, in particular about her decision-making algorithm, and if , then counterfactual reasoning favors 1-boxing. But weirdly enough, this does not mean that Alice should decide to be 1-boxing. Indeed, if Alice chooses her decision-making algorithm, then this decision-making algorithm will now be transparent to her, and the theorem no longer applies.

In fact, if Alice engages in counterfactual optimization, then she will know her decision-making algorithm, and thus, by virtue of the theorem of transparent counterfactual optimization we proved above, she will be using the 2-boxer strategy. Put differently, this theorem says that, if you have a purely Bayesian epistemic system, then you can't engage in counterfactual optimization and decide to be a 1-boxer.

But say we're now in the context of the previous section, where Alice's decision-making algorithm is opaque to her. And assume that . Then, we can argue that Alice should be a 1-boxer, based on counterfactual optimization. In fact, Alice's own epistemic system may be shouting that she should be 1-boxing. But this epistemic system is not Alice's decision-making system. This decision-making algorithm cannot be changed.

But isn't it just a wrong assumption? Can't we decide that Alice can decide what decision-making algorithm to execute, perhaps even up to some noise?

Well, if so, then Alice's decision-making algorithm would actually be none other than the meta-algorithm that outputs a decision-making algorithm to be executed, and then execute it. Yet a meta-algorithm is still an algorithm! And it is this meta-algorithm that may be better understood by Omega than by Alice herself.

This may be frustrating. You might want to ask: Can't we at least say that the 1-boxer wins more money than the 2-boxer strategy?

Yes, definitely, because these are descriptive questions. And indeed, if both Alice and Omega know that Alice's decision-making will tell her to use the 1-boxer strategy, then Alice will know she will gain $1M. Conversely, if both know that Alice will be a 2-boxer, then Alice will know she will gain$1,000. In the former case, Alice will win more than in the latter case. But, as we saw earlier, because of the zero-surprise theorem, in both cases, . Thus, in both cases, Alice's epistemic system expected to win \$1,000 more by taking both boxes. In both cases, her counterfactual optimization urged her to be a 2-boxer.

# A few remarks for non-Bayesians

Now, unfortunately, those of us who are mere mortals usually don't have access to a Bayesian oracle. Given our limited computational resources, how should we find inspiration from this Bayesian analysis of Newcomb's paradox to draw conclusions on what we should actually do in Newcomb-like paradoxes?

The critical feature of our analysis arguably still stands. In a sense, our decision-making algorithms are partial black boxes to ourselves. We are running an algorithm whose code is unknown to us. As a result, we should expect to be surprised by some decisions we are making. And thus to learn something once we find out about what our decision-making systems outputs.

Interestingly, if now consider computational complexity, this principle becomes all the more compelling. In practice, because we have limited computing power, especially under the assumption of computational irreducibility, we should often be surprised by the result of our decision-making process. This principle is beautifully captured in this quote from Turing's 1950 brilliant paper.

The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false. A natural consequence of doing so is that one then assumes that there is no virtue in the mere working out of consequences from data and general principles.

To rephrase this, even when you know how you decide, you may still not know what you decide. You should then not be surprised that some algorithm like Omega really can predict what you decide better than you can.

In this case, if you somehow thinks that Omega does predict better than you can, and if you are really unsure about what Omega will predict for you, then approximate counterfactual reasoning may suggest that you might want to engage with 1-boxing, though this setting with limited computational resources would need to be further investigated to clear things out.

Weirdly, though, this suggests that the more you think about Newcomb's paradox, the more (probably) you decrease your uncertainty on Omega's prediction — assuming that it's reasonably close to a Bayesian prediction. And thus, the more compelling the 2-boxer strategy may seem to you.

# Conclusion

In this article, we carefully distinguished an individual's decision-making algorithm from its epistemic system. Interestingly, this principle has been argued to be critical to robust AI alignment. It is probably useful for us humans as well. The job of deciding what to do is not the same as the job of figuring out a reliable model of the world. Interestingly, we commonly assume that the decision-making algorithm should exploit the epistemic system. But perhaps a neglected reflexion is the fact that the epistemic system must also describe the decision-making algorithm. In practice, it is usually unfortunate that our decision-making algorithms are somewhat opaque to ourselves — though, in the specific case of Newcomb's paradox, it may be just what we need to counterfactually prefer 1-boxing.

Note though that I'm leaving here open the question as to whether we should aim at counterfactual optimization as a decision rule. I personally still believe that this is the most natural form of decision rule, especially under a Bayesian framework when considering the von Neumann - Morgenstern preferences. I gladly acknowledge, however, my lack of understanding of alternatives.

An important feature of this article is a strong motivation to consistently apply Bayesianism for epistemic thinking. This motivation I have is the result of years of wonders within the formidable world of probability theory and its astonishing theorems, a few of which have been mentioned above. My fascination for Bayes rule has culminated in a book I recently published at CRC Press, called The Equation of Knowledge, which I presented in this LessWrong post. The book contains a large number of examples, many of which are pretty similar to what you have read in this article. Others are more empirical. If you found this article somewhat interesting, I bet that you'll really enjoy the book.