Newcomb's Problem as an Epistemic Rorschach Test

D T

Rejected for the following reason(s):

No LLM generated, assisted/co-written, or edited work.

Read full explanation

1. The Gnawing Feeling That Won't Go Away

I stumbled onto Newcomb's problem a few days ago via this Veritasium video and found that I was a one-boxer. It just made more sense to me given the circumstances.

Causality and dominance reasoning suggested going for two boxes, as my choice didn't change their contents. But I faced Omega, which had a knowledge about this world like nothing I've seen before or deemed possible. The evidence made it highly likely that I'd walk away with $1,000 in case of two-boxing. It seemed to me like old rules didn't apply anymore. So I one-boxed. Still, I couldn't shake the gnawing feeling that my decision was irrational. My training as a natural scientist screamed "correlation doesn't equal causation, time flows linearly, magical thinking is a trap". How could I justify my behaviour? At what point does a decision that would be irrational in most circumstances become rational? This question led me to the idea that the common framing of the problem may be misleading.

I became even more intrigued when learning about the near-even split between one-boxers and two-boxers both among professional philosophers (PhilPapers 2020) and the wider public (Guardian's poll 2016) – a distribution that I could roughly reproduce among colleagues and friends. There were smart and rational people on both sides of the aisle. What's going on here?

I've come to suspect that the disagreement between one-boxers and two-boxers is not so much about decision theory, but about how you interpret the problem's premises. Not whether you believe them, but how you frame them and how this influences your world model. I think that players are starting out with an implicit decision based on their personal preferences, let's call them "epistemic temperament", and the box-taking strategy naturally ensues. When viewed from this angle, the one-box/two-box positions become internally consistent and the paradox dissolves.

2. The Standard Framing

Here's the setup: A predictor, Omega, with a near-perfect track record has set up two boxes. Box A is transparent and contains $1,000. Box B is opaque and contains either $1,000,000 or nothing. You can take either both boxes or only box B. If Omega predicted you'd take only box B, it put the million inside. If it predicted you'd take both, it left box B empty. You know that it has been right in thousands of previous cases. The problem was publicized by Nozick in 1969, and has divided thinkers ever since.

A common argument for two-boxing is this: in the best case, you get $1,001,000, and in the worst case you get $1,000 – a consolation prize that is clearly preferable if box B is empty anyway. This line of thought is known as Causal decision theory (CDT). In real life, naive CDT leads to undesirable outcomes in several scenarios, like collective action, trust-based negotiations and even nuclear deterrence. However, it works well in many other cases, e.g. it leads to stable cooperation in iterated prisoner's dilemma.

Two other decision theories favor one-boxing. Evidential decision theory (EDT) suggests choosing the action that is the best evidence of good outcomes ("the action which would be indicative of the best outcome in expectation if one received the "news" that it had been taken"). Think correlation, forget causation: we know that one-boxers walk away rich while two-boxers walk away poor – therefore, one-box. This clearly doesn't work in many real-life scenarios, e.g. even though wealthy people choose expensive wine, you won't become wealthy yourself by choosing expensive wine.

Functional decision theory (FDT) comes to the same conclusion for very different reasons. It frames your action as the output of your underlying decision algorithm, with Omega running that same algorithm. A 'subjunctive dependence' connects your decision in the physical world and in Omega's simulation. So, if you want Omega to predict you one-boxing, you should one-box. FDT is particularly relevant in AI alignment research and the design of interacting artificial agents.

3. The Threshold Argument Reveals a Hidden Tension

I complained to my friend André that the setup bothered me and I wasn't happy with the virtual million dollars that I earned by honest one-boxing. He came up with a thought experiment that shifted the focus from the one-box/two-box decision to another question the problem conveniently left open: what kind of thing is Omega?

Instead of $1,000, we'll put 1 cent in box A. In this situation, almost everybody will one-box. Now we'll gradually increase the amount in box A to $100,000,000. At some point, almost every person X will switch to two-boxing. We'll call this threshold Y_X.

At Y_X, the Signal-to-Noise Ratio (SNR) for person X approaches zero. The decision is driven by tiny fluctuations, like thermal noise in the brain or quantum randomness. Can the near-perfect predictor resolve this noise, or does its accuracy break down?

If Omega is a physical system (a supercomputer running a world simulation, think Devs), it's plausible that the accuracy drops sharply around Y_X. No physical simulation can predict a purely stochastic process. This reveals a tension that is inherent to the setup. The prediction seems to run into hard limits.

The situation improves if predictions are made far enough from the threshold. SNR is robust and Omega once again achieves high accuracy. But the question about Omega's nature remains. Whether you think this tension is fatal to the premises or merely uncomfortable is, I'll argue, what separates one-boxers from two-boxers.

4. It's What You Make of the Premises

The premise of Omega as an almost perfect oracle intuitively questions the common reality model. You have two options: you can either frame it through your established beliefs about reality, or accept it as new reality.

If you filter the premises through your existing worldview, you interpret the predictor's track record as reducible to known mechanisms – sophisticated psychological profiling, ordinary computer simulations, maybe even luck. We've all seen magic tricks that seemed like magic but were still tricks. Essentially, the two-boxer says: "Omega is impressive but not supernatural. Its performance doesn't change the fundamental causal structure of reality. It's a very good psychologist, not a reality-breaking oracle. It made its prediction, then filled and sealed the box. My choice can't change what's inside. Subjunctive dependence is an interesting concept, but this is the real world, not a thought experiment. Two-box."

If you believe that Omega's performance transcends or even contradicts our understanding of the world, you are entering a world with different rules. You may not fully understand its mechanics, but the evidence overwhelmingly points to one-boxing. Essentially, the one-boxer says: "The premise clearly tells me this goes beyond ordinary psychology. A thousand correct predictions, near-perfect accuracy. I'm taking that as evidence that something is going on that I don't fully understand. Whatever Omega is, it's something else. If you are playing against an omniscient superintelligence or a mirror, you cannot beat it on your terms. I'll adapt my strategies accordingly. So, one-box."

Neither approach is unreasonable. The two-boxer is completely rational within their model of reality. The one-boxer is updating their model in response to evidence that breaks it. The deciding factor is epistemic temperament: how readily do you let anomalous evidence overturn your worldview?

Omega's nature determines if its accuracy is a constraint or a correlation, and tells you what kind of game you are playing (Wolpert and Benford, 2013). The premises force you to make an implicit decision about its nature, forming the basis for your decision-making strategy. The disagreement about boxes was a disagreement about Omega all along.

5. The Fairy Tale Test

Imagine that you meet a talking lion. Are you the kind of person who thinks: "apparently lions talk now, what else might be different here?" or "there must be a hidden speaker somewhere"? Neither response is crazy and neither predisposition is inherently better, but they lead to very different behavior.

The "hidden speaker" people are two-boxers. Confronted with a highly irregular situation, they say: "There are no fairy tales. There is an explanation consistent with my world model, and I'll behave accordingly." The "lions can talk" people are one-boxers. They say: "OK, we're in a fairy tale now. I'll play by the new rules."

Newcomb's problem is constructed so that it's impossible to tell which situation you're in. It's not about intelligence or rigor. It's about choosing your perspective.

This does not at all imply that a decision theory like FDT is akin to fairy tale rules, as opposed to reality. But whether or not FDT should be applied is another question altogether, as Yudkowsky and Soares (2017) also hint: "If the predictor is reliable, then there is a sense in which the predictor’s prediction depends on which action the agent will take in the future". So, FDT requires first accepting that the predictor is the kind of entity that is doing what it claims it's doing. And that acceptance is – well, an epistemic commitment about its nature. To illustrate: imagine the predictor is not a superintelligence but a PhD student from the local university who has somehow gotten most predictions right. Same track record – but would you apply FDT in this case? It's worth noting that calling the predictor "Omega" is a framing choice in itself.

6. Existing literature

Recent work has already suggested that Newcomb's paradox is not what it seems, and several lines of thought have narrowed it down from various directions.

Wolpert and Benford (2013) arrive at essentially the same conclusion as us, but through formal game theory. They show that one-boxers and two-boxers are playing two different games, corresponding to different probabilistic structures. Rather than a conflict of game theory principles, there is "imprecision in specifying the probabilistic structure of the game" you and Omega are playing. The paradox arises because the problem doesn't specify which one is correct. Once the game is fully specified, the optimal strategy is perfectly well-defined, and the paradox is resolved. We add the observation that what drives the choice of the game is your understanding of Omega's nature and its impact on your world model, i.e., the playing field. How you fill that gap determines which game you're playing.

Ninan (2006) also frames Newcomb's paradox primarily as an epistemic problem. He says it wouldn’t be epistemically rational for a player to be certain that her decision would not aﬀect the contents of the opaque box, and under these circumstances, one-boxing becomes CDT-rational. So he's primarily talking about how strong the player's trust in the experimental setup is, which determines her optimal strategy. But André's threshold argument implies that the tension isn't just psychological but structural. Even if you believe the premise, you need to deal with its ambiguity and a potentially ground-breaking reality shift. Ninan doesn't question whether this is an inherent property of the premise – he takes the premise as well-defined but hard to believe.

Bermúdez (in Ahmed, ed., Newcomb's Problem, 2018) argues that it's impossible for a rational agent to encounter a real-life Newcomb problem. This is adjacent to our view, though arrived at through different reasoning.

Finally, a note on LessWrong context. This community has generally favored one-boxing through FDT (Yudkowsky and Soares, 2017), which models the decision-making process as the output of a stable, deterministic computation rather than a physical action. Our argument does not touch the decision theory; it rather frames the problem as a preceding implicit epistemic choice.

7. Objections

FDT one-boxers don't need fairy tales. They have a formal mathematical framework.

True, and we're not denying that at all. Yudkowsky and Soares (2017) show that FDT fares better than EDT or CDT in many hypothetical scenarios. It's very relevant for formal philosophy and AI safety research, e.g. the design of aligned artificial agents and interactions between agents. FDT-based logic also helps explain some seemingly irrational human behaviors like voting. However, so far it hasn't seen many applications in the real world, where most systems operate on CDT. When facing Omega, you need to decide if your understanding of its nature warrants switching to FDT or if you should keep using CDT. The threshold argument suggests that at least near the threshold, the algorithm doesn't have a stable output. This is also why the premise of Devs, while being a great show, was questioned (e.g. reddit). This is not to say that FDT isn't a promising framework for agentic interactions.

Thought experiments stipulate their premises. Rejecting them is refusing to play.

We're not rejecting the premises at all. Our point is subtler: there is more than one way to "accept" a premise that conflicts with your understanding of reality. You can treat it as evidence that your understanding needs revision, or you can absorb it into your existing model by finding a compatible interpretation. Both are legitimate epistemic moves. In Newcomb's problem it's highly likely that this ambiguity isn't a quirk of how people read the problem – it's built into the problem itself. Not a bug, but a feature.

David Lewis and other serious two-boxers fully accept the premises and still two-box.

Lewis, in his well-known essay "Why Ain'cha Rich?" (1981), acknowledged that two-boxers walk away poorer. His response was that Newcomb's problem rewards irrationality – one-boxing works, but it's still irrational. Interestingly, he distinguished two different kinds of rationality: V-rationality of one-boxers maximizing V, "a kind of expected utility defined in entirely non-causal terms"; and U-rationality of two-boxers like himself maximizing U, "a kind of expected utility defined in terms of causal dependence as well as credence and value". In our framing, even though Lewis saw that U-rationality didn't lead to the desired result, he held onto the CDT framework for fundamental reasons and thus refused V-rationality.

This just pushes the problem back. Now we're arguing about epistemic temperament instead of decision theory.

True. Rather than arguing about the boxes, we're now arguing about how people interpret Omega's nature, which is a different problem altogether. In a sense, Newcomb's problem is akin to a magician, who uses misdirection to divert the audience's attention away from what is actually going on. We should focus more on something closer to metaphysics: what do you do when the evidence contradicts your model of reality? Reframing the question this way explains why the debate has been so persistent without ever converging. The debaters were never really having the same argument.

8. Conclusion

Newcomb's paradox is traditionally viewed as a decision theory problem, but it may actually be more of an indicator of epistemic temperament, i.e., your strategy for dealing with a situation when empirical evidence contradicts your established model of reality.

The decision-theoretical frameworks CDT, EDT, and FDT give different answers because they entail different prior attitudes toward the premises. Those attitudes trace back to a fundamental choice: If you hold that the boxes are already set and your present choice cannot affect their contents, two-boxing is the rational path within that model. If you accept Omega's accuracy at face value and let it reshape your worldview, one-boxing follows naturally as a response to a world with different rules. The real question Newcomb's problem asks is not "which box should you take?" but "what do you do when reality stops making sense?" or rather "at what point does reality stop making sense to you?"

A small anecdote at the end. When I told Newcomb's problem to one of the smartest and most rational people I know, she two-boxed without much hesitation. When I asked if and under which circumstances she'd switch to one-boxing, she said: yes, if there was another, unrelated observation of Omega-like power. So, she implicitly decided that a single Omega instance in this setup was not enough to overturn her world model, but a second, independent observation would be enough.

As for me – I finally understand why I'm a one-boxer, and the feeling of irrationality is gone. It's not that I've abandoned logic for fairy tales; it's that when my world model was confronted with new data, I prioritized the data. I’m not just OK with meeting a talking lion – I’m willing to update my biology textbook when he starts speaking.

1