Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I have written extensively on Newcomb's problem, so I assumed that I must have written up a clear explanation for this at some point. However, this doesn't seem to be the case, so I suppose I'm doing it now. The TLDR is that only one counterfactual is factual and the rest are constructed. Since they're constructed, there's no real requirement for them to have the same past as the factual and, in fact, if we value consistency, then the most natural way to construct them will involve tweaking the past.

The Student and the Exam

I've covered this example in a past post, so feel free to skip to the next section if you know it.

Suppose a student has a test on Friday. They are considering whether to study or go to the beach. Since the universe deterministic, they conclude that the outcome is already fixed and has been fixed since the start of time. Therefore, they figure out that they may as well not bother to study. Is there anything wrong with this reasoning?

My answer is that there are two views we can take of the universe.

Raw Reality: From this perspective, we are only looking at the universe as it is in its raw form; that is the territory component of the map and territory. This means that the outcome is fixed, but that the choice of whether or not to study is fixed as well. Or to be clear, the outcome is fixed in part due to the student's decision being fixed; and not independently of it. From within this view, we can't construct anything other than a Trivial Decision Theory problem as there's only a single choice we can take.

(Update: Eliezer seems to adopt a similar view here).

Augmented Reality: This perspective is created by constructing counterfactuals. It is only from within this perspective that we can talk about the student having a choice. The fact that the student necessarily obtains a particular outcome places no limitation on the counterfactuals - which are by definition not factual and were expressly created for the purpose of considering the universe travelling down a different path.

Newcomb's Problem

One of the most confusing aspects of Newcomb's Problem, is that the one-boxing solution seems to depend on backwards causation. It seems to rely on the reasoning that if I made a different choice, this would cause Omega to make a different prediction. I'll assume determinism as if we have a free will independent of determinism, Omega wouldn't be able to be a perfect predictor. (Similarly, quantum physics doesn't make much of a difference as it simply changes the universe from deterministic to probabilistically deterministic).

I've argued that backwards causation isn't necessarily absurd, but nonetheless, I still want to demonstrate that 1-boxing doesn't require backwards causation so that we can completely avoid this controversy. Further, it is possible to undermine the concept of causality itself, but I'll ignore this, as if you do this, then there won't be any problem for me to solve.

We'll do this by making a similar move to the one we made in The Student and The Exam. However, instead of suggesting that counterfactuals can have different futures from the factual, we'll suggest that they can have different pasts.

The first point I'll note that the mere fact of something being a particular way in the factual doesn't mean that it has to be that way in the counterfactual. If we didn't make any changes, then it wouldn't actually be a counterfactual, but beyond this there'd be no point.

So we're allowed to make changes, but why are we specifically allowed to edit the past? Well, when we edit the decision an agent makes, only projecting that decision forwards in time results in an inconsistent counterfactual. For example: the agent is the kind of agent that will go left, up until the moment of the decision when they magically decide to go right instead. Editing the past to make a counterfactual consistent, is an entirely natural thing to do. After all, we might very well doubt the value in considering a counterfactual that isn't even consistent with our laws of physics! The vague intuition behind counterfactuals is for them to be something that "could have happened" - I suspect that I'm not alone in wanting to protest that an inconsistent counterfactual couldn't have happened!

I'll acknowledge that the previous two arguments contain involve gesturing at an unclear notion of what counterfactual are. This is inevitable as I don't yet have a complete theory of counterfactuals and even if I did, including it would greatly complicate this post. Nonetheless, I think they should be sufficient to persuade most people that constructing counterfactuals with different pasts is completely reasonable.

Nonetheless, some people may wonder why the process I've described doesn't involve backwards causation. After all, it involved editing a decision and then backpropagating. The key point is that this backwards propagation occurs during the process for constructing the counterfactual and that it is not necessarily a fact about the internal structure of the counterfactual.

That is, within the actual counterfactual causality operates normally. If you are using an entropic arrow for determining the direction of time, it will in most cases be pointing the same way in both the counterfactual and the factual. If you think of causality as the existence of laws which order the universe temporally (the mere fact of X being the case forcing some Y to also be the case later in time), then the counterfactual will also have these laws. Our model of the counterfactual is just like our model of the factual with a few tweaks!

But beyond this, even if you think causality is about the relationship between the counterfactuals, you can't claim backwards causation by merely referring to the process used to generate the counterfactuals rather than the actual counterfactuals themselves. And the mere fact that the counterfactuals have different pasts, isn't sufficient to count as backwards causation. Claiming this would collapse two distinct concepts into one. But if we did define backwards causation in this way, this whole issue would become something of a nothing-burger as we could just concede its existence and then say, "So what?".


The claim "if I made a different choice, then the past would be different" is misleading. From the view of raw reality, it's important to understand that you can only make the choice you made and the past can only be as it was. From the view of augmented reality, the claim merely becomes "if we look at a version of me that would have made a different choice, then it will be located in a counterfactual with a different past".


  • The Prediction Problem: My first attempt at explaining Newcomb's problem - it did okay, but I'm kind of embarrassed about it now. I see the Prediction Problem as a useful intuition pump for why you should take into meta-theoretical uncertainty, but I don't see it as having anything to say about which decision theory is best in and of itself.
  • Deconfusing Logical Counterfactuals: Another somewhat outdated post of mine since it defends an erasure model of counterfactuals that I no longer endorse. Nonetheless, I still think that it contains some interesting ideas about Newcomb-like problems.
New Comment
14 comments, sorted by Click to highlight new comments since: Today at 11:29 PM

Omega can simulate people accurately down to the moment they choose, just as a programmer can run a program. Only a different person could have made a different choice, and Omega would also have foreseen that. You are not a person who makes a choice. You are a one-boxer or a two-boxer, and always have been. Even if you don't know which you are until the moment you choose, that is only an example of your limited self-awareness, a limitation that Omega does not share.

There are two things you can do with Newcomb's Problem. One is to outline arguments for one-boxing and two-boxing, along with whether and why they control your actual decision. This is mainly a formal, logical problem, though you can make illogical arguments too. This argument as a whole is in this category.

The other is to make a prediction about whether, in this scenario, you would take one box or two, and what would motivate that choice in the moment. These are questions about your personal psychology.

Underlying Newcomb's Problem, then, lurks the is-ought problem.

Underlying Newcomb's Problem, then, lurks the is-ought problem.


It's interesting that you say this. I've been thinking a lot about what counterfactuals are at their base and one of the possibilities I've been considering is that maybe we can't provide an objective answer due to the is-ought problem.

Hmm... it's interesting that you're writing this comment. I suppose it indicates that I didn't make this point clearly enough?

I guess I probably should have expanded on this sentence: "From the view of pure reality, it's important to understand that you can only make the choice you made and the past can only be as it was".

It’s not that you didn’t make your point clearly enough! It’s that I have been confused by Newcomb’s problem in the past, so this was my attempt to re-express your idea in my own words and see if it still made sense. Would you say it correctly re-states your OP?

Yeah, that's the raw reality perspective. I guess the core point of my post was that there are two different perspectives.

Are there real humans still arguing against this?  It seems so obvious that, once you accept that decisions have causes and are not independent things, CDT dies on the vine, and Newcomb has a simple model that the causes of your decision are the same as the causes of Omega's box-filling.  

The only arguments against this I've seen in the last N years are that maybe decisions are NOT completely determined by state that's accessible to any possible Omega (quantum uncertainty woo is the most common of such arguments).  But that's not an argument against anything in the Newcomb problem, that's just denying the setup itself).  

It's not so much people arguing against this as being confused about how to explain backwards causation. So like tying up loose ends.

Fair enough.  I do like the idea that counterfactuals are just as reasonable (and useful) for the past as for the future - it shouldn't matter whether it didn't happen or it won't happen.  

Well I've got another post arguing backwards causation isn't necessarily absurd. But we don't need to depend on it.

I'm confused... What you call the "Pure Reality" view seems to work just fine, no? (I think you had a different name for it, pure counterfactuals or something.) What do you need counterfactuals/Augmented Reality for? Presumably making decisions thanks to "having a choice" in this framework, right? In the pure reality framework the "student and the test" example one would dispassionately calculate what kind of a student algorithm passes the test, without talking about making a decision to study or not to study. Same with the Newcomb's, of course, one just looks at what kind of agents end up with a given payoff. So... why pick an AR view over the PR view, what's the benefit?

Excellent question. Maybe I haven't framed this well enough.

We need a way of talking about the fact that both your outcome and your action are fixed by the past.

We also need a way of talking about the fact that we can augment the world with counterfactuals (Of course, since we don't have complete knowledge of the world, we typically won't know which is the factual and which are the counterfactuals).

And that these are two distinct ways of looking at the world.

I'll try to think about a cleaner way of framing this, but do you have any suggestions?

(For the record, the term I used before was Raw Counterfactuals - meaning consistent counterfactuals - and that's a different concept than looking at the world in a particular way).

(Something that might help is that if we are looking at multiple possible pure realities, then we've introduced counterfactuals as only one is true and "possible" is determined by the map rather than the territory)

I think the best way to explain this is to imagine characterise the two views as slightly different functions both of which return sets. Of course, the exact type representations isn't the point. Instead, the types are just there to illustrate the difference between two slightly different concepts.

possible_world_pure() returns {x} where x is either <study & pass> or <beach & fail>, but we don't know which one it will be

possible_world_augmented() returns {<study & pass>, <beach & fail>}

Once we've defined possible worlds, it naturally provides us a definition of possible actions and possible outcomes that matches what we expect. So for example:

size(possible_world_pure()) = size(possible_action_pure()) = size(possible_outcome_pure()) = 1

size(possible_world_augmented()) = size(possible_action_augmented()) = size(possible_outcome_augmented()) = 2

And if we have a decide function that iterates over all the counterfactuals in the set and returns the highest one, we need to call it on possible_world_augmented() rather than possible_world_pure().

Note that they aren't always this similar. For example, for Transparent Newcomb they are:

possible_world_pure() returns {<1-box, million>}

possible_world_augmented() returns {<1-box, million>, <2-box, thousand>}

The point is that if we remain conscious of the type differences then we can avoid certain errors.

For example possible_outcome_pure() = {"PASS"}, doesn't mean that possible_outcome_augmented() = {"PASS"}. It's that later which would imply it doesn't matter what the student does, not the former.

Hmm, it sort of makes sense, but possible_world_augmented() returns not just a set of worlds, but a set of pairs, (world, probability). For example for the transparent Newcomb's you get possible_world_augmented() returns {(<1-box, million>, 1), (<2-box, thousand>, 0)}. And that's enough to calculate EV, and conclude which "decision" (i.e. possible_world_augmented() given decision X) results in maxEV. Come to think of it, if you tabulate this, you end up with what I talked about in that post.