Newcomb's Problem is effectively a problem about pre-commitment. Everyone agrees that if you have the opportunity to pre-commit in advance of Omega predicting you, then you ought to. The only question is what you ought to do if you either failed to do this or weren't given the opportunity to do this. LW-Style Decision Theories like TDT or UDT say that you should act as though you are pre-committed, while Casual Decision Theory says that it's too late.

Formal pre-commitments include things like rewriting your code, signing a legally binding contract, or providing assets as security. If set up correctly, they ensure that a rational agent actually keeps their end of the bargain. Of course, an irrational agent may still break their end of the bargain.

Effective pre-commitment describes any situation where an agent must (in the logical sense) necessarily perform an action in the future, even if there is no formal pre-commitment. If libertarian free will were to exist, then no one would ever be effectively pre-committed, but if the universe is deterministic, then we are effectively pre-committed to any choice that we make (quantum mechanics effectively pre-commits us to particular probability distributions, rather than individual choices, but for purposes of simplicity we will ignore this here and just assume straightforward determinism). This follows straight from the definition of determinism (more discussion about the philosophical consequences of determinism in a previous post).

One reason why this concept seems so weird is that there's absolutely no need for an agent that's effectively pre-committed to know that it is pre-committed until the exact moment when it locks in its decision. From the agent's perspective, it magically turns out to be pre-committed to whatever action it chooses, however, the truth is that the agent was always pre-committed to this action, just without knowing.

Much of the confusion about pre-commitment is about whether we should be looking at formal or effective pre-commitment. Perfect predictors only care about effective pre-commitment; for them formalities are unnecessary and possibly misleading. However, human-level agents tend to care much more about formal pre-commitments. Some people, like detectives or poker players, may be really good at reading people, but they're still nothing compared to a perfect predictor and most people aren't even this good. So in everyday life, we tend to care much more about formal pre-commitments when we want certainty.

However, Newcomb's Problem explicitly specifies a perfect predictor, so we shouldn't be thinking about human level predictors. In fact, I'd say that some of the emphasise on formal pre-commitment comes from anthropomorphizing perfect predictors. It's really hard for us to accept that anyone or anything could actually be that good and that there's no way to get ahead of it.

In closing, differentiating the two kinds of pre-commitment really clarifies these kinds of discussions. We may not be able to go back into the past and pre-commit to a certain cause of action, but we can take an action on the basis that it would be good if we had pre-committed to it and be assured that we will discover that we were actually pre-committed to it.

New to LessWrong?

New Comment
44 comments, sorted by Click to highlight new comments since: Today at 11:37 PM

Not sure why people find the Newcomb's problem so complicated, it is pretty trivial: you one-box: you win, you two-box: you lose. Doesn't matter when you feel you have made the decision, either, what maters is the decision itself. The confusion arises when people try to fight the hypothetical by assuming an impossible world where they can fool a perfect predictor has a non-zero probability of becoming an actual world.

Well the question is whether you're solely trying to figure out what you should do if you end up in Newcomb's Problem or whether you're trying to understand decision theory. In the former, perhaps the analysis is trivial, but for the later, figuring out why and where your intuition goes wrong is important.

I am yet to see a problem where simple counting and calculating expected probabilities didn't give the optimal outcome. A lot of confusion stems from trying to construct causal graphs, or from trying to construct counterfactuals, something that has nothing to do with decisions.

Why don't you think counterfactuals have anything to do with decisions?

Let me give you an example from your own Evil Genie puzzle: There are only two possible worlds, the one where you pick rotten eggs, and the one where you have a perfect life. Additionally, in the one where you have the perfect life, there is a bunch of clones of you who are being tortured. The clones may hallucinate that they have the capability of deciding, but, by the stipulation in the problem, they are stuck with your heartless decision. So, depending on whether you care about the clones enough, you "decide" on one or the other. There are no counterfactuals needed.

Yes, I am so happy to see someone else mentioning Evil Genie! That said, it doesn't quite work that way. They freely choose that option, it is just guaranteed to be the same choice as yours. "So, depending on whether you care about the clones enough" - well you don't know if you are a clone or an individual.

They freely choose that option, it is just guaranteed to be the same choice as yours.

That is where we part ways. They think they choose freely, but they are hallucinating that. There is no world where this freedom is expressed. The same applies to the original, by the way. Consider two setups, the original and the one where you (the original), and your clones are told that they are clones before ostensibly making the choice. By the definition of the problem, the genie knows your decision in advance, and, since the clones have been created, it is to choose the perfect life. Hence, regardless of whether you are told that you are a clone, you will still "decide" to pick the perfect life.

The sooner you abandon the self-contradictory idea that you can make decisions freely in a world with perfect predictors, the sooner the confusion about counterfactuals will fade away.

I wasn't claiming the existence of libertarian free will. Just that the clone's decision is no less free than yours.

My guess is that the thing you think is being hallucinated is not the thing your interlocutors refer to (in multiple recent conversations). You should make some sort of reference that has a chance of unpacking the intended meanings, giving the conversations more of a margin above going from the use of phrases like "fleely choose" to conviction about what others mean by that, and about what others understand you to mean by that.

I agree with that, but the inferential distance seems too large. When I explain what I mean (there is no such thing as making a decision changing the actual world, except in the mind of an observer), people tend to put up a mental wall against it.

My point is that you seem to disagree in response to words said by others, which on further investigation turn out to have been referring to things you agree with. So the disagreable reaction to words themselves is too trigger-happy. Conversely, the words you choose to describe your own position ("there is no such thing as making a decision...") are somewhat misleading, in the sense that their sloppy reading indicates something quite different from what you mean, or what should be possible to see when reading carefully (the quote in this sentence is an example, where the ellipsis omits the crucial detail, resulting in something silly). So the inferential distance seems mostly a matter of inefficient communication, not of distance between ideas themselves.

Thanks, it's a good point! I appreciate the feedback.

For the record, I actually agree that: "there is no such thing as making a decision changing the actual world, except in the mind of an observer" and made a similar argument here:

Just reread it. Seems we are very much on the same page. What you call timeless counterfactuals I call possible worlds. What you call point counterfactuals are indeed just mental errors, models that do not correspond to any possible world. In fact, my post makes many of the same points.

Counterfactuals are about the state of mind of the observer (commonly known as the agent), and thus are no more special than any other expected utility calculation technique. When do you think counterfactuals are important?

"Counterfactuals are about the state of mind of the observer" - I agree. But my question was why you don't think that they have anything to do with decisions?

When do you think counterfactuals are important?

When choosing the best counterfactual gives us the best outcome.

Maybe we have different ideas about what counterfactuals are. What is your best reference for this term as people here use it?

An imaginary world representing an alternative of what "could have happened"

Ah, so about a different imaginable past? Not about a different possible future?

A different imaginable timeline. So past, present and future

Ah. I don't quite understand the "different past" thing, at least not when the past is already known. One can say that imagining a different past can be useful for making better decisions in the future, but then you are imagining a different future in a similar (but not identical in terms of a mictrostate) setup, not a different past.

The past can't be different, but the "past" in a model can be.

No, it cannot. What you are doing in a self-consistent model is something else. As jessicata and I discussed elsewhere on this site, What we observe is a macrostate, and there are many microstates corresponding to the same macrostate. The "different past" means a state of the world in a different microstate than in the past, while in the same macrostate as in the past. So there is no such thing as a counterfactual. the "would have been" means a different microstate. In that sense it is no different from the state observed in present or in the future.

I have to admit that my intuition is that Omega is cheating, and somehow changing the box contents after my decision. CDT works fine in this case: one-box and take the money. I don't think I learn much by figuring out where my intuition is wrong, so I have to first break my intuition and believe in a perfect predictor, then figure out where that counterfactual intuition is wrong. At which point my head starts to hurt.

In a world with perfect behavioral predictions over human timescales, it's just silly to believe in simple free will. I don't think that is our world, but I also don't think it's resolvable by pure discussion.

"I have to admit that my intuition is that Omega is cheating, and somehow changing the box contents after my decision" - Well, if there's any kind of backwards causation, then you should obviously one-box.

"I don't think I learn much by figuring out where my intuition is wrong, so I have to first break my intuition and believe in a perfect predictor, then figure out where that counterfactual intuition is wrong. At which point my head starts to hurt" - it may help to imagine that you are submitting computer programs into a game. In this case, perfect prediction is possible as it has access to the agents source code and it can simulate the situation the agent will face perfectly.

Note that Newcomb's problem doesn't depend on perfect prediction – 90% or even 55% accurate Omega still makes the problem work fine (you might have to tweak the payouts slightly)

Sure, it's fine with even 1% accuracy with 1000:1 payout difference. But my point is that causal decision theory works just fine if Omega is cheating or imperfectly predicting. As long as the causal arrow isn't fully independent from prediction to outcome and decision to outcome, one-boxing is trivial.

If "access to my source code" is possible and determines my actions (I don't honestly know if it is), then the problem dissolves in another direction - there's no choice anyway, it's just an illusion.

it’s fine with even 1% accuracy with 1000:1 payout difference.

Well, if 1% accuracy means 99% of one-boxers are predicted to two-box, and 99% of two-boxers are expected to one-box, you should two-box. The prediction needs to at least be correlated with reality.

Sorry, described it in too few words. "1% better than random" is what I meant. If 51.5% of two-boxers get only the small payout, and 51.5% of one-boxers get the big payout, then one-boxing is obvious.

In particular, I'd argue that the paradoxical aspects of Newcomb's problem result from exactly this kind of confusion between the usual agent idealization and the fact that actual actors (human beings) are physical beings subject to the laws of physics. The apparent paradoxical aspects results because we are used to idealizing individual behavior in terms of agents where that formalism requires we specify the situation in terms of a tree of possibilities with each path corresponding to an outcome and with the payoff computed by looking at the path specified by all agent's choices (e.g. there is a node where the demon player chooses what money to put in the boxes and then there is a node where the human player, without knowledge of demon player's choices, decides to take both boxes or neither). The agent formalization (where 1 or 2 boxing is modeled as a subsequent choice) simply doesn't allow the content of the boxes to depend on whether or not the human agent chooses 1 or 2 boxes.

Of course, since actual people aren't ideal agents one can argue that something like the newcomb demon is physically possible but that's just a way of specifying that we are in a situation where the agent idealization breaks down.

This means there is simply no fact of the matter about how a rational agent (or whatever) should behave in newcomb type situations because the (usual) rational agent idealization is incompatible with the newcomb situation (ok, more technically you can model it that way but the choice of how to model it just unsatisfactorily builds in the answer by specifying how the payoff depends on 1 vs two boxing).

To sum up what the answer to the newcomb problem is depends heavily on how you preciscify the question. Are you asking whether humans who are disposed to decide in way A end up better of than humans disposed to behave in way B? In that case it's easy. But things like CDT, TDT etc.. don't claim to be producing facts of that kind but rather saying something about ideal rational agents of some kind which then just boringly depends on a ambiguities in what we mean by that.ideal rational agents.

"This means there is simply no fact of the matter about how a rational agent (or whatever) should behave in newcomb type situations" - this takes this critique too far. Just because the usual agent idealisation breaks, it doesn't follow that we can't create a new idealisation that covers these cases.

Obviously you can and if you define that NEW idealization an X-agent (or more likely redefine the word rationality in that situation) and then there may be a fact of the matter about how an X-agent will behave in such situations. What we can't do is assume that there is a fact of the matter about what a rational agent will do that outstrips the definition.

As such it doesn't make sense to say CDT is right or TDT or whatever before introducing a specific idealization relative to which we can prove they give the correct answer. But that idealization has to come first and has to convince the reader that it is a good idealization.

But the rhetoric around these decision theories misleadingly tries to convince us that there is some kind of pre-existing notion of rational agent and they have discovered that XDT gives the correct answer for that notion. That's what makes people view these claims as interesting. If the claim was nothing more than 'here is one way you can make decisions corresponding to the following assumptions" it would be much more obscure and less interesting.

There are pre-formal facts about what words should mean, or what meanings to place in the context where these words may be used. You test a possible definition against the word's role in the story, and see if it's apt. This makes use of facts outside any given definition, just as with the real world.

And here, it's not even clear what the original definitions of agents should be capable of, if you step outside particular decision theories and look at the data they could have available to them. Open source game theory doesn't require anything fundamentally new that a straightforward idealization of an agent won't automatically represent. It's just that the classical decision theories will discard that data in their abstraction of agents. In Newcomb's problem, it's essentially discarding part of the problem statement, which is a strange thing to expect of a good definition of an agent that needs to work on the problem.

Except if you actually go try and do the work people's pre-theoretic understanding of rationality doesn't correspond to a single precise concept.

Once you step into Newcomb type problems it's no longer clear how decision theory is supposed to correspond to the world. You might be tempted to say that decision theory tells you the best way to act...but it no longer does that since it's not that the two-boxer should have picked one box. The two-boxer was incapable of so picking and what EDT is telling you is something more like: you should have been the sort of being who would have been a one boxer not that *you* should have been a one boxer.

Different people will disagree over whether their pre-theoretic notion of rationality is one in which it is correct to say that it is rational to be a one/two boxer. Classic example of working with a imprecisely defined concept.

I'd argue that nobody cares about formal pre-commitments, except to the extent that formality increases knowledge of effective pre-committments.

I've seen plenty of cases where people talk about pre-commitment as though you are only pre-committed if you are formally pre-committed. However, maybe this is just an assumption that they didn't examine too hard. But perhaps it would have been better to call this Legible vs. Effective pre-commitments.

In many discussions, "effective pre-commitment" is more simply described as "commitment". Once you're talking about pre- something, you're already in the realm of theory and edge cases.

There _is_ a fair bit of discussion about pre-commitment as signaling/negotiation theory rather than as decision theory. In this case, it's the appearance of the commitment, not the commitment itself that matters.

typo: "Casual Decision Theory"

True Path has already covered it (or most of it) extensively, but both the Newcomb's Problem and the distinction made in the post (if it were to be applied in a game theory setting) contain too many inherent contradiction and do not seem to actually point out anything concrete.

You can't talk about decision-making agents if they are basically not making any decisions (classical determinism, or effective precommitment in this case, enforces that). Also, you can't have a 100% accurate predictor and have freedom of choice on the other hand, because that implies (in the very least) that the subset of phenomena in the universe that govern your decision is deterministic.

[Plus, even if you have a 99.9999... (... meaning some large N times 9, not infinity) percent accurate predictor, if the Newcomb's problem assumes perfect rationality, there's really no paradox.

I think what this post exemplifies (and perhaps that was the intent from the get-go and I just completely missed it) is precisely that Newcomb's is ambiguous about the type of precommitment taken (which follows from it being ambiguous about how Omega works) and therefore is sort of self contradictory, and not truly a paradox.]

"Also, you can't have a 100% accurate predictor and have freedom of choice on the other hand" - yes, there is a classic philosophical argument that claims determinism means that we don't have libertarian freewill and I agree with that.

"You can't talk about decision-making agents if they are basically not making any decisions" - My discussion of the student and the exam in this post may help clear things up. Decisions don't require you to have multiple things that you could have chosen as per the libertarian freewill model, but simply require you to be able to construct counterfactuals. Alternatively, this post by Anna Salamon might help clarify how we can do this.

It doesn't really make sense to talk about the agent idealization at the same time as talking about effective precommitment (i.e. deterministic/probabilistic determination of actions).

The notion of an agent is an idealization of actual actors in terms of free choices, e.g., idealizing individuals in terms of choices of functions on game theoretic trees. This is an incompatible idealization with thinking of such actors as being deterministically or probabilistically committed to actions for those same 'choices.'

Of course, ultimately, actual actors (e.g. people) are only approximated by talk of agents but if you try and simultaneously use the agent idealization while regarding those *same* choices as being effectively precommited you risk contradiction and model absurdity (of course you can decide to reduce the set of actions you regard as free choices in the agent idealization but that doesn't seem to be the way you are talking about things here).

What do you mean by agent idealization? That seems key to understanding your comment, which I can't follow at the moment.

EDIT: Actually, I just saw your comment above. I think TDT/UDT show how we can extend the agent idealization to cover these kinds of situations so that we can talk about both at the same time.

To the extent they define a particular idealization it's one which isn't interesting/compelling. What one would want to have to say there was a well defined question here is a single definition of what a rational agent is that everyone agreed on which one could then show favored such and such decision theory.

To put the point differently you and I can agree on absolutely every fact about the world and mathematics and yet disagree about which is the best decision theory because we simply mean slightly different things by rational agent. Moreover, there is no clear practical difference which presses us to use one definition or another like the practical usefulness of the aspects of the definition of rational agreement which yield the outcomes that all the theories agree on.

Some people, like detectives or poker players, may be really good at reading people, but they're still nothing compared to a perfect predictor and most people aren't even this good.

Which is why the notion of trust exists - if someone always does something (they're never late), we figure they will probably continue to do so. The longer a pattern holds, the more credence we lend to it continuing (especially if it's very consistent).