How does the typical construction of Newcomb's problem handle mixed strategies?
A more general claim is: if something can predict your action with better than random accuracy no matter how hard you try to prevent them, you don’t have free will over that action. (I’m not addressing the question of whether free will exists in general, only whether a particular action is chosen freely.)
The whole framing here strikes me as confused (although maybe I am confused). The way you are phrasing it already assumes that you are in conflict with someone ("no matter how hard you try to prevent them"). Your setup already assumes away you have free will. Both Cooperate Bot and Defect Bot do in fact not have free will. The whole point of game-theory is that you have agents that can simulate each other to some extent. A more useful game theory bot is a function that takes the function of another agent as input. When you assume that you just exist in one place and time, then you are already assuming a setup where game theory is not useful. If you are predicted by by someone else, then you (your computation) is being simulated, you don't exist in just one place and time, you exist in different places and times (although not all of these versions are the full You). You don't get to choose where you exist, but in this framing you do get to choose your output.
Newcomb’s problem is a famous paradox in decision theory. The simple version is as follows:
Two boxes are designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:
- Box A is transparent and always contains a visible $1,000.
- Box B is opaque and its content has already been set by the predictor:
- If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.
- If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
The player does not know what the predictor predicted or what box B contains while making the choice.
The argument in favor of “one-boxing” is that one-boxers systematically get more money—only one-boxers get the $1,000,000.
The argument in favor of “two-boxing” is that, at the point in time when you’re faced with the choice, nothing you can do will change the amount of money in the boxes. The predictor has already put the money in. And so taking both must be strictly better.
My claim is that the fundamental paradox here is that the existence of the predictor is incompatible with free will. If a predictor can allocate money based on your predicted future actions, you don’t have complete freedom of choice, which causes confusion when thinking about making free decisions.
Newcomb’s Problem only looks like a paradox because people are trying to insert the pretense of free choice into a setup that denies that freedom. Perfect (or even robustly better-than-random) prediction of your act means you don’t have freedom over that act. Therefore, asking “What should I choose in Newcomb’s Problem right now?” is confused in the same way “What should the output bit of this already-wired circuit choose at timestamp t?” is confused.
A more general claim is: if something can predict your action with better than random accuracy no matter how hard you try to prevent them, you don’t have free will over that action. (I’m not addressing the question of whether free will exists in general, only whether a particular action is chosen freely.)
Some may ask: what if someone can predict your actions the second before you take them using a brain-scanning device? I think that’s irrelevant. For decision/game theory problems, it makes sense to discretize time. Each time step is a point when at least one of the agents can make a decision. If an agent has information about another agent’s future decision no matter what the other agent’s strategy is, the other agent is not acting freely. In the case of the brain-scanner, I’d say “the second before” just reflects the fact that taking an action in the physical world takes >0 time, and once you’ve made the decision to start the action you are no longer free to reverse it. Further, for the purposes of what I’m saying, it’s not relevant whether free will generally exists or not, but rather only whether, in a particular situation, you’re freely choosing between some number of options.
A couple examples of the principle:
Or consider the following problem:
You’ve been sorted into a group of people based on your personality, which was determined by an algorithm that observed your behavior for many years. The algorithm either identified you as a “cooperator”, in which case it put you in a group with other cooperators, or as a “defector”, in which case it put you in a group with other defectors[1]. The algorithm is known to be very accurate. Inside the group, a Prisoner’s Dilemma is arranged. Without talking, you must decide to either “cooperate” or “defect”. If everyone in the group cooperates, you all win $100. If some but not all people defect, anyone who didn’t defect gets nothing, and the defectors each get $200. If everyone defects, no-one gets any money at all.
This problem is similar to Newcomb’s problem in that if you think only about the causal impact of your decision, you should always defect. Defecting gets you strictly more money—either you've been sorted into the defectors group and so you get no money either way (because everyone else will defect), or you’ve been sorted into the cooperators group, and so if you defect you’ll get $200 instead of $100.
The argument in favor of cooperating is that it’s clearly better to be a cooperator. The cooperator groups get $100 each, while the defector groups get nothing.
But the issue here is that the whole setup presumes that you can’t systematically fool the algorithm. If you had free will, your choice to defect or cooperate in the scenario could be completely decoupled from your prior behavior. Thus, in the moment, the possibilities would be as follows:
As soon as you say something like “the algorithm would have predicted you’d change your tendency in the moment” you are positing a world where people cannot change what type of person they are in the moment; a world without free choice. For in a world with free choice (for these actions), you can be a perfect cooperator type until the last minute and then switch to become a defector.
Of course, in a world full of utility-maximizing freely-choosing agents, the algorithm would be best off predicting that everyone is a defector, which is not good for you. This leads us to the topic of pre-commitment.
Some people’s answer to Newcomb’s Problem is that:
In other words, it’s “just a standard time consistency problem”, as Basil Halperin writes. Quoting from Basil’s post:
So to summarize, what’s the answer to, “Should you one-box or two-box?”?
The answer is, it depends on from which point in time you are making your decision. In the moment: you should two-box. But if you’re deciding beforehand and able to commit, you should commit to one-boxing.
How does this work out in real life? In real life, you should – right now, literally right now – commit to being the type of person who if ever placed in this situation would only take the 1-box. Impose a moral code on yourself, or something, to serve as a commitment device. So that if anyone ever comes to you with such a prediction machine, you can become a millionaire 😊.
This is of course what’s known as the problem of “time consistency”: what you want to do in the moment of choice is different from what you-five-minutes-ago would have preferred your future self to do. Another example would be that I’d prefer future-me to only eat half a cookie, but if you were to put a cookie in front of me, sorry past-me but I’m going to eat the whole thing.
Thus my claim: Newcomb merely highlights the issue of time consistency.
So why does Newcomb’s problem produce so much confusion? When describing the problem, people typically conflate and confuse the two different points in time from which the problem can be considered. In the way the problem is often described, people are – implicitly, accidentally – jumping between the two different points of view, from the two different points in time. You need to separate the two possibilities and consider them separately. I have some examples in the appendix at the bottom of this type of conflation.
The only problem with this line of thinking is that true pre-commitment is very difficult. I agree that one should pre-commit to one-boxing, if such an option is available, but how? Simply saying “I pre-commit” doesn’t work. Furthermore, if you successfully pre-committed to one-boxing, there’s no choice to make. And so it’s no longer a “decision problem” in the intuitive sense.
Some claim that the correct strategy is to consistently do things they would have pre-committed to, so that they are treated and modeled by others as cooperators/one-boxers. However, this only makes sense under two conditions:
Of course, whether or not we have some free will, we are not entirely free—some actions are outside of our capability. Being sufficiently good at deception may be one of these. Hence why one might rationally decide to always be honest and cooperative—successfully only pretending to be so when others are watching might be literally impossible (and messing up once might be very costly).
The notion of pre-commitment also highlights how free will is central to the decision to one- or two-box.
You could split the problem into “what happens if you have free will” vs. “what happens if you don’t have free will”:
Michael Huemer writes about why two-boxing is correct in his post “The Solution to Newcomb’s Paradox”.
having a goal does not (intrinsically) give you a reason to give yourself evidence that the goal will be satisfied; it gives you a reason to cause the goal to be satisfied.
Since you cannot change the past, the correct EU calculation is to treat the past as fixed, and calculate EU given each possible past state of the world. Then take an average of these EU values, weighted by your credence in each possible past state. This way of doing the calculation necessarily preserves the results of dominance reasoning.
The only issue is that, as soon as we speak of “causing a goal to be satisfied”, we’re presuming freedom of choice. Whereas Newcomb’s problem, as generally posed, assumes you are not free.
The stubborn Rationalist repeats but one-boxing will leave me richer, and so I will choose to one-box. But you can’t change the payouts in the moment. Insofar as you can choose anything, you’re only choosing between X and X + 1000. (I’m talking about the instantaneous Newcomb’s problem, i.e. you are deciding now, not “what would you decide in advance” or “what would you commit to”.) If you’re simply saying “Because of my nature, I have already successfully pre-committed to one-boxing” then you are saying “I don’t have free will, I have to one-box, because I’ve cut off my other option”. And this is valid, but then you’re not “choosing”. (And please explain how you’ve committed.)
In conclusion, if you find yourself freely choosing between options, it’s rational to take a dominating strategy, like two-boxing in Newcomb’s problem, or defecting in the sorted prisoner’s dilemma. However, given the opportunity to actually pre-commit to decisions that get you better outcomes provided your pre-commitment, you should do so. The most interesting thing about Newcomb’s problem is that it demonstrates that the capacity to make decisions is sometimes disadvantageous for future situations. You don’t have free will in Newcomb’s problem, so you better hope you’re destined to one-box. But if you do have free will (for example, because they lied to you about Omega, or because you’re the only guy in the universe with free will), you may as well choose to get an extra $1000!
The same scenario can be described as “you were put in a group with people who have a correlated personality type”.