I’ve finished reading through Functional Decision Theory: A New Theory of Instrumental Rationality, and in my mind, the main defense of causal-decision-theory (CDT) agents in Newcomb’s Problem is not well-addressed. As they stated in the paper, the standard defense of two-boxing is that Newcomb’s Problem explicitly punishes rational decision theories, and rewards others. The paper then refutes this by saying: “In short, Newcomb’s Problem doesn’t punish rational agents; it punishes two-boxers”.
It is true that FDT agents will “win” at Newcomb’s Problem, but what the paper doesn’t address is that it is quite easy to come up with similar situations where FDT agents are punished and CDT agents are not. A simple example of this is something I’ll call the Inverse Newcomb’s Problem since it only requires changing a few words relative to the original.
Inverse Newcomb’s Problem:
An agent finds herself standing in front of a transparent box labeled “A” that contains $1,000, and an opaque box labeled “B” that contains either $1,000,000 or $0. A reliable predictor, who has made similar predictions in the past and been correct 99% of the time, claims to have placed $1,000,000 in box B iff she predicted that the agent would two-box in the standard Newcomb’s problem. The predictor has already made her prediction and left. Box B is now empty or full. Should the agent take both boxes, or only box B, leaving the transparent box containing $1,000 behind?
The bolded part is the only difference relative to the original problem. The situation initially plays out the same, but just branches at the end:
1. The prediction is made by a reliable predictor (I’ll use the name Omega) for how the agent would respond in a standard Newcomb’s dilemma (“one-box” or “two-box”)
2. Armed with this prediction, Omega stands in front of box B and decides whether to place the $1,000,000 inside.
3. The $1,000,000 goes in the box iff:
a. The prediction was “one-box” (Standard Newcomb’s Problem)
b. The prediction was “two-box” (Inverse Newcomb’s Problem)
Because FDT agents would one-box in a standard Newcomb’s problem, their box B in the inverse problem is empty. In this inverse problem, there is no reason for FDT agents not to two-box, and they end up with $1,000. In contrast, because CDT agents would two-box in a standard Newcomb’s problem, their box B in the inverse problem has $1,000,000. They two-box in this case as well and end up with $1,001,000. In much the same way that Omega in the standard Newcomb’s problem “punishes two-boxers”, in the inverse problem it punishes one-boxers.
One may try to argue that the Standard Newcomb’s problem is much more plausible than the inverse, but this does not hold weight. The standard problem has a certain level of elegance and interest to it due to the dilemma it creates, and the fact that Omega has reproduced the scenario from its prediction, but this does not mean that it’s more likely. Whether Omega rewards standard one-boxers (3a) or two-boxers (3b) is completely tied to the motivations of this hypothetical predictor, and there’s no way to justify that one is more likely than the other.
By the very nature of this situation, it is impossible for a single agent to get the $1,000,000 in both the Standard and Inverse Newcomb’s Problems; they are mutually exclusive. Both FDT and CDT agents are punished in one scenario and rewarded in the other, so to measure their relative performance in a fair manner, we can imagine that each agent gets presented with both the Standard and Inverse problems, and the total money earned is calculated:
The CDT agent will two-box in each case and earn $1,002,000 ($1,000 in the Standard problem and $1,001,000 in the Inverse problem).
The FDT agent will one-box in the Standard problem ($1,000,000) and two-box in the Inverse problem ($1,000), earning $1,001,000.
It is clear that when the agents are presented with the same sequence of two scenarios, one that rewards Standard one-boxers and the other that rewards Standard two-boxers, the CDT agent outperforms the FDT agent. By choosing to one-box in the Standard problem, the FDT agent leaves $1,000 on the table, while the CDT agent claims all the money that was available to them.
To conclude, FDT agents undoubtedly beat CDT agents at the Standard Newcomb’s problem, but that result alone is not relevant in comparing the two in general. This situation is one where a third agent (Omega) has explicitly chosen to reward one-boxers, but as shown above, it is simple enough to imagine an equally-likely case where one-boxers are punished instead. When the total performance of both agents across the two equally-likely scenarios is considered, CDT ends up with an extra $1,000. Based on this, I don’t see how FDT can be considered a strict advance over CDT.