Oct 23, 2012
I set out to understand precisely why naive TDT (possibly) fails the counterfactual mugging problem. While doing this I ended up drawing a lot of Bayes nets, and seemed to gain some insight; I'll pass these on, in the hopes that they'll be useful. All errors are, of course, my own.
First let's look at the problem that inspired all this research: the Newcomb problem. In this problem, a supremely-insightful-and-entirely-honest superbeing called Omega presents two boxes to you, and tells you that you can either choose box A only ("1-box"), or take box A and box B ("2-box"). Box B will always contain $1K (one thousand dollars). Omega has predicted what your decision will be, though, and if you decided to 1-box, he's put $1M (one million dollars) in box A; otherwise he's put nothing in it. The problem can be cast as a Bayes net with the following nodes:
Your decision algorithm (or your your decision process) is the node that determines what you're going to decide. This leads to "Your decision" (1-box or 2-box) and Ω (puts $1M or zero in box A). These lead to the "Money" node, where you can end up with $1M+1K, $1M, $1K or $0 depending on the outputs of the other nodes. Note that the way the network is set up, you can never have $1M+1K or $0 (since "Ω" and "Your decision" are not independent). But it is the implied "possibility" of getting those two amounts that causes causal decision theory to 2-box in the Newcomb problem.
In TDT, as I understand it, you sever your decision algorithm node from the history of the universe (note this is incorrect, as explained here. In fact you condition on the start of your program, and screen out the history of the universe), and then pick the action that maximises our utility.
But note that the graph is needlessly complicated: "Your decision" and "Ω" are both superfluous nodes, that simply pass on their inputs to their outputs. Ignoring the "History of the Universe", we can reduce the net to a more compact (but less illuminating) form:
Here 1-box leads to $1M and 2-box leads to $1K. In this simplified version, the decision is obvious - maybe too obvious. The decision was entirely determined by the choice of how to lay out the Bayes net, and a causal decision theorist would disagree that the original "screened out" Bayes net was a valid encoding of the Newcomb problem.
In the counterfactual mugging, Omega is back, this time explaining that he tossed a coin. If the coin came up tails, he would have asked you to give him $1K, giving nothing in return. If the coin came up heads, he would have given you $1M - but only if when you would have given him the $1K in the tails world. That last fact he would have known by predicting your decision. Now Omega approaches you, telling you the coin was tails - what should you do? Here is a Bayes net with this information:
I've removed the "History of the Universe" node, as we are screening it off anyway. Here "Simulated decision" and "Your decision" will output the same decision on the same input. Ω will behave the way he said, based on your simulated decision given tails. "Coin" will output heads or tails with 50% probability, and "Tails" simply outputs tails, for use in Ω's prediction.
Again, this graph is very elaborate, codifying all the problem's intricacies. But most of the nodes are superfluous for our decision, and the graph can be reduced to:
"Coin" outputs "heads" or "tails" and "Your decision algorithm" outputs "Give $1K on tails" or "Don't give $1K on tails". Money is $1M if it receives "heads" and "Give $1K on tails", -$1K if it receives "tails" and "Give $1K on tails", and zero if receives "Don't give $1K on tails" (independent of the coin results).
If our utility does not go down too sharply in money, we should choose "Give $1K on tails", as a 50-50 bet on willing $1M and losing $1K is better than getting nothing with certainty. So precommitting to giving Omega $1K when he asks, leads to the better outcome.
But now imagine that we are in the situation above: Omega has come to us and explained that yes, the coin has come up tails. The Bayes net now becomes:
In this case, the course is clear: "Give $1K on tails" does nothing but lose us $1K. So we should decide not to - and nowhere in this causal graph can we see any problem with that course of action.
So it seems that naive TDT has an inconsistency problem. And these graphs don't seem to fully encode the actual problem properly (ie that the action "Give $1K on tails" corresponds to situations where we truly believe that tails came up).
Some thoughts that occurred when formalising this problem: