This post is an attempt to refute an article offering critique on Functional Decision Theory (FDT). If you’re new to FDT, I recommend reading this introductory paper by Eliezer Yudkowsky & Nate Soares (Y&S). The critique I attempt to refute can be found here: A Critique of Functional Decision Theory by wdmacaskill. I strongly recommend reading it before reading this response post.
The article starts with descriptions of Causal Decision Theory (CDT), Evidential Decision Theory (EDT) and FDT itself. I’ll get right to the critique of FDT in this post, which is the only part I’m discussing here.
“FDT sometimes makes bizarre recommendations”
The article claims “FDT sometimes makes bizarre recommendations”, and more specifically, that FDT violates guaranteed payoffs. The following example problem, called Bomb, is given to illustrate this remark:
“Bomb.
You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.
A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.
You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?”
The article answers and comments on the answer as follows:
“The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death. Why? Because, using Y&S’s counterfactuals, if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left. In contrast, the right action on CDT or EDT is to take Right.
The recommendation is implausible enough. But if we stipulate that in this decision-situation the decision-maker is certain in the outcome that her actions would bring about, we see that FDT violates Guaranteed Payoffs.”
I agree FDT recommends taking the left box. I disagree that it violates some principle every decision theory should adhere to. Left-boxing really is the right decision in Bomb. Why? Let’s ask ourselves the core question of FDT:
“Which output of this decision procedure causes the best outcome?”
The answer can only be left-boxing. As wdmacaskill says:
“…if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left.”
But since you already know the bomb in Left, you could easily save your life by paying $100 in this specific situation, and that’s where our disagreement comes from. However, remember that if your decision theory makes you a left-boxer, you virtually never end up in the above situation! In 999,999,999,999,999,999,999,999 out of 1,000,000,000,000,000,000,000,000 situations, the predictor will have predicted you left-box, letting you keep your life for free. As Vaniver says in a comment:
“Note that the Bomb case is one in which we condition on the 1 in a trillion trillion failure case, and ignore the 999999999999999999999999 cases in which FDT saves $100. This is like pointing at people who got into a plane that crashed and saying ‘what morons, choosing to get on a plane that would crash!’ instead of judging their actions from the state of uncertainty that they were in when they decided to get on the plane.”
“FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it”
Here wdmacaskill argues that in Newcomb’s problem, FDT recommends one-boxing if it assumes the predictor (Omega) is running a simulation of the agent’s decision process. But what if Omega isn’t running your algorithm? What if they use something else to predict your choice? To use wdmacaskill’s own example:
“Perhaps the Scots tend to one-box, whereas the English tend to two-box.”
Well, in that case Omega’s prediction and your decision (one-boxing or two-boxing) aren’t subjunctively dependent on the same function. And this kind of dependence is key in FDT’s decision to one-box! Without it, FDT recommends two-boxing, like CDT. In this particular version of Newcomb’s problem, your decision procedure has no influence on Omega’s prediction, and you should go for strategic dominance (two-boxing).
However, wdmacaskill argues that part of the original motivation to develop FDT was to have a decision theory that one-boxes on Newcomb’s problem. I don’t care what the original motivation for FDT was with respect to this discussion. What matters is whether FDT gets Newcomb’s problem right — and it does so in both cases: when Omega does run a simulation of your decision process and when Omega does not.
Alternatively, wdmacaskill argues,
“Y&S could accept that the decision-maker should two-box in the cases given above. But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.”
Again, I do not care where the case for one-boxing stemmed from, or what FDT’s original motivation was: I care about whether FDT gets Newcomb’s problem right.
“Implausible discontinuities”
“First, take some physical processes S (like the lesion from the Smoking Lesion) that causes a ‘mere statistical regularity’ (it’s not a Predictor). And suppose that the existence of S tends to cause both (i) one-boxing tendencies and (ii) whether there’s money in the opaque box or not when decision-makers face Newcomb problems. If it’s S alone that results in the Newcomb set-up, then FDT will recommending two-boxing.”
Agreed. The contents of the opaque box and the agent’s decision to one-box or two-box don’t subjunctively depend on the same function. FDT would indeed recommend two-boxing.
“But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S and, if the agent sees that S will cause decision-maker X to be a one-boxer, then the agent puts money in X’s opaque box. Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing.”
No! The critical factor isn’t whether “there’s an agent making predictions”. The critical factor is subjunctive dependence between the agent and another relevant physical system (in Newcomb’s problem, that’s Omega’s prediction algorithm). Since in this last problem put forward by wdmacaskill the prediction depends on looking at S, there is no such subjunctive dependence going on and FDT would recommend two-boxing.
Wdmacaskill further asks the reader to imagine a spectrum of a more and more agent-like S, and imagines that at some-point there will be a “sharp jump” where FDT goes from recommending two-boxing to recommending one-boxing. Wdmacaskill then says:
“Second, consider that same physical process S, and consider a sequence of Newcomb cases, each of which gradually make S more and more complicated and agent-y, making it progressively more similar to a Predictor making predictions. At some point, on FDT, there will be a point at which there’s a sharp jump; prior to that point in the sequence, FDT would recommend that the decision-maker two-boxes; after that point, FDT would recommend that the decision-maker one-boxes. But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.”
But like I explained, the “agent-ness” of a physical system is totally irrelevant for FDT. Subjunctive dependence is key, not agent-ness. The sharp jump between one-boxing and two-boxing wdmacaskill imagines there to be really isn’t there: it stems from a misunderstanding of FDT.
“FDT is deeply indeterminate”
Wdmacaskill argues that
“there’s no objective fact of the matter about whether two physical processes A and B are running the same algorithm or not, and therefore no objective fact of the matter of which correlations represent implementations of the same algorithm or are ‘mere correlations’ of the form that FDT wants to ignore.”
… and gives an example:
“To see this, consider two calculators. The first calculator is like calculators we are used to. The second calculator is from a foreign land: it’s identical except that the numbers it outputs always come with a negative sign (‘–’) in front of them when you’d expect there to be none, and no negative sign when you expect there to be one. Are these calculators running the same algorithm or not? Well, perhaps on this foreign calculator the ‘–’ symbol means what we usually take it to mean — namely, that the ensuing number is negative — and therefore every time we hit the ‘=’ button on the second calculator we are asking it to run the algorithm ‘compute the sum entered, then output the negative of the answer’. If so, then the calculators are systematically running different algorithms.
But perhaps, in this foreign land, the ‘–’ symbol, in this context, means that the ensuing number is positive and the lack of a ‘–’ symbol means that the number is negative. If so, then the calculators are running exactly the same algorithms; their differences are merely notational.”
I’ll admit I’m no expert in this area, but it seems clear to me that these calculators are running different algorithms, but that both algorithms are subjunctively dependent on the same function! Both algorithms use the same “sub-algorithm”, which calculates the correct answer to the user’s input. The second calculator just does something extra: put a negative sign in front of the answer or remove an existing one. Whether inhabitants of the foreign land interpret the ‘-’ symbol different than we do is irrelevant to the properties of the calculators.
“Ultimately, in my view, all we have, in these two calculators, are just two physical processes. The further question of whether they are running the same algorithm or not depends on how we interpret the physical outputs of the calculator.”
It really doesn’t. The properties of both calculators do NOT depend on how we interpret their outputs. Wdmacaskill uses this supposed dependence on interpretation to undermine FDT: in Newcomb’s problem, it would also be a matter of choice of interpretation whether Omega is running the same algorithm as you are in order to predict your choice. However, as interpretation isn’t a property of any algorithm, this becomes a non-issue. I’ll be doing a longer post on algorithm dependence/similarity in the future.
“But FDT gets the most utility!”
Here, wdmacaskill talks about how Yudkowsky and Soares compare FDT to EDT and CDT to determine FDT’s superiority to the other two.
“As we can see, the most common formulation of this criterion is that they are looking for the decision theory that, if run by an agent, will produce the most utility over their lifetime. That is, they’re asking what the best decision procedure is, rather than what the best criterion of rightness is, and are providing an indirect account of the rightness of acts, assessing acts in terms of how well they conform with the best decision procedure.
But, if that’s what’s going on, there are a whole bunch of issues to dissect. First, it means that FDT is not playing the same game as CDT or EDT, which are proposed as criteria of rightness, directly assessing acts. So it’s odd to have a whole paper comparing them side-by-side as if they are rivals.”
I agree the whole point of FDT is to have a decision theory that produces the most utility over the lifetime of an agent — even if that, in very specific cases like Bomb, results in “weird” (but correct!) recommendations for specific acts. Looking at it from a perspective of AI Alignment — which is the goal of MIRI, the organization Yudkowsky and Soares work for — it seems clear to me that that’s what you want out of a decision theory. CDT and EDT may have been invented to play a different game — but that’s irrelevant for the purpose of FDT. CDT and EDT — the big contenders in the field of Decision theory — fail this purpose, and FDT does better.
“Second, what decision theory does best, if run by an agent, depends crucially on what the world is like. To see this, let’s go back to question that Y&S ask of what decision theory I’d want my child to have. This depends on a whole bunch of empirical facts: if she might have a gene that causes cancer, I’d hope that she adopts EDT; though if, for some reason, I knew whether or not she did have that gene and she didn’t, I’d hope that she adopts CDT. Similarly, if there were long-dead predictors who can no longer influence the way the world is today, then, if I didn’t know what was in the opaque boxes, I’d hope that she adopts EDT (or FDT); if I did know what was in the opaque boxes (and she didn’t) I’d hope that she adopts CDT. Or, if I’m in a world where FDT-ers are burned at the stake, I’d hope that she adopts anything other than FDT.”
Well, no, not really — that’s the point. What decision theory does best shouldn’t depend on what the world is like. The whole idea is to have a decision theory that does well under all (fair) circumstances. Circumstances that directly punish an agent for its decision theory can be made for any decision theory and don’t refute this point.
“Third, the best decision theory to run is not going to look like any of the standard decision theories. I don’t run CDT, or EDT, or FDT, and I’m very glad of it; it would be impossible for my brain to handle the calculations of any of these decision theories every moment. Instead I almost always follow a whole bunch of rough-and-ready and much more computationally tractable heuristics; and even on the rare occasions where I do try to work out the expected value of something explicitly, I don’t consider the space of all possible actions and all states of nature that I have some credence in — doing so would take years.
So the main formulation of Y&S’s most important principle doesn’t support FDT. And I don’t think that the other formulations help much, either. Criteria of how well ‘a decision theory does on average and over time’, or ‘when a dilemma is issued repeatedly’ run into similar problems as the primary formulation of the criterion. Assessing by how well the decision-maker does in possible worlds that she isn’t in fact in doesn’t seem a compelling criterion (and EDT and CDT could both do well by that criterion, too, depending on which possible worlds one is allowed to pick).”
Okay, so we’d need an approximation of such a decision theory — I fail to see how this undermines FDT.
“Fourth, arguing that FDT does best in a class of ‘fair’ problems, without being able to define what that class is or why it’s interesting, is a pretty weak argument. And, even if we could define such a class of cases, claiming that FDT ‘appears to be superior’ to EDT and CDT in the classic cases in the literature is simply begging the question: CDT adherents claims that two-boxing is the right action (which gets you more expected utility!) in Newcomb’s problem; EDT adherents claims that smoking is the right action (which gets you more expected utility!) in the smoking lesion. The question is which of these accounts is the right way to understand ‘expected utility’; they’ll therefore all differ on which of them do better in terms of getting expected utility in these classic cases.”
Yes, fairness would need to be defined exactly, although I do believe Yudkowsky and Soares have done a good job at it. And no: “claiming that FDT ‘appears to be superior’ to EDT and CDT in the classic cases in the literature” isn’t begging the question. The goal is to have a decision theory that consistently gives the most expected utility. Being a one-boxer does give you the most expected utility in Newcomb’s problem. Deciding to two-box after Omega made his prediction that you one-box (if this is possible) would give you the most utility — but you can’t have your decision theory recommending two-boxing, because that results in the opaque box being empty.
In conclusion, it seems FDT survives the critique offered by wdmacaskill. I am quite new to the field of Decision theory, and will be learning more and more about this amazing field in the coming weeks. This post might be updated as I learn more.
The statement of Bomb is bad at being legible outside the FDT/UDT paradigm, it's instead actively misleading there, so is a terrible confusion-conflict-and-not-clarity inducing example to show someone who is not familiar with it. The reason Left is reasonable is that the scenario being described is, depending on the chosen policy, almost completely not real, a figment of predictor's imagination.
Unless you've read a lot of FDT/UDT discussion, a natural reading of a thought experiment is to include the premise "the described situation is real". And so people... (read more)
The point is that you don't know that something is happening to you just because you are seeing it happen. Seeing it happen is what takes place when you-as-an-algorithm is evaluated on the corresponding observations. A response to seeing it happen is well-defined even if the algorithm is never actually evaluated on those observations. When we spell out what happens inside the algorithm, what we see is that the algorithm is "seeing it happen". This is so even if we don't actually look. (See also.)
So for example, if I'm asking what would be your reaction to the sky turning green, what is the status of you-in-the-question who sees the sky turn green? They see it happen in the same way that you see it not happen. Yet from the fact that they see it happen, it doesn't follow that it actually happens (the sky is not actually green).
Another point is that for you-in-the-question, it might be the green-sky world that matters, not the blue-sky world. That is a side effect of how your insertion into the green-sky world doesn't respect the semantics of your preferences, which care about blue-sky world. For you-in-the-question with preferences... (read more)
I read through this long enough to come to the conclusion that the author of the original article simply does not understand FDT rather than having valid criticisms of it, and stopped there, that being perfectly sufficient to refute the article.
Off-topic: I initially misread this title as "A defense of density functional theory," and was intrigued.
There are two huge ambiguities in this scenario:did Predictorinclude the notein the simulation, or write it later? If there was even a small (say, anything more than 1 in a million) chance that it was written later, the agent should pick Right.does Predictoralwaysadd a note showing the prediction in this scenario?We can rule out the combination of both together. It is not possible for Predictor to always write a note that honestly records their prediction including the note and still guarantee 10^-24 chance of prediction error. If the note has nontrivi... (read more)At first this bomb scenario looked like an interesting question, but too much over-specification in some respects and vagueness in others means that in this scenario FDT recommends taking the right box, not left as claimed.
by running a simulation of you and seeing what that simulation did.
A simulation of your choice "upon seeing a bomb in the Left box under this scenario"? In that case, the choice to always take the Right box "upon seeing a bomb in the Left box under this scenario" is correct, and what any of the decision theories would recommend. Being in such a situation does necessitate the failure of the predictor, which means you are in a very improbable world, but that is not relevant to your decision in the world you happen to be in (simulated or not).
Or: A simulation... (read more)
These arguments -- the Bomb argument and Torture versus Dust Specs -- suffer from an ambiguity between telling the reader what to do given their existing UF/preferences, telling the reader to have a different UF, and saying what an abstract agent , but not the reader, would do.
Suppose the reader has a well defined utility function where death of torture are set to minus infinity. Then the writer can't persuade them to trade off death or torture against any finite amount of utility. So, in what sense is the reader wrong about their own preferences?
Maybe t... (read more)
In Newcombe's problem, Omega is a perfect predictor, not just a very good one. Subjunctive dependence is necessarily also perfect in that case.
If Omega is imperfect in various ways, their predictions might be partially or not at all subjunctively dependent upon yours and below some point on this scale FDT will s... (read more)
Re: the Bomb scenario:
It seems to me that the given defense of FDT is, to put it mildly, unsatisfactory. Whatever “fancy” reasoning is proffered, nevertheless the options on offer are “burn to death” or “pay $100”—and the choice is obvious.
FDT recommends knowingly choosing to burn to death? So much the worse for FDT!
FDT has very persuasive reasoning for why I should choose to burn to death? Uh-huh (asks the non-FDT agent), and if you’re so rational, why are you dead?
Counterfactuals, you say? Well, that’s great, but you still chose to burn to death, instead... (read more)
According to me, the correct rejoinder to Will is: I have confidently asserted that X is false for X whose probabliity I assign much greater probability than 1 in a trillion trillion, and so I hereby confidently assert that no, I do not see the bomb on the left. You see the bomb on the left, and lose $100. I see no bombs, and lose $0.
I can already hear the peanut gallery objecting that we can increase the fallibility of the predictor to reasonable numbers and I'd still take the bomb, so before we go further, let's all agree that sometimes you're faced with uncertainty, and the move that is best given your uncertainty is not the same as the move that is best given perfect knowledge. For example, suppose there are three games ("lowball", "highball", and "extremeball") that work as follows. In each game, I have three actions -- low, middle, and high. In the lowball game, my payouts are $5, $4, and $0 respectively. In the highball game, my payouts are $0, $4, and $5 respectively. In the extremeball game, my payouts are $5, $4, and $5 respectively. Now suppose that the real game I'm facing is that one of these games is chosen at uniform random by unobserved die roll. What action should ... (read more)
It's not complete enough to determine what I do when I don't see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they're facts, you'll find that my behavior in the corrected problem is underdefined. (If this still isn't clear, try working out what the predictor does to the agent that takes the bomb if it's present, but pays the $100 if it isn't.)
And if we're really technical, it's not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there's no bomb, then I have no incentive to go left upon seeing a bomb insofar as they're accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity.
Nitpic... (read more)
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we're in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I'm in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we're facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there's an accurate predictor there; we assume this by fiat.
(To be clear, I'm fine with assuming this by fiat. I'm simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
... (read more)Let's be more precise, then, and speak in terms of "correctness" rather than "accuracy". There are then two possibilities in the "bomb" scenario as stipulated:
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call "ObeyBot", which always chooses the action that confirms the predict... (read more)