Edit: I think the following answer is only partially correct
The answer is almost certainly that MacAskill is doing FDT wrong. Subjunctive dependence (as I think I now understand it, based on vibes) exists in the map, and not the territory.
In Bomb, if the note is always truthful, then the presence or absence of the bomb no longer subjunctively depends on your decision function. For all decision functions---causal, evidential, functional, or vibe---the bomb is there.
Edit: The other half of the answer
From above: FDT clearly says you should take the left box when you don't know what is in the box. But in Bomb as written, you do know in (what is assumed to be) the real world. If the predictor predicts your action given no note, then the above argument applies.
If the predictor predicts your actions given the note then Bomb is a blackmail problem and you should (probably) commit to not paying blackmail in general, including actually not paying the blackmail when it occurs.
In this case, I think Bomb is unfair because the scenario-as-written only occurs in ~0% of futures. In the ~100% of other futures, you succeed. This is the same with blackmail: you have to actually be the kind of person who would spit in a blackmailer's face (and get their sex tape leaked as a result) in order to avoid getting into that scenario.
Suppose that when the agent enters the room, a note is either Absent, states that the Left box was predicted (and there's no bomb in it), or that the Right box was predicted (and therefore there's a bomb in Left). There is probability pN of the note being present in both prediction cases, and probability pT of it being truthful in both cases where it is present. (A more complete analysis would have these be different probabilities, and a consideration that they may be adversarially selected to extract the most damage from participants or to minimise the recorded rates of error of the predictor)
For simplicity, the probabilities of incorrect prediction in every combination of functional inputs and outputs are all B, which according to the scenario is very small, less than 10^-24. Again, in a more complete analysis one should consider different rates of error under different decision scenarios and possible optimization pressures.
The simplest decision function then maps type of note one sees mapped to which box to take, although a more complete analysis would permit mixed strategies. There are eight such functions instead of the two in the no-note scenario.
In particular, there is a possible function F(A) = L, F(L) = R, F(R) = L. In this case, the predictor leaves a truthful note with probability pN pT, and the prediction was wrong because the agent does the opposite. By the scenario constraint B < 10^-24, we know that the product pN pT < 10^-24.
So it is inconsistent with the scenario that the predictor always leaves a truthful note. The predictor can either be forced into error with probability very much greater than 10^-24 which violates the scenario premises, or almost always does not leave a truthful note. It is also possible that the predictor could arrange that the agent is unable to decide in this way, but that's very much outside the bounds of decision theory.
It is still possible that pT is close to 1, but less than 1 - U(-$100)/U(death). In this case, pN ~= 10^-24 and your optimal decision function is F(A) = L, F(L) = R, F(R) = R. That is, take the left box unless you see a note saying Right, in which case you should pick Right. If you see no note then with probability at least 0.999999999999999999999999 you will save $100 and not burn to death. In the extremely unlikely case that you see a note, you pick the right box even if it says Left, because you can't be sufficiently confident that it's true.
If pT >= 1 - U(-$100) / U(death), then the optimum is F(A) = L, F(L) = L, F(R) = R because now you have near certainty that the note is truthful and the risk of picking the left box based on the note is now worth it.
You'll also need to update the content of the note and the predictor's decision process to take into account that the agent may see a note. In particular, the predictor needs to decide whether to show a note in the simulation, and may need to run multiple simulations.
That would have been a different set of example scenarios. In this one, I chose one in which the predictor had fixed probabilities. I am not assuming any particular means by which the predictor arrives at the prediction (such as simulation), and don't care about anything to do with self-locating uncertainty (such as being a conscious entity created by such a simulation).
Feel free to follow up with FDT analysis of a different scenario, if you prefer it.
Though I will note that in a scenario where the agent is uncertain of the actual constraints of the scenario, the conclusion is almost always simpler: take the right box every time, because even a tiny fraction of uncertainty makes the left box too unsafe to be worth taking. The more moving parts you introduce into a scenario with this basic premise, the more likely it is that "always take the right box" is the correct decision, which is boring.
Let me try again:
Does the note say that I was predicted to choose the right box regardless of what notes I am shown, and therefore the left box contains a bomb? Then the predictor is malfunctioning and I should pick the right box.
Does the note say that I was predicted to choose the right box when told that the left box contains a bomb, and therefore the left box contains a bomb? Then I should pick the left box, to shape what I am predicted to do when given that note.
In my scenario, there is a probability pN that the predictor leaves a note, regardless of what prediction the predictor makes. The note (when present) is always of the form "I predicted that you will pick the <right|left> box, and therefore <did|did not> put a bomb in the left box." That is, the note is about what the predictor thinks you will do, which more closely matches your second paragraph. Your first paragraph concerns a prediction about what you counterfactually would have done in some other situations and is not relevant in this scenario.
However, your decision process should consider the probability 1-pT that the note is lying about the predictor's actual prediction (and therefore bomb-placing).
If the note predicts the outcome of your decision after seeing the note, then you are free to diagonalize the note (do the opposite of what it predicts). Which would then contradict the premise that the predictor is good at making predictions (if the prediction must be this detailed and the note must remain available, even when diagonalized), because the prediction is going to be mostly wrong by construction, whatever it is. Transparent Newcomb for example is designed to avoid this issue, while gesturing at a similar phenomenon.
This kind of frame breaking thought experiment is not useful for illustrating the framings it breaks (in this case, FDT). It can be useful for illustrating or motivating some different (maybe novel) framing that does manage to make sense of the new thought experiment, but that's only productive when that happens, and it's easy to break framings (as opposed to finding a within-framing centrally-there error in an existing theory) without motivating any additional insight. So this is somewhat useful to recognize, to avoid too much unproductive confusion when the framings that usually make sense get broken.
In my scenario there is probability pT that the note is truthful (and the value of pT is assumed known to the agent making the decision). It is possible that pT = 1, but only for pN < 10^-24 so as to preserve the maximum 10^-24 probability of the predictor being incorrect.
Yes, this is very much correct. It is invalid reasoning to carry out an FDT decision analysis on a different scenario (without the possibility of a note), claim that you should use the result of that on this one (with the note), and then claim that FDT is a bad decision theory as a result.
The possible inclusion of a note and its contents adds an unknown number of bits of information to the input of the function in FDT, which obviously changes the analysis.
The agent would need extraordinarily strong information about when a note appears and the bounds on its probability of truthfulness in order to decide to take the left box under FDT here (and almost certainly not burn to death because the note is almost certainly lying). Otherwise FDT dictates that you must take the right box.
The exact bounds depend upon the conditions under which a note can appear and what its content can be. If they are not specified then you would need to apply some prior distribution, which unless you are an extremely bizarre person with extremely bizarre priors or utility function, will lead to FDT recommending that you take the right box.
In the "Bomb" scenario, suppose we delete the words "by running a simulation of you and seeing what the simulation did" and replace them with something like "by carefully analysing your brain in order to deduce what you would do".
I am not an FDT expert and so maybe I'm missing something, but it seems to me that FDT still says you should pick Left (because, a priori, if your algorithm picks Right then the predictor will put the bomb in Left and you'll have to pay, whereas if your algorithm picks Left then the predictor will leave the bomb out and you won't, and the latter is the better outcome).
But in this version of the "Bomb" scenario, you have no particular reason to think that in addition to the real-world you making the choice there's also a faithfully-simulated you that you-right-now might equally plausibly turn out to be. You might perhaps worry that that might be how the predictor is doing her analysis, but unless you think she almost certainly ran a faithful simulation it seems obvious that you do better to pick Right contrary to FDT's recommendation.
(Once again, I'm not an expert and there's a good chance I'm missing something important. If so, I hope someone will enlighten me.)
What you're missing is the F in FDT: the agent is assumed to have a function that maps their available information to a decision. In a note-less scenario, every possible agent has the same information available, and so there are only two possible functions: one that maps to Left, and one that maps to Right. FDT then says that Left performs better.
Once you introduce the possibility of a note, there are now (at least) three values in the domain: no note, a note saying the prediction was Left, and a note saying the prediction was Right. MacAskill assumed that this doesn't change the FDT decision, but of course it does: there are now at least 8 (and possibly infinitely many) such functions to compare, not just 2, and unstated distributions over the mapping between inputs and outputs. In almost all of the distributions, FDT will in these new and different scenarios recommend picking Right.
Likewise, if we assume the agent’s behavior in Newcomb’s problem is also determined by a function—its *decision procedure—*then, if the predictor can model this function, it can accurately predict what the agent will do.
How does this not fail to the Halting Problem?
If you make your decision in a bounded amount of time (e.g. <1 million years) then the space of possible algorithms you're using is restricted to ones which definitely output a decision within a bounded amount of time. So the Halting Problem doesn't apply.
I’m confused, does FDT require some form of anthropic reasoning and simulationism to justify its actions in transparent Newcomb problems (such as MacAskill’s bomb)?
So am I correct in understanding that “you could be in a simulation to see how you would act” is just an intuition pump to help people comprehend the counterfactual reasoning used in FDT?
It's a separate issue entirely. Scenarios in which you could be in a temporary simulation (especially in which the simulation outcome may be used to determine something for another instance of your decision process) are different and should be analysed differently from those in which you are definitely not.
Why do you believe that I believe that this is a scenario in which you are definitely not in a simulation? I do not.
I am saying that in general, FDT does not require that the agent must reason about whether they are in a simulation. In the published Bomb scenario in particular it is not stated whether the agent may be in a simulation, and it is also not stated whether the agent knows or believes that they may be in a simulation. In principle, all these combinations of cases must be considered separately.
Since the scenario does not make any statement in this respect, I do believe that it was not intended by the scenario author that the agent should reason as if they may be in a simulation. That would be just one of infinitely many unstated possibilities that might affect the analysis if they were considered, all of which would complicate and detract from the issue they intended to discuss.
So I do believe that the analysis described in the original scenario was carried out for an agent that does not consider whether or not they may be in a simulation, as distinct from them actually being definitely not in a simulation.
After all, the fact of the matter is that they are in a simulation. We are simulating what such an agent should do, and there is no "real" agent in this case.
Since its inception in 1960, Newcomb's Problem has continued to generate controversy. Here's the problem as defined by Yudkowsky & Soares:
An agent finds herself standing in front of a transparent box labeled “A” that contains $1,000, and an opaque box labeled “B” that contains either $1,000,000 or $0. A reliable predictor, who has made similar predictions in the past and been correct 99% of the time, claims to have placed $1,000,000 in box B iff she predicted that the agent would leave box A behind. The predictor has already made her prediction and left. Box B is now empty or full. Should the agent take both boxes (“two-boxing”), or only box B, leaving the transparent box containing $1,000 behind (“one-boxing”)?
Functional Decision Theory (FDT) solves this problem by considering that the predictor is so reliable because she builds an accurate model of the agent. If you want to predict what a a calculator will answer when asked to compute e.g. 34 + 42, it helps a great deal - indeed, it seems necessary - to know what function the calculator implements. If you have an accurate model of this function (addition), then you can just calculate the answer (76) yourself and predict what the calculator will say. Likewise, if we assume the agent's behavior in Newcomb's problem is also determined by a function - its decision procedure - then, if the predictor can model this function, it can accurately predict what the agent will do. An FDT agent's decision procedure asks: "What output of this very decision procedure results in the best outcome?" Knowing that this very decision procedure is implemented by both the agent and the predictor (when she models the agent), and knowing the output is necessarily the same on both occasions (like 34 + 42 equals 76 regardless of who or what is doing the computation), the answer can only be to one-box.
If two systems are computing the same function, Yudkowsky and Soares state these systems are subjunctively dependent upon that function. If you predict a calculator will answer 76 when prompted with 34 + 42, you and the calculator are subjunctively dependent upon the addition function. Likewise, the agent and the predictor are subjunctively dependent upon the agent's decision procedure in Newcomb's Problem.
Given the assumption of subjunctive dependence between the agent and the predictor, the answer to Newcomb's Problem must be one-boxing. And yet, FDT has received quite some critique, a large part of which centers around what MacAskill has called "implausible recommendations" in Newcomblike problems. Consider MacAskill's Bomb:
You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.
A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.
You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?
MacAskill comments:
The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death. Why? Because, using Y&S’s counterfactuals, if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left.
MacAskill calls this recommendation "implausbile enough". But while he is right that FDT Left-boxes, he's wrong to say it does so "in the full knowledge that as a result you will slowly burn to death". To see why, let's first consider the following thought experiment.
Identical Rooms. Imagine you wake up in a white room. Omega sits next to your bed, and says that 1 hour ago, he flipped a fair coin. If the coin came up heads, he put you to sleep in a white room; if the coin came up tails, he put you to sleep in another, identical white room. You are now in one of these two rooms. Omega gave you a pill that made you forget the events just before you went to sleep, and because the rooms are identical, you have no way of knowing whether the coin came up heads or tails. In front of you appears a special box, and you can choose to take it or to leave it. Omega tells you the content of the box depends on whether the coin came up heads or tails. If it was tails, the box contains a fine for $100, which you'll have to pay Omega in case you take the box. If the coin came up heads, the box contains $10,000, which you get to keep should you take the box.
Question A) In order to make as much (expected) money as possible, should you take the box?
Question B) If the coin came up tails, should you take the box?
Question A is straightforward: there's a 50/50 probability of winning $10,000 and losing $100, which comes out to an expected value of $4,950. Yes, you should take the box.
What about Question B? I hope it's obvious this one doesn't make any sense: it's asking the wrong question. You have no way of knowing whether the coin came up heads or tails!
So why bring up this rather silly thought experiment? Because I believe MacAskill is essentially making the "Question B mistake" when he says
The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death.
Remember: in Bomb, the predictor runs a simulation of you in order to make her prediction. This, of course, makes you and the predictor subjunctively dependent upon your decision procedure, and more to the point, you can't know whether you are the real "you" or the simulated version of you. If you did, that would influence your decision procedure, breaking the subjunctive dependence. The simulated "you" observes the same things (or at least the same relevant things) as you and therefore can't tell she's simulated. And if the simulated "you" Left-boxes, this doesn't lead to you burning to death: it leads to an empty Left box and you not losing $100.
In fact, it's not so much that you can't know whether you are the real "you" or the simulated "you" - you are both of them, at different times, and you have to make a decision taking this into account. Left-boxing simply leads to not burning to death AND not losing $100! (Yeah, unless the predictor made a mistake - but that probability is ~0.) In the Identical Rooms thought experiment, you are in only one of the two rooms (each with probability 1/2), but the point remains.
Identical Rooms was modelled after Counterfactual Mugging:
Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, Omega tells you that if the coin came up heads instead of tails, it'd give you $10000, but only if you'd agree to give it $100 if the coin came up tails.
Omega can predict your decision in case it asked you to give it $100, even if that hasn't actually happened, it can compute the counterfactual truth. Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says, it really tossed a coin and really would've given you $10000.
If we assume Omega predicts your decision by simulating you and modelling your decision procedure (and thus assume subjunctive dependence), then Counterfactual Mugging is isomorphic to Identical Rooms. Though it may seem like you know the coin came up tails in Counterfactual Mugging, there's only a 1/2 probability this is actually the case: because if the coin came up heads, Omega simulates you and a simulated Omega tells you the exact same thing as the real Omega does given tails. So "What do you decide, given that the coin came up tails?" is the wrong question: you don't actually know the outcome of the coin flip!
In the original Newcomb's Problem, you have to decide whether to one-box or to two-box knowing that you make this exact decision on two occasions: in the predictor's simulation of you and in the "real world". Given this, the correct answer has to be one-boxing. So although some seem to believe it's good to be the kind of person to one-box, but you should two-box when you are actually deciding, FDT denies there's a difference: one-boxers one-box, and those who one-box are one-boxers.