In the case of the XOR blackmail problem, there are four “possible” worlds: LT (letter + termites), NT (noletter + termites), LN (letter + notermites), and NN (noletter + notermites).

The predictor, by dint of their accuracy, has put the universe into a state where the only consistent possibilities are either (LT, NN) or (LN, NT). You get to choose which of those pairs is consistent and which is contradictory. Clearly, you don’t have control over the probability of termites vs. notermites, so you’re only controlling whether you get the letter. Thus, the question is whether you’re willing to pay $1000 to make sure that the letter shows up only in the worlds where you don’t have termites.

Even when you’re holding the letter in your hands, I claim that you should not say “if I pay I will have no termites”, because that is false — your action can’t affect whether you have termites. You should instead say:

I see two possibilities here. If my algorithm outputs pay, then in the XX% of worlds where I have termites I get no letter and lose $1M, and in the (100-XX)% of worlds where I do not have termites I lose $1k. If instead my algorithm outputs refuse, then in the XX% of worlds where I have termites I get this letter but only lose $1M, and in the other worlds I lose nothing. The latter mixture is preferable, so I do not pay.

You’ll notice that the agent in this line of reasoning is not updating on the fact that they’re holding the letter. They’re not saying, “Given that I know that I received the letter and that the universe is consistent…”

[-]siIver9y10

Can someone briefly explain to me the difference between functional and updateless decision theory / where FDT performs better? That would be much appreciated. I have not yet read FDT because it does not mention UDT (I checked) and I want to understand why UDT needs "fixing" before I invest the time.

[-]Rob Bensinger9y40

"UDT" is ambiguous and has been used to refer to a number of different approaches; "FDT" is a new name for the central cluster of UDT-ish theories (excluding some similar theories like TDT), intended to be less ambiguous and easier to explain (especially to working decision theorists).

In part it's easier to explain because it's formulated in a more CDT-like fashion (whereas Wei Dai's formulations are more EDT-like), and in part it's easier to explain because it builds in less content: accepting FDT doesn't necessarily require a commitment to some of the philosophical ideas associated with updatelessness and logical prior probability that MIRI, Wei Dai, or other FDT proponents happen to accept. In particular, some of Nate's claims in the linked post are stronger than is strictly required for FDT.

[-]siIver9y20

Thanks! So UDT is integrated. That's good to hear.

[-]MalcolmOcean9y40

My impression (based in part on this humorous piece) is that FDT is primarily a better formulation than UDT & TDT, but doesn't necessarily make better decisions.

Ah. And the paper says (emphasis mine)

Functional decision theory has been developed in many parts through (largely unpublished) dialogue between a number of collaborators. FDT is a generalization of Dai’s (2009) “updateless decision theory” and a successor to the “timeless decision theory” of Yudkowsky (2010).

[-]username29y00

It seems like real life is most analogous to the FDT formulation for the Murder Lesion; all decision theories are imperfectly implemented in a variety of meat or silicon with some hardwired lesion-like behavior that reduces the power of the theory. Recognizing/discovering the actual hardwired behavior sounds like a useful ability for an agent to have. Solving the identity/embedding problem would do it, but are there other ways a decision theory could hedge against unknown hardwired behavior?

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

14

MIRI: Decisions are for making bad outcomes inconsistent

14

14