Review

In my article about why Eliezer was often confidently and egregiously wrong, I gave his decision theory as a solid example of this. Eliezer thinks that the correct theory of rationality—of what it is wise to do in a particular situation—is FDT which says roughly that you should act in ways such that agents acting in those ways get the most utility. Heighn and I are having a debate about this that will go back and forth for a bit—so here’s my response to Heighn’s first post.

I think basically all of Heighn’s replies rest on the same crucial error, so I won’t go through line by line. Instead, I’ll briefly explain the core argument against FDT, which has various manifestations, and then why the basic reply doesn’t work. I’ll then clean up a few other extraneous points.

The most important argument against FDT is that, while it’s a fine account of what type of agent you want to be, at least, in many circumstances, it’s a completely terrible account of rationality—of what it’s actually wise to do when you’re in one situation. Suppose that there’s an agent who has a very high probability of creating people who once they exist will cut off their legs in ways that don’t benefit them. In this case, cutting off one’s legs is clearly irrational—one doesn’t benefit at all and yet is harmed greatly. But despite that, FDT instructs one to cut off their legs because those agents get more utility, on average.

Heighn’s response to this argument is that this is a perfectly fine prescription. After all, agents who follow their advice get more utility on average than agents who follow EDT or CDT.

But the CDTists have a perfectly adequate description of this. Sometimes, it pays to be irrational. If there is a demon who will pay only those who are irrational, then it obviously pays to be irrational. But this doesn’t make it rational to be irrational. Cutting off your legs in ways that you know will never benefit you or anyone else is flagrantly irrational—it is bad for everyone—and this is so even if such agents win more.

Heighn would presumably agree with this. After all, if there’s a demon who pays a billion dollars to everyone who follows CDT or EDT then FDTists will lose out. The fact you can imagine a scenario where people following one decision theory are worse off is totally irrelevant—the question is whether a decision theory provides a correct account of rationality. But if a theory involves holding that you should cut off your legs for no reason, it clearly does not. It doesn’t matter that the type of agent who cuts off their legs is better off on average—when you’re in the situation, you don’t care about what kind of agents do what, you only care about the utility that you can get from the ac. Thus when Heighn asks:

Another way of looking at this is asking: "Which decision theory do you want to run, keeping in mind that you might run into the Blackmail problem?" If you run FDT, you virtually never get blackmailed in the first place.

they are asking the wrong question. Decision theories are not about what kind of agent you want to be. There is no one on god’s green earth who disputes that the types of agents who one box are better off on average. Decision theory is about providing a theory of what is rational.

The next case I give which comes from Wolfgang Schwarz, is the following:

Procreation. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there's a significant probability that I wouldn't exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.)

Heighn says I don’t “explain why [I believe] FDT gives the wrong recommendation here.” I think the quote from Schwarz that I provide is pretty clear but just to make it clearer—FDT instructs one to act in some way if FDT prescribing that one act that way produces highest average utility. Thus, in procreation, a person gets more utility if FDT prescribes procreation because the person is more likely to exist in possible worlds where procreation prescribes that. Heighn’s comments here strike me as pretty confused:

This problem doesn't fairly compare FDT to CDT though. By specifying that the father follows FDT, FDT'ers can't possibly do better than procreating. Procreation directly punishes FDT'ers - not because of the decisions FDT makes, but for following FDT in the first place.

They could do better. They could follow CDT and never pass up on the free value of remaining child-free. This is not a scenario where a person is punished directly for following FDT—it is a scenario where FDT prescribes acting in a certain way because doing so makes your father more likely to create you, even once they have already created you, even though that is clearly bad for you. This becomes easier to see with Heighn’s attempted parody:

I can easily make an analogous problem that punishes CDT'ers for following CDT:

ProcreationCDT. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed CDT. I highly value existing (even miserably existing). Should I procreate?

FDT'ers don't procreate here and live happily. CDT'ers wouldn't procreate either and don't exist. So in this variant, FDT'ers fare much better than CDT'ers.

It is true that we can easily gerrymander a scenario where following every decision theory ends up being bad for you. But that’s not a problem—decision theories are not theories of what’s good for you. Indeed, there is no across-the-board theory of which decision theory will make your life go best. They are intended as theories of what is rational to do in particular scenarios. Procreating seems clearly irrational here, even if such agents end up getting punished. Again, it’s important to disambiguate the question of “which agent would you rather be” from “which agent is rational.” Rationality doesn’t always pay (E.g. when demons artificially rig things to be bad for the rational).

Once the person already exists, it doesn’t matter what % of agents of a certain type exist. They exist—and as such, they have no reason to lose out on free value. Once you already exists, you don’t care about other agents in the reference class. And these agents all are making decisions after they already exist, so they have no reason to take into account causally irrelevant subjective dependence.

In short, FDT often prescribes harming yourself in ways that are guaranteed never to benefit you. This would be totally fine if it were a theory of what kind of agent to be, but once you exist, there’s no good reason to harm yourself in ways that are guaranteed to give you no rewards. Its appeal comes entirely from mixing up what decision-making procedure you want to have from which one is rational. Those clearly come apart in lots of situations. Finally, Heighn quotes me saying:

The basic point is that Yudkowsky’s decision theory is totally bankrupt and implausible, in ways that are evident to those who know about decision theory.

In reply, Heighn says:

Are you actually going to argue from authority here?! I've spoken to Nate Soares, one of the authors of the FDT paper, many times, and I assure you he "knows about decision theory".

Yes, yes I am. Not from the authority of me in particular—I’m a random undergraduate who no one should defer to. I do not know of a single academic decision theorist who accepts FDT. When I bring it up with people who know about decision theory, they treat it with derision and laughter. There have been maybe one or two published papers ever defending something in the vicinity. Thus, it is

opposed by nearly all academics

something that I think rests on basic errors.

I don’t know much about Soares, so I won’t comment on how much he knows about decision theory. But I sort of dubious that he knows a lot about it and even if he does, it’s not hard to find one or two informed people defending crazy, fringe positions.

Finally, Heighn accuses MacAskill of misrepresenting FDT. MacAskill says:

First, take some physical processes S (like the lesion from the Smoking Lesion) that causes a ‘mere statistical regularity’ (it’s not a Predictor). And suppose that the existence of S tends to cause both (i) one-boxing tendencies and (ii) whether there’s money in the opaque box or not when decision-makers face Newcomb problems.  If it’s S alone that results in the Newcomb set-up, then FDT will recommending two-boxing.

But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S and, if the agent sees that S will cause decision-maker X to be a one-boxer, then the agent puts money in X’s opaque box. Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing.

In response, Heighn says:

This is just wrong: the critical factor is not whether "there's an agent making predictions". The critical factor is subjunctive dependence, and there is no subjunctive dependence between S and the decision maker here.

But in this case there is subjective dependence. The agent’s report depends on whether the person will actually one box on account of the lesion. Thus, there is an implausible continuity on account of it mattering whether to one box the precise causal mechanisms of the box.

To recap, I think that once we disambiguate what you should do when you’re already in the scenario, and your actions are guaranteed not to affect your odds of existence from what kind of agent gets more average utility, FDT seems crazy. It results in the consequence that you should burn money in ways that are guaranteed never to benefit you. My description of it as crazy was somewhat harsh but, I think, accurate.

 

 

 


 

New Comment
14 comments, sorted by Click to highlight new comments since:

Decision theories are not about what kind of agent you want to be. There is no one on god’s green earth who disputes that the types of agents who one box are better off on average. Decision theory is about providing a theory of what is rational.

Taboo the word "rational" - what real-world effects are you are actually trying to accomplish? Because if FDT makes me better off on average and CDT allows me to write "look how rational I am" in my diary, then there's a clear winner here.

You might like my recent post which also argues that FDT (or at least the sequential version of EDT, I don't feel that FDT is well specified enough for me to talk about it) shouldn't be considered a decision theory.
 

https://www.lesswrong.com/posts/MwetLcBPvshg9ePZB/decision-theory-is-not-policy-theory-is-not-agent-theory?commentId=rZ5tpB5nHewBzAszR

I think FDT is more practical as a decision theory for humans than you give it credit for. It's true there are a lot of weird and uncompelling examples floating around, but how about this very practical one: the power of habits. There's common and (I think) valuable wisdom that when you're deciding whether to e.g. exercise today or not (assuming that's something you don't want to do in the moment but believe has long-term benefits), you can't just consider the direct costs and benefits of today's exercise session. Instead, you also need to consider that if you don't do it today, realistically you aren't going to do it tomorrow either because you are a creature of habit. In other words, the correct way to think about habit-driven behavior (which is a lot of human behavior) is FDT: you don't ask "do I want to skip my exercise today" (to which the answer might be yes), instead you ask "do I want to be the kind of person who skips their exercise today" (to which the answer is no, because that kind of person also skips it every day).

I agree that this is an important consideration for humans, though I feel perhaps FDT is overkill for the example you mentioned (insofar as I understand what FDT means). 

I haven't dived into the formalism (if there is one?), but I'm roughly using FDT to mean "make your decision with the understanding that you are deciding at the policy level, so this affects not just the current decision but all other decisions that fall under this policy that will be made by you or anything sufficiently like you, as well as all decisions made by anyone else who can discern (and cares about) your policy". Which sounds complicated, but I think often really isn't? e.g. in the habits example, it makes everything very simple (do the habit today because otherwise you won't do it tomorrow either). CDT can get to the same result there - unlike for some weirder examples, there is a causal though not well-understood pathway between your decision today and the prospective cost you will face when making the decision tomorrow, so you could hack that into your calculations. But if by 'overkill' you mean using something more complicated than necessary, then I'd say that it's CDT that would be overkill, not FDT, since FDT can get to the result more simply. And if by 'overkill' you mean using something more powerful/awesome/etc than necessary, then overkill is the best kind of kill :)

Making a decision at the policy level may be useful for forming habits, but I don't think that considering others with a similar policy or those who can discern my policy is useful in this example. Those later two are the ones I associate more closely with FDT, and such considerations seem to be very difficult to carry out in practice and perhaps sensitive to assumptions about the world. 
Honestly I don't see the motivation for worrying about the decisions of "other agents sufficiently similar to oneself" at all. It doesn't seem useful to me, right now, making decisions or adopting a policy, and it doesn't seem useful to build into an A.I. either except in very specific cases where many copies of the A.I. are likely to interact. The heuristic arguments that this is important aren't convincing to me because they are sufficiently elaborate that it seems other assumptions about the way the environment accesses/includes one's policy could easily lead to completely different conclusions. 

The underlying flaw I see in many pro-FDT style arguments is that they tend to uncritically accept that if adopting FDT (or policy X) is better in one example that adopting policy Y, policy X must be better than policy Y, or at least neither one is the best policy. But I strongly suspect there are no free lunch conditions here - even in the purely Decision Theoretic context of AIXI there are serious issues with the choice of prior being subjective, so I'd expect it to be even worse if one allows the environment read/write access to the whole policy. I haven't seen any convincing argument that there is some kind of "master policy." I suppose if you pin down a mixture rigorously defining how the environment is able to read/write the policy then there would be some Bayes optimal policy, but I'm willing to bet it would be deviously hard to find or even approximate.

Once the person already exists, it doesn’t matter what % of agents of a certain type exist. They exist—and as such, they have no reason to lose out on free value. Once you already exists, you don’t care about other agents in the reference class.

This means that you cannot credibly precommit to paying in a gamble (if coin comes up tails, you pay $1, otherwise you receive $20), since if coin comes up tails "you don't care about other variants" and refuse to pay.

I think the difference between "FDT" and "CDT"[1] in these scenarios can be framed as a difference in preferences. "FDT" values all copies of itself equally; "CDT" has indexical values, only caring about the version of itself that it actually finds itself as. As such the debate over which is more "rational" mostly comes down to a semantic dispute.


  1. quotation marks because this difference in preferences is really orthogonal to CDT/FDT but it reproduces the way they are usually argued to act. ↩︎

(Will be using "UDT" below but I think the same issue applies to all subsequent variants such as FDT that kept the "updateless" feature.)

I think this is a fair point. It's not the only difference between CDT and UDT but does seem to account for why many people find UDT counterintuitive. I made a similar point in this comment. I do disagree with "As such the debate over which is more “rational” mostly comes down to a semantic dispute." though. There are definitely some substantial issues here.

(A nit first: it's not that UDT must value all copies of oneself equally but it is incompatible with indexical values. You can have a UDT utility function that values different copies differently, it just has to be fixed for all time instead of changing based on what you observe.)

I think humans do seem to have indexical values, but what to do about it is a big open problem in decision theory. "Just use CDT" is unsatisfactory because as soon as someone could self-modify, they would have incentive to modify themselves to no longer use CDT (and no longer have indexical values). I'm not sure what further implications that has though. (See above linked post where I talked about this puzzle in a bit more detail.)

I'm surprised Wei Dai thinks this is a fair point. I disagree entirely with it: FDT is a decision theory and doesn't in and of itself value anything. The values need to be given by a utility function.

Consider the Psychological Twin Prisoner's Dilemma. Given the utility function used there, the agent doesn't value the twin at all: the agent just wants to go home free as soon as possible. FDT doesn't change this: it just recognizes that the twin makes the same decision the agent does, which has bearing on the prison time the agent gets.

FDT is a decision theory and doesn’t in and of itself value anything. The values need to be given by a utility function.

I explicitly said that this difference in values is meant to reproduce the way that FDT/CDT are usually argued to act in these sorts of scenarios, but is actually orthogonal to decision theory per se.

Psychological Twin Prisoner’s Dilemma

This scenario is a stronger one for the decision theory FDT. But that's not the sort of scenario I was referring to: the argument in my comment applies to scenarios where one of the copies makes itself worse off to benefit the others, like the Bomb or transparent Newcomb. These were the main topic of discussion of the post, and I still think it's accurate to say that the difference in intuitions between CDTists/FDTists here comes down to a values/semantic dispute.

Heighn’s response to this argument is that this is a perfectly fine prescription.

Note that omnizoid hasn't checked with me whether this is my response, and if he had, I would have asked him to specify the problem more. In my response article, I attempt to specify the problem more, and with that particular specification, I do indeed endorse FDT's decision.

The description is exactly as you describe in your article.  I think my original was clear enough, but you describe your interpretation, and your interpretation is right.  You proceed to bite the bullet.  

Your original description doesn't specify subjunctive dependence, which is a critical component of the problem.