Why Prefer Any Decision Theory?

J Bostock

tl;dr Functional Decision Theory does actually give the right answers. People who construct scenarios where it doesn't, or take umbrage at the idea of "fair problems" are falling foul of symmetry arguments, and would never apply this level of scrutiny to any other system of making decisions.

Intro

Bentham's Bulldog recently posted an attempted takedown of Functional Decision Theory on LessWrong. This is probably the second bravest post I've seen someone in the EA/Rat sphere post on LessWrong.

His first argument was that FDT is not mathematically well-defined, because logical counterfactuals are not well-understood and, as he argues, can never be well-defined. I don't know enough about the state of the logical counterfactual research, so I'll leave that to a pro decision theorist to explain.

His second argument was that FDT gives the wrong answer sometimes. I think that he skips up and down different levels of demand for rigor, when talking about different decision theories. I think FDT beats CDT, or at least ties, at basically every point on the spectrum between totally abstract and totally practical.

Decision Problems as a Tower of Assumptions

The standard jumping off point for decision theories is the set of fair problems with perfect information about the overall scenario:

You get to make decisions
Various decision theorist trickster gods such as Omega get to perfectly simulate you
You know, ahead of time, everything about the situation including what kinds of simulations you might be put in
1. This doesn't mean you have perfect information about which of two identical situations you're in, once the situation starts, but it does mean you can have a well-calibrated Bayesian prior over it.
The agent can only simulate your actions, and doesn't have access to information about your decision theory

This includes the famous Newcomb's Problem^[1]. It's worth working through Newcomb's problem in an FDT language, since that will be instructive for cases in the future. FDT reasons through problems as follows:

In this world, there are two instances of FDT, which are both given the same input (a decision theorist trickster saying "I am Omega, this is Newcomb's problem) so they both have to give the same output as well
- If they both choose two-box, then the reward is $1,000.
- If they both choose one-box, then the reward is $1,000,000.
Therefore, the optimal output for FDT, is to one-box
Therefore [outputs one-box]

Newcomb's Revenge

Bentham's Bulldog has brought up Newcomb's Revenge^[2], which does not fall into this class. Why not?

Let's try to set Newcomb's Revenge up using the previous ruleset.

In the world of Newcomb's Revenge, there are two instances of FDT which affect the world, one outside of Omicron, and one inside of Omicron.
Both of them can decide independently, because the simulated one inside Omicron sees a decision theorist trickster God saying "I am Omega, this is Newcomb's problem" the one outside Omicron sees a decision theorist trickster God saying "I am Omicron, this is Newcomb's Revenge".
- If both instances output one-box, total reward is $0
- If Omicron instance outputs one-box and Omega instance outputs two-box, total reward is $1,000,000
- If Omicron instance outputs two-box and Omega instance outputs one-box, total reward is $1,000
- If both instances output two-box, total reward is $1,001,000
The optimal choice is for both instances to two-box, since this gets a reward of $1,001,000
Therefore [outputs two-box for Omicron, outputs two-box for Omega]

But wait? I thought FDT chose one-box in Newcomb's Problem? Well, in this case we've given it perfect information! It knows it's in Newcomb's Revenge world, so it's changed its answer!

In order to get FDT to fail in Newcomb's Revenge, we need the simulated FDT agent to believe that it's in the original Newcomb's problem. This drops rule 3.

Why Not Allow Deception?

The problem with allowing deception is that the class of problems with deception does not have any winners in terms of decision theory! Roughly, if Omicron can trick you, then for any problem where a decision theory believe's it's in situation , there exists a symmetric problem , which is identical but with the payoff matrix reversed.

We can a problem we might call "Newcomb's Apology" where Omelette simulates your decision in Newcomb's problem, then puts the $1,000,000 dollars in the box iff you one-box. In this case, CDT gets $1,000 and FDT gets $1,001,000, exactly parallel to Newcomb's revenge where CDT gets $1,001,000 and FDT gets $1,000.

"Unfair Problems"

There's an equivalent issue where, if you drop rule 4 and let the decision theory trickster gods make decisions directly based on your decision theory algorithm, then no decision theory can possibly win either. These are typically called "unfair" problems, which sounds like a cop-out, but I don't think it is. As with the total-lying problems, there's an exact symmetry where if, in one problem, Omnomnom shows up and puts $1,000,000 in the box iff you use CDT, then there exists a corresponding problem where Ompalompa shows up and puts $1,000,000 in the box iff you use FDT. In this case we don't even need a first box.

The reason for rules 3 and 4 is that they provide too large of a space. Dropping rules 3 and 4 runs us into the territory of no-free-lunch theorems. The problem with no-free-lunch theorems is that if you buy into them, you'll quite often give up on lunch forever, and go hungry. As one example, there's the theorem that appears to prove that no brain can ever exist and intelligence is fake, which rules out [gestures vaguely at everything]. A good rule of thumb is that when you run into a no-free-lunch theorem, you need a prior.

Priors and Weightings

If you don't want to think in terms of Bayesian Priors, how about we think about getting an overall "score" for each decision theory by giving a weight to each possible problem, and then adding up the scores, multiplied by the weight. Let's say that our weights have to add to 1, without loss of generality. For the problems we've looked at, we'll get the following scores:

Problem	Weight	FDT Score	CDT Score
Newcomb		$1,000,000	$1,000
Revenge (simulated agent deceived)		$1,000	$1,001,000
Apology (simulated agent deceived)		$1,001,000	$1,000
CDT <3 (unfair)		$0	$1,000,000
FDT <3 (unfair)		$1,000,000	$0

Now we might say that revenge and apology are in some sense symmetrical, and that CDT <3 and FDT <3 are in some sense symmetrical. If we do that, we ought to enforce and . So then the only difference beteween the two comes from . FDT still wins so long as we respect the symmetry of the system!

Isolated Demands for Rigor

Now I've gone through a huge amount of stuff here, because it's worth going through the maths to justify an intuition that everyone already has: mostly focus on scenarios where you have a decent model of the world.

As an example: I don't torture people, because I think it's wrong, because of evidence. Now it is possible to construct a world in which this ethical rule is false: suppose that actually, every person except me has a four-dimensional wire in place of their brain, which goes to a five-dimensional daemon consciousness which actually loves being tortured and just role-plays as someone who dislikes it. Obviously this is stupid and we don't take this into account in the real world. Obviously we mostly evaluate theories in the worlds where people are basically correct about the world.

(And of course, the possibility of that is immediately cancelled out by the possibility of 5-d daemons who hate torture and are role-playing).

Now you might say, OK, but Newcomb's problem is pretty contrived. You might bucket the scenarios like this:

Realistic: {torture is bad for normal reasons}
Unrealistic: {Newcomb's problem, Newcomb's revenge, Newcomb's Apology, ..., torture is good because of 5-d daemons, torture is bad because of 5-daemons}

In which case ooh boy do I have some examples for you.

Parfit's Hitchhiker

Suppose you're dying of thirst in the desert. Someone comes along and offers to drive you to the nearest town, but only if you give them money to cover their detour. You don't have money on you, but can take some out when you get there. They will only take you if they think you'll pay. Do you pay?

FDT says yes, CDT says no. EDT (Oh you thought you were getting off scot-free, EDT?) also says no. If the driver is a good predictor then FDT lives, CDT and EDT die in the heat of the sands.

Now, you may say, the driver is probably not a great predictor of me. FDT was originally invented to reason about AIs, who could inspect each other's source code, and probably can tell what decision theory each other are following. The random driver cannot do that, but they can get some information about you! People are constantly leaking information about what rules they follow, in some cases by posting long blogposts which tell anyone reading them "I DO NOT PAY IN PARFIT'S HITCHHIKER AND I GIVE IN TO BLACKMAIL".

(To be clear, I think that Bentham's Bulldog probably would pay in Parfit's Hitchhiker, even if there were no consequences to not paying, but for reasons not well captured by utilitarian CDT)

CDT does not, in general, have a good way to pre-commit to actions. Nor does EDT. Since pre-commiting to actions is extremely common in real life ("I will hire you if and only if I think you won't slack off and cause trouble for me") this is a huge deal which favours FDT over EDT and CDT.

Updateless decision theory does, and indeed Bentham's Bulldog mentions it as an alternative to CDT. Updateless decision theory has a bunch of its own problems, which I won't go into here, since this post isn't supposed to litigate between UDT and FDT, but rather to show the non-validity of a very particular argument.

(I think there's also a weird set of self-modifications that a CDT agent might perform, which switches it into a thing called son-of-CDT, which is a bit more like FDT but not quite the same, but I have honestly only seen this come up once and I think it's deep MIRI lore)

Summary

If we limit ourselves to fair problems without deception as to which problem you're in then it makes sense to say that one decision theory is better than another
- And FDT wins in lots of these problems
If you expand your universe to unfair problems, or allow a more general notion of deception, then you can construct arbitrary problems where any decision theory wins
If you then apply a metric to these problems, by symmetry, only the component of fair problems with limited deception matters
- So we're back to FDT winning
If you actually care about non-contrived real problems, then the most common issue which comes up which is decision-theoretically relevant is pre-commitment
- But FDT and UDT (and a few weird others) are the only systems which can pre-commit to things

^{^}
Well-known. Omega offers you the choice to take or leave $1,000, and, if it predicts you will leave the $1,000 on the table, gives you $1,000,000.
^{^}
Omicron offers you the same choice as Omega, but gives you $1,000,000 if you take the $1,000 in Newcomb's problem. You can still take or leave the $1,000 but this doesn't really matter at all, you might as well take it.

There are other reasonable definitions of "fair" according to which Newcomb's Problem is clearly unfair: the predictor punishes people who pick the most useful option in the given situation, since "usefulness" is a causal term. So it punishes specifically agents who pick actions according to causal expected utility. There are other cases of uncontroversially fair problems, like tragedy of the commons and prisoners dilemma type situations, but Newcomb's Problem is not one of them.

CDT does not, in general, have a good way to pre-commit to actions. Nor does EDT. Since pre-commiting to actions is extremely common in real life ("I will hire you if and only if I think you won't slack off and cause trouble for me")

I disagree. The example is not a case of a pre-commitment. Pre-commitments move decisions from the future into the present, such that your future self executes the decided action with slavish certainty as if under hypnotic suggestion, such that nothing is left to decide in the future. Humans generally can't do this. We may think we can "pre-commit" to washing the dishes tomorrow, but when tomorrow comes, we still have to decide whether to wash the dishes or not. Past "pre-commitments" are then just recommendations about what to do from our past selves that we may safely ignore, like we can ignore recommendations from other people.

There are other reasonable definitions of "fair" according to which Newcomb's Problem is clearly unfair: the predictor punishes people who pick the most useful option in the given situation, since "usefulness" is a causal term. So it punishes specifically agents who pick actions according to causal expected utility.

Whether you call this "fair" or not, if FDT outputs better decisions in a larger class of problems than CDT does, that's still a win for FDT. It's relevant that FDT doesn't seem to have this failure mode, since if you design a situation where one-boxing is obviously a bad decision, FDT starts two-boxing.

FDT doesn't pay in Parfit Hitchhiker against a human. There being some correlation between your decision and their prediction is far less than required by FDT. It's nowhere close to being the same algorithm.

Many of the interesting problems need some kind of "spirit of FDT." It's fair to complain that this is not formalized well.

This is because the human is a bad predictor of FDT-in-abstract though, right? And if you're a human implementing FDT-as-available-to-you, then the other human is probably a pretty good predictor of you. Does this not rescue it?

The goodness of their prediction doesn't matter to FDT. What matters is if they're running the same algorithm, which they aren't, regardless of what you do. They're observing things like facial expressions that are correlated with your decision in ways you can't control but which have nothing to do with the algorithm you use.

The human is reading your own beliefs about what you'll do in the future. It's not simulating any algorithm you're using.

CDT and FDT both want to self modify to be the type of person who pays, but they don't pay. Or really, they want to self modify to believe they will pay and then don't pay.

Edit: I'm not sure about this, maybe FDT can pay based on the beliefs being a consequence of its output? It wouldn't be because of the human predictor then.

To the extent FDT is picking its decision based on the way the decision influences your beliefs about the decision, that seems dangerous. In this case the belief is self-confirming, but would FDT lead you to form false beliefs if you get rewarded for doing so?

FDT will pay $50 to temporarily implant a false belief that it will pay $100 to the driver. Something seems unfair about this whole setup.

If you actually care about non-contrived real problems, then you need a huge

I think you cut your sentence off here, did you plan to add something more?

I prefer to believe the period was accidentally replaced with a letter 'e'.

I had another comment with a general overview of my own issues with FDT which are different. But I want to ago through and address some other points that are more tangential.

The standard jumping off point for decision theories is the set of fair problems with perfect information about the overall scenario

This is quite different then how most decision theory and game theory courses will introduce normal-form games and these aren't always assumed. These also have nothing to do with "fairness," fair problems are not typically a formal basis.

If we assume an imperfect predictor, then Newcomb's problem is not a perfect information, normal-form game since the predictor is acting without common knowledge of other player's strategies. If the predictor is a perfect predictor, then it can be understood as a normal-form game, but there is no decisions (by definition, whatever it predicted will always be right so you have no choice over the outcome, it is predetermined as whatever Omega predicted). The original framing of Newcomb's problem is evidentiary, it does not assume perfect knowledge of how it is making its predictions or what their accuracy is, it only assumes a strong evidentiary basis to believe the predictions are accurate.^[1]

For reference, a normal-form game is typically defined with the below assumptions:

To someone who adopts evidential decision theory, you could reasonably say the set of fair problems is just every problem for which the outcomes have evidentiary dependence on your decision. The CDTer could similarly say the set of fair problems is those with causal dependence on your decisions. FDT to my understanding would consider the set of fair problems those problems where the outcomes are logically dependent on your choices. None of these would inherently consider newcomb's problem unfair, it just depends on how you judge the outcomes. CDT is going to judge the outcome just by the consequences of your decision, not by some assumed logical relationship.

^{^}
As described by Nozick: "You know that this being has often correctly predicted your choices in the past (and has never, so far as you know, made an incorrect prediction about your choices), and furthermore you know that this being has often correctly predicted the choices of other people, many of whom are similar to you, in the particular situation to be described below"

My recent critique of FDT (coming from an economic background) is that in economics we have 2 real uses for a decision theory.

Descriptive (e.g., describing how people actually behave and interact around uncertain utility expectations resulting from their decisions)
Prescriptive (i.e., how people should act to optimize their utility expectations).

FDT seems to be obviously much less useful for 1 and not clearly more useful for 2, when trying to apply it to real world models it claims to perform in. Even granting in the abstract the assertion that FDT's answers are more correct (which is not an objective criteria), it doesn't follow that we should prefer it. It seems facially (and formally) less useful for deriving prescriptions even in examples proponents cite.

Edit: to be clear, it doesn't claim to be useful as a descriptive theory. Advocates claim it's advantage is prescriptive. This would be fine if it could offer better prescriptions. But on the issues proponents claim it offers better prescriptions, the same proponents (e.g. Yudkowsky) citing those issues (e.g. voting) are unable to actually articulate better prescriptions. They, in fact, admit that FDT does not offer a clear way of prescribing actions, while other decision theories do offer clear prescriptions.

FDT seems to be obviously much less useful for 1

What? People seem to behave obviously more according to FDT intuitions than either CDT or EDT intuitions. See also: https://www.lesswrong.com/posts/FCffGHJnYfdE2DgRe/humans-do-acausal-coordination-all-the-time and many other similar posts.

Apologies for the double post, but to be a bit more precise, since my last comment was somewhat dismissive. As I discuss in my post, the reason people will give for voting even when they do not expect their vote to have an impact in excess of the cost of voting is because they place some value on voting. This is entirely consistent with standard behavioral models and is causal.

People get a warm and fuzzy feeling for engaging in pro-social, altruistic behavior. This has a real utility value. They also get signaling benefits from engaging in public pro-social behavior. This also has real utility benefits. Any serious model of real world utility will include these effects.

FDT does not offer clear intuitions with regards to behavior like voting. Even proponents (like Yudkowsky) will confess they are unable to offer a way of valuing voting under FDT. But some plain intuitions from FDT directly contradict observed behaviors and the explicit views of FDT advocates as I mention in my previous post.

FDT would imply "the more people that are similar to you, the more value you should place vote" which runs counter to my intuition and the intuition of proponents like Yudkowsky under FDT (who proports to, at least circumstantially, endorse not voting "if you don't expect any of the elections to be close").

If you think people weigh logical counter factual to make decisions, I don't think you have talked to many people.

See my post which discusses how we understand it from behavioral economics. The standard view is pretty simple: people place real utility on complex values and actions. I discuss the example of voting in detail.