This post has benefited greatly from discussion with Sam Eisenstat, Caspar Oesterheld, and Daniel Kokotajlo.
Last year, I wrote a post claiming there was a Dutch Book against CDTs whose counterfactual expectations differ from EDT. However, the argument was a bit fuzzy.
I recently came up with a variation on the argument which gets around some problems; I present this more rigorous version here.
Here, "CDT" refers -- very broadly -- to using counterfactuals to evaluate expected value of actions. It need not mean physical-causal counterfactuals. In particular, TDT counts as "a CDT" in this sense.
"EDT", on the other hand, refers to the use of conditional probability to evaluate expected value of actions.
Put more mathematically, for action , EDT uses , and CDT uses . I'll write and to keep things short.
My argument could be viewed as using Dutch Books to formalize Paul Christiano's "simple argument" for EDT:
Suppose I am faced with two options, call them L and R. From my perspective, there are two possible outcomes of my decision process. Either I pick L, in which case I expect the distribution over outcomes P(outcome|I pick L), or I pick R, in which case I expect the distribution over outcomes P(outcome|I pick R). In picking between L and R I am picking between these two distributions over outcomes, so I should pick the action A for which E[utility|I pick A] is largest. There is no case in which I expect to obtain the distribution of outcomes under causal intervention P(outcome|do(I pick L)), so there is no particular reason that this distribution should enter into my decision process.
However, I do not currently view the argument as favoring EDT over CDT! Instead it supports the weaker claim that the two had better agree. Indeed, the Troll Bridge problem strongly favors CDT whose expectations agree with EDT over EDT. So, this is intended to provide a strong constraint on a theory of (logical) counterfactuals, not necessarily abolish the need for them. (However, the constraint is a strong one, and it's worth considering the possibility that this constraint is all we need for a theory of counterfactuals.)
The Basic Argument
Consider any one action for which , in some decision problem. We wish to construct a modified decision problem which Dutch-books the CDT.
My argument requires an assumption that the action is assigned nonzero probability. This is required to ensure is defined at all (since otherwise we would be conditioning on a probability zero event), but also for other reasons, which we'll see later on.
Anyway, as I was saying, we wish to take the decision problem which produces the disagreement between and , and from it, produce a new decision problem which is a Dutch book.
The new decision problem will be a two-step sequential decision problem. Immediately before the original decision, the bookie offers to sell the agent the following bet , for a price of utilons. is a bet conditional on , in which the buyer is betting against 's expectation and in favor of 's expectation. For example:
B: In the case that , the seller of this certificate owes the purchaser of this certificate , where is the signum .
The key point here is that because the agent is betting ahead of time, it will evaluate the value of this bet according to the conditional expectation .
If , so , then the value of in the case that is . The expectation of this is , which again, we have supposed is positive. So the overall expectation is . Setting low enough ensures that the agent will be happy to take this bet. Similarly, if , the value of the bet ends up being and the agent still takes it for the right price.
Now, the second stage of our argument. As the agent is making the decision the bookie again makes an offer. (In other words, we extend the original set of actions to contain twice as many valid actions; half in which we accept, half in which we don't accept.) The new offer is this: "I will buy the bet from you for utilons."
Now, since the agent is reasoning during its action, it is evaluating possible actions according to ; so its evaluation of the bet will be different. Here, the argument splits into two cases:
- When considering the action , the bet's expected value is , which is zero. So the agent prefers the new action which is like except is sold back to the bookie for utilons.
- When considering any other action, the bet is worth zero automatically, since it only pays out anything when . So, the agent will gladly take the bookie's payment of to sell the bet back.
So the result is the same in either case -- CDT recommends selling back to the bookie no matter what.
The agent has paid to buy , and gotten only when selling back. Buying and selling the contract cancel each other out. So the agent is down utilons for no gain!
Here is an illustration of the entire Dutch Book:
Non-Zero Probability of
A really significant assumption of this argument is that actions are given nonzero probability -- particularly, that the target action has a nonzero probability. This assumption is important, since the initial evaluation of is . If the probability of action were zero, there would be no price the agent would be willing to pay for the bet.
The assumption is also required in order to guarantee that is well-defined -- although we could possibly use tricks to get around that, specifying a variant of EDT which defines some expectation in all cases.
Many of my arguments for CDT=EDT rest on this assumption, though, so it isn't anything new. It seems to be a truly important requirement, rather than an artefact of the argument.
There are many justifications of the assumption which one might try to give. I have often invoked epsilon-exploration; that is, the idea that some randomness needs to be injected into an agents actions in order to ensure that it can try all options. I don't like invoking that as much as I used to. I might make the weaker argument that agents should use the chicken rule, IE, refuse to take any action which they can prove they take. (This can be understood as weaker than epsilon-exploration, because epsilon-exploration can be implemented by the epsilon-chicken rule: take any action which you assign probability less than epsilon to.) This rule ensures that agents can never prove what they do (so long as they use a sound logic). We can then invoke the non-dogmatism principle, which says that we should never assign probability 0 to a possibility unless we've logically refuted it.
Or, we could invoke a free-will principle, claiming that agents should have the subjective illusion of freedom.
In the end, though, what we have is an argument that applies if and only if has nonzero probability. All the rest is just speculation about how broadly this argument can be applied.
An interesting feature of the argument is that the less probable action according to the agent, the less money we can get by Dutch-booking them on discrepancies between and . This doesn't matter for traditional Dutch Book arguments -- any sure loss is considered a failure of rationality. However, if we take a logical-induction type approach to rationality, smaller Dutch Books are less important -- boundedly rational agents are expected to lose some money to Dutch Books, and are only trying to avoid losing too much.
So, one might consider this to be a sign that, in some hypothetical bounded-rationality approach to decision theory, lower-probability actions would be allowed to maintain larger discrepancies between and , and maintain them for longer.
Probabilities of Actions in the Modified Problem
A trickier point is the way probabilities carry over from the original decision problem to the modified problem. In particular, I assume the underlying action probabilities do not change. Yet, I split each action in two!
One justification of this might be that, for agents who choose according to CDT, it shouldn't change anything -- at the moment of decision, the bet is worth nothing, so it doesn't bias actions in one direction or another.
Ultimately, though, I think this is just part of the problem setup. Much like money-pump arguments posit magical genies who can switch anything for anything else, I'm positing a bookie who can offer these bets without changing anything. The argument -- if you choose to accept it -- is that the result is disturbing in any case. It does not seem likely that an appealing theory of counterfactuals is going to wriggle out of this specifically by denying the premise that action probabilities remain the same.
Note, however, that it is not important to my argument that none of the new actions get assigned zero probability. It's only important that the sum of and in the new problem equals the original decision problem's .
Counterfactual Evaluation of Bets
Another assumption I didn't spell out yet is the interaction of bet contracts with counterfactual evaluations.
I assume that counterfacting on accepting the bet does not change probabilities of other things, such as the probability of the actions. This could be a large concern in general -- taking a conditional bet on might make us want to choose on purpose, in order to cash in on the bet. This isn't a problem in this case, since the agent later evaluates the bet to be worth nothing. However, that doesn't necessarily mean it's not an issue according to the counterfactual evaluation, which would chance the perceived value of . Or, even more problematic, the agent's counterfactual expectations might say that taking the bet would result in some very negative event -- making the agent simply refuse. So the argument definitely assumes "reasonable counterfactual evaluations" in some sense.
On the other hand, this kind of reasoning is very typical for Dutch Book arguments. The bets are grafted onto the situation without touching any of the underlying probabilities -- so, e.g., you do not normally ask "is accepting the bet against X going to make X more probable?".
Handling Some Possible Objections
Does the Bookie Cheat?
You might look at my assumptions and be concerned that the bookie is cheating by using knowledge which the agent does not have. If a bookie has insider information, and uses that to get a sure profit, it doesn't count as a Dutch Book! For example, if a bookie knows all logical facts, it can money-pump any agent who does not know all logical facts (ie, money-pump any fixed computable probability distribution). But that isn't fair.
In this case, one might be concerned about the agent not knowing its own action. Perhaps I'm sneaking in an assumption that agents are uncertain of their own actions, and then Dutch-booking them by taking advantage of that fact, via a bookie who can easily predict the agent's action.
To this I have a couple of responses.
The bookie does not know the agent's choice of action. The bookie's strategy doesn't depend on this. In particular, note the disjunctive form of the argument: either the agent prefers , in which case is worthless for one reason, or the agent prefers a different action, in which case is worthless for a different reason. The bookie is setting things up so that it's safe no matter what.
The agent knows everything the bookie knows, from the beginning. All the bookie needs in order to implement its strategy is the the values of and , and, in order to set the price , the probability of the action .These are things which the agent also knows.
The Agent Justifiably Revises Its Position
Another critique I have received is that it makes perfect sense that the agent takes the bet at the first choice point and later decides against it at the second choice point. The agent has gained information -- namely, when considering an action, the agent knows it will take that action. This extra information is being used to reject the bet. So it's perfectly reasonable.
Again I have a couple of responses.
The agent does not learn anything between the two steps of the game. There is no new observation, or additional information of any kind, between the step when the bookie offers and the step when the bookie offers to buy back. As the agent is evaluating a particular action, it does not "know" it will carry out that action -- it is only considering what would happen if it carried out that action!
Even if the agent did learn something, it would not justify being Dutch-booked. Consider two-stage games in which an agent is offered a bet, then learns some information, and then given a choice to sell the bet back for a fee. It is perfectly reasonable for an agent to in some cases sell the bet back. What makes a Dutch book, however, is if the agent always sells the bet back. It should never be the case that an agent predictably won't want the bet later, no matter what it observes. If that were the case (as it is in my scenario), the agent should not have accepted the bet in the first place. It's critical here to again note that the agent prefers to sell back the bet for every possible action -- that is, the original actions are always judged worse than their modified copies in which the sell-back deal is taken. So, even if we think of the agent as "learning" which action it selects when it evaluates selecting an action, we can see that it decides to sell back the bet no matter what it learns.
But Agents WILL Know What They'll Do
One might say that the argument doesn't mean very much at all in practice because from the information it knows, the agent should be able to derive its action. It knows how it evaluates all the actions, so, it should just know that it takes the argmax. This means the probability of the action actually taken is 1, and the probability of the rest of the actions is zero. As a result, my argument would only apply to the argument actually taken -- and a CDT advocate can easily concede that when is the action actually taken. It's other actions that one might disagree about. For example, in Newcomb, classical physical-causality CDT two-boxes, and agrees with EDT about the consequences of two-boxing. The disagreement is only about the value of the other action.
(Note, however, that the CDT advocate is still making a significant concession here; in particular, this rules out the classic CDT behavior in Death and Damascus and many variations on that problem. I don't know exactly how a classic physical-causality CDT advocate would maintain such a position.)
There are all kinds of problems with the agent knowing its own action, but a CDT advocate can very naturally reply that these should be solved with the right counterfactuals, not by ensuring that the agent is unsure of its actions (through, e.g., epsilon-exploration).
I'll have more to say about this objection later, but for now, a couple of remarks.
First and foremost, yeah, my argument doesn't apply to actions which the agent knows it won't take. I think the best view of the phenomenon here is that, if the agent really does know exactly what it will do, then yeah, the argument really does collapse to saying its evidential expectations should equal its counterfactual expectations for that one action. Which is like saying that, if , then we had better have -- counterfacting on something true should never change anything.
Certainly it's quite common to think of agents as knowing exactly what they'll do; for example, that's how backwards-induction in game theory works. And at MIRI we like to talk about problems where the agent can know exactly what it can do, because these stretch the limits of decision theory.
On the other hand, realistic agents probably mostly don't know with certainty what they'll do -- meaning my argument will usually apply in practice.
The agent might not follow the recommendations of CDT. Just because a CDT-respecting agent would definitely do a specific thing given all the information, does not mean that we have to imagine the agent in my argument knowing exactly what it will do. The agent in the argument might not be CDT-respecting.
Here on LessWrong, and at MIRI, there is often a tendency to think of CDT or EDT as the agent -- that is, think of agents as instances of decision theories. This is a point of friction between MIRI's way of thinking and that of academic philosophy. In academic philosophy, the decision theory need not be an algorithm the agent is actually running (or indeed, could ever run). A decision theory is a normative theory about what an agent should do. This means that CDT, as a normative theory, can produce recommendations for non-CDT agents; and, we can judge CDT on the correctness or incorrectness of those recommendations.
Now, I think there are some advantages to the MIRI tendency -- for example, thinking in this way brings logical uncertainty to the forefront. However, I agree nonetheless with making a firm distinction between the decision theory -- a normative requirement -- and the decision procedure -- a real algorithm you can run, which obeys the normative requirement. The logical induction algorithm vs the logical induction criterion illustrates a similar idea.
Academic decision theorists extend this idea to the criticism of normative principles -- such as CDT and EDT -- for their behavior in scenarios which an agent would never get into, if it were following the advice of the respective decision theory. This is what's going on in the bomb example Will MacAskill uses. (Nate Soares argues against this way of reasoning, saying "decisions are for making bad outcomes inconsistent".)
If we do endorse the idea, this offers further support for the argument I'm making. It means we get to judge CDT for recommending that an agent accept a Dutch-book, even if the scenario depends on uncertainty over actions which a CDT advocate claims a CDT-compliant agent does not have.
This is particularly concerning for CDT, because this kind of argument is used especially to defend CDT. For example, it's hard to justify smoking lesion as a situation which a CDT or EDT agent could actually find itself in; but, a CDTer might reply, a decision theory needs to offer the right advice to a broad variety of agents. So CDT is already in the business of defending normative claims about non-CDT agents.
One might object: the argument simply illustrates a dynamic inconsistency in CDT. We already know that both CDT and EDT are dynamically inconsistent. What's the big deal?
Let me make some buckshot remarks before I dive into my main response here:
- First, this isn't just any old dynamic inconsistency. This is a Dutch Book. Dutch Books have a special status in the foundations of Bayesianism. So, one might consider this to be more concerning than mere dynamic inconsistency.
- Second, dynamic consistency is still bad. We may still take examples of dynamic inconsistency as counting against a normative theory, and seek a theory which is dynamically consistent in a fairly broad range of cases.
- Third, I think EDT really does have a dynamic-consistency advantage over CDT, and my argument is just one example of that.
This third point is the one I want to expand on.
In decision problems where the payoff depends only on actions actually taken, not on your policy, there is a powerful argument for the dynamic consistency of EDT:
Think of the entire observation/action history as a tree. Dynamic consistency means that at earlier points in the tree, the agent does not prefer for the decisions of later selves to be different from what they will be. The restriction to actions (not policies) mattering for payoffs means this: selecting one action rather than another changes which branch we go down in the tree, but does not change the payoffs of other branches in the tree. This means that even from a perspective beforehand, an action can only make a difference down the branch where it is taken -- no spooky interactions across possible worlds. As a result, thinking about possible choices ahead of time, the contribution to early expected utility is exactly the expected utility that action will be assigned later at the point of decision, times the probability the agent ends up in that situation in the first place. Therefore, the preference about the decision must stay the same.
So, EDT is a dynamically consistent choice when actions matter but policy does not.
Importantly, this is not a no-Newcomblike-problems condition. It rules out problems such as counterfactual mugging, transparent Newcomb, Parfit's hitchhiker, and XOR Blackmail. However, it does not rule out the original Newcomb problem. In particular, we are highlighting the inconsistency where CDT wishes to be a 1-boxer, and similar cases.
Now, you can make a very similar argument for the dynamic consistency of CDT, if you define dynamic consistency based on counterfactuals: would you prefer to counterfact on your future self doing X? For Newcomb's problem, this gets us back to consistency -- for all that the CDT agent wishes it could be EDT, it would have no interest in the point-intervention that makes its future self one-box, for the usual reason: that would not cause Omega to change its mind.
However, this definition seems not to capture the most useful notion of dynamic consistency, since the same causal CDT agent would happily precommit to one-box. So I find the EDT version of the argument more convincing. I'm not presently aware of a similar example for EDT -- it seems Omega needs to consider the policy, not just the action really taken, in order to make EDT favor changing its actions via precommitments.
More on Probability Zero Actions
As I've said, my argument depends on the action having nonzero probability. EDT isn't well-defined otherwise; how do you condition on a probability-zero event? However, there are some things we could try in order to get around the problem of division by zero - filling in the undefined values with sensible numbers. For example, we can use the Conditional Oracle EDT which Jessica Taylor defined, to "fill in" the otherwise-undefined conditionals.
However, recall I said that the argument was blocked for two reasons:
- EDT not being defined.
- The bet conditional on is worth nothing if has probability zero.
So, if we've somehow made well-defined for probability-zero , can we patch the second problem?
We can try to flip the argument I made around: for probability zero actions, we pay the agent to take on a bet (so that it will do it even though it's worthless). Then later, we charge the agent to offload the bet which it now thinks is unfavorable.
The problem with this argument is, if the agent doesn't take action anyway, then the conditional bet will be nullified regardless; we can't force the agent into a corner where it prefers to nullify the bet, so, we don't get a full Dutch Book (because we can't guarantee that we make money off the agent -- indeed, we would only make money if the agent ends up taking the action which it previously assigned probability zero to taking).
However, we do get a moderately damning result: limiting attention to just the action and the new alternative which both does and also pays to cancel the bet, *CDT strictly prefers that the agent be Dutch Booked rather than just do . This seems pretty bad: CDT isn't actually recommending taking a Dutch Book, BUT, it would rather take a Dutch Book than take an alternative which is otherwise the same but which does not get Dutch Booked.
So, we can still make a moderately strong argument against divergence between counterfactuals and conditionals, even if actions have probability zero. But not a proper Dutch Book.
A Few Words on Troll Bridge
At the beginning of this, I said that I didn't necessarily take this to be an argument for EDT over CDT. In the past, I've argued this way:
- "Here's some argument that CDT=EDT."
- "Since EDT gets the answer more simply, while CDT has to posit extra information to get the same result, we should prefer EDT."
However, this argument has at least two points against it. First, the arguments for CDT=EDT generally have some assumptions, such as nonzero probability for actions. CDT is a strictly more general framework when those conditions are not met. Theories of rational agency should be as inclusive as possible, when rationality does not demand exclusivity. So one might still prefer CDT.
Second, as I mentioned at the beginning of this post, the Troll Bridge problem strongly favors CDT over EDT. Counterintuitively, it's perfectly possible for a CDT agent to keep its counterfactual expectations exactly in agreement with its conditional expectations, and yet get Troll Bridge right -- even though we are doomed to get Troll Bridge wrong if we directly use our conditional expectations. Insisting on a distinction "protects" us from spurious counterfactual reasoning. (I may go over this phenomenon in more detail in a future post. But perhaps you can see why by reviewing the Troll Bridge argument.)
So, my current take on the CDT=EDT hypothesis is this:
- We should think of counterfactuals as having real, independent truth. In other words, does not reduce to $edt(a)*. Counterfactual information tells us something above and beyond probabilistic information.
- Counterfactuals are subjective in the same way that probabilities are subjective. The "independent truth" of counterfactuals does not mean there is one objectively correct counterfactual which every agent is normatively required to agree with. So there doesn't need to be a grand theory of logical counterfactuals -- there are many different subjectively valid beliefs.
- However, as with probability theory, there are important notions of coherence which constrain subjective beliefs. In particular, counterfactual beliefs should almost always equal conditional beliefs, at least when the antecedent has positive probability.
- Furthermore, conditional beliefs act a whole lot more like stereotypical CDT counterfactuals than most people seem to give them credit for. Something can't correlate with your action unless it contains information you don't have about your action. This is a high bar to pass, and will typically not be passed in e.g. twin prisoner's dilemma. (So, to solve these problems requires something further, e.g. updatelessness, Löbian handshakes, ???).
This is not a strongly held view, but it is the view that has made the most sense of counterfactual reasoning for me.
As I've mentioned in the past, the CDT=EDT hypothesis is almost the most boring possible answer to the question "how do (logical) counterfactuals work?" -- it doesn't do very much to help us solve interesting decision problems. If we factor decision theory into the two parts (1) "What are the (logical) counterfactuals?" (2) "How do we use counterfactuals to make decisions?" then I see the CDT=EDT hypothesis as a solution to (1) which shoves an awful lot of the interesting work of decision theory into (2). IE, to solve the really interesting problems, we would need logically-updateless UDT or even more exotic approaches.
In particular, for variants of Newcomb's problem where the predictor is quite strong but doesn't know as much as the agent does about what the agent will choose, this post implies that TDT either two-boxes, or, is vulnerable to the Dutch Book I construct. This is unfortunate.
Frankly, I find it somewhat embarrassing that I'm still going on about CDT vs EDT. After all, Paul Christiano said, partially in response to my own writing which he cited:
There are many tricky questions in decision theory. In this post, I’ll argue that the choice between CDT and EDT isn’t one of them.
I wish I could say this will be my final word on the subject. The contents of this post do feel quite definitive in the sense of giving a settled, complete view. However, the truth is that it only represents my view as of November or early December of 2019. Late December and early January saw some developments which I'm excited to work out further and post about.