The purpose of this post is twofold: (1) an attempt at understanding UDT and updatelessness; and (2) to explain how I (coming from classical decision theory) view the relation between UDT, updatelessness and EDT and CDT. I'll jump back and forth between these. 


This post was largely based on discussions with James Faville (but all views are mine).


Consider Parfit’s Hitchhiker.

Suppose you're out in the desert, running out of water, and soon to die - when someone in a motor vehicle drives up next to you. Furthermore, the driver of the motor vehicle is a perfectly selfish ideal game-theoretic agent, and even further, so are you; and what's more, the driver is Paul Ekman, who's really, really good at reading facial microexpressions. The driver says, "Well, I'll convey you to town if it's in my interest to do so - so will you give me $100 from an ATM when we reach town?"

causal decision theorist reasons as follows: not paying once I’m in town won’t causally bring about me returning to the desert. As such, I have no incentive to actually pay once I’m in town. Of course, even though you say you’re going to pay, Ekman reads this, says goodbye and drives off in the distance.

An evidential decision theorist reasons as follows: not paying once I’m in town won’t give me any evidence that I’ll be back in the desert. As such, I should not pay when in town. Ekman reads this, and you’re left to die.

What’s clear, though, insofar as it’s possible, is that both EDT and CDT would recommend pre-committing to pay. For example, you might want to give a friend some of your money before ending up in these kinds of situations which you would only get back if you actually pay.

However, you might think that this is ad-hoc, inelegant and impractical and that we would like our decision theory, very roughly, to somehow have an in-built feature that makes it “equivalent” to EDT/CDT + pre-commitment devices for all situations in which you would have wanted the device set up.

Updatelessness is supposed to be the answer.


Flavours of UDT and updatelessness

Updatelessness, as I understand it, is the idea that we should do the thing which we would have wanted to pre-commit to doing. That is, we would like to have pre-committed to paying in Parfit’s Hitchhiker, and thus we should pay; we would like to have pre-committed to one-boxing in transparent Newcombs, and thus we should one-box; and the same for giving in in the counterfactual mugging case, et cetera.

However, there are different variants of updateless decision theory. From this post by Abram Demski:

UDT 1.0, on seeing observation , takes the action  which maximizes the expected utility of "my code outputs action  on seeing observation ", with expected value evaluated according to the prior. 

UDT 1.1, on seeing observation , takes the action  which the globally optimal policy (according to the prior) maps  to. This produces the same result as UDT1.0 in many cases, but ensures that the agent can hunt stag with itself. 

UDT2.0 is like UDT1.1, except it (1) represents policies as programs rather than input-output mappings, and (2) dynamically decides how much time to spend thinking about the optimal policy.

Does it matter if we are thinking about policy selection or program selection as opposed to action selection? Yes, but for the purposes of this post, I think we can safely just proceed with UDT1.0 without loss of generality.


Different questions, different answers?

EDT and CDT respectively tell us to ask the following questions:

  1. In a given decision situation, which action maximizes the expected value with conditionals?
    1. Formally,  .
  2.  In a given decision situation, which action maximizes the expected value with counterfactuals?
    1. Formally,  .[1]

That is, EDT and CDT are answers to the following type of question: (i) "what action maximizes expected value in a given decision situation?".

UDT1.0, on the other hand, as we've seen is an answer to the question: (ii) "what action maximizes expected value in this decision situation with respect to some "earlier"[2] point in time?".

The way I see this: (i) differs from (ii) in that the game is extended backwards such that you have the opportunity to make an ex ante evaluation of the situation; or, at the very least, there is a referenceable past (to which the UDT agent time travels). EDTs and CDTs, on the other hand, reason as they do because to them there's no world beyond the desert and the prospect of the city, no past, and thus no point where you could have made that evaluation. They were born in the desert and now face a choice. That's the scope of those theories.

In other words: the decision situation that is considered by UDT1.0 is different. So, what’s the big deal then? If EDT and CDT are answers to a different question altogether, then framing this as a “disagreement” is not completely right. Furthermore, it’s not clear why it’s an objection to EDT and CDT that the corresponding agents die in Parfit’s Hitchhiker. Rather, it could/should be seen as an objection to the real-world relevance of question (i), the one that EDT and CDT are answers to.

Consider the following (no doubt familiar) exchange:

  • Utilitarian: in a moral decision situation, I maximize happiness - suffering.
  • Objector: that’s not very practical! It’s unreasonable to think that you would have the resources to calculate the expected utility of all of your actions. Indeed, that would probably not maximize utility. We should instead make up some rules that we apply in normal life that might not maximize utility in all instances, but given limited resources are going to perform better, on the whole.
  • Utilitarian: yes, sure, I guess? I don’t find this very interesting and that’s not the question I was asking, but that might be a good approach to making decisions over a lifetime.
  • Objector: you should ask better questions.

This is essentially the vibe I’m getting from the updateless “critique” of EDT and CDT, and (very tentatively) I would respond somewhat similarly to the Utilitarian.[3] Concerns around updatelessness are arguably not fundamental and hopefully we can just add something on top of our regular decision theory (perhaps updatelessness), that lets us survive in Parfit’s Hitchhiker. This is also why I don’t think the response of setting up commitment devices (which should be especially easy for AIs) is a particularly bad one: it’s a practical response to an objection of practical nature.

How I UDT proponents respond:

  1. “I reject the argument: there are deeper reasons to prefer UDT over EDT and CDT that have nothing to do with differences in what questions we’re asking, and how they might differ.”
  2. “I reject the argument: there are deeper reasons to prefer UDT over EDT and CDT that specifically are about the question”. (E.g” the question that motivates UDT is clearly the one we should be asking and not just because it’s more relevant to the real world”.)
  3.  “I agree that the motivation for UDT is ‘practical’, but decision theory (especially as a part of AI alignment research) should be ‘practical’”.

(I’m not sure this is the optimal division of possible responses but I think it tentatively works.)

As for (1) and (2) I’d be excited to hear what these deeper reasons are, and as for (3) I wonder why something like Parfit’s Hitchhiker or the Absent-Minded Driver[4]—which are very contrived and improbable scenarios—would be of much concern. Furthermore, insofar as you think updatelessness is the adequate response to Parfit’s Hitchhiker-y problems, why is it that we can’t integrate updatelessness into EDT or CDT (or vice versa)? 

(Per this post, it’s however clear that the motivation for UDT is not only decision-theoretic, but also anthropic. So I guess I’m merely questioning the decision-theoretic part here, although I think some of my concerns about updatelessness apply in the anthropic case as well.)

The last consideration is what makes up the remainder of this post: my attempt at understanding how UDT and updatelessness relate to EDT and CDT.


EDT, CDT and updatelessness on top

So what does adding updatelessness on top of decision theories amount to? The way I understand this is basically self-modifying (or building your representative AI) in such a way that you (the AI) do(es) the thing in future situations that is ex ante rational from the perspective of this point in time. This is as opposed to UDT, which makes the choice which is ex ante rational from the “earliest” point in time. The cartoonish way of putting this is “I just realized that my decision theory in and of itself does not let me live in Parfit’s Hitchhiker and similar situations, so I’m going to be updateless from now on”. But whether your initial decision theory was EDT or CDT might matter: for example, you might think that your act of self-modification is correlated with other similar agents’ acts of self-modification. So if you are doing this to, for example, “win” a future game of chicken (i.e unconditionally going straight, and the other player being forced to swerve) against these agents, then ETD will tell you that you are making a mistake, whilst CDT will ignore this correlation and probably recommend self-modifying into daring.

(One interesting case of this is CDT agents having the incentive to self-modify into cooperating in the prisoner’s dilemma with highly action-correlated agents because it knows that it’s going to get the chance to do prisoner’s dilemma-style ECL with future branches of itself; and these agents will then of course also cooperate (Oesterheld 2017).)


EDT + updatelessnessUEDT? CDT + updatelessness  UCDT?

Recall the definition of UDT1.0:

UDT 1.0, on seeing observation , takes the action  which maximizes the expected utility of "my code outputs action  on seeing observation ", with expected value evaluated according to the prior. 

But what kind of expected utility? It seems like we are once again at the “conditionals-or-counterfactuals'' crossroad. That is, we can define updateless evidential decision theory (UEDT1.0) as 

[5].


And updateless causal decision theory (UCDT1.0) as


 

(Note that the counterfactual in this case is the logical counterfactual, and we have logical causation.)

We could also easily formulate UEDT1.1 and UCDT1.1, as well as UEDT2.0 and UCDT2.0, mutatis mutandis.

A couple of questions then arise:

  1. Are there circumstances in which EDT + updatelessness and UEDT1.0 come apart?
  2. Are there circumstances in which CDT + updatelessness and UCDT1.0 come apart?
  3. Are there circumstances in which UEDT1.0 and UCDT1.0 come apart?

Short answers: 

  1. Yes. Specifically, situations in which the precise point of the “ex ante evaluation” matters for the decision.
  2. Yes. For the same reason as above. 
  3. Yes (I think). This is what the problem of Troll Bridge is supposed to illustrate, as I understand it.

Related: The lack of performance metrics for CDT versus EDT, etc. by Caspar Oesterheld.
 

  1. ^

    “ □→” is supposed to denote the causal counterfactual.

  2. ^
  3. ^

    I don’t think it’s a good counterargument to say that the solution to the “practical” problem of utilitarianism above is “easy”, whilst it is arguably “difficult” in the UDT case.

  4. ^

    See this post for more examples.

  5. ^

    ” ⌜...⌝ ” is Quine quotation.

25

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 2:31 AM

This comment is a mishmash of several reactions / comments.

1: Thanks for the interesting post!

2: Tongue in cheek: Using CDT seems like a lot of work. How about just picking whatever action we think of first, thus avoiding the need to model the environment and do an expensive expected utility calculation. So it's unclear why we really need CDT in the first place, and we should default to picking whatever action we think of first.

3: The counterfactual type of UDT is much closer to UEDT than UCDT (though "just use logical counterfactuals" is a larger change and a tougher nut to crack than it first might appear at first glance.). A lot of the reason to use something like UDT is to handle cases where other agents know your reasoning procedure and take actions in anticipation of your future decisions, so it's not very useful to do something that looks like intervening only on your decision algorithm and nothing else about the world. (You could define logical counterfactuals in a different way so that UCDT works, but now I think you'd just be sweeping small problems under the rug of a bigger problem.)

4: I'm still trying to figure out a simple way to argue that UDT(1) is mandated by Savage's theorem. (Savage's theorem is the one where you make some assumptions about "rational behavior" and then get out both probabilistic reasoning and expected utility maximization.)

Savage's theorem talks about "actions," "states," and "consequences," but really those are just labels for mathematical objects with certain properties. My suspicion is that games where you need UDT(1.0) are ones where some sleight of hand has been played, and the thing the game calls "actions"/"states" aren't actually "actions"/"states" in the Savage's theorem sense, but your policy still fulfills Savage's conditions to be an "action."

E.g. one Savage postulate is "For all actions a, b, x, and y, and some set of states E, you prefer (a if E else x) to (b if E else x) if and only if you prefer (a if E else y) to (b if E else y)." First, note that this sort of independence of alternatives might not hold  in cases like Newcomb's problem or the absent-minded driver. Second, note that this implicitly says that states are the sorts of things you can condition actions on (and actions are the sorts of things you can condition on states).

  1. :)

  2. Not sure I get your point. Seems like you're saying that EU maximization is analogously poorly motivated?

  3. I'm confused. In this post I use "UEDT" to mean "UDT with conditionals", and "UCDT" to precisely mean "UDT with logical counterfactuals". I'm not saying that this is necessarily the optimal terminology, but it seems like you're thinking of UCDT in a different way here? (Perhaps CDT + updatelessness?)

  4. Seems difficult, but I'd be very interested in reading more!