Can we hybridize Absent-Minded Driver with Death in Damascus?

by Eliezer Yudkowsky 3y1st Aug 20163 min readNo comments

2

Ω 2


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Summary:

I was initially excited to re-encounter the Absent-Minded Driver problem in the light of causal decision theory, because I thought causal decision theory gave a clear-cut wrong answer of "Continue with probability 5/9." If so, it would be a case of CDT shooting off its own foot that didn't involve Omega, Death, or anyone else reading your mind, cloning you, or trying to predict you. The decision theory would have shot off its own foot without postulating anything more than anterograde amnesia or limited disk storage.

However, the resolution which makes the Absent-Minded Driver work under CDT, is a resolution that we can pump money out of in the Death in Damascus case. I'm wondering if there's some way to hydridize the two scenarios to yield a clear-cut case of CDT shooting off its own foot without any other agent being involved.

Background:

In the Absent-Minded Driver dilemma, an absent-minded driver will come across two identical-looking intersections on their journey. The utility of exiting at the first section is $0, the utility of exiting at the second intersection is $4, and the utility of continuing past both intersections is $1.

Let be the probability of continuing at each intersection, so is the probability of exiting given that you are at that intersection. The optimal maximizes the function so

I initially thought that CDT would yield a suboptimal answer of =4/9, obtained as follows:

Suppose I think is the policy I will decide on. Then my odds of being at the first vs. second intersection are

If I'm already at the second intersection, my reward for a probability of continuation is And if I'm at the first intersection, my reward for a policy is as before.

So my best policy is found by maximizing which will have its maximum at or If then = 4/9.

This was in fact the first analysis ever published on the problem by Piccione and Rubinstein. However, as Aumann et. al. swiftly pointed out, a true causal decision theorist ought to believe that if it chooses when at the first intersection, this has no effect on its probability of continuing at the second intersection! (Aka: If we're going to ignore what LDT considers to be logical correlations, we'd better ignore all of them equally.)

So assuming its own strategy is a CDT agent's expected payoff for a policy is at the first intersection and at the second intersection. Then

which has no dependence on . This makes a kind of sense, since for most settings of the correct decision under CDT is to exit or continue deterministically. However, at all will seem equally attractive, so is a stable point.

One might ask whether this answer seems a bit ad-hoc, given that with =2/3 we could theoretically output any whatsoever. But the general schema of finding a permissible stable to output, given your self-observation of has been proposed for a general rule of CDT; so it's not that ad-hoc.

Next, we consider the Death in Damascus problem. In this dilemma you have the option of staying in Damascus or fleeing to Aleppo, and Death has told you that whatever you end up deciding will in fact be the wrong option that gets you killed.

The version of CDT that looks for a stable point yields the mixed strategy of staying in Damascus or riding to Aleppo each with 50% probability. At this point Damascus and Aleppo seem equally attractive--each decision allegedly has a 50% probability of killing you, from the perspective of a CDT agent--and so the mixed strategy is stable. (At least until you notice your own actual non-random output a second later, and want to change your mind.)

However, in the case of Death and Damascus, we can extract value out of a CDT agent reasoning this way. Suppose we offer the CDT agent a ticket costing $1 which pays $10 if the agent survives. At the moment of decision, the agent must press one of four buttons DN, AN, DY, AY, indicating first whether the agent goes to Damascus (D) or Aleppo (A), and second whether the agent purchases the ticket (Y) or not (N).

From outside the problem, we know that the agent will die wherever it goes and that it should not purchase the ticket. But at the moment of making the decision, the CDT agent thinks it has free will that doesn't correlate with any background variables and hence a 50% probability of survival, so the CDT agent will buy the ticket for $1. And then afterwards we can buy back the knowably worthless ticket for $0.01, thereby having predictably pumped money out of the agent.

The question:

Can we hybridize the Absent-Minded Driver's CDT-stable strategy, with the way we pumped money out of the CDT-stable strategy in the Death in Damascus problem, to yield a Newcomblike problem which pumps money out of a CDT agent; assuming nothing more than anterograde amnesia or limited memory, without there being any Omega/Death predictor who can read your random numbers?

My instinct is no--I suspect that the money-pumping is coming from the part where Death knows your random numbers and the CDT agent thinks it has free will. But I haven't verbalized this instinct into a formal argument, and maybe the anthropic aspects of the Absent-Minded Driver would let us arrange the problem structure somehow so that the agent is making a bad prediction at the equilibrium point where it thinks all policies have equal value, and then we could sell it a worthless ticket.

More generally, if we can find an analogue of the Absent-Minded Driver that assumes nothing more than anterograde amnesia or limited disk space and makes CDT unambiguously fail, this would be a very valuable Newcomblike problem to have on hand.

2

Ω 2