LESSWRONG
LW

Wikitags

Absent-Minded Driver dilemma

Edited by Eliezer Yudkowsky last updated 2nd Aug 2016
Requires: , ,
Teaches:

A road contains two-identical looking intersections. An absent-minded driver wants to exit at the second intersection, but can't remember if they've passed the first intersection already.

The utility of exiting at the first intersection is $0, the utility of exiting at the second intersection is $4, and the utility of continuing straight past both intersections is $1. [1]

With what probability should the driver continue vs. exit at a generic-looking intersection, in order to maximize their expected utility?

Analyses

From the standpoint of , the Absent-Minded Driver is noteworthy because the logical correlation of the two decisions arises just from the agent's imperfect memory (anterograde amnesia or limited storage space). There is no outside making predictions about the agent; any problem that the agent encounters is strictly of its own making.

Intuitive/pretheoretic

The driver doesn't know each time whether they're at the first or second intersection, so will continue with the same probability p at each intersection. The expected payoff of adopting p as a policy is the sum of:

  • $0 times the probability 1−p of exiting at 1st;
  • $4 times a p probability of continuing past first multiplied by a 1−p probability of exiting at the second intersection;
  • $1 times a p2 probability of continuing past both intersections.

To find the maximum of the function 0(1−p)+4(1−p)p+1p2 we set the derivative 4−6p equal to 0 yielding p=23.

So the driver should continue with 2/3 probability and exit with 1/3 probability at each intersection, yielding an expected payoff of $0⋅13+$4⋅2313+$1⋅2323=$43≈$1.33.

Causal decision theory

The analysis of this problem under has traditionally been considered difficult; e.g., Volume 20 of the journal Games and Economic Behavior was devoted entirely to the Absent-Minded Driver game.

Suppose that before you set out on your journey, you intended to adopt a policy of continuing with probability 2/3. Then when you actually encounter an intersection, you believe you are at the second intersection with probability 3/5. (There is a 100% or 3/3 chance of encountering the first intersection, and a 2/3 chance of encountering the second intersection. So the are 3 : 2 for being in the first intersection versus the second intersection.)

Now since you are not a , you believe that if you happen to already be at the second intersection, you can change your policy p without retroactively affecting the probability that you're already at the second intersection - either you're already at the second intersection or not, after all!

The first analysis of this problem was given by Piccione and Rubinstein (1997):

Suppose we start out believing we are continuing with probability q. Then our odds of being at the first vs. second intersection would be 1:q, so the probability of being at each intersection would be 11+q and q1+q respectively.

If we're at the first intersection and we choose a policy p, we should expect a future payoff of 4p(1−p)+1p2. If we're already at the second intersection, we should expect a policy p's future payoff to be 4(1−p)+1p.

In total our expected payoff is then 11+q(4p(1−p)+p2)+q1+q(4(1−p)+p) whose derivative −6p−3q+4q+1 equals 0 at p=4−3q6.

Our decision at q will be stable only if the resulting maximum of p is equal to q, and this is true when p=q=49. The expected payoff from this policy is $4⋅4959+$1⋅4949≈$1.19.

However, the immediately following paper by Robert Aumann et. al. (1997) offered an alternative analysis in which, starting out believing our policy to be q, if we are at the first intersection, then our decision p also cannot affect our decision q that will be made at the second intersection. [2] So:

  • If we had in fact implemented the policy q, our for being at the first vs. second intersection would be 1:q≅11+q:q1+q respectively.
  • If we're at the first intersection, then the payoff of choosing a policy p, given that our future self will go on implementing q regardless, is 4p(1−q)+1pq.
  • If we're already at the second intersection, then the payoff of continuing with probability p is 4(1−p)+1p.

So if our policy is q, the expected payoff of the policy p under CDT is:

11+q(4p(1−q)+pq)+q1+q(4(1−p)+p)

Differentiating with respect to p yields 4−6q1+q which has no dependency on p. This makes a kind of sense, since if your decision now has no impact on your past or future decision at the other intersection, most settings of q will just yield an answer of "definitely turn right" or "definitely turn left". However, there is a setting of q which makes any policy p seem equally desirable, the point at which 4−6q=0⟹q=23. Aumann et al. take this to imply that a CDT agent should output a p of 2/3.

One might ask how this result of 2/3 is actually rendered into an output, since on the analysis of Aumann et. al., if your policy q in the past or future is to continue with 2/3 probability, then any policy p seems to have equal utility. However, outputting p=2/3 would also correspond to the general procedure proposed to resolve e.g. within . Allegedly, it is just a general rule of the that in this type of problem one should find a policy where, assuming one implements that policy, all policies look equally good, and then do that.

Further analyses have, e.g., remarked on the analogy to the Sleeping Beauty Problem and delved into anthropics; or considered the problem as a game between two different agents occupying each intersection, etcetera. It is considered nice to arrive at an answer of 2/3 at the end, but this is not mandatory.

Logical decision theory

using, e.g., the form of , will compute an answer of 2/3 using the same procedure and computation as in the intuitive/pretheoretic version. They will also remark that it is strange to imagine that the reasonable answer could be different from the optimal policy, or even that they should require a different reasoning path to compute; and will note that while simplicity is not the only virtue of a theory of instrumental rationality, it is a virtue.

  1. ^︎

    describe the relative desirability intervals between outcomes. So this payoff matrix says that the added inconvenience of "going past both intersections" compared to "turning right at 2nd" is 1/4 of the added inconvenience of "turning right at 1st" compared to "turning right at 2nd". Perhaps turning right at 1st involves a much longer detour by the time the driver realizes their mistake, or a traffic-jammed stoplight to get back on the road.

  2. ^︎

    From an perspective, at least the agent is being consistent about ignoring logical correlations!

Parents:
1
1
Newcomblike problems
Newcomblike decision problems
Ability to read algebra
Omega
logical decision theorist
Logical decision theorists
LDT
Ability to read calculus
Discussion0
Discussion0
odds
odds
timeless decision theory
updateless
Causal decision theories
Causal decision theories
causal decision theory
CDT
CDT
principle of rational choice
Utility functions
Death in Damascus