What would you do in situation X?" and "What would you like to pre-commit to doing, should you ever encounter situation X?" should, to a rational agent, be one and the same question.
Applied to Vladimir Nesov's counterfactual mugging, the reasoning is then:
Precommitting to paying $100 to Omega has expected utility of $4950.p(Omega appears). Not precommitting has strictly less utility; therefore I should precommit to paying. Therefore I should, in fact, pay $100 in the event (Omega appears, coin is tails).
To combat the argument that it is more likely that one is insane than that Omega has appeared, Eliezer said:
So imagine yourself in the most inconvenient possible world where Omega is a known feature of the environment and has long been seen to follow through on promises of this type; it does not particularly occur to you or anyone that believing this fact makes you insane.
My first reaction was that it is simply not rational to give $100 away when nothing can possibly happen in consequence. I still believe that, with a small modification: I believe, with moderately high probability, that it will not be instrumentally rational for my future self to do so. Read on for the explanation.
Suppose we lived in Eliezer's most inconvenient possible world:
- Omega exists.
- Omega has never been found untrustworthy.
- Direct brain simulation has verified that Omega has a 100% success rate in predicting the response to its problem, thus far.
- Omega claims that no other Omega-like beings exist (so no perverse Omegas that cancel out Omega's actions!).
- Omega never speaks to anyone except if it is asking them for payment. It never meets anyone more than once
- Omega claims that actual decisions never have any consequences. It is only what you would have decided that can ever affect its actions.
Did you see a trap? Direct brain simulation instantiates precisely what Omega says does not exist, a "you" whose decision has consequences. So forget that. Suppose Omega privately performs some action for you (for instance, a hypercomputation) that is not simulable. Then direct brain simulation of this circumstance cannot occur. So just assume that you find Omega trustworthy in this world, and assume it does not itself simulate you to make its decisions. Other objections exist: numerous ones, actually. Forget them. If you find that a certain set of circumstances makes it easier for you to decide not to pay the $100, or to pay it, change the circumstances. For myself, I had to imagine knowing that the Tegmark ensemble didn't exist*. If, under the MWI of quantum mechanics, you find reasons (not) to pay, then assume MWI is disproven. If the converse, then assume MWI is true. If you find that both suppositions give you reasons (not) to pay, then assume some missing argument invalidates those reasons.
Under these circumstances, should everyone pay the $100?
No. Well, it depends what you mean by "should".
Suppose I live in the Omega world. Then prior to the coin flip, I assign equal value to my future self in the event that it is heads, and my future self in the event that it is tails. My utility function is, very roughly, the expected utility function of my future self, weighted by the probabilities I assign that I will actually become some given future self. Therefore if I can precommit to paying $100, my utility function will possess the term $4950.p(Omega appears), and if I can only partially precommit, in other words I can arrange that with probablity q I will pay $100, then my utility function will possess the term $4950.q.p(Omega appears). So the dominant strategy is to precommit with probability one. I can in fact do this if Omega guarantees to contact me via email, or a trusted intermediary, and to take instructions thereby received as "my response", but I may have a slight difficulty if Omega chooses to appear to me in bed late one night.
On the principle of the least convenient world, I'm going to suppose that is in fact how Omega chooses to appear to me. I'm also going to suppose that I have no tools available to me in Omega world that I do not in fact possess right now. Here comes Omega:
Hello Nathan. Tails, I'm afraid. Care to pay up?
"Before I make my decision: Tell me the shortest proof that P = NP, or the converse."
Omega obliges (it will not, of course, let me remember this proof - but I knew that when I asked).
"Do you have any way of proving that you can hypercompute to me?"
Yes. (Omega proves it.)
"So, you're really Omega. And my choice will have no other consequences?"
None. Had heads appeared, I would have predicted precisely this current sequence of events and used it to make a decision. But heads has not appeared. No consequences will ensue.
"So you would have simulated my brain performing these actions? No, you don't do that, do you? Can you prove that's possible?"
Yes. (Omega proves it.)
"Right. No, I don't want to give you $100."
What the hell just happened? Before Omega appeared, I wanted this sequence of events to play out quite differently. In fact this was my wish right up to the 't' of "tails". But now I've decided to keep the $100 after all!
The answer is that there is no equivalence between my utility function at time t, where t < timeOmega, and my utility function at time T, where timeOmega < T. Before timeOmega, my utility function contains terms from states of the world where Omega appears and the coin turns up heads; after, it doesn't. Add to that the fact that my utility function is increasing in money possessed, and my preferred action at time T changes (predictably so) at timeOmega. To formalise:
Suppose we index possible worlds with a time, t, and a state, S: a world state is then (S,t). Now let the utility function of 'myself' at time t and in world state S be denoted US,t:AS → R, where AS is my set of actions and R the real numbers. Then in the limit of a small time differential Δt, we can use the Bellman equation to pick an optimal policy π*:S → AS such that we maximise US,t as US,t(π*(S)).
Before Omega appears, I am in (S,t). Suppose that the action "paying $100 to Omega if tails appears" is denoted a100. Then, obviously, a100 is not in my action set AS. Let "not paying $100 to Omega if tails appears" be denoted a0. a0 isn't in AS either. If we suppose Omega is guaranteed to appear shortly before time T (not a particularly restricting assumption for our purposes), then precommitting to paying is represented in our formalism by taking an action ap at (S,t) such that either:
- The probability of being a state § in which tails has appeared and for which a0 ∈ A§ at time T is 0, or
- For all states § with tails having appeared, with a0 ∈ A§ and with non-zero probability at time T, U§,T(a0) < U§,T(a100) = π*(§). Note that a 'world state' S includes my brain.
Then if Omega uses a trusted intermediary, I can easily carry out an action ap = "give bank account access to intermediary and tell intermediary to pay $100 from my account to Omega under all circumstances". This counts as taking option 1 above. But suppose that option 1 is closed to us. Suppose we must take an action such that 2 is satisfied. What does such an action look like?
Firstly, brain hacks. If my utility function in state § at time T is increasing in money, then U§,T(a0) > U§,T(a100), contra the desired property of ap. Therefore I must arrange for my brain in world-state § to be such that my utility function is not so fashioned. But by supposition my utility function cannot "change"; it is simply a mapping from world-states X possible actions to real numbers. In fact the function itself is an abstraction describing the behaviour of a particular brain in a particular world state**. If, in addition, we desire that the Bellman equation actually holds, then we cannot simply abolish the process of determining an optimal policy at some arbitrary point in time T. I propose one more desired property: the general principle of more money being better than less should not cease to operate due to ap, as this is sure to decrease US,t(ap) below optimum (would we really lose less than $4950?). So the modification I make to my brain should be minimal in some sense. This is, after all, a highly exceptional circumstance. What one could do is arrange for my brain to experience strong reward for a short time period after taking action a100. The actual amount chosen should be such that that the reward outweighs the time-discounted future loss in utility from surrendering the $100 (it follows that the shorter the duration of reward, the stronger its magnitude must be). I must also guarantee that I am not simply attaching a label called "reward" to something that does not actually represent reward as defined in the Bellman equation. This would, I believe, require some pretty deep knowledge of the nature of my brain which I do not possess. Add to that the fact that I do not know how to hack my brain, and in a least convenient world, this option is closed to me also***.
It's looking pretty grim for my expected utility. But wait: we do not simply have to increase U§,T(a100). We can also decrease U§,T(a0). Now we could implement a brain hack for this also, but the same arguments against apply. A simple solution might be to use a trusted intermediary for another purpose: give him $1000, and tell him not to give it back unless I do a100. This would, in fact, motivate me, but it reintroduces the factor of how probable it is Omega will appear, which we were previously able to neglect, by altering the utility from time t to time timeOmega. Suppose we give the intermediary our account details instead. This solves the probability issue, but there is a potential for either myself to frustrate him, a solvable problem, or for Omega to frustrate him in order to satisfy the "no further consequences" requirement. And so on: the requirements of the problem are such that only our own utility function is sancrosact to Omega. It is through that mechanism only that we can win.
This is my real difficulty: that the problem appears to require cognitive understanding and technology that we do not possess. Eliezer may very well give $100 whenever he meets this problem; so may Cameron; but I wouldn't, probably not, anyway. It wouldn't be instrumentally rational for me, given my utility function under those circumstances, at least not unless something happens that can put the concepts they carry around with them into my head, and stop me - or rather, make it instrumentally irrational for me, in the sense of being part of a suboptimal policy - from removing those concepts after Omega appears.
However, on the off-chance that Omega~, a slightly less inconvenient version of Omega, appears before me: I hereby pledge one beer to every member of Less Wrong, if I fail to surrender my $100 when asked. Take that, obnoxious omniscient being!
*It's faintly amusing, though only faintly, that despite knowing full well that I was supposed to consider the least convenient possible world, I neglected to think of my least convenient possible world when I first tried to tackle the problem. Ask yourself the question.
**There are issues with identifying what it means for a brain/agent to persist from one world-state to another, but if such a persisting agent cannot be identified, then the whole problem is nonsense. It is more inconvenient for the problem to be coherent, as we must then answer it. I've also decided to use the Bellman equations with discrete time steps, rather than the time-continuous HJB equation, simply because I've never used the latter and don't trust myself to explain it correctly.
***There is the question: would one not simply dehack after Omega arrives announcing 'tails'? If that is of higher utility than other alternatives: but then we must have defined "reward" inappropriately while making the hack, as the reward for being in each state, together with the discounting factor, serves to fully determine the utility function in the Bellman equation.
(I've made a few small post-submission edits, the largest to clarify my conclusion)