Counterfactual Mugging Alternative

by wafflepudding1 min read6th Jun 201623 comments


Personal Blog

Edit as of June 13th, 2016: I no longer believe this to be easier to understand than traditional CM, but stand by the rest of it. Minor aesthetic edits made.

First post on the LW discussion board. Not sure if something like this has already been written, need your feedback to let me know if I’m doing something wrong or breaking useful conventions.

An alternative to the counterfactual mugging, since people often require it explained a few times before they understand it -- this one I think will be faster for most to comprehend because it arose organically, not seeming specifically contrived to create a dilemma between decision theories:

Pretend you live in a world where time travel exists and Time can create realities with acausal loops, or of ordinary linear chronology, or another structure, so long as there is no paradox -- only self-consistent timelines can be generated. 

In your timeline, there are prophets. A prophet (known to you to be honest and truly prophetic) tells you that you will commit an act which seems horrendously imprudent or problematic. It is an act whose effect will be on the scale of losing $10,000; an act you never would have taken ordinarily. But fight the prophecy all you want, it is self-fulfilling and you definitely live in a timeline where the act gets committed. However, if it weren’t for the prophecy being immutably correct, you could have spent $100 and, even having heard the prophecy (even having believed it would be immutable) the probability of you taking that action would be reduced by, say, 50%. So fighting the prophecy by spending $100 would mean that there were 50% fewer self-consistent (possible) worlds where you lost the $10,000, because its just much less likely for you to end up taking that action if you fight it rather than succumbing to it.

You may feel that there would be no reason to spend $100 averting a decision that you know you’re going to make, and see no reason to care about counterfactual worlds  where you don’t lose the $10,000. But the fact of the matter is that if you could have precommitted to fight the choice you would have, because in the worlds where that prophecy could have been presented to you, you’d be decreasing the average disutility by (($10,000)(.5 probability) - ($100) = $4,900). Not following a precommitment that you would have made to prevent the exact situation which you’re now in because you wouldn’t have followed the precommitment seems an obvious failure mode, but UDT successfully does the same calculation shown above and tells you to fight the prophecy. The simple fact that should tell causal decision theorists that converting to UDT is the causally optimal decision is that Updateless Decision Theorists actually do better on average than CDT proponents.


(You may assume also that your timeline is the only timeline that exists, so as not to further complicate the problem by your degree of empathy with your selves from other existing timelines.)


23 comments, sorted by Highlighting new comments since Today at 12:42 AM
New Comment

For what it's worth, this formulation appears to me substantially more confusing than the ordinary Counterfactual Mugging. It requires a hypothetical world with multiple confusing features (time travel! prophets! prophecies that are absolutely inevitable .. except, wait, no they aren't! or maybe they are but just might not have been!) And for extra confusion, you introduce the idea that I might believe the prophecy immutable when in fact it isn't, while (if I'm understanding right) asking me just to take on trust that in the real world (er, the real world of this hypothetical situation) it really truly definitely is immutable.

The ordinary Counterfactual Mugging is hard to think about, but (at least for me) it's reasonably clear what situation it's describing, whereas here I had to read your description several times before I was confident I'd correctly understood the problem statement (and I'm still not quite certain I have).

I'm also not sure this is equivalent to ordinary CM (is it meant to be?). Ordinary CM says there was a 50% chance of Omega's coin flip coming up either way, but here nothing seems quite to correspond to that. In particular, the 50% reduction in Pr(I perform the unwise action) in your scenario doesn't seem like it plays quite the same role. But maybe I'm misunderstanding something?

Well, your confusion means my original goal has failed, and I suppose that's that. I am pretty sure this is equivalent to CM in the sense that only UDT wins -- I'd be happy to explain further if you'd like, but otherwise, thanks for your help!

I suppose that's that

Not necessarily. It might just be that for some idiosyncratic reason I find standard CM particularly easy to make sense of or your version particularly hard. You should see what other people say.

I'm with gjm. I appreciate the attempt, but neither added weirdness (time travel nor prophetic causality) would meet your criterion of not contrived and easy to understand.

Actually, I still stand by the "not contrived" part. (I think that's what drove me to believe it would be easy to understand.) The idea arose organically when I was thinking about what I would do if presented a prophecy like this, and whether it would be worth expending effort to fight it. On the other hand, there's no reason for Omega to play his game with you other than specifically to illustrate the point of CM.

You should see what other people say.

I am a two-boxing black sheep here, but it doesn't seem to me that introducing time travel (but only self-consistent!) and perfectly accurate prophets (except when they aren't!) helps with the clarity.

This conveys what I wanted to say but failed to say earlier.

I have to remind myself that it is okay to criticize the form of an argument, as long as you don't treat arguments against a form as arguments against the content (e.g. you made a typo thus what you said is lies). ... though saying "this is complicated and confusing" on a site where most topics discussed are complicated and confusing fits in a certain category bordering to "not constructive criticism", which might have helped in blinding me.

Maybe I just don't get it, but offering me the option AFTER you've told me that it makes no difference makes it a pointless option. I get the feeling there's a single step missing from your explanation.

From what I'm reading, there's 3 things that can happen...

You spend $100. A prophet comes to you and tells you that you will lose $10,000 in the future, and then...
1) looks at you more closely, and coughs "wait, you're not the person I was looking for."
2) tells you something that sounds plausibly true, but turns out to be false, costing you $10,000 by overpaying for your next house.
1 & 2 happen with 50% chance, if you have spent $100.
If you don't spend $100, then
3) A prophet comes to you and tells you that you will lose $10,000 in the future, and then afterwards as you sputter "why", he tells you this plausibly true thing that turns out to be false, costing you $10,000 by overpaying for your next house.

(As for what the thing is, it's either something that makes you spend $10,000 in carefully rationalizing your decision to buy a house, or $10,000 costs in overbidding)

But... you've just told me that a prophet came to me and told me I will lose $10,000 in the future. I am already on path 3. There is no going back. Time CAN create time loops but there is no cause for it to do so in your explanation. You yourself walled it off by stating the prophecy was self-fulfilling and that you could spend $100 "if the prophecy weren't immutably correct" (this in a manner implying that it is immutably correct).

You have given me a button, but the button is disabled. I can't take any actions.

Also, there's something lurking in your description which might (I really am unsure) imply that if I spend $100, the world may become inconsistent and therefore stop disappearing. Basically replace path 1 with "universe ends." Which would make spending $100 really bad, since, losing $10,000 is preferable to destroying your own universe?

You are on path 3, but the button is not disabled. The purpose of spending the $100 is to decrease the number of possible worlds where the prophet would come up and talk to you in the first place. You wouldn't end up destroying your timeline by making it inconsistent; ideally, this timeline was just never created because if it had been you would've spent the $100.

Out of curiosity, would you pay Omega in the counterfactual mugging? If you'd pay in CF but not here, that makes me worry that this formulation isn't similar.

I definitely would pay Omega. Here, I'm just going "that's one shady prophet".

And if the prophet is "honest and truly prophetic"?

He makes a self-contradictory statement and loses credibility points. Like, a lot of them. Maybe not in general, but a lot of them for this specific topic.

Hmm, I didn't intend for the prophet to contradict himself. (Based on your comments and others, I seem to have tripped and fallen hard into the illusion of transparency.) Would you mind elaborating on the contradictory statement he makes? And, had he not said anything apparently contradictory, then would you have paid $100?

We've talked about this before, right? He claims there is a something that you cannot prevent in your future, like ever ever. Like "even if we chain you to the wall in a dungeon all locked up with locks"-ever. I don't know what happens if you suicide in the example; I guess the point is moot in that case.

But! He has THE CURE! It only has a 50% chance of working, and if you pay $100, he will give you the cure.

I detect a slight personal bias here; I treat Omega as "an entity" and this prophet as "a person, who may or may not be out to scam me"... but whatever, supposed to override that.

We'll assume THE CURE is informational in nature, because if it was physical, there exist futures in which I slam the prophet to the ground and just take THE CURE without paying $100. Or I convince the prophet that, look, I don't have $100 on me right now, how about I give you just $50 for it? To which the prophet responds "it's okay, you can pay in installments". (This is why "Omega" solves a lot of problems, when you agree to pay $100, he will wire it for you. This prophet can't plausibly have hacked the worlds banking systems, Omega can. The prophet cannot withhold your income, Omega can.)

At that point... well, yes, I'd pay by axiom. The prophet is trustworthy. Like, 100% trustworthy. That's the whole premise. Omega isn't, but if you want proof, he'll generate it for you.

I could be wrong about this (one part of me says "Yes, you're wrong! Wrong wrong wrong wrong!" and the other says "Nah, this is fine"), but I think you'd be more likely to run into Omega-like entities than truly honest prophets, because we've had SO MANY people already claiming to be able to see the future (and failing; how many times has the world been supposed to end already?) that when a man comes up to me and says such a thing, the first I think is "scammer".

More points (not all of them fair and most of them arguing against the example which is normally not the goal); Omega can generate proof so fast that if I wanted to know if it was likely that he did what he did, I could learn such proof in a reasonable amount of time. Maybe he's been playing this game with other people. Maybe he made a pledge somewhere public on a blockchain somewhere. He could help me out; I'd be willing to engage him in detail if he could arrange for a faster transport home than the bus. The prophet is someone I'd encounter somewhere on the streets. When I'm outside, I am going somewhere. Rarely am I waiting. When I'm outside, I don't have $100 on me (nor the equivalent in euros). So when a man walks up to me and starts a story like this my first reaction is "I don't have time for this". "But wait! There is a great doom lurking in your future!" Yeah, right... - I'm supposed to trust this person by axiom, but this is so opposite of my regular reaction that I'm bound by the scenario anyway. "If I wasn't me, I wouldn't act like me." Right. I knew that already.

Well, there's a whole lot of words; basically what you're encountering here is a mix of "all powerful AI can fix problems if they occur, thus there are no problems if you could think of a solution yourself" and "my prior for people walking up to me telling me there is some reason that I should give them money that doesn't involve unpaid bills is to ignore them".

We can reframe the situation, though; Maybe I just missed my bus and have to wait 15 minutes before the next one shows up, and then the prophet comes up to me. Now I have time. Maybe the prophet is my friend, who I've always known to be pretty rational. Maybe the prophet is my dad or my mom.

Here's a version you might enjoy more:

You take your car to the mechanics for the yearly checkup. He calls you up and says "mate, there's a slight problem with your car. There's some corrosion on the fuel valve according to this sensor - its not a big problem, and your car passes the yearly inspection, so it's cleared for the road, but I worry about this - if that corrosion continues then bits of the valve could end up in the fuel mixture, throughly wrecking your engine whilst you're driving, and you'd have an accident for sure. We can fix it, no problem, but we'd have to add $100 to the bill for parts and hours worked."

Would you pay?

Maybe you don't have a car. Maybe you think selling the car off and buying a new one is better. Maybe you'd get a second opinion first. But there's plenty people I can think of that would say "sure, do it" - I mean, if the dealership you've been at for a few years tells you something like this, then, well, I'd feel unsafe driving it on the highway.

Other versions of this same problem: A minor crack in your wall. Mom tells you "you should really get that looked at, you know. You know that old couple two streets over? They have a minor crack too, and later during the summer it had torn the entire wall in two due the ground drying out" (or something). Fees for a building inspector are $100... do you ignore or pay?

These scenarios differ in that you can gather intel about the likelihood of the future bad event in greater detail (albeit you might have to spend something for that as well - letting your uncle who works in construction come by for coffee and a short look at that crack costs you at least a coffee and some time, and googling for "corroded fuel valve scam" also takes you a bunch of time).

And yet I'd totally do that. I'd call up my uncle and have him take a look. I'd let them fix my car. But that prophet of yours is not giving me any details. He's engaging in fear mongering. There's a lot of fear mongering in the world already and not all of it is true. So my prior for paying people based on fear mongering is lower than my prior for paying people who tell me my car might break down...

There's a problem with these scenarios, though; if you take a careful look and play with the numbers, you'll see that the chance of having to pay $10000 is not 100% certain - maybe the car will be fine, maybe the wall will be fine. And if you pay, it's either fixed or you'll know if it is a problem.

So enter the prophet.

You're outside of a restaurant, busy with dessert, when you get the call from the mechanic. He explains about the possible corroded fuel valve. You tell him you want to enjoy your dessert first; you'll call him up in an hour or so with your answer.

You've finished your meal - when the prophet walks up to you and says you'll DEFINITELY crash if you don't get your car fixed, but if you fix your car then there'll only be a 50% chance.


Yeah, sorry, but this case is scary for me too. Say what, prophet? Thanks for telling me I'm doomed to crash if I don't get it fixed, that's valuable information. But what do you mean with "50% chance"? Is there something ELSE wrong with my car? And the prophet loses credibility again. I wish I could get some answers out of this prophet so that I could trust him some more. (Bias here; I'm allowing Omega to answer questions and I'm not giving the prophet the same opportunity. This is of course a major difference, but it stems from my personal feelings where crystal ball prophecies tend to be "I've said there is such a chance so there is, no further questions allowed" and Omega to be answering things like "over how many years is that crash chance calculated?" - to which the answer would be very interesting to hear.)

(Too many words)

wafflepudding is saying something similar to this:

You can suffer the $10,000 damage in two ways, Path A and Path B. Normally these two things happen equally often. If you pay the $100, you can prevent Path A from happening, with a 100% chance. That means if you pay, Path B will definitely happen. But it also means that since you're the sort of person who would pay in this situation, you will receive that prophecy only 50% as often, in general, as a person who would not pay; this happens because you only get the prophecy when path B is going to happen, instead of either Path A or path B.

I am not the sort of person who would pay in that situation, and I do not want to be. But I am the sort of person who might very well pay the $100 before hearing any prophecy, and therefore I will get the prophecy 50% as often anyway.

I am extremely satisfied with this description; I hadn't personally thought of it in such specific terms, and this would be a perfect way to say it. I'll admit I'm a bit confused why you would pay before but not after, considering that either one is done by a person to whom the prophecy is given 50% less often.

I would pay Omega in the counterfactual mugging, but I would not pay here.

The reason is that in the counterfactual mugging case, I would want to be the sort of person who pays when they get offered a deal like that.

Here, I would not want to be the sort of person who pays to fight an infallible prophecy.

However, I would want to be the sort of person who pays to fight a non-infallible prophecy, so I would be happy to precommit to pay in non-infallible prophecy situations.

The kind of person who pays to fight an infallible prophecy is the same kind of person to whom infallible prophecies are given 50% less often. In this case.

Read about Newcomb's problem with both transparent boxes (you still only take one).

I understand the normal version of newcombs perfectly fine, I understand the normal version of counterfactual mugging (or at least the wiki version of it) perfectly fine, I get that the transparent boxes are mostly the same if you follow the logic , but in this case, the choice is presented AFTER you've picked boxes. "Here are two boxes. Would you like one or both. 'Both please.' Okay, also I'd like to inform you that if you pick both, you don't get the million. No backsies."

Saying that this is predicted in advance is weird, because there is no possibility of a meaningful loop: the moment of timeline separation is AFTER the choice has been made. The choice is set in stone. There is no possible change. You can pay, but it won't change a thing. Unless you were somehow determined to pay people in scenarios like this - which requires knowledge of scenarios like this.

In the original version, something happens, and then the losing you is contacted and asked whether you'd want to pay. And you'd be able to choose at that point, and even think about it. And then it turns out this was all a simulation and because you paid, the winning real you gets paid.

In this version, we could make it work by taking the result of the previous simulation (I flipped a coin you lost, pay me $100 or I won't pay $10000 if you had won), and then going through the prophet who either says you're fine if losing you paid, or that you're not fine if losing you didn't pay.

But what we cannot do is simulate this and loop it on itself. You are doomed in the future. You are always doomed in the future. There is no possibility of you being not doomed in the future. But, if you pay, then there is a possibility that you are not doomed in the future. That's a contradiction right there. If I accept that the statement about my unchanging future is false, then I'll pay because then I can go from 100% doomed to 50% doomed. If I accept that the statement about changing my future is false, then I won't pay, because you're a snake oil salesman, your cure will do me no good.

To fix this, the wording needs to be altered so that there is no contradiction and that there is a clear result of paying the money that will reduce the chance.

In short, I think this problem relies too much on UDT's ability to magically teleport between possible situations and failed to left a path for Time to take.

I think even with the ordinary phrasing the omegas prediction can be thought to sit in a sealed envelope as the real you picks. You don't think about the problem until then with your own brain. But in a way the contents of the envelope can be deduced from the transparent boxes.

I think it exhibits the same kind of wierdness. It doesn't really make sense to ever have the player choose only the empty transparent box, because the box will only be left empty if the player is predicted to pick both. So committing to not take both boxes means the boxes will be full. It doesn't really mean that the impossible "presented with two empty boxes" scenario is destroyed.

I don't really have a reply to this, but pet peeve of mine: "wierdness" when you mean "weirdness"

I have found what I think is equivalent to the counter-factual mugging.

You come across a thread on 4chan with text like "You have been visited by the murderous dog of murder. Your mother will die in their sleep tonight unless you please the doggie with 'good doggie'" Do you reply or do you have reason to reply? On one hand you don't have any evidence on who is posting the post and whether they are actually being able to pull of the murder. On the other hand the message is highly unplausible. (One could argue that seeing several "good doggie" replies is social proof that somebody is taking it seriously enough (but the explanation that it is more lolzy to play along is more plausible)).

The logic of someone who replies (outside of the lol factor) is "meh, its quickly typed and it would suck to find my mother dead the next morning". Yet it means that anyone that makes such a "visited by" thread is guaranteed many replies. Someone that recognises this is doesn't think that much about how credible th threat is. She just recognises the mugging and ignores the thread.