5 UDT might not pay a Counterfactual Mugger

21st Nov 2020

3 min read

5

The Counterfactual Mugging is my favorite decision theory problem, and it's the problem that got me started reading LessWrong in the first place. In short,

Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, Omega tells you that if the coin came up heads instead of tails, it'd give you $10000, but only if you'd agree to give it $100 if the coin came up tails.

Since hearing the problem for the first time, I have flip-flopped on what should be done many times. I won't go into the details, but the general consensus on this forum (as far as I can tell) is that you should pay the $100 and that UDT tells you to pay the $100.

While I admit I found some of these arguments (especially MBlume's) quite persuasive, and my position for a good while was that one should pay, I still had this intuition in the back of my mind telling me that rational agents should win. Giving Omega $100 for no real gain sure doesn't sound like winning to me, and if UDT says to pay the $100, that means that UDT is wrong, not that we should change our preferences to including paying the $100 in this scenario.

But there is a third option, one that allows you to save your $100 while still following UDT: show that UDT doesn't tell you to pay.

When the Counterfactual Mugging is usually presented, it would appear that there are two possible scenarios, each with probability 0.5: Omega exists and the coin landed heads, and Omega exists and the coin landed tails. Thus, by UDT, we would want to precommit to paying should the coin land tails, and so when the coin lands tails, we pay.

However, those are not the only two scenarios. Before we learn about the counterfactual mugging, there is a third option: Nomega exists, a being who will pay $10000 to anyone who doesn't pay Omega when counterfactually mugged, and gives no money to someone who does. Our new view of the world:

Scenario	Probability	U(Precommit)	U(Don't Preccomit)
Omega, Heads	0.25	10000	0
Omega, Tails	0.25	-100	0
Nomega	0.5	0	10000
Expected Value		2475	5000

Thus a rational agent running UDT should NOT precommit to paying a counterfactual mugger. Once we learn that we live in a universe where Omega, rather than Nomega, exists, it may look tempting to pay. But at that point, we have also leaned that we live a universe in which the coin is a tail, rather than a head, and so we still should not pay.

Some caveats:

Firstly, as always, I may be completely overlooking something, in which case this entire arguments is flawed.
Secondly, there is some intuition that Omega seems more real / more likely to exists then Nomega does, which may through the calculus off. Considering Omega and Nomega as equally likely options seems to open you up to getting Pascal Mugged and Wagered all over the place. However, I have no real way to formalizing why Omega might be more real then Nomega. (And, in fact, Pascals Wager is about not making decisions, precommitments, etc., on the basis that some God may exist, because its equally likely that some God with opposite incentives exists. Actually, the counterfactual mugging is starting to smell a lot like Pascal's Wager.)
Irrespective of point 2, even if we decide there is a reason to believe in Omega more than Nomega, I still feel like this idea makes the case for UDT telling us to pay a lot more shaky, and relies on multiplying lots of numbers, which makes me nervous.
Finally, this same argument might be able to be applied to the Counterfactual Prisoner's Dilemma to tell you not to pay, even though I relatively certain that one should pay in that scenario.

Counterfactual MuggingCounterfactualsDecision theoryAI

Frontpage

5

New Comment

18 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:09 AM

[-]abramdemski5y120

I think this is essentially right (UDT might behave in just about any way, in a given situation, depending on its prior), and highlights two facts:

UDT is hugely sensitive to the prior, so the prior matters a whole lot. (It matters much more than the prior matters for something like CDT or EDT. And learning-theoretic properties such as those for the Solomonoff prior become much less helpful.)
The question "what does UDT do on a particular decision problem" is therefore much different than it is for other decision theories.

But, I think the standard practice of assuming that UDT faces a specific decision problem with no contrary effects from other stuff in its prior is basically the most useful way to ask "what does UDT do", so long as one is aware that really, there may be other influences.

One important point which you might be pointing at here is that a contrary effect from elsewhere in the prior becomes more probable the less probable the decision problem is in the first place.

[-]Vladimir_Nesov5y*60

CDT and EDT are also sensitive to their prior. The difference is that it's a more familiar routine to define their prior by idealization of the situation being considered without getting out of scope, thus ensuring that we remain close to informal expectations. When building a tractable model for UDT, it similarly makes sense to specify its prior without allowing retraction of knowledge of the situation and escaping to consideration of all possible situations (turning the prior into a model of all possible situations rather than just of this one situation).

In the case of CDT and EDT, escaping the bounds of the situation looks like a refrigerator falling from the sky on the experimental apparatus. In the case of UDT, it looks like a funding agency refusing to fund the experiment because its results wouldn't be politically acceptable, unless it's massaged to look right, and the agents within the experiment understand that (and have no scientific integrity). I think it's similarly unreasonable for both kinds of details to be included in models, and it's similarly possible for them to occur in reality.

[-]Vladimir_Nesov5y50

This proves too much. For any thought experiment, if you are allowed to introduce a generalized Nomega, you can force any conclusion for an agent that cares about counterfactuals or has a chance to make a global precommitment. What if Nomega pays $50000 when you do precommit to pay $100 to Omega? Or when you Defect in Prisoner's Dilemma? Or when you Cooperate in Prisoner's Dilemma? This shows that if Nomega is allowed to be introduced, none of the usual decision theory thought experiments can be usefully considered (with a theory like UDT). Thus, it's a reasonable assumption that Nomega shouldn't be allowed to be introduced.

(Btw, the expected values you list in the table are off. Don't know what's up with that.)

[-]winwonce5y10

Hi Vladimir, thanks for your response.

Upon further reflection, I think the crux of my argument is that by precommiting you are essentially pascals wagering yourself -- you are making a decision looking to maximize yoir reward should a certain type of God (Omega) exist. Unless (before you get mugged) you have some reason to believe that this type of God is more likley to exist then the opposite type (Nomega), then precommiting is getting wagered (as far as I can tell). You cant wait until you find out that Omega exists to preccomit because by then you have aready learned that the coin is tails -- you have to do so blind.

I dont think this proves to much becuase in other problems (Prisoners Dilema, Newcombs Paradox, etc.) considering what to do if a random God shows is wagering, so you just ignore it. Here, though, precommiting is wagering, so (it seems to me) that youshould just ignore it as well and not precommit.

Good point on EV numbers -- they are now updated although the actual numbers are not super important to the crux of the argument.

[-]Vladimir_Nesov5y30

I think I see what you mean. The situation where you'd make a precommitment, which is described by the same state of knowledge that UDT makes its decision under, occurs before the setting of the thought experiment is made clear. Thus it's not yet clear what kinds of Nomegas can show up with their particular incentives, and the precommitment can't rely on their absense. With some sort of risk-averse status-quo-anchored attitude it seems like "not precomitting" is therefore generally preferable.

But optimization of expected utility doesn't work like that. You have the estimates for possible decisions, and pick the option that's estimated to be the best available. Whether it's the status quo ("not precommitting") or not has no bearing on the decision unless it's expressed as a change in the esimate of expected utility that makes it lower or greater than expected utility of the alternative decisions. Thus when a thought experiment talks about precommitments or any UDT decisions, bringing in arbitrary Nomegas is a problem because it makes the expected utility of precommitments similarly arbitrary, and it's these expected utilities that determine decisions. (Whether to make some precommitment or not is itself a decision.) The obvious way of making it possible to perform the calculation of expected utilities of precommitments is to make the assumption of absense of Nomegas, or more generally to construct the settings of precommitments based only on what's already in the thought experiment.

(Mistakes in expected values in the post are a tiny bit relevant (one of the values is still wrong after the correction), as they vaguely signal lack of reliable knowledge of what expected utility is, although the issue here is mostly informal and correct calculation won't by itself make things clear. General experience with mathematical proofs might be closer to being helpful, as the issue is that the actual algorithms being discussed screen off a lot of informal considerations such as whether not making precommitments is the status quo.)

[-]winwonce5y10

Whoops -- EV re-updated.

Perhaps I am misunderstanding the setup of the counterfactual mugging -- do we live in a world in which Omega is a known being (and just hasn't yet interacted with us), or do we live in a world in which we have roughly equal credence of the existence of Omega vs Nomega (vs any other arbitrary God-like figure). If it's the former, then sure UDT says precommit and pay.

But if its the latter, I still don't see why UDT tells us to pay -- not because not precommitting is some sort of default (which is I agree UDT says isn't relevant) but because when making decisions based on the possible existence of some sort of God while ignoring the possible existence of other God's isn't fair or an effective way to maximize you expected utility. Perhaps some sort of Occam's Razor / Solomon Induction argument could be made that Omega is simpler and thus more likely to exist, but this seems fairly difficult to rigorize.

[-]Vladimir_Nesov5y*30

In the first approximation, the point is not that counterfactual mugging (or any other thought experiment) is actually defined in a certain way, but how it should be redefined in order to make it possible to navigate the issue. Unless Nomegas are outlawed, it's not possible to do any calculations, therefore they are outlawed. Not because they were already explicitly outlawed or were colloquially understood to be outlawed.

But when we look at this more carefully, the assumption is not actually needed. If nonspecified Nomegas are allowed, the distribution of their possible incentives is all over the place, so they almost certainly cancel out in the expected utility of alternative precommitments. The real problem is not with introduction of Nomegas, but with managing to include the possibilities involving Omega in the calculations (as opposed to discarding them as particular Nomegas), taking into account the setting that's not yet described at the point where precommitment should be made.

In counterfactual mugging, there is no physical time when the agent is in the state of knowledge where the relevant precommitment can be made (that's the whole point). Instead, we can construct a hypothetical state of knowledge that has updated on the description of the thought experiment, but hasn't updated on the fact of how the coin toss turned out. The agent never holds this state of knowledge as a description of all that's actually known. Why retract knowledge of the coin toss, instead of retracting knowledge of the thought experiment? No reason, UDT strives to retract all knowledge and make a completely general precommitment to all eventualities. But in this setting, retracting knowledge of the coin toss while retaining knowledge of Omega creates a tractable decision problem, thus UDT that notices the possibility will make a precommitment. Similarly, it should precommit to not paying Omega in a situation where a Nomega punishing for paying up $100 to Omega (as described in this post) is known to operate. But only when it's known to be there, not when it's not known to be there.

[-]winwonce5y10

Hmm perhaps I am still a little confused as to how UDT works. My understanding is that you don't make your decisions based on the information you have observed, but instead, when you "boot up" your UDT, you consider all of the possible world states you may find yourself in and their various mesures, and then for each decision, "precommit" to making the one that maximizes your expected utility across all of the possible world states that this decision affects.

If this understanding is correct, then unless we have some sort of prior telling us, when we "boot up" UDT and thus before we interact with Omega, that Omega is more likley to exist than Nomega, then I don't see how UDT could tell us to pay up.

I think it is somewhat likley that I am missing something here but I dont know what.

[-]adamShimi5y10

This really looks like you're adding stuff to the problem to make it go your way. Maybe I'm just misunderstanding, but at my current level of understanding, this doesn't convince me of anything, because you can probably prove any conclusion by such a trick.

[-]interstice5y10

I think there's a big asymmetry between Omega and Nomega here, namely that Omega actually appears before you, while Nomega does not. This means there's much better reason to think that Omega will actually reward you in an alternate universe than Nomega.

Put another way, the thing you could pre-commit to could be a broad policy of acausally cooperating with beings you have good reason to think exist, in your universe or a closely adjacent one(adjacent in the sense that your actions here actually have a chance of effecting things there) Once you learn that a being such as Omega exists, then you should act as though you had pre-committed to cooperating with them all along.

[-]Dagon5y20

This means there's much better reason to think that Omega will actually reward you in an alternate universe than Nomega.

That's exactly what others are saying about priors. But really, it's about your probabilities (including posteriors once someone appears). The "simple hack decision theory" works for all of these cases - multiply the conditional probability by the value of each possible outcome, and pick the condition that's gives the largest utility-contribution.

If you assign a much lower probability to nomega than to omega, and assign a high probability of honesty to the setup, you want to pay. With other beliefs, you might not.

[-]interstice5y10

That’s exactly what others are saying about priors

It's not the same thing. Other people are correctly pointing out that UDT's behavior here depends on the prior. I'm arguing that a prior similar to the one we use in our day-to-day lives would assign greater probability to Omega than Nomega, given that one has seen Omega. The OP can be seen as implicitly about both issues.

[-]winwonce5y10

If the answer is that you have a higher prior towards Omega before the mugging, then fine that solves the problem. But if you think Omega is more likley to exist only because you see Omega in front of you, then doesnt that violate UDTs principle of never updating?

[-]interstice5y*10

Although UDT is formally updateless, the 'mathematical intuition module' which it uses to determine the effects of its actions can make it effectively act as though it's updating.

Here's a simple example. Say UDT's prior over worlds is the following:

75% chance: you will see a green and red button, and a sign saying "press the red button for $5"
25% chance: same buttons, but the sign says "press the green button for $5"

Now, imagine the next thing UDT sees is the sign saying that it should press the green button. Of course, what it should do is press the green button(assuming the signs are truthful), even though in expectation the best thing to do would be pressing the red button. So why does it do this? UDT doesn't update -- it still considers the worlds where it sees the red button to be 3X more important -- however, what does change is that, once it sees the green button sign, it no longer has any influence over the worlds where it sees the red button sign. Thus it acts as though it's effectively updated on seeing the green button sign, even though its distribution over worlds remains unchanged.

By analogy, in your scenario, even though Omega and Nomega might be equally likely a priori, UDT's influence over Omega's actions is far greater given that it has actually seen Omega. Or to be more precise -- in the situation where UDT has both seen Omega and the coin comes up heads, it has a lot of predictable influence over Omega's behavior in a(equally valuable by its prior) world where Omega is real and the coin comes up tails. It has no such predictable influence over worlds where Nomega exists.

[-]winwonce5y*10

But UDT's decision on how to interact woth Omega does direct affect worlds in which Nomega exists instead of Omega.

Again overly simplistic prior:

50% chance: Omega exists, and we get counterfactually mugged, half of the times heads and half of the times tails.

50% chance: Nomega exists, guesses what we would do if Omega existed and the coin came up tails, and pays out accordingly.

There is only one decision -- do you pay if Omega exists and the coin comes up tails, and that decision affects both (or all three) possible worlds.

Even once you see that Omega exists, UDT already recognized that in order to maximize utility it should precommit (or just decide or whatever) to not pay.

[-]interstice5y*10

UDT's behavior here is totally determined by its prior. The question is which prior is more reasonable. 'Closeness to Solomonoff induction' is a good proxy for reasonableness here.

I think a prior putting greater weight on Omega, given that one has seen Omega, is much more reasonable. Here's the reasoning. Let's say that the description complexity of both Omega and Nomega is 1000 bits. Before UDT has seen either of them, it assigns a likelihood of to worlds where either of them exist. So it might seem that it should weight them equally, even having seen Omega.

However, the question then becomes -- why is Nomega choosing to simulate the world containing Omega? Nomega could choose to simulate any world. In fact, a complete description of Nomega's behavior must include a specification of which world it is simulating. This means that, while it takes 1000 bits to specify Nomega, specifying that Nomega exists and is simulating the world containing Omega actually takes 2000 bits.^[1]

So UDT's full prior ends up looking like:

999/1000: Normal world
$2^{- 1000}$ : Omega exists
$2^{- 1000}$ : Nomega exists
$2^{- 2000}$ : Nomega exists and is simulating the world containing Omega

Thus, in a situation where UDT has seen Omega, it has influence over the Omega world and Nomega/Omega world, but no influence over the normal world and Nomega world. Since the Omega world has so much more weight than the Omega/Nomega world, UDT will effectively act as if it's in the Omega world.

You might object that Nomega is defined by its property of messing with Omega, so it will naturally simulate worlds with Omega. In that case, it's strictly more complex to specify than Omega, probably by several hundred bits due to the complexity of 'messing with' ↩︎

[-]winwonce5y20

I don't think Nomega has to simulate you interacting with Omega in order to know how to would react should you encounter it, in the same way that you can predict the output of many computer programs without simulating them.

By the time you get mugged, you could be 100% sure that you are in the Omega world, rather than the Nomega world, but the principle is that your decision in the Omega world affects the Nomega world, and so before knowing UDT commits to making the decision that maximizing EV across both worlds.

This logic operates in the same way for the coin coming up tails -- when you see the tails, you know your in the tails world, but your decision in the tails world affects the heads world, so you have to consider it. Likewise, your decision in the Omega world affects the Nomega world (independent of any sort of simulation argument).

Thus, in a situation where UDT has seen Omega, it has influence over the Omega world and Nomega/Omega world, but no influence over the normal world and Nomega world. Since the Omega world has so much more weight than the Omega/Nomega world, UDT will effectively act as if it's in the Omega world.

This argument would also suggest that by the time you see tails, you know you live in the tails world and thus should not pay up.

[-]interstice5y*20

I don’t think Nomega has to simulate you interacting with Omega in order to know how to would react should you encounter it

By 'simulating' I just mean that it's reasoning in some way about your behavior in another universe, it doesn't have to be a literal simulation. But the point remains -- of all the ways that Nomega could choose to act, for some reason it has chosen to simulate/reason about your behavior in a universe containing Omega, and then give away its resources depending on how it predicts you'll act.

What this means is that, from a Kolmogorov complexity perpective, Nomega is strictly more complex than Omega, since the definition of Nomega includes simulating/reasoning about Omega. Worlds containing Nomega will be discounted by a factor proportional to this additional complexity. Say it takes 100 extra bits to specify Nomega. Then worlds containing Nomega have less measure under the Solomonoff prior than worlds with Omega, meaning that UDT cares much less about them.

(My comment above was reasoning as if Nomega could choose to simulate/reason about many different possible universes, not just the ones with Omega. Then, perhaps, its baseline complexity might be comparable to Omega. Either way, the result is that the worlds where Nomega exists and you have influence don't have very high measure)

This argument would also suggest that by the time you see tails, you know you live in the tails world and thus should not pay up.

What I meant by "Nomega world" in that paragraph was a world where Nomega exists but does not simulate/reason about your behavior in the Omega world. The analogous situation to the tails/heads world here is the "Omega"/"Nomega simulating omega" world. I acknowledge that you would have counterfactual influence over this world. The difference is that the heads/tails worlds have equal measure, whereas the "Nomega simulates omega" world has much less measure than the Omega world(under a 'reasonable' measure such as Solomonoff)

[+][comment deleted]5y10

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

5

UDT might not pay a Counterfactual Mugger

5

5