Learning as you play: anthropic shadow in deadly games

This doesn't make sense to me. Why am I not allowed to update on still being in the game?

I noticed that in your problem setup you deliberately removed n=6 from being in the prior distribution. That feels like cheating to me - it seems like a perfectly valid hypothesis.

After seeing the first chamber come up empty, that should definitively update me away from n=6. Why can't I update away from n=5 ?

[-]dr_s2y3-1

Yes, the case is special. I didn't mean to "cheat" but I simply excluded it because it's trivial. But past the certainty that the game isn't rigged that much, you can't gain anything else. If you didn't condition on the probability of observing the sequence, nothing would actually change anyway. Your probability distribution would be

P (n) \propto {(1 - \frac{n}{6})}^{N}

(properly normalized, of course). This skews the distribution ever further towards low values of $n$ , irrespective of any information about the actual gun. In other words, if you didn't quit at the beginning, this will never make you quit - you will think you're safer and safer by sheer virtue of playing longer, irrespective of whether you actually are. So, what use are you getting out of this information? None at all. If you are in a game that is worth playing, you gain zero; you would have played anyway. If you are not in a game that is worth playing, you lose in expectation the difference $V - W_{P L A Y}$ . So either way, this information is worthless. The only information that is useful is one that behaves differently (again, in expectation) between a world in which the optimal strategy is to play, and one in which the optimal strategy is to quit, and allows you to make better decisions. But there is no such useful information you can gain in this game upstream of your decision.

Also please notice that in the second game, the one with the blanks, my criterion allows you to define a distribution of belief that actually you can get some use out of. But if we consistently applied your suggested criterion, and did not normalize over observable paths, then the belief after $E$ empty chambers would just be

P (b; E) = (E + 1) b^{E}

which behaves exactly like the function above. It's not really affected by your actual trajectory, it will simply convince you that playing is safer every time an empty chamber comes up, and can't change your optimal strategy. Which means, again, you can't get any usefulness out of it. This for an example of a game in which instead using the other approach can yield some gains.

[-]Yair Halberstadt2y51

This post seems incorrect to me. Here's the crux:

And if at, say, the fifth turn, there is no alternative to seeing EEEEE, that means that the probability of observing it, conditioned on us still being in play, is 1, and entirely independent of n. No information can be derived to update our prior.

Well yes, conditioned on us being in play, we can't get any more information from the sequence of events we observed. But the fact that we are still in play itself tells us that the gun is mostly empty. Now as it happens this isn't very useful because we will never both update towards not playing, and be able to carry on playing, but I can still be practically certain after seeing 20 Es in a row that the gun is empty.

The way anthropics twists things is that if this were russian roulette I might not be able to update after 20 Es that the gun is empty, since in all the world's where I died there's noone to observe what happened, so of course I find myself in the one world where by pure chance I survived.

[-]dr_s2y4-4

But the fact that we are still in play itself tells us that the gun is mostly empty.

No, it tells us that either the gun is mostly empty, or we were very lucky; but since either way that cancels out exactly with our probability of being still in play in the first place, no additional information can be deduced with which to decide our strategy. The bit about this being conditioned on being still in play is key! If we consider the usefulness of each bit of information acquired, then obviously they can only be useful if you're still playing.

[-]Yair Halberstadt2y50

Why are we conditioning on still being in play though? Without the anthropic shadow there's no reason to do so. There are plenty of world's where I observe myself out of play, but with different information, why is the fact I'm not in one of them not telling me anything?

[-]Yair Halberstadt2y30

To give a concrete counterexample.

Let's say each round there's a 50 percent probability of adding an extra bullet to the gun.

If I didn't update based on the fact I'm still playing then I would quickly stop after a few rounds, since the probability I would see a bullet constantly increases.

But if I do update, then it's worth it carrying on, since the fact I'm still playing is evidence there still are plenty of empty chambers.

[-]dr_s2y20

There are plenty of world's where I observe myself out of play, but with different information, why is the fact I'm not in one of them not telling me anything?

There are! But here I am only interested in useful information, bits of information you can use to update your strategy. In those worlds you just acquired bits of information that have zero usefulness. While information would still be useful, though, you won't acquire it. I should probably try to formalise this concept more, and I will maybe write something else about it.

[-]Yair Halberstadt2y4-2

See my comment from earlier below, which highlights how this information is in general useful, even if in this case it happens not to be:

To give a concrete counterexample.

Let's say each round there's a 50 percent probability of adding an extra bullet to the gun.

If I didn't update based on the fact I'm still playing then I would quickly stop after a few rounds, since the probability I would see a bullet constantly increases.

But if I do update, then it's worth it carrying on, since the fact I'm still playing is evidence there still are plenty of empty chambers

[-]dr_s2y2-2

Not quite getting it - is the addition permanent or just for each round? Seems to me like all that does is make the odds even worse. If you're already in a game in which you should quit, that only is all the more reason; if you're not, it could tilt the scales. And in neither case does updating on the fact you're still playing help in any way, because in fact you can't meaningfully update on that at all.

[-]Yair Halberstadt2y20

The addition is permanent. Updating on the fact that you're still playing provides evidence that the bullet was not in fact added in previous rounds, so it's worth carrying on playing a little bit longer, whereas if you didn't update, even if it was worth playing the first round, you would stop after 1 or 2.

[-]dr_s2y-1-6

OK, so if you get told that a bullet was added, then yes, that is information you can use, combined with the knowledge that the drum only holds maximum 6 bullets. But that's a different game, closer to the second I described (well, it's very different, but it has in common the fact that you do get extra information to ground your beliefs).

Even simpler, something similar would happen simply if you didn't spin the chamber after each turn, which would mean the probability of finding a bullet isn't uncorrelated any more. These details matter. I'd need to work it out to figure how it works, but I picked the game description very deliberately to show the effect I was talking about, so obviously changing it makes things different.

[-]Yair Halberstadt2y3-2

You don't get told no, you just guess from the fact you're still alive.

but I picked the game description very deliberately to show the effect I was talking about, so obviously changing it makes things different.

On the contrary, it doesn't show any such effect at all. It's carefully contrived so that you can update on the fact you're still alive, but that happens not to change your strategy. That's not very interesting at all. Often a change in probabilities won't change your strategy.

I'm simply showing that with a slight change of setup, updating on the fact your still alive does indeed change your strategy.

[-]dr_s2y1-6

If you don't get told I don't think it does then, no. It just changes the probabilities in a more confusing way, but you still have no particular way of getting information out of it.

As a player, who by definition must be still in play, you can't deduce anything from it. You only know what whatever your odds of surviving an extra round were at the beginning, they will go down with time. This probably leads to an optimal strategy that requires you absolutely quit after a certain number of rounds (depending on the probability of the bullet being added). But that's not affected by any actual in-game information because you don't really get any information.

[-]Yair Halberstadt2y20

But that's not affected by any actual in-game information because you don't really get any information.

You keep on asserting this, but that's not actually true - do the maths. A player who doesn't update on them still being alive, will play fewer rounds on average, and will earn less in repeated play. (Where each play is independent).

The reason is simple - they're not going to be able to play for very long when there are lots of bullets added, so the times when they find themselves still playing are disproportionately those where bullets weren't added, so they should play for longer.

[-]dr_s2y10

But the thing is, updating on being still alive doesn't change anything - it can never drive your estimate of up and thus save you from losing out. It could convince you to play if you aren't playing - but that's an absurdity, if you're not playing you won't get any updates! All updating gives you is a belief that since you're still alive, you must be in a low-bullets, high-probability world. This belief may be correct (and then it's fine, but you would have played even without it) or wrong (in which case you can never realise until it's too late). Either way, it doesn't swing your payoff.

In your added bullets scenario thinking about it there's a bit of a difference because now a strategy of playing for a certain amount of turns can make sense. So the game isn't time-symmetric, and this has an effect. I'm still not sure how you would use your updating though. Basically I think the only situation in which that sort of updating might give a genuine benefit is one in which the survival curve is U-shaped: there's a bump of mortality at the beginning, but if you get through it, you're good to go for a while. In that case, observing that you survived long enough to overcome the bump suggests that you're probably better off going on playing all the way to the end.

[-]Christopher King2y10

The way anthropics twists things is that if this were russian roulette I might not be able to update after 20 Es that the gun is empty, since in all the world's where I died there's noone to observe what happened, so of course I find myself in the one world where by pure chance I survived.

This is incorrect due to the anthropic undeath argument. The vast majority of surviving worlds will be ones where the gun is empty, unless it is impossible to be so. This is exactly the same as a Bayesian update.

[-]dadadarren2y41

To my understanding, anthropic shadow refers to the absurdum logic in Leslie's Firing Squad: "Of course I have survived the firing squad, that is the only way I can make this observation. Nothing surprising here". Or reasonings such as "I have played the Russian roulette 1000 times, but I cannot increase my belief that there is actually no bullet in the gun because surviving is the only observation I can make".

In the Chinese Roulette example, it is correct that the optimal strategy for the first round is also optimal for any following round. It is also correct if you decide to play for the first round then you will keep playing until kicked out i.e. no way to adjust our strategy. But that doesn't justify there is no probability update, for each subsequent decision, while all agree to keep playing, can be different. (And they should be different) It seems absurd to say I would not be more confident to keep going after 100 empty shots.

In short, changing strategy implies there is an update, not changing strategy doesn't imply there is no update.

[-]dr_s2y20

But the general point I wanted to make is that "anthropic shadow" reflects a fundamental impossibility of drawing useful updates. From within the boundaries of the game, you can't really say anything other than "well, I'm still playing, so of course I'm still playing". You can still feel like you update as a person because you wouldn't cease existing if you lost. But my point was that the essence of the anthropic shadow is that if you think as the player, an entity that in a sense ceases to exist as soon as that specific game is over, then you can't really update meaningfully. And that is reflected in the fact that you can't get any leverage out of the update.

At least, that was my thought when writing the post. I am thinking now about whether that can change if we design a game such that you can actually get meaningful in-game updates on your survival; I think having a turn-dependent survival probability might be key for that. I'll probably return to this.

[-]dadadarren2y97

It seems earlier posts and your post have defined anthropic shadow differently in subtle but important ways. The earlier posts by Christopher and Jessica argued AS is invalid: that there should be updates given I survived. Your post argued AS is valid: that there are games where no new information gained while playing can change your strategy (no useful updates). The former is focusing on updates, the latter is focusing on strategy. These two positions are not mutually exclusive.

Personally, the concept of "useful update" seems situational. For example, say someone has a prior that leads him to conclude the optimal strategy is not to play the Chinese Roulette. However, he was forced to play several rounds regardless of what he thought. After surviving those rounds (say EEEEE), it might very well be that he updates his probability enough to change his strategy from no-play to play. That would be a useful update. And this "forced-to-play" kind of situation is quite relevant to existential risks, which anthropic discussions tend to focus on.

[-]dr_s2y20

True enough, I guess. I do wonder how to reconcile the two views though, because the approach you describe that allows you to update in case of a basic game is actively worse for the second kind of game (the one with the blanks). In that case, using the approach I suggested actually produces a peaked probability distribution on that eventually converges to the correct value (well, on average). Meanwhile just looking at survival produces exactly the same monotonically decaying power law. If the latter potentially is useful information, I wonder how one might integrate the two.

[-]avturchin2y3-1

We can add anthropic flavor to this game:

1. Imagine this Chinese roulette, but you forget the number N of how many games you already played. In that case, it is more reasonable to bet on a smaller number.

If we add here the idea that firing the gun has only a probability p to broke the vase, we arrive to SSA-counterargument to anthropic shadow recently suggested by Jessika. That is, most of the observations will happen before the risk event.

2 Imagine another variant. You know that the number of played games N=6 and now you should guess n - the number of chambers typically loaded. In that case it is more reasonable to expect n=1 than n=5.

This is SIA-counterargument against anthropic shadow: if there are many worlds, you more likely to find yourself in the world with weakest anthropic shadow.

Note that both counterarguments do not kill anthropic shadow completely, but rather shift it to the most mild variant allowed. I explored it here.

And it all now looks like a variant of Sleeping beauty, btw.

3. However, the most important application of anthropic shadow is the idea of underestimating of the fragility of our environment. Based on previous Ns one can think that the vase is unbrokenable. This may cause a global risks in the case of blindly performing some physical experiment in fragile environment, eg geo-engineering.

[-]dr_s2y31

Imagine this Chinese roulette, but you forget the number N of how many games you already played. In that case, it is more reasonable to bet on a smaller number.

Not sure I follow - the idea is that the parameter gets randomized at every game, so why would this change the optimal strategy? Each game is its own story.

If we add here the idea that firing the gun has only a probability p to broke the vase, we arrive to SSA-counterargument to anthropic shadow recently suggested by Jessika. That is, most of the observations will happen before the risk event.

I think this is roughly equivalent to what the blanks do? Now you just have a probability of $b + (1 - p) (1 - b)$ to not break the vase in case of a loaded chamber.

2 Imagine another variant. You know that the number of played games N=6 and now you should guess n - the number of chambers typically loaded. In that case it is more reasonable to expect n=1 than n=5.

Only if you know that the games lasted more than one turn! Also this only makes sense if you assume that the distribution that $n$ is picked from before each game is not uniform.

But yeah, things can get plenty tricky if you assume some kind of non-uniform distribution between worlds etc. But I'm not sure with so many possibilities what would be, specifically, an interesting one to explore more in depth.

[-]avturchin2y20

My point was to add uncertainty about one's location relative to the game situation - this is how it turns into more typical anthropic questions.

[-]noggin-scratcher2y30

This is one of the few times where I've seen a post involving anthropic reasoning, and not come away with the general impression that one-of (myself, the subject itself) is hopelessly confused on some fundamental point. So kudos for that.

[-]avturchin2y7-1

In astrophysics they often use expression "observation-selection effect" and it is more clear.

[-]dr_s2y73

Thanks! I honestly think it actually tends to get all more mind-screwey than needed to be because it always ends up mixed up with all these grand concepts involving death, counterfactual existence and so on. That's why I wanted to really strip it of all those connotations and keep it super mundane.

[-]Christopher King2y10

Maximizing expected utility in Chinese Roulette requires Bayesian updating.

Let's say on priors that P(n=1) = p and that P(n=5) = 1-p. Call this instance of the game G_p.

Let's say that you shoot instead of quit the first round. For G_1/2, there are four possibilities:

n = 1, vase destroyed: The probability of this scenario is 1/12. No further choices are needed.
n = 5, vase destroyed. The probability of this scenario is 5/12. No further choices are needed.
n = 1, vase survived: The probability of this scenario is 5/12. The player needs a strategy to continue playing.
n = 5, vase survived. The probability of this scenario is 1/12. The player needs a strategy to continue playing.

Notice that the strategy must be the same for 3 and 4 since the observations are the same. Call this strategy S.

The expected utility, which we seek to maximize, is:

E[U(shoot and then S)] = 0 + 5/12 * (R + E[U(S) | n = 1]) + 1/12 * (R + E[U(S) | n = 5])

Most of our utility is determined by the n = 1 worlds.

Manipulating the equation we get:

E[U(shoot and then S)] = R/2 + 1/2 * (5/6 * E[U(S) | n = 1] + 1/6 * E[U(S) | n = 5])

But the expression 5/6 * E[U(S) | n = 1] + 1/6 * E[U(S) | n = 5] is the expected utility if we were playing G_5/6. So the optimal S is the optimal strategy for G_5/6. This is the same as doing a Bayesian update (1:1 * 5:1 = 5:1 = 5/6).

[-]RedMan2y01

A lot of real games in real life follow these rules. Except, the game organizer knows the value of the vase, and how many bullets they loaded. They might also charge you to play.

^{^}

Actually, this only happens if the limit for $b$ is very close to 1; if $b \in] 0, 1 / 2]$ , for example, then it's another story. But that's a completely different game too.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

37

Learning as you play: anthropic shadow in deadly games

37

37

A game of Chinese Roulette

The only winning move is not to play - sometimes

A matter of viewpoints

Conclusions