Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself.  (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.)

Then I attempt to force a reflective inconsistency in your decision system, as follows:

I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room.  If they all reply "Yes", I will do so.

(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen.  Very little rescaling should be necessary.)

(Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves.  Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves.  We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50".  If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this.  If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.)

Suppose that you wake up in a green room.  You reason, "With 90% probability, there are 18 of me in green rooms and 2 of me in red rooms; with 10% probability, there are 2 of me in green rooms and 18 of me in red rooms.  Since I'm altruistic enough to at least care about my xerox-siblings, I calculate the expected utility of replying 'Yes' as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60."  You reply yes.

However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20.  You want your future selves to reply 'No' under these conditions.

This is a dynamic inconsistency - different answers at different times - which argues that decision systems which update on anthropic evidence will self-modify not to update probabilities on anthropic evidence.

I originally thought, on first formulating this problem, that it had to do with double-counting the utilons gained by your variable numbers of green friends, and the probability of being one of your green friends.

However, the problem also works if we care about paperclips.  No selfishness, no altruism, just paperclips.

Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'.  (Should they disagree on their answers, I will destroy 5 paperclips.)"  Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet.  But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90% * +1 paperclip) + (10% * -3 paperclips)) = +0.6 paperclips.

This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".

Well, that's not too disturbing, is it?  I mean, the whole anthropic thing seemed very confused to begin with - full of notions about "consciousness" and "reality" and "identity" and "reference classes" and other poorly defined terms.  Just throw out anthropic reasoning, and you won't have to bother.

When I explained this problem to Marcello, he said, "Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning", which is a fascinating sort of reply.  And I responded, "But when you have a problem this confusing, and you find yourself wanting to build an AI that just doesn't use anthropic reasoning to begin with, maybe that implies that the correct resolution involves us not using anthropic reasoning either."

So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains.  QED.

In general, I find the sort of argument given here - that a certain type of decision system is not reflectively consistent - to be pretty damned compelling.  But I also find the Boltzmann conclusion to be, ahem, more than ordinarily unpalatable.

In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips."  I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.

EDIT:  On further reflection, I also wouldn't want to build an AI that concluded it was a Boltzmann brain!  Is there a form of inference which rejects this conclusion without relying on any reasoning about subjectivity?

EDIT2:  Psy-Kosh has converted this into a non-anthropic problem!

New Comment
210 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Actually... how is this an anthropic situation AT ALL?

I mean, wouldn't it be equivalent to, say, gather 20 rational people (That understand PD, etc etc etc, and can certainly manage to agree to coordinate with each other) that are allowed to meet with each other in advance and discuss the situation...

I show up and tell them that I have two buckets of marbles, some of which are green, some of which are red

One bucket has 18 green and 2 red, and the other bucket has 18 red and 2 green.

I will (already have) flipped a logical coin. Depending on the outcome, I will use either one bucket or the other.

After having an opportunity to discuss strategy, they will be allowed to reach into the bucket without looking, pull out a marble, look at it, then, if it's green choose if to pay and steal, etc etc etc. (in case it's not obvious, the payout rules being equivalent to the OP)

As near as I can determine, this situation is entirely equivalent to the OP and is in no way an anthropic one. If the OP actually is an argument against anthropic updates in the presence of logical uncertainty... then it's actually an argument against the general case of Bayesian updating in the presence of logical uncertainty, even when there's no anthropic stuff going on at all!

EDIT: oh, in case it's not obvious, marbles are not replaced after being drawn from the bucket.

Right, and this is a perspective very close to intuition for UDT: you consider different instances of yourself at different times as separate decision-makers that all share the common agenda ("global strategy"), coordinated "off-stage", and implement it without change depending on circumstances they encounter in each particular situation. The "off-stageness" of coordination is more naturally described by TDT, which allows considering different agents as UDT-instances of the same strategy, but the precise way in which it happens remains magic.
2Eliezer Yudkowsky
Nesov, the reason why I regard Dai's formulation of UDT as such a significant improvement over your own is that it does not require offstage coordination. Offstage coordination requires a base theory and a privileged vantage point and, as you say, magic.
I still don't understand this emphasis. Here I sketched in what sense I mean the global solution -- it's more about definition of preference than the actual computations and actions that the agents make (locally). There is an abstract concept of global strategy that can be characterized as being "offstage", but there is no offstage computation or offstage coordination, and in general complete computation of global strategy isn't performed even locally -- only approximations, often approximations that make it impossible to implement the globally best solution. In the above comment, by "magic" I referred to exact mechanism that says in what way and to what extent different agents are running the same algorithm, which is more in the domain of TDT, UDT generally not talking about separate agents, only different possible states of the same agent. Which is why neither concept solves the bargaining problem: it's out of UDT's domain, and TDT takes the relevant pieces of the puzzle as given, in its causal graphs. For further disambiguation, see for example this comment you made:
That uncertainty is logical seems to be irrelevant here.
Agreed. But I seem to recall seeing some comments about distinguishing between quantum and logical uncertainty, etc etc, so figured may as well say that it at least is equivalent given that it's the same type of uncertainty as in the original problem and so on...
1Eliezer Yudkowsky
Again, if we randomly selected someone to ask, rather than having specified in advance that we're going to make the decision depend on the unanimous response of all people in green rooms, then there would be no paradox. What you're talking about here, pulling out a random marble, is the equivalent of asking a random single person from either green or red rooms. But this is not what we're doing!

Either I'm misunderstanding something, or I wasn't clear.

To make it explicit: EVERYONE who gets a green marble gets asked, and the outcome depends their consent being unanimous, just like everyone who wakes up in a green room gets asked. ie, all twenty rationalists draw a marble from the bucket, so that by the end, the bucket is empty.

Everyone who got a green marble gets asked for their decision, and the final outcome depends on all the answers. The bit about them drawing marbles individually is just to keep them from seeing what marbles the others got or being able to talk to each other once the marble drawing starts.

Unless I completely failed to comprehend some aspect of what's going on here, this is effectively equivalent to the problem you described.

Oh, okay, that wasn't clear actually. (Because I'm used to "they" being a genderless singular pronoun.) In that case these problems do indeed look equivalent.

Hm. Hm hm hm. I shall have to think about this. It is a an extremely good point. The more so as anyone who draws a green marble should indeed be assigning a 90% probability to there being a mostly-green bucket.

Sorry about the unclarity then. I probably should have explicitly stated a step by step "marble game procedure". My personal suggestion if you want an "anthropic reasoning is confooozing" situation would be the whole anthropic updating vs aumann agreement thing, since the disagreement would seem to be predictable in advance, and everyone involved would appear to be able to be expected to agree that the disagreement is right and proper. (ie, mad scientist sets up a quantum suicide experiment. Test subject survives. Test subject seems to have Bayesian evidence in favor of MWI vs single world, external observer mad scientist who sees the test subject/victim survive would seem to not have any particular new evidence favoring MWI over single world) (Yes, I know I've brought up that subject several times, but it does seem, to me, to be a rather more blatant "something funny is going on here") (EDIT: okay, I guess this would count as quantum murder rather than quantum suicide, but you know what I mean.)
I don't see how being assigned a green or red room is "anthropic" while being assigned a green or red marble is not anthropic. I thought the anthropic part came from updating on your own individual experience in the absence of observing what observations others are making.
The difference wasn't marble vs room but "copies of one being, so number of beings changed" vs "just gather 20 rationalists..." But my whole point was "the original wasn't really an anthropic situation, let me construct this alternate yet equivalent version to make that clear"
Do you think that the Sleeping Beauty problem is an anthropic one?
It probably counts as an instance of the general class of problems one would think of as an "anthropic problem".
I see. I had always thought of the problem as involving 20 (or sometimes 40) different people. The reason for this is that I am an intuitive rather than literal reader, and when Eliezer mentioned stuff about copies of me, I just interpreted this as meaning to emphasize that each person has their own independent 'subjective reality'. Really only meaning that each person doesn't share observations with the others. So all along, I thought this problem was about challenging the soundness of updating on a single independent observation involving yourself as though you are some kind of special reference frame. ... therefore, I don't think you took this element out, but I'm glad you are resolving the meaning of "anthropic" because there are probably quite a few different "subjective realities" circulating about what the essence of this problem is.
Sorry for delay. Copies as in "upload your mind. then run 20 copies of the uploaded mind". And yes, I know there's still tricky bits left in the problem, I merely established that those tricky bits didn't derive from effects like mind copying or quantum suicide or anything like that and could instead show up in ordinary simple stuff, with no need to appeal to anthropic principles to produce the confusion. (sorry if that came out babbly, am getting tired)
That's funny: when Eliezer said "imagine there are two of you", etc., I had assumed he meant two of us rationalists, etc.
I don't think so. I think the answer to both these problems is that if you update correctly, you get 0.5.
*blinks* mind expanding on that? P(green|mostly green bucket) = 18/20 P(green|mostly red bucket) = 2/20 likelihood ratio = 9 if one started with no particular expectation of it being one bucket vs the other, ie, assigned 1:1 odds, then after updating upon seeing a green marble, one ought assign 9:1 odds, ie, probability 9/10, right?
I guess that does need a lot of explaining. I would say: P(green|mostly green bucket) = 1 P(green|mostly red bucket) = 1 P(green) = 1 because P(green) is not the probability that you will get a green marble, it's the probability that someone will get a green marble. From the perspective of the priors, all the marbles are drawn, and no one draw is different from any other. If you don't draw a green marble, you're discarded and the people who did get a green vote. For the purposes of figuring out the priors for a group strategy, your draw being green is not an event. Of course, you know that you've drawn green. But the only thing you can translate it into that has a prior is "someone got green." That probably sounds contrived. Maybe it is. But consider a slightly different example: * Two marbles and two people instead of twenty. * One marble is green, the other will be red or green based on a coin flip (green on heads, red on tails). I like this example because it combines the two conflicting intuitions in the same problem. Only a fool would draw a red marble and remain uncertain about the coin flip. But someone who draws a green marble is in a situation similar to the twenty marble scenario. If you were to plan ahead of time how the greens should vote, you would tell them to assume 50%. But a person holding a green marble might think it's 2/3 in favor of double green. To avoid embarrassing paradoxes, you can base everything on the four events "heads," "tails," "someone gets green," and "someone gets red." Update as normal.
yes, the probability that someone will get a green marble is rather different than the probability that I, personally, will get a green marble. But if I do personally get a green marble, that's evidence in favor of green bucket. The decision algorithm for how to respond to that though in this case is skewed due to the rules for the payout. And in your example, if I drew green, I'd consider the 2/3 probability the correct one for whoever drew green. Now, if there's a payout scheme involved with funny business, that may alter some decisions, but not magically change my epistemology.
What kind of funny business?
Let's just say that you don't draw blue.
OK, but I think Psy-Kosh was talking about something to do with the payoffs. I'm just not sure if he means the voting or the dollar amounts or what.
Sorry for delay. And yeah, I meant stuff like "only greens get to decide, and the decision needs to be unanimous" and so on
I agree that changes the answer. I was assuming a scheme like that in my two marble example. In a more typical situation, I would also say 2/3. To me, it's not a drastic (or magical) change, just getting a different answer to a different question.
Um... okay... I'm not sure what we're disagreeing about here, if anything: my position is "given that I found myself with a green marble, it is right and proper for me to assign a 2/3 probability to both being green. However, the correct choice to make, given the pecuiluarities of this specific problem, may require one to make a decision that seems, on the surface, as if one didn't update like that at all."
Well, we might be saying the same thing but coming from different points of view about what it means. I'm not actually a bayesian, so when I talk about assigning probabilities and updating them, I just mean doing equations. What I'm saying here is that you should set up the equations in a way that reflects the group's point of view because you're telling the group what to do. That involves plugging some probabilities of one into Bayes' Law and getting a final answer equal to one of the starting numbers.
So was I. But fortunately I was restrained enough to temper my uncouth humour with obscurity.
Very enlightening! It just shows that the OP was an overcomplicated example generating confusion about the update. [EDIT] Deleted rest of the comment due to revised opinion here: http://lesswrong.com/lw/17c/outlawing_anthropics_an_updateless_dilemma/13hk
Good point. After thinking about this for a while, I feel comfortable simultaneously holding these views: 1) You shouldn't do anthropic updates. (i.e. update on the fact that you exist) 2) The example posed in the top-level post is not an example of anthropic reasoning, but reasoning on specific givens and observations, as are most supposed examples of anthropic reasoning. 3) Any evidence arising from the fact that you exist is implicitly contained by your observations by virtue of their existence. Wikipedia gives one example of a productive use of the anthropic principle, but it appears to be reasoning based on observations of the type of life-form we are, as well as other hard-won biochemical knowledge, well above and beyond the observation that we exist.
Thanks. I don't THINK I agree with your point 1. ie, I favor saying yes to anthropic updates, but I admit that there's definitely confusing issues here. Mind expanding on point 3? I think I get what you're saying, but in general we filter out that part our observations, that is, the fact that observations are occurring at all, Getting that back is the point of anthropic updating. Actually... IIRC, Nick Bostrom's way of talking about anthropic updates more or less is exactly your point 3 in reverse... ie, near as I can determine and recall, his position explicitly advocates talking about the significance that observations are occurring at all as part of the usual update based on observation. Maybe I'm misremembering though. Also, separating it out into a single anthropic update and then treating all observations as conditional on your existence or such helps avoid double counting that aspect, right? Also, here's another physics example, a bit more recent that was discussed on OB a while back.
Reading the link, the second paper's abstract, and most of Scott Aaronson's post, it looks to me like they're not using anthropic reasoning at all. Robin Hanson summarizes their "entropic principle" (and the abstract and all discussion agree with his summary) as The problem is that "observer" is not the same as "anthrop-" (human). This principle is just a subtle restatement of either a tautology or known physical law. Because it's not that "observers need entropy gains". Rather, observation is entropy gain. To observe something is to increase one's mutual information with it. But since phase space is conserved, all gains in mutual information must be offset by an increase in entropy. But since "observers" are simply anything that forms mutual information with something else, it doesn't mean a conscious observer, let alone a human one. For that, you'd need to go beyond P(entropy gain|observer) to P(consciousness|entropy gain). (I'm a bit distressed no one else made this point.) Now, this idea could lead to an insight if you endorsed some neo-animistic view that consciousness is proportional to normalized rate of mutual information increase, and so humans are (as) conscious (as we are) because we're above some threshold ... but again, you'd be using nothing from your existence as such.
The argument was "higher rate of entropy production is correlated with more observers, probably. So we should expect to find ourselves in chunks of reality that have high rates of entropy production" I guess it wasn't just observers, but (non reversible) computations ie, anthropic reasoning was the justification for using the entropy production criteria in the first place. Yes, there is a question of fractions of observers that are conscious, etc... but a universe that can't support much in the way of observers at all probably can't support much in the way of conscious observers, while a universe that can support lots of observers can probably support more conscious observers than the other, right? Or did I misunderstand your point?
Now I'm not understanding how your response applies. My point was: the entropic principle estimates the probability of observers per unit volume by using the entropy per unit volume. But this follows immediately from the second law and conservation of phase space; it's necessarily true. To the extent that it assigns a probability to a class that includes us, it does a poor job, because we make up a tiny fraction of the "observers" (appropriately defined) in the universe.
The situation is not identical in the non-anthropic case in that there are equal numbers of rooms but differing numbers of marbles. There's only one green room (so observing it is evidence for heads-green with p=0.5) whereas there are 18 green marbles, so p(heads|green)= ((18/20)/0.5)*0.5 = 0.9.
Sorry for delayed response. Anyways, how so? 20 rooms in the original problem, 20 marbles in mine. what fraction are green vs red derives from examining a logical coin, etc etc etc... I'm not sure where you're getting the only one green room thing.

Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning.

Why is anthropic reasoning related to consciousness at all? Couldn't any kind of Bayesian reasoning system update on the observation of its own existence (assuming such updates are a good idea in the first place)?

Why do I think anthropic reasoning and consciousness are related? In a nutshell, I think subjective anticipation requires subjectivity. We humans feel dissatisfied with a description like "well, one system running a continuation of the computation in your brain ends up in a red room and two such systems end up in green rooms" because we feel that there's this extra "me" thing, whose future we need to account for. We bother to ask how the "me" gets split up, what "I" should anticipate, because we feel that there's "something it's like to be me", and that (unless we die) there will be in future "something it will be like to be me". I suspect that the things I said in the previous sentence are at best confused and at worst nonsense. But the question of why people intuit crazy things like that is the philosophical question we label "consciousness". However, the feeling that there will be in future "something it will be like to be me", and in particular that there will be one "something it will be like to be me" if taken seriously, forces us to have subjective anticipation, that is, to write probability distribution summing to one for which copy we end up as. Once you do that, if you wake up in a green room in Eliezer's example, you are forced to update to 90% probability that the coin came up heads (provided you distributed your subjective anticipation evenly between all twenty copies in both the head and tail scenarios, which really seems like the only sane thing to do.) Or, at least, the same amount of "something it is like to be me"-ness as we started with, in some ill-defined sense. On the other hand, if you do not feel that there is any fact of the matter as to which copy you become, then you just want all your copies to execute whatever strategy is most likely to get all of them the most money from your initial perspective of ignorance of the coinflip. Incidentally, the optimal strategy looks like an policy selected by updateless decision theory and not like
Consciousness is really just a name for having a model of yourself which you can reflect on and act on - plus a whole bunch of other confused interpretations which don't really add much. To do anthropic reasoning you have to have a simple model of yourself which you can reason about. Machines can do this too, of course, without too much difficulty. That typically makes them conscious, though. Perhaps we can imagine a machine performing anthropic reasoning while dreaming - i.e. when most of its actuators are disabled, and it would not normally be regarded as being conscious. However, then, how would we know about its conclusions?

An AI that runs UDT wouldn't conclude that it was a Boltzmann or non-Boltzmann brain. For such an AI, the statement has no meaning, since it's always both. The closest equivalent would be "Most of the value I can create by making the right decision is concentrated in the vicinity of non-Boltzmann brains."

BTW, does my indexical uncertainty and the Axiom of Independence post make any more sense now?

This was my take after going through a similar analysis (with apples, not paperclips) at the SIAI summer intern program.
2Wei Dai
It seems promising that several people are converging on the same "updateless" idea. But sometimes I wonder why it took so long, if it's really the right idea, given the amount of brainpower spent on this issue. (Take a look at http://www.anthropic-principle.com/profiles.html and consider that Nick Bostrom wrote "Investigations into the Doomsday Argument" in 1996 and then did his whole Ph.D. on anthropic reasoning, culminating in a book published in 2002.) BTW, weren't the SIAI summer interns supposed to try to write one LessWrong post a week (or was it a month)? What happened to that plan?
5Eliezer Yudkowsky
People are crazy, the world is mad. Also inventing basic math is a hell of a lot harder than reading it in a textbook afterward.
3Wei Dai
I suppose you're referring to the fact that we are "designed" by evolution. But why did evolution create a species that invented the number field sieve (to give a random piece of non-basic math) before UDT? It doesn't make any sense. In what sense is it "hard"? I don't think it's hard in a computational sense, like NP-hard. Or is it? I guess it goes back to the question of "what algorithm are we using to solve these types of problems?"
4Eliezer Yudkowsky
No, I'm referring to the fact that people are crazy and the world is mad. You don't need to reach so hard for an explanation of why no one's invented UDT yet when many-worlds wasn't invented for thirty years.
I also don't think general madness is enough of an explanation. Both are counterintuitive ideas in areas without well-established methods to verify progress, e.g. building a working machine or standard mathematical proof techniques.
The OB/LW/SL4/TOElist/polymathlist group is one intellectual community drawing on similar prior work that hasn't been broadly disseminated. The same arguments apply with much greater force to the the causal decision theory vs evidential decision theory debate. The interns wound up more focused on their group projects. As it happens, I had told Katja Grace that I was going to write up a post showing the difference between UDT and SIA (using my apples example which is isomorphic with the example above), but in light of this post it seems needless.
UDT is basically the bare definition of reflective consistency: it is a non-solution, just statement of the problem in constructive form. UDT says that you should think exactly the same way as the "original" you thinks, which guarantees that the original you won't be disappointed in your decisions (reflective consistency). It only looks good in comparison to other theories that fail this particular requirement, but otherwise are much more meaningful in their domains of application. TDT fails reflective consistency in general, but offers a correct solution in a domain that is larger than those of other practically useful decision theories, while retaining their expressivity/efficiency (i.e. updating on graphical models).
2Wei Dai
What prior work are you referring to, that hasn't been broadly disseminated? I think much less brainpower has been spent on CDT vs EDT, since that's thought of as more of a technical issue that only professional decision theorists are interested in. Likewise, Newcomb's problem is usually seen as an intellectual curiosity of little practical use. (At least that's what I thought until I saw Eliezer's posts about the potential link between it and AI cooperation.) Anthropic reasoning, on the other hand, is widely known and discussed (I remember the Doomsday Argument brought up during a casual lunch-time conversation at Microsoft), and thought to be both interesting in itself and having important applications in physics. I miss the articles they would have written. :) Maybe post the topic ideas here and let others have a shot at them?
"What prior work are you referring to, that hasn't been broadly disseminated?" I'm thinking of the corpus of past posts on those lists, which bring certain tools and concepts (Solomonoff Induction, anthropic reasoning, Pearl, etc) jointly to readers' attention. When those tools are combined and focused on the same problem, different forum participants will tend to use them in similar ways.
You might think that more top-notch economists and game theorists would have addressed Newcomb/TDT/Hofstadter superrationality given their interest in the Prisoner's Dilemma. Looking at the actual literature on the Doomsday argument, there are some physicists involved (just as some economists and others have tried their hands at Newcomb), but it seems like more philosophers. And anthropics doesn't seem core to professional success, e.g. Tegmark can indulge in it a bit thanks to showing his stuff in 'hard' areas of cosmology.
3Wei Dai
I just realized/remembered that one reason that others haven't found the TDT/UDT solutions to Newcomb/anthropic reasoning may be that they were assuming a fixed human nature, whereas we're assuming an AI capable of self-modification. For example, economists are certainly more interested in answering "What would human beings do in PD?" than "What should AIs do in PD assuming they know each others' source code?" And perhaps some of the anthropic thinkers (in the list I linked to earlier) did invent something like UDT, but then thought "Human beings can never practice this, I need to keep looking."
This post is an argument against voting on your updated probability when there is a selection effect such as this. It applies to any evidence (marbles, existence etc), but only in a specific situation, so has little to do with SIA, which is about whether you update on your own existence to begin with in any situation. Do you have arguments against that?
It's for situations in which different hypotheses all predict that there will be beings subjectively indistinguishable from you, which covers the most interesting anthropic problems in my view. I'll make some posts distinguishing SIA, SSA, UDT, and exploring their relationships when I'm a bit less busy.
Are you saying this problem arises in all situations where multiple beings in multiple hypotheses make the same observations? That would suggest we can't update on evidence most of the time. I think I must be misunderstanding you. Subjectively indistinguishable beings arise in virtually all probabilistic reasoning. If there were only one hypothesis with one creature like you, then all would be certain. The only interesting problem in anthropics I know of is whether to update on your own existence or not. I haven't heard a good argument for not (though I still have a few promising papers to read), so I am very interested if you have one. Will 'exploring their relationships' include this?
You can judge for yourself at the time.

Curses on this problem; I spent the whole day worrying about it, and am now so much of a wreck that the following may or may not make sense. For better or worse, I came to a similar conclusion of Psy-Kosh: that this could work in less anthropic problems. Here's the equivalent I was using:

Imagine Omega has a coin biased so that it comes up the same way nine out of ten times. You know this, but you don't know which way it's biased. Omega allows you to flip the coin once, and asks for your probability that it's biased in favor of heads. The coin comes up head... (read more)

1Eliezer Yudkowsky
By assumption, if the person is right to believe they're in a sim, then most of the lottery winners are in sims, so while Omega laughs at them in our world, they win the bet with Omega in most of their worlds. should have been your clue to check further.
2Scott Alexander
This is a feature of the original problem, isn't it? Let's say there are 1000 brains in vats, each in their own little world, and a "real" world of a billion people. The chance of a vat-brain winning the lottery is 1, and the chance of a real person winning the lottery is 1 in a million. There are 1000 real lottery winners and 1000 vat lottery winners, so if you win the lottery your chance of being in a vat is 50-50. However, if you look at any particular world, the chances of this week's single lottery winner being a brain in a vat is 1000/1001. Assume the original problem is run multiple times in multiple worlds, and that the value of pi somehow differs in those worlds (probably you used pi precisely so people couldn't do this, but bear with me). Of all the people who wake up in green rooms, 18/20 of them will be right to take your bet. However, in each particular world, the chances of the green room people being right to take the bet is 1/2. In this situation there is no paradox. Most of the people in the green rooms come out happy that they took the bet. It's only when you limit it to one universe that it becomes a problem. The same is true of the lottery example. When restricted to a single (real, non-vat) universe, it becomes more troublesome.
1Eliezer Yudkowsky
It's worth noting that if everyone got to make this choice separately - Omega doing it once for each person who responds - then it would indeed be wise for everyone to take the bet! This is evidence in favor of either Bostrom's division-of-responsibility principle, or byrnema's pointer-based viewpoint, if indeed those two views are nonequivalent.
EDIT: Never mind
Bostrom's calculation is correct, but I believe it is an example of multiplying by the right coefficients for the wrong reasons. I did exactly the same thing -- multiplied by the right coefficients for the wrong reasons -- in my deleted comment. I realized that the justification of these coefficients required a quite different problem (in my case, I modeled that all the green roomers decided to evenly divide the spoils of the whole group) and the only reason it worked was because multiplying the first term by 1/18 and the next term by 1/2 meant you were effectively canceling away that the factors the represented your initial 90% posterior, and thus ultimately just applying the 50/50 probability of the non-anthropic solution. Anthropic calculation: 18/20(12)+2/20(-52) = 5.6 Bostrom-modified calculation for responsibility per person: [18/20(12)/18+2/20(-52)/2] / 2 = -1 Non-anthropic calculation for EV per person: [1/2(12)+1/2(-52)] /20 = -1 My pointer-based viewpoint, in contrast, is not a calculation but a rationale for why you must use the 50/50 probability rather than the 90/10 one. The argument is that each green roomer cannot use the information that they were in a green room because this information was preselected (a biased sample). With effectively no information about what color room they're in, each green roomer must resort to the non-anthropic calculation that the probability of flipping heads is 50%.
I can very much relate to Eliezer's original gut reaction: I agree that Nick's calculation is very ad hoc and hardly justifiable. However, I also think that, although you are right about the pointer bias, your explanation is still incomplete. I think Psi-kosh made an important step with his reformulation. Especially eliminating the copy procedure for the agents was essential. If you follow through the math from the point of view of one of the agents, the nature of the problem becomes clear: Trying to write down the payoff matrix from the viewpoint of one of the agents, it becomes clear that you can't fill out any of the reward entries, since the outcome never depends on that agent's decision alone. If he got a green marble, it still depends on other agents decision and if he drew a red one, it will depend only on other agent's decision. This makes it completely clear that the only solution is for the agents is to agree on a predetermined protocol and therefore the second calculation of the OP is the only correct one so far. However, this protocol does not imply anything about P(head|being in green room). It is simply irrelevant for the expected value of any of the agreed upon protocol. One could create a protocol that depends on P(head|being in a green room) for some of the agents, but you would have to analyze the expected value of the protocol from a global point of view, not just from the point of view of the agent, for you can't complete the decision matrix if the outcome depends on other agent's decisions as well. Of course a predetermined protocol does not mean that the agents must explicitly agree on a narrow protocol before the action. If we assume that the agents get all the information once they find themselves in the room, they could still create a mental model of the whole global situation and base their decision on the second calculation of the OP.
I agree with you that the reason why you can't use the 90/10 prior is because the decision never depends on a person in a red room. In Eliezer's description of the problem above, he tells each green roomer that he asks all the green roomers if they want him to go ahead with a money distribution scheme, and they must be unanimous or there is a penalty. I think this is a nice pedogogical component that helps a person understand the dilemma, but I would like to emphasize here (even if you're aware of it) that it is completely superfluous to the mechanics of the problem. It doesn't make any difference if Eliezer bases his action on the answer of one green roomer or all of them. For one thing, all green roomer answers will be unanimous because they all have the same information and are asked the same complicated question. And, more to the point, even if just one green roomer is asked, the dilemma still exists that he can't use his prior that heads was probably flipped.
Agreed 100%. [EDIT:] Although I would be a bit more general: regardless of red rooms: if you have several actors, even if they necessarily make the same decision they have to analyze the global picture. The only situation when the agent should be allowed to make the simplified subjective Bayesian decision table analysis if he is the only actor (no copies, etc. It is easy to construct simple decision problems without "red rooms": Where each of the actors have some control over the outcome and none of them can make the analysis for itself only but have to buid a model of the whole situation to make the globally optimal decision.) However, I did not imply in any way that the penalty matters. (At least, as long as the agents are sane and don't start to flip non-logical coins) The global analysis of the payoff may clearly disregard the penalty case if it's impossible for that specific protocol. The only requirement is that the expected value calculation must be made protocol by protocol basis.
My intuition says that this is qualitatively different. If the agent knows that only one green roomer will be asked the question, then upon waking up in a green room the agent thinks "with 90% probability, there are 18 of me in green rooms and 2 of me in red rooms." But then, if the agent is asked whether to take the bet, this new information ("I am the unique one being asked") changes the probability back to 50-50.

"I've made sacrifices! You don't know what it cost me to climb into that machine every night, not knowing if I'd be the man in the box or in the prestige!"

sorry- couldn't help myself.

You know, I never could make sense out of that line. If you assume the machine creates "copies" (and that's strongly implied by the story up to that point), then that means every time he gets on stage, he's going to wind up in the box. (And even if the copies are error-free and absolutely interchangeable, one copy will still end up in the box.) (Edit to add: of course, if you view it from the quantum suicide POV, "he" never ends up in the box, since otherwise "he" would not be there to try again the next night.)

Again: how can you talk about concluding that you are a Boltzmann brain? To conclude means to update, and here you refuse updating.

I read this and told myself that it only takes five minutes to have an insight. Five minutes later, here's what I'm thinking:

Anthropic reasoning is confusing because it treats consciousness as a primitive. By doing so, we're committing LW's ultimate no-no: assuming an ontologically fundamental mental state. We need to find a way to reformulate anthropic reasoning in terms Solomonoff induction. If we can successfully do so, the paradox will dissolve.

Anthropic reasoning is confusing - probably because we are not used to doing it much in our ancestral environment. I don't think you can argue it treats consciousness as a primitive, though. Anthropic reasoning is challenging - but not so tricky that machines can't do it.
It involves calculating a 'correct measure' of how many partial duplicates of a computation exist: www.nickbostrom.com/papers/experience.pdf Anthropics does involve magical categories.
Right - but that's "Arthur C Clark-style magic" - stuff that is complicated and difficult - not the type of magic associated with mystical mumbo-jumbo. We can live with some of the former type of magic - and it might even spice things up a bit.
I fail to see how solomonoff can reduce ontologically basic mental states.

More thinking out loud:

It really is in your best interest to accept the offer after you're in a green room. It really is in your best interest to accept the offer conditional on being in a green room before you're assigned. Maybe part of the problem arises because you think your decision will influence the decision of others, ie because you're acting like a timeless decision agent. Replace "me" with "anyone with my platonic computation", and "I should accept the offer conditional on being in a green room" with "anyone wit... (read more)

Yes, exactly. If you are in a green room and someone asks you if you will bet that a head was flipped, you should say "yes". However, if that same person asks you if they should bet that heads was flipped, you should answer no if you ascertain that they asked you on the precondition that you were in a green room. * the probability of heads | you are in green room = 90% * the probability of you betting on heads | you are green room = 100% = no information about the coin flip
0Joanna Morningstar
Your first claim needs qualifications: You should only bet if you're being drawn randomly from everyone. If it is known that one random person in a green room will be asked to bet, then if you wake up in a green room and are asked to bet you should refuse. P(Heads | you are in a green room) = 0.9 P(Being asked | Heads and Green) = 1/18, P(Being asked | Tails and Green) = 1/2 Hence P(Heads | you are asked in a green room) = 0.5 Of course the OP doesn't choose a random individual to ask, or even a random individual in a green room. The OP asks all people in green rooms in this world. If there is confusion about when your decision algorithm "chooses", then TDT/UDT can try to make the latter two cases equivalent, by thinking about the "other choices I force". Of course the fact that this asserts some variety of choice for a special individual and not for others, when the situation is symmetric, suggests something is being missed. What is being missed, to my mind, is a distinction between the distribution of (random individuals | data is observed), and the distribution of (random worlds | data is observed). In the OP, the latter distribution isn't altered by the update as the observed data occurs somewhere with probability 1 in both cases. The former is because it cares about the number of copies in the two cases.

I've been watching for a while, but have never commented, so this may be horribly flawed, opaque or otherwise unhelpful.

I think the problem is entirely caused by the use of the wrong sets of belief, and that anything holding to Eliezer's 1-line summary of TDT or alternatively UDT should get this right.

Suppose that you're a rational agent. Since you are instantiated in multiple identical circumstances (green rooms) and asked identical questions, your answers should be identical. Hence if you wake up in a green room and you're asked to steal from the red roo... (read more)

I was influenced by the OP and used to think that way. However I think now, that this is not the root problem. What if the agents get more complicated decision problems: for example, rewards depending on the parity of the agents voting certain way, etc.? I think, what essential is that the agents have to think globally (categorical imperative, hmmm?) Practically: if the agent recognizes that there is a collective decision, then it should model all available conceivable protocols (but making apriori sure that all cooperating agents perform the same or compatible analysis, if they can't communicate) and then they should choose the protocol with best overall total gain. In the case of the OP: the second calculation in the OP. (Not messing around with correction factors based on responsibilities, etc.) Special considerations based on group sizes etc. may be incidentally correct in certain situations, but this is just not general enough. The crux is that the ultimate test is simply the expected value computation for the protocol of the whole group.
1Joanna Morningstar
Between non communicating copies of your decision algorithm, it's forced that every instance comes to the same answers/distributions to all questions, as otherwise Eliezer can make money betting between different instances of the algorithm. It's not really a categorical imperative, beyond demanding consistency. The crux of the OP is asking for a probability assessment of the world, not whether the DT functions. I'm not postulating 1/n allocation of responsibility; I'm stating that the source of the confusion is over: P(A random individual is in a world of class A_i | Data) with P(A random world is of class A_i | Data) And that these are not equal if the number of individuals with access to Data are different in distinct classes of world. Hence in this case, there are 2 classes of world, A_1 with 18 Green rooms and 2 Reds, and A_2 with 2 Green rooms and 18 Reds. P(Random individual is in the A_1 class | Woke up in a green room) = 0.9 by anthropic update. P(Random world is in the A_1 class | Some individual woke up in a green room) = 0.5 Why? Because in A_1 there 18/20 individuals fit the description "Woke up in a green room", but in A_2 only 2/20 do. The crux of the OP is that neither a 90/10 nor 50/50 split seem acceptable, if betting on "Which world-class an individual in a Green room is in" and "Which world-class the (set of all individuals in Green rooms which contains this individual) is in" are identical. I assert that they are not. The first case is 0.9/0.1 A_1/A_2, the second is 0.5/0.5 A_1/A_2. Consider a similar question where a random Green room will be asked. If you're in that room, you update both on (Green walls) and (I'm being asked) and recover the 0.5/0.5, correctly. This is close to the OP as if we wildly assert that you and only you have free will and force the others, then you are special. Equally in cases where everyone is asked and plays separately, you have 18 or 2 times the benefits depending on whether you're in A_1 or A_2. If each in
And how would they decide which protocol had the best overall total gain? For instance, could you define a protocol complexity measure, and then use this complexity measure to decide? And are you even dealing with ordinary Bayesian reasoning any more, or is this the first hint of some new more general type of rationality? MJG - The Black Swan is Near!
It's not about complexity, it is just expected total gain. Simply the second calculation of the OP. I just argued, that the second calculation is right and that is what the agents should do in general. (unless they are completely egoistic for their special copies)
This was a simple situation. I'm suggesting a 'big picture' idea for the general case. According to Wei Dei and Nesov above, the anthropic-like puzzles can be re-interpreted as 'agent co-ordination' problems (multiple agents trying to coordinate their decision making). And you seemed to have a similiar interpretation. Am I right? If Dei and Nesov's interpretation is right, it seems the puzzles could be reinterpreted as being about groups of agents tring to agree in advance about a 'decision making protocol'. But now I ask is this not equivalent to trying to find a 'communication protocol' which enables them to best coordinate their decision making? And rather than trying to directly calculate the results of every possible protocol (which would be impractical for all but simple problems), I was suggesting trying to use information theory to apply a complexity measure to protocols, in order to rank them. Indeed I ask whether this is actually the correct way to interpret Occam's Razor/Complexity Priors? i.e, My suggestion is to re-interpret Occam/Priors as referring to copies of agents trying to co-ordinate their decision making using some communication protocol, such that they seek to minimize the complexity of this protocol.
"Hence if you wake up in a green room and you're asked to steal from the red rooms and give to the green rooms, you either commit a group of 2 of you to a loss of 52 or commit a group of 18 of you to a gain of 12." In the example you care equally about the red room and green room dwellers.
0Joanna Morningstar
Hence if there are 2 instances of your decision algorithm in Green rooms, there are 2 runs of your decision algorithm, and if they vote to steal there is a loss of 3 from each red and gain 1 for each green, for a total gain of 12-318 = - 52. If there are 18 instances in Green rooms, there are 18 runs of your decision algorithm, and if they vote to steal there is a loss of 3 from each red and a gain of 1for each green, for a total gain of 118-23 = 12 The "committal of a group of" is noting that there are 2 or 18 runs of your decision algorithm that are logically forced by the decision made this specific instance of the decision algorithm in a green room.

I think I'm with Bostrom.

The problem seems to come about because the good effects of 18 people being correct are more than wiped out by the bad effects of 2 people being wrong.

I'm sure this imbalance in the power of the agents has something to do with it.

What if, instead of requiring agreement of all copies in a green room, one copy in a green room was chosen at random to make the choice?
In this case the chosen copy in the green room should update on the anthropic evidence of being chosen to make the choice. That copy had a 1/18 probability of being chosen if the coin flip came up heads, and a 1/2 probability of being chosen if the coin flip came up tails, so the odds of heads:tails should be updated from 9:1 to 1:1. This exactly canceled the anthropic evidence of being in a green room.
... or equivalently: you play a separate game with every single copy in each green room... In both cases, the anthropic update gives the right solution as I mentioned in an earlier post. (And consequently, this demonstrates the the crux of the problem was in fact the collective nature of the decision.)
They are not equivalent. If one green room copy is chosen at random, then the game will be played exactly once whether the coin flip resulted in heads or tails. But if every green room copy plays, the the game will be played 18 times if the coin came up heads and 2 times if the coin came up tails.
Good point. However, being chosen for the game (since the agent knows that in both cases exactly one copy will be chosen) also carries information the same way as being in the green room. Therefore, (by the same logic) it would imply an additional anthropic update: "Although I am in a green groom, the fact that I am chosen to play the game makes it much less probable that the coin is head." So (by calculating the correct chances), he can deduce: I am in a green room + I am chosen => P(head)=0.5 OTOH: I am in a green room (not knowing whether chosen) => P(head)=0.9 [EDIT]: I just noted that you already argued the same way, I have plainly overlooked it.

I waited to comment on this, to see what others would say. Right now Psy-Kosh seems to be right about anthropics; Wei Dai seems to be right about UDT; timtyler seems to be right about Boltzmann brains; byrnema seems to be mostly right about pointers; but I don't understand why nobody latched on to the "reflective consistency" part. Surely the kind of consistency under observer-splitting that you describe is too strong a requirement in general: if two copies of you play a game, the correct behavior for both of them would be to try to win, regardle... (read more)

4Wei Dai
That doesn't make sense to me, unless you're assuming that the player isn't capable of self-modification. If it was, wouldn't it modify itself so that its copies won't try to win individually, but cooperate to obtain the outcome that it prefers before the copying?
Yes, that's right. I've shifted focus from correct program behavior to correct human behavior, because that's what everyone else here seems to be talking about. If the problem is about programs, there's no room for all this confusion in the first place. Just specify the inputs, outputs and goal function, then work out the optimal algorithm.
Unless the copies can modify themselves too.

You can't reject the conclusion that you are a Boltzmann brain - but if you are, it doesn't matter what you do, so the idea doesn't seem to have much impact on decision theory.

There are lots of ordinary examples in game theory of time inconsistent choices. Once you know how to resolve them, then if you can't use those approaches to resolve this I might be convinced that anthropic updating is at fault. But until then I think you are making a huge leap to blame anthropic updating for the time inconsistent choices.

8Wei Dai
Robin, you're jumping into the middle of a big extended discussion. We're not only blaming anthropic updating, we're blaming Bayesian updating in general, and proposing a decision theory without it (Updateless Decision Theory, or UDT). The application to anthropic reasoning is just that, an application. UDT seems to solve all cases of time inconsistency in decision problems with one agent. What UDT agents do in multi-player games is still an open problem that we're working on. There was an extensive discussion about it in the previous threads if you want to see some of the issues involved. But the key ingredient that is missing is a theory of logical uncertainty, that tells us how different agents (or more generally, computational processes) are logically correlated to each other.
The ordinary time inconsistencies in game theory are all regarding multiple agents. Seems odd to suggest you've solved the problem except for those cases.
3Wei Dai
I was referring to problems like Newcomb's Problem, Counterfactual Mugging, Sleeping Beauty, and Absentminded Driver.
1Eliezer Yudkowsky
Not exactly the way I would phrase it, but Timeless Decision Theory and Updateless Decision Theory between them have already killed off a sufficiently large number of time inconsistencies that treating any remaining ones as a Problem seems well justified. Yes, we have solved all ordinary dynamic inconsistencies of conventional game theory already!
Let's take the simple case of time inconsistency regarding punishment. There is a two stage game with two players. First A decides if to cheat B for some gain. Then B decides if to punish A at some cost. Before the game B would like to commit to punishing A if A cheats, but once A has already cheated, B would rather not punish.
2Wei Dai
In UDT, we blame this time inconsistency on B's updating on A having cheated (i.e. treating it as a fact that can no longer be altered). Suppose it's common knowledge that A can simulate or accurately predict B, then B should reason that by deciding to punish, it increases the probability that A would have predicted that B would punish and thus decreases the probability that A would have cheated. But the problem is not fully solved, because A could reason the same way, and decide to cheat no matter what it predicts that B does, in the expectation that B would predict this and see that it's pointless to punish. So UDT seems to eliminate time-inconsistency, but at the cost of increasing the number of possible outcomes, essentially turning games with sequential moves into games with simultaneous moves, with the attendant increase in the number of Nash equilibria. We're trying to work out what to do about this.
Er, turning games with sequential moves into games with simultaneous moves is standard in game theory, and "never cheat, always punish cheating" and "always cheat, never punish" are what are considered the Nash equilibria of that game in standard parlance. [ETA: Well, "never cheat, punish x% of the time" will also be a NE for large enough x.] It is subgame perfect equilibrium that rules out "never cheat, always punish cheating" (the set of all SPE of a sequential game is a subset of the set of all NE of that game).
2Wei Dai
Yeah, I used the wrong terminology in the grandparent comment. I guess the right way to put it is that SPE/backwards induction no longer seems reasonable under UDT and it's unclear what can take its place, as far as reducing the number of possible solutions to a given game.
How strictly do you (or the standard approach) mean to rule out options that aren't good on all parts of the game? It seems like sometimes you do want to do things that are subgame suboptimal. Edit: or at least be known to do things, which unfortunately can require actually being prepared to do the things.
Well, the classical game theorist would reply that they're studying one-off games, in which the game you're currently playing doesn't affect any payoff you get outside that game (otherwise that should be made part of the game), so you can't be doing the punishment because you want to be known to be a punisher, or the game that Robin specified doesn't model the situation you're in. The classical game theorist assumes you can't look into people's heads, so whatever you say or do before the cheating, you're always free to not punish during the punishment round (as you're undoubtedly aware, mutual checking of source code is prohibited by antitrust laws in over 185 countries). The classical game theorist would further point out that if you do want model that punishment helps you be known as a punisher, then you should use their theory of repeated games, where they have some folk theorems for you saying that lots and lots of things can be Nash equilibria e.g. in a game where after each round there is a fixed probability of another round; for example, cooperation in the prisoner's dilemma, but also all sorts of suboptimal outcomes (which become Nash equilibria because any deviator gets punished as badly as the other players can punish them). I should point out that not all classical game theorists think that SPE makes particularly good predictions, though; I've read someone say, I think Binmore, that you expect to virtually always see a NE in the laboratory after a learning period, but not an SPE, and that the original inventor of SPE actually came up with it as an example of what you would not expect to see in the lab, or something to that tune. (Sorry, I should really chase down that reference, but I don't have time right now. I'll try to remember to do that later. ETA: Ok, Binmore and Shaked, 2010: Experimental Economics: Where Next? Journal of Economic Behavior & Organization, 73: 87-100. See the stuff about backward induction, starting at the bottom on p.88. The inv
Interesting. This idea, used as an argument for SPE, seems to be the free will debate intruding into decision theory. "Only some of these algorithms have freedom, and others don't, and humans are free, so they should behave like the free algorithms." This either ignores, or accepts, the fact that the "free" algorithms are just as deterministic as the "unfree" algorithms. (And it depends on other stuff, but that's not the fun bit) :D
Hm, I may not quite have gotten the point across: I think you may be thinking of the argument that humans have free will, so they can't force future versions of themselves to do something that would be against that future version's given its information, but that isn't the argument I was trying to explain. The idea I was refering to works precisely the same way with deterministic algorithms, as long as the players only get to observe each others' actions, not each others' source (though of course its proponents don't think in those terms). The point is that if the other player looks at you severely and suggestively taps their baseball bat and tells you about how they've beaten up people who have defected in the past, that still doesn't mean that they're actually going to beat you up -- since if such threats were effective on you, then making them would be the smart thing to do even if the other player has no intention of actually beating you up (and risk going to jail) if for some reason you end up defecting. (Compare AI-in-the-box...) (Of course, this argument only works if you're reasonably sure that the other player is a classical game theorist; if you think you might be playing against someone who will, "irrationally", actually punish you, like a timeless decision theorist, then you should not defect, and they won't have to punish you...) Now, if you had actual information about what this player had done in similar situations in the past, like police reports of beaten-up defectors, this argument wouldn't work, but then (the standard argument continues) you have the wrong game-theoretical model; the correct model includes all of the punisher's previous interactions, and in that game, it might well be a SPE to punish. (Though only if the exact number of "rounds" is not certain, for the same reason as in the finitely iterated Prisoner's Dilemma: in the last round the punisher has no more reason to punish because there are no future targets to impress, so you defec
That is not what I was thinking of. Here, let me re-quote the whole sentence: The funny implication here is that if someone did look into your head, you would no longer be "free." Like a lightswitch :P And then if they erased their memory of what they saw, you're free again. Freedom on, freedom off. And though that is a fine idea to define, to mix it up with an algorithmic use of "freedom" seems to just be used to argue "by definition."
Ok, sorry I misread you. "Free" was just my word rather than part of the standard explanation, so alas we don't have anybody we can attribute that belief to :-)
0Eliezer Yudkowsky
(The difficulty arises if UDT B reasons logically that there should not logically exist any copies of its current decision process finding themselves in worlds where A is dependent on its own decision process, and yet A defects. I'm starting to think that this resembles the problem I talked about earlier, where you have to use Omega's probability distribution in order to agree to be Counterfactually Mugged on problems that Omega expects to have a high payoff. Namely, you may have to use A's logical uncertainty, rather than your own logical uncertainty, in order to perceive a copy of yourself inside A's counterfactual. This is a complicated issue and I may have to post about it in order to explain it properly.)
0Eliezer Yudkowsky
Drescher-Nesov-Dai UDT solves this (that is, goes ahead and punishes the cheater, making the same decision at both times). TDT can handle Parfit's Hitchhiker - pay for the ride, make the same decision at both times, because it forms the counterfactual "If I did not pay, I would not have gotten the ride". But TDT has difficulty with this particular case, since it implies that B's original belief that A would not cheat if punished, was wrong; and after updating on this new information, B may no longer have a motive to punish. (UDT of course does not update.) Since B's payoff can depend on B's complete strategy tree including decisions that would be made under other conditions, instead of just depending on the actual decision made under real conditions, this scenario is outside the realm where TDT is guaranteed to maximize.
The case is underspecified: * How transparent/translucent are the agents? I.e. can A examine B's sourcecode, or use observational and other data to assess B's decision procedure? If not, what is A's prior probability distribution for decision procedures B might be using? * Are both A and B using the same decision theory, TDT/UDT? Or is A using CDT and B using TDT/UDT or vice versa?
0Eliezer Yudkowsky
Clearly B has mistaken beliefs about either A or its own dispositions; otherwise B would not have dealt with A in the interaction where A ended up cheating. If B uses UDT (and hence will carry through punishments), and A uses any DT that correctly forecasts B's response to cheating, then A should not in fact cheat. If A cheats anyway, though, B still punishes. Actually, on further reflection, it's possible that B would reason that it is logically impossible for A to have the specified dependency on B's decision, and yet for A to still end up defecting, in which case even UDT might end up in trouble - it would be a transparent logical impossibility for A to defect if B's beliefs about A are true, so it's not clear that B would handle the event correctly. I'll have to think about this.
If there is some probability of A cheating even if B precommits to punishment, but with odds in B's favor, the situation where B needs to implement punishment is quite possible (expected). Likewise, if B precommiting to punish A is predicted to lead to an even worse outcome than not punishing (because of punishment expenses), UDT B won't punish A. Futhermore, a probability of cheating and not-punishment of cheating (mixed strategies, possibly on logical uncertainty to defy the laws of the game if pure strategies are required) is a mechanism through which the players can (consensually) bargain with each other in the resulting parallel game, an issue Wei Dai mentioned in the other reply. B doesn't need absolute certainty at any stage, in both cases. Also, in UDT there are no logical certainties, as it doesn't update on logical conclusions as well.
0Eliezer Yudkowsky
Sure, but that's the convenient setup. What if for A to cheat means that you necessarily just mistaken about which algorithm A runs? UDT will be logically certain about some things but not others. If UDT B "doesn't update" on its computation about what A will do in response to B, it's going to be in trouble.
A decision algorithm should never be mistaken, only uncertain. "Doesn't update" doesn't mean that it doesn't use the info (but you know that, so what do you mean?). A logical conclusion can be a parameter in a strategy, without making the algorithm unable to reason about what it would be like if the conclusion was different, that is basically about uncertainty of same algorithm in other states of knowledge.
Am I correct in assuming that if A cheats and is punished, A suffers a net loss?
0Wei Dai
What is the remaining Problem that you're referring to? Why can't we apply the formalism of UDT1 to the various examples people seem to be puzzled about and just get the answers out? Or is cousin_it right about the focus having shifted to how human beings ought to reason about these problems?
2Eliezer Yudkowsky
The anthropic problem was a remaining problem for TDT, although not UDT. UDT has its own problems, possibly. For example, in the Counterfactual Mugging, it seems that you want to be counterfactually mugged whenever Omega has a well-calibrated distribution and has a systematic policy of offering high-payoff CMs according to that distribution, even if your own prior has a different distribution. In other words, the key to the CM isn't your own distribution, it's Omega's. And it's not possible to interpret UDT as epistemic advice, which leaves anthropic questions open. So I haven't yet shifted to UDT outright. (The reason I did not answer your question earlier was that it seemed to require a response at greater length than the above.)
2Wei Dai
Hi, this is the 2-week reminder that you haven't posted your longer response yet. :)
2Wei Dai
Well, you're right in the sense that I can't understand the example you gave. (I waited a couple of days to see if it would become clear, but it didn't) But the rest of the response is helpful.
Did he ever get around to explaining this in more detail? I don't remember reading a reply to this, but I think I've just figured out the idea: Suppose you get word that Omega is coming to the neighbourhood and going to offer counterfactual muggings. What sort of algorithm do you want to self-modify into? You don't know what CMs Omega is going to offer; all you know is that it will offer odds according to its well-calibrated prior. Thus, it has higher expected utility to be a CM-accepter than a CM-rejecter, and even a CDT agent would want to self-modify. I don't think that's a problem for UDT, though. What UDT will compute when asked to pay is the expected utility under its prior of paying up when Omega asks it to; thus, the condition for UDT to pay up is NOT prior probability of heads * Omega's offered payoff > prior of tails * Omega's price but prior of (heads and Omega offers a CM for this coin) * payoff > prior of (tails and CM) * price. In other words, UDT takes the quality of Omega's predictions into account and acts as if updating on them (the same way you would update if Omega told you who it expects to win the next election, at 98% probability). CDT agents, as usual, will actually want to self-modify into a UDT agent whose prior equals the CDT agent's posterior [ETA: wait, sorry, no, they won't act as if they can acausally control other instances of the same program, but they will self-modify so as to make future instances of themselves (which obviously they control causally) act in a way that maximizes EU according to the agent's present posterior, and that's what we need here], and will use the second formula above accordingly -- they don't want to be a general CM-rejecter, but they think that they can do even better than being a general CM-accepter if they refuse to pay up if at the time of self-modification they assigned low probability to tails, even conditional on Omega offering them a CM.
0Wei Dai
He never explained further, and actually I still don't quite understand the example even given your explanation. Maybe you can reply directly to Eliezer's comment so he can see it in his inbox, and let us know if he still thinks it's a problem for UDT?
I'd look for it as logical theory of concurrency and interaction: "uncertainty" fuzzifies the question.
0Wei Dai
Why? For me, how different agents are logically correlated to each other seems to be the same type of question as "what probability (if any) should I assign to P!=NP?" Wouldn't the answer fall out of a general theory of logical uncertainty? (ETA: Or at least be illuminated by such a theory?)
Logic is already in some sense about uncertainty (e.g. you could interpret predicates as states of knowledge). When you add one more "uncertainty" of some breed, it leads to perversion of logic, usually of applied character and barren meaning. The concept of "probability" is suspect, I don't expect it to have foundational significance.
1Wei Dai
So what would you call a field that deals with how one ought to make bets involving P!=NP (i.e., mathematical statements that we can't prove to be true or false), if not "logical uncertainty"? Just "logic"? Wouldn't that cause confusion in others, since today it's usually understood that such questions are outside the realm of logic?
I don't understand how to make such bets, except in a way it's one of the kinds of human decision-making that can be explicated in terms of priors and utilities. The logic of this problem is in the process that works with the statement, which is in the domain of proof theory.

I think I'll have to sit and reread this a couple times, but my INITIAL thought is "Isn't the apparent inconsistancy here qualitatively similar to the situation with a counterfactual mugging?"

This is my reaction too. This is a decision involving Omega in which the right thing to do is not update based on new information. In decisions not involving Omega, you do want to update. It doesn't matter whether the new information is of an anthropic nature or not.
Yeah, thought about it a bit more, and still seems to be more akin to "paradox of counterfactual mugging" than "paradox of anthropic reasoning" To me, confusing bits of anthropic reasoning would more come into play via stuff like "aumann agreement theorem vs anthropic reasoning"

Huh. Reading this again, together with byrnema's pointer discussion and Psy-Kosh's non-anthropic reformulation...

It seems like the problem is that whether each person gets to make a decision depends on the evidence they think they have, in such a way to make that evidence meaningless. To construct an extreme example: The Antecedent Mugger gathers a billion people in a room together, and says:

"I challenge you to a game of wits! In this jar is a variable amount of coins, between $0 and $10,000. I will allow each of you to weigh the jar using this set of... (read more)

The problem here is that your billion people are for some reason giving the answer most likely to be correct rather than the answer most likely to actually be profitable. If they were a little more savvy, they could reason as follows: "The scales tell me that there's $6000 worth of coins in the jar, so it seems like a good idea to buy the jar. However, if I did not receive the largest weight estimate from the scales, my decision is irrelevant; and if I did receive the largest weight estimate, then conditioned on that it seems overwhelmingly likely that there are many fewer coins in the jar than I'd think based on that estimate -- and in that case, I ought to say no."
Ooh, and we can apply similar reasoning to the marble problem if we change it, in a seemingly isomorphic way, so that instead of making the trade based on all the responses of the people who saw a green marble, Psy-Kosh selects one of the green-marble-observers at random and considers that person's response (this should make no difference to the outcomes, assuming that the green-marblers can't give different responses due to no-spontaneous-symmetry-breaking and all that). Then, conditioning on drawing a green marble, person A infers a 9/10 probability that the bucket contained 18 green and 2 red marbles. However, if the bucket contains 18 green marbles, person A has a 1/18 chance of being randomly selected given that she drew a green marble, whereas if the bucket contains 2 green marbles, she has a 1/2 chance of being selected. So, conditioning on her response being the one that matters as well as the green marble itself, she infers a (9:1) * (1/18)/(1/2) = (9:9) odds ratio, that is probability 1/2 the bucket contains 18 green marbles. Which leaves us back at a kind of anthropic updating, except that this time it resolves the problem instead of introducing it!

isn't this a problem with the frequency you are presented with the opportunity to take the wager? [no, see edit]

the equation: (50% ((18 +$1) + (2 -$3))) + (50% ((18 -$3) + (2 +$1))) = -$20 neglects to take into account that you will be offered this wager nine times more often in conditions where you win than when you lose.

for example, the wager: "i will flip a fair coin and pay you $1 when it is heads and pay you -$2 when it is tails" is -EV in nature. however if a conditional is added where you will be asked if you want to take the bet 9... (read more)

some other thoughts. the paradox exists because you cannot precommit yourself to taking the wager given you are in a green room as this commits you to taking the wager on 100% of coinflips which is terrible for you. when you find yourself in a green room, the right play IS to take the wager. however, you can't make the right play without committing yourself to making the wrong play in every universe where the coin comes up tails. you are basically screwing your parallel selves over because half of them exist in a 'tails' reality. it seems like factoring in your parallel expectation cancels out the ev shift of adjusting you prior (50%) probability to 90%. and if you don't care about your parallel selves, you can just think of them as the components that average to your true expectation in any given situation. if the overall effect across all possible universes was negative, it was a bad play even if it helped you in this universe. metaphysical hindsight.

If the many worlds interpretation of quantum mechanics is true isn't anthropic reasoning involved in making predictions about the future of quantum systems. There exists some world in which, from the moment this comment is posted onward, all attempts to detect quantum indeterminacy fail, all two-slit experiments yield two distinct lines instead of a wave pattern etc. Without anthropic reasoning we have no reason to find this result at all surprising. So either we need to reject anthropic reasoning or we need to reject the predictive value of quantum mechan... (read more)

Basic QM seems to say that probability is ontologically basic. In a collapse point of view, it's what we usually think of as probability that shows up in decision theory. In MWI, both events happen. But you could talk about usual probability either way. ("classical probability is a degenerate form of quantum probability" with or without collapse) Anthropics is about the interaction of probability with the number of observers. Replacing usual probability with QM doesn't seem to me to make a difference. Quantum suicide is a kind of anthropics, but it's not clear to me in what sense it's really quantum. It's mainly about rejecting the claim that the Born probabities are ontologically basic, that they measure how real an outcome is.
But in MWI isn't the observed probability of some quantum state just the fraction of worlds in which an observer would detect that quantum state? As such, doesn't keeping the probabilities of quantum events as QM predicts require that "one should reason as if one were a random sample from the set of all observers in one’s reference class" (from a Nick Bostrom piece). The reason we think our theory of QM is right is that we think our branch in the multi-verse didn't get cursed with an unrepresentative set of observed phenomena. Wouldn't a branch in the multi-verse that observed quantum events in which values were systematically distorted (by random chance) come up with slightly different equations to describe quantum mechanics? If so, what reason do we have to think that our equations are correct if we don't consider our observations to be similar to the observations made in other possible worlds?
It's not just world counting... (Although Robin Hanson's Mangled World's idea does suggest a way that it may turn out to amount to world counting after all) essentially one has to integrate the squared modulus of quantum amplitude over a world. This is proportional to the subjective probability of experiencing that world. Yes... that it isn't simple world counting does seem to be a problem. This is something that we, or at least I, am confused about.
Thanks. Good to know. I don't suppose you can explain why it works that way?
As I said, that's something I'm confused about, and apparently others are as well. We've got the linear rules for how quantum amplitude flows over configuration space, then we've got this "oh, by the way, the subjective probability of experiencing any chunk of reality is proportional to the square of the absolute value" rule. There're a few ideas out there, but...
Would you expand and sharpen your point? Woit comes to mind. At one point you claim, possibly based on MWI, that "there is some world in which ...". As far as I can tell, the specifics of the scenario shouldn't have anything to do with the correctness of your argument. This is how I would paraphrase your comment: 1. According to MWI, there exists some world in which unlikely things happen. 2. We find this surprising. 3. Anthropic reasoning is necessary to conclude 2. 4. Anthropic reasoning is involved in making predictions about quantum systems. In step 2: Who is the "we"? What is the "this"? Why do we find it surprising? In step 3: What do you mean by "anthropic reasoning"? In general, it is pretty hard metareasoning to conclude that a reasoning step or maneuver is necessary for a conclusion.
We don't need anthropic reasoning under MWI in order to be surprised when finding ourselves in worlds in which unlikely things happen so much as we need anthropic reasoning to conclude that an unlikely thing has happened. And our ability to conclude that an unlikely thing has happened is needed to accept quantum mechanics as a successful scientific theory. "We" is the set of observers in the worlds where events, declared to be unlikely by quantum mechanics actually happen. An observer is any physical system with a particular kind of causal relation to quantum states such that the physical system can record information about quantum states and use the information to come up with methods of predicting the probability of previously unobserved quantum processes (or something, but if we can't come up with a definition of observer then we shouldn't be talking about anthropic reasoning anyway). 1. According to MWI, the (quantum) probability of a quantum state is defined as the fraction of worlds in which that state occurs. 2. The only way an observer somewhere in the multi-verse can trust the observations used that confirm quantum mechanics probabilistic interpretations is if they reason as if they were a random sample from the set of all observers in the multi-verse (one articulation of anthropic reasoning) because if they can't do that then they have no reason to think their observations aren't wrong in a systematic way. 3. An observer's reason for believing the standard model of QM to be true the first place is that they can predict atomic and subatomic particles behaving according a probabilistic wave-function. 4. Observers lose their reason for trusting QM in the first place if they accept the MWI AND are prohibited reason anthropically. In other words If MWI is likely, then QM is likely iff AR is acceptable. I think one could write a different version of this argument by referencing expected surprise at discovering sudden changes in quantum probabilities (whi
Can I paraphrase what you just said as: "If many-worlds is true, then all evidence is anthropic evidence"
I hadn't come to that conclusion until you said it... but yes, that is about right. I'm not sure I would say all evidence is anthropic- I would prefer saying that all updating involves a step of anthropic reasoning. I make that hedge just because I don't know that direct sensory information is anthropic evidence, just that making good updates with that sensory information is going to involve (implicit) anthropic reasoning.

“You generalise probability, when anthropics are involved, to probability-2, and say a number defined by probability-2; so I’ll suggest to you a reward structure that rewards agents that say probability-1 numbers. Huh, if you still say the probability-2 number, you lose”.

This reads to me like, “You say there’s 70% chance no one will be around that falling tree to hear it, so you’re 70% sure there won’t be any sound. But I want to bet sound is much more likely; we can get measure the sound waves, and I’m 95% sure our equipment will register the sound. Wanna bet?”

I think this is a confusion of two different types of thinking. One is the classical thought of one being responsible only for the consequences of one's individual actions. If you think of yourself as an individual making independent decisions like this, then you are justified in thinking that there is a 90% chance of heads upon seeing a green room: 90% of individuals in green rooms, in expectation, are there when the coin flips heads.(note that if you modify the problem so that the outcomes of the bet only apply to the people making it, the bet becomes fa... (read more)

Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves. Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves. We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50". If causal decision agents can win on the problem "If everyon

... (read more)

The reason we shouldn't update on the "room color" evidence has nothing to do with the fact that it constitutes anthropic evidence. The reason we shouldn't update is that we're told, albeit indirectly, that we shouldn't update (because if we do then some of our copies will update differently and we will be penalized for our disagreement).

In the real world, there is no incentive for all the copies of ourselves in all universes to agree, so it's all right to update on anthropic evidence.

[comment deleted]

Oops... my usual mistake of equivocating different things and evolving the problem until it barely resembles the original. I will update my "solution" later if it still works for the original.

... Sigh. Won't work. My previous "solution" recovered the correct answer of -20 because I bent the rules enough to have each of my green-room-deciders make a global rather than anthropic calculation.

Thinking about how all the green-room people come to the wrong conclusion makes my brain hurt. But I suppose, finally, it is true. They cannot base their decision on their subjective experience, and here I'll outline some thoughts I've had as to under what conditions they should know they cannot do so. Suppose there are 20 people (Amy, Benny, Carrie, Donny, ...) and this experiment is done as described. If we always ask Tony (the 20th person) whether or not to say "yes", and he bases his decision on whether or not he is in a green room, then the expected value of his decision really is $5.6. Tony here is a special, singled out "decider". One way of looking at this situation is that the 'yes' depends on some information in the system (that is, whether or not Tony was in a green room.) If instead we say that the decider can be anyone, and in fact we choose the decider after the assortment into rooms as someone in a green room, then we are not really given any information about the system. It is the difference between (a) picking a person, and seeing if they wake up in a green room, and (b) picking a person that is in a green room. (I know you are well aware of this difference, but it helps to spell it out.) You can't pick the deciders from a set with a prespecified outcome. It's a pointer problem: You can learn about the system from the change of state from Tony to Tony (Tony: no room -->Tony: green room), but you can't assign* the star after the assignment (pick someone in a green room and ask them). When a person wakes in a green room and is asked, they should say 'yes' if they are randomly chosen to be asked independently of their room color. If they were chosen after the assignment, because they awoke in a green room, they should recognize this as the “unfixed pointer problem” (a special kind of selection bias). Avoiding the pointer problem is straight-forward. The people who wake in red rooms have a posterior probability of heads as 10%. The people who wake
0Eliezer Yudkowsky
The rest of your reply makes sense to me, but can I ask you to amplify on this? Maybe I'm being naive, but to me, a 90% probability is a 90% probability and I use it in all my strategic choices. At least that's what I started out thinking. Now you've just shown that a decision process won't want to strategically condition on this "90% probability", because it always ends up as "90% probability" regardless of the true state of affairs, and so is not strategically informative to green agents - even if the probability seems well-calibrated in the sense that, looking over impossible possible worlds, green agents who say "90%" are correct 9 times out of 10. This seems like a conflict between an anthropic sense of probability (relative frequency in a population of observers) and a strategic sense of probability (summarizing information that is to be used to make decisions), or something along those lines. Is this where you're pointing toward by saying that a posterior probability is meaningful at some times but not others?
The 90% probability is generally strategically informative to green agents. They may legitimately point to themselves for information about the world, but in this specific case, there is confusion about who is doing the pointing. When you think about a problem anthropically, you yourself are the pointer (the thing you are observing before and after to make an observation) and you assign yourself as the pointer. This is going to be strategically sound in all cases in which you don't change as the pointer before and after an observation. (A pretty normal condition. Exceptions would be experiments in which you try to determine the probability that a certain activity is fatal to yourself -- you will never be able to figure out the probability that you will die of your shrimp allergy by repeated trials of consuming shrimp, as it will become increasingly skewed towards lower and lower values.) Likewise, if I am in the experiment described in the post and I awaken in a green room I should answer "yes" to your question if I determine that you asked me randomly. That is, that you would have asked me even if I woke in a red room. In which case my anthropic observation that there is a 90% probability that heads was flipped is quite sound, as usual. On the other hand, if you ask me only if I wake in a green room, then you wouldn’t have asked “me” if I awoke in a red room. (So I must realize this isn’t really about me assigning myself as a pointer, because “me” doesn’t change depending on what room I wake up in.) It's strange and requires some mental gymnastics for me to understand that you Eliezer are picking the pointer in this case, even though you are asking me about my anthropic observation, for which I would usually expect to assign myself as the pointer. So for me this is a pointer/biased-observation problem. But the anthropic problem is related, because we as humans cannot ask about the probability of currently observed events based on the frequency of observations w
1Eliezer Yudkowsky
Huh. Very interesting again. So in other words, the probability that I would use for myself, is not the probability that I should be using to answer questions from this decision process, because the decision process is using a different kind of pointer than my me-ness? How would one formalize this? Bostrom's division-of-responsibility principle?
I haven't had time to read this, but it looks possibly relevant (it talks about the importance of whether an observation point is fixed in advance or not) and also possibly interesting, as it compares Bayesian and frequentist views. I will read it when I have time later... or anyone else is welcome to if they have time/interest.
What I got out of the article above, since I skipped all the technical math, was that frequentists consider "the pointer problem" (i.e., just your usual selection bias) as something that needs correction while Bayesians don't correct in these cases. The author concludes (I trust, via some kind of argument) that Bayesian's don't need to correct if they choose the posteriors carefully enough. I now see that I was being entirely consistent with my role as the resident frequentist when I identified this as a "pointer problem" problem (which it is) but that doesn't mean the problem can't be pushed through without correction* -- the Bayesian way -- by carefully considering the priors. *"Requiring correction" then might be a euphemism for time-dependent, while a preference for an updateless decision theory is a good Bayesian quality. A quality, by the way, a frequentist can appreciate as well, so this might be a point of contact on which to win frequentists over.

Before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% ((18 +$1) + (2 -$3))) + (50% ((18 -$3) + (2 +$1))) = -$20

This assumes that the question is asked only once, but then, to which of the 20 copies will it be asked?

If all 20 copies get asked the same question (or equivalently if a single copy chosen at random is) then the utility is (50% 18/20 ((18 +$1) + (2 -$3))) + (50% 2/20 ((18 -$3) + (2 +$1))) = 2.8$ = 50% * 5.6$.

Consider th... (read more)

Every copy that is in a green room is asked the question (so either 2 or 18 copies total are asked). If all answer Play, we play. If all answer Don't Play, we don't. In any other case we fine all 20 copies some huge amount; this is intended to make them agree beforehand on what answer to give. (This is reworded from the OP.) For your other thought experiment - if there aren't actual N copies being asked the question, then there's no dilemma; you (the only copy) simply update on the evidence available (that the room is green). So yes, the original problem requires copies being asked in parallel to introduce the possibility that you're hurting other copies of yourself by giving a self-serving answer. Whereas if you're the only copy, you always give a self-serving answer, i.e. play only if the room is green.
Every copy that is in a green room is asked the question (so either 2 or 18 copies total are asked). If all answer Play, we play. If all answer Don't Play, we don't. In any other case we fine all 20 copies some huge amount; this is intended to make them agree beforehand on what answer to give. (This is reworded from the OP.) For your other thought experiment - if there aren't actual N copies being asked the question, then there's no dilemma; you (the only copy) simply update on the evidence available (that the room is green). So yes, the original problem requires copies being asked in parallel to introduce the possibility that you're hurting other copies of yourself by giving a self-serving answer. Whereas if you're the only copy, you always give a self-serving answer, i.e. play only if the room is green.

I keep having trouble thinking of probabilities when I'm to be copied and >=1 of "me" will see red and >=1 of "me" will see green. My thought is that it is 100% likely that "I" will see red and know there are others, once-mes, who see green, and 100% likely vice-versa. Waking up to see red (green) is exactly the expected result.

I do not know what to make of this opinion of mine. It's as if my definition of self - or choice of body - is in superposition. Am I committing an error here? Suggestions for further reading would be appreciated.

[This comment is no longer endorsed by its author]Reply

I remain convinced that the probability is 90%.

The confusion is over whether you want to maximize the expectation of the number of utilons there will be if you wake up in a green room or the expectation of the number of utilons you will observe if you wake up in a green room.


The notion of "I am a bolzmann brain" goes away when you conclude that conscious experience is a Tegmark-4 thing, and that equivalent conscious experiences are mathematically equal and therefore there is no difference and you are at the same time a human being and a bolzmann brain, at least until they diverge.

Thus, antrhopic reasoning is right out.

Well, by the same token "What I experience represents what I think it does / I am not a Boltzmann brain which may dwindle out of existence in an instance" would go right out, just the same. This kind of reasoning reduces to something similar to quantum suicide. The point at which your conscious experience is expected to diverge, even if you take that perspective, does kind of matter. The different paths and their probabilistic weights which govern the divergence alter your expected experience, after all. Or am I misunderstanding?
I am not sure. Let met try to clarify. By virtue of existential quantification in a ZF equivalent set theory, we can have anything. In an arbitrary encoding format, I now by existential quantfication select a set which is the momentary subjective experience of being me as I write this post, e.g. memory sensations, existential sensations, sensory input, etc. It is a mathematical object. I can choose it's representation format independent of any computational medium I might use to implement it. I just so happens that there is a brain in the universe we are in, which is implementing this matematical object. Brains are computers that compute conscious experiences. They no more have bearing on the mathematical objects they implement than a modern computer has on the definition of conways game of life. Does that clarify it?
Which is why we're still highly invested in the question whether (whatever it is that generates our conscious experience) will "stay around" and continue with our pattern in an expected manner. Let's say we identify with only the mathematical object, not the representation format at all. That doesn't excuse us from anthropic reasoning, or from a personal investment in reasoning about the implementing "hardware". We'd still be highly invested in the question, even as 'mathematical objects'. We probably still care about being continually instantiated. The shift in perspective you suggest doesn't take away from that (and adds what could be construed as a flavor of dualism).
Hmmm. I will have to mull on that, but let me leave with a mote of explanation: The reasoning strategy I used to arrive at this conclusion was similar to the one used in concluding that "every possible human exists in paralell universes, so we need not make more humans, but more humans feeling good."
Doesn't every possible human-feeling-good also exist in parallel universes? (And if you argue that although they exist you can increase their measure, that applies to the every-possible-human version as well.)
Sure, but I will quote Karkat Vantas on time-travel shenanigans from Andrew Hussie's Homestuck

Whoohoo! I just figured out the correct way to handle this problem, that renders the global and egocentric/internal reflections consistent.

We will see if my solution makes sense in the morning, but the upshot is that there was/is nothing wrong with the green roomer's posterior, as many people have been correctly defending. The green roomer who computed an EV of $5.60 modeled the money pay-off scheme wrong.

In the incorrect calculation that yields $5.6 EV, the green roomer models himself as winning (getting the favorable +$12) when he is right and losing (... (read more)

This is my attempt at a pedagogical exposition of “the solution”. It’s overly long, and I've lost perspective completely about what is understood by the group here and what isn't. But since I've written up this solution for myself, I'll go ahead and share it. The cases I'm describing below are altered from the OP so that they completely non-metaphysical, in the sense that you could implement them in real life with real people. Thus there is an objective reality regarding whether money is collectively lost or won, so there is finally no ambiguity about what the correct calculation actually is. Suppose that there are twenty different graduate students {Amy, Betty, Cindy, ..., Tony} and two hotels connected by a breezeway. Hotel Green has 18 green rooms and 2 red rooms. Hotel Red has 18 red rooms and 2 green rooms. Every night for many years, students will be assigned a room in either Hotel Green or Hotel Red depending on a coin flip (heads --> Hotel Green for the night, tails --> Hotel Red for the night). Students won’t know what hotel they are in but can see their own room color only. If a student sees a green room, that student correctly deduces they are in Hotel Green with 90% probability. Case 1: Suppose that every morning, Tony is allowed to bet that he is in a green room. If he bets ‘yes’ and is correct, he pockets $12. If he bets ‘yes’ and is wrong, he has to pay $52. (In other words, his payoff for a correct vote is $12, the payoff for a wrong vote is -$52.) What is the expected value of his betting if he always says ‘yes’ if he is in a green room? For every 20 times that Tony says ‘yes’, he wins 18 times (wins $12x18) and he loses twice (loses $52x2), consistent with his posterior. One average he wins $5.60 per bet , or $2.80 per night. (He says “yes” to the bet 1 out of every 2 nights, because that is the frequency with which he finds himself in a green room.) This is a steady money pump in the student’s favor. The correct calculation for Case 1 is: av
I believe both of your computations are correct, and the fallacy lies in mixing up the payoff for the group with the payoff for the individual - which the frame of the problem as posed does suggest, with multiple identities that are actually the same person. More precisely, the probabilities for the individual are 90/10 , but the probabilities for the groups are 50/50, and if you compute payoffs for the group (+$12/-$52), you need to use the group probabilities. (It would be different if the narrator ("I") offered the guinea pig ("you") the $12/$52 odds individually.) byrnema looked at the result from the group viewpoint; you get the same result when you approach it from the individual viewpoint, if done correctly, as follows: For a single person, the correct payoff is not $12 vs. -$52, but rather ($1 minus $6/18 to reimburse the reds, making $0.67) 90% and ($1 minus $54/2 = -$26) 10%, so each of the copies of the guinea pig is going to be out of pocket by 2/3 0.9 + (-26) 0.1 = 0.6 - 2.6 = -2, on average. The fallacy of Eliezer's guinea pigs is that each of them thinks they get the $18 each time, which means that the 18 goes into his computation twice (squared) for their winnings (18 * 18/20). This is not a problem with antropic reasoning, but with statistics. A distrustful individual would ask themselves, "what is the narrator getting out of it", and realize that the narrator will see the -$12 / + $52 outcome, not the guinea pig - and that to the narrator, the 50/50 probability applies. Don't mix them up!
It was 3:30 in the morning just a short while ago, and I woke up with a bunch of non-sensical ideas about the properties of this problem, and then while I was trying to get back to sleep I realized that one of the ideas made sense. Evidence that understanding this problem for myself required a right-brain reboot. I'm not surprised about the reboot: I've been thinking about this problem a lot, which signals to my brain that it's important, and it literally hurt my brain to think about why the green roomers were losing for the group when they thought they were winning, strongly suggesting I was hitting my apologist limit.

In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips." I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.

I would perhaps prefer to use diff... (read more)

Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'. (Should they disagree on their answers, I will destroy 5 paperclips.)" Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet. But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the

... (read more)
Red Clippy doesn't get a vote.

Can someone come up with a situation of the same general form as this one where anthropic reasoning results in optimal actions and nonanthropic reasoning results in suboptimal actions?

How about if the wager is that anybody in any room can guess the outcome of the coinflip, and if they get it right they win 1$ and if they get it wrong they lose 2$? If you still think it's 50% after waking up in a green room, you won't take the bet, and you'll win 0$, if you think it's 90% you'll take the bet and come out 14$ ahead on balance, with two of you losing 2$ each and 18 of you getting $1. Doesn't this show anthropic reasoning is right as much as the OP shows it's wrong?

I think you're missing a term in your second calculation. And why are anthropism and copies of you necessary for this puzzle. I suspect the answer will indicate something I'm completely missing about this series.

Take this for straight-up probability:

I have two jars of marbles, one with 18 green and 2 red, the other with 18 red and two green. Pick one jar at random, then look at one marble from that jar at random.

If you pick green, what's the chance that your jar is mostly green? I say 90%, by fairly straightforward application of bayes' rule.

I offer... (read more)

0Eliezer Yudkowsky
The problem is that we aren't asking one randomly selected person, we're asking all of the green ones (they have to agree unanimously for the Yes vote to go through).
Ah, I see. You're asking all the green ones, but only paying each pod once. This feels like reverse-weighting the payout, so it should still be -EV even after waking up, but I haven't quite worked out a way to include that in the numbers...
The second sum still seems wrong. Here it is: "However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% ((18 +$1) + (2 -$3))) + (50% ((18 -$3) + (2 +$1))) = -$20. You want your future selves to reply 'No' under these conditions." The sum given is the one you would perform if you did not know which room you woke up in. Surely a different sum is appropriate with the additional evidence that you awoke in a green room. Incidentally, this problem seems far too complicated! I feel like the programmer faced with a bug report which failed to find some simple code that nontheless manages to reproduce the problem. Simplify, simplify, simplify!

In this comment:


I put forward my view that the best solution is to just maximize total utility, which correctly handles the forcing anthropics case, and expressed curiosity as to whether it would handle the outlawing anthropics case.

It now seems my solution does correctly handle the outlawing anthropics case, which would seem to be a data point in its favor.

Maximizing total hedonic utility fails the outlawing anthropics case: substitute hedons for paperclips.
I don't think I understand your claim here. We agree that my solution works if you measure utility in paperclips? Why do you think it fails if you measure utility in hedons?

Assume that each agent has his own game (that is one game for each agent). That is there are overall 18 (or 2) games (depending the result of the coin flip.)

Then the first calculation would be correct in every respect, and it makes sense to say yes from a global point of view. (And also with any other reward matrix, the dynamic update would be consistent with the apriori decision all the time)

This shows that the error made by the agent was to implicitely assume that he has his own game.

How about give all of your potential clones a vote, even though you can't communicate?

So, in one case, 18 of you would say "Yes, take the bet!" and 2 would say "No, let me keep my money." In the other case, 18 would say no and two would say yes. In either case, of course, you're one of the ones who would vote yes. OK, that leaves us tied. So why not let everyone's vote be proportional to what they stand to gain/lose? That leaves us with 20 -3 vs. 20 1. Don't take the bet.

(Yes, I realize half the people that just voted above don't exist. We just don't know which half...)

As it's been pointed out, this is not an anthropic problem, however there still is a paradox. I'm may be stating the obvious, but the root of the problem is that you're doing something fishy when you say that the other people will think the same way and that your decision will theirs.

The proper way to make a decision is to have a probability distribution on the code of the other agents (which will include their prior on your code). From this I believe (but can't prove) that you will take the correct course of action.

Newcomb like problem fall in the same category, the trick is that there is always a belief about someone's decision making hidden in the problem.


[EDIT:] Warning: This post was based on a misunderstanding of the OP. Thanks orthonormal for pointing out the the mistake! I leave this post here so that the replies stay in context.

I think that decision matrix of the agent waking up in green room is not complete: it should contain the outcome of losing $50 if the answers are not consistent.

Therefore, it would compute that even if the probability of the coin was flipped to 1 is 90%, it still does not make sense to answer "yes" since two other copies would answer "no" and therefore the ... (read more)

The copies in red rooms don't get to vote in this setup.
Thanks for pointing that out. Now I understand the problem. However, I still think that the mistake made by the agent is the implicit assumption the he is the only one influencing the outcome. Since all of the copies assume that they solely decide the outcome, they overestimate the reward after the anthropic update (each of the copies claim the whole reward for his decision, although the decision is collective and each vote is necessary).
By the way, please don't delete a comment if you change your mind or realize an error; it makes the conversation difficult for others to read. You can always put in an edit (and mark it as such) if you want. I'd only delete one of my comments if I felt that its presence actually harmed readers, and that there was no disclaimer I could add that would prevent that harm.
OK, sorry. (In this special case, I remember thinking that your remark was perfectly understandable even without the context.)

EDIT: at first I thought this was equivalent, but then I tried the numbers and realized it's not.

  1. I'll flip a coin to choose which roulette wheel to spin. If it comes up heads, I'll spin a wheel that's 90% green and 10% red. If it comes up tails, a wheel that's 10% green and 90% red.
  2. I won't show you the wheel or the coin (at this point) but I'll tell you which color came up.
  3. If it's green, you can bet on the coinflip: win $3 for heads and lose $13 for tails.

If the color is green, do you take the bet?

EDIT: After playing with the numbers, I think reaso... (read more)

Perhaps we should look at Dresher's Cartesian Camcorder as a way of reducing consciousness, and thereby eliminate this paradox.

Or, to turn it around, this paradox is a litmus test for theories of consciousness.

The more I think about this, the more I suspect that the problem lies in the distinction between quantum and logical coin-flips.

Suppose this experiment is carried out with a quantum coin-flip. Then, under many-worlds, both outcomes are realized in different branches. There are 40 future selves--2 red and 18 green in one world, 18 red and 2 green in the other world--and your duty is clear:

(50% ((18 +$1) + (2 -$3))) + (50% ((18 -$3) + (2 +$1))) = -$20.

Don't take the bet.

So why Eliezer's insistence on using a logical coin-flip? Because, I suspect,... (read more)

Is there any version of this post that doesn't involve technologies that we don't have? If not, then might the resolution to this paradox be that the copying technology assumed to exist can't exist because if it did it would give rise to a logical inconsistency.

Cute. You may be able to translate into the language of "wake, query, induce amnesia" - many copies would correspond to many wakings.
No, the dilemma depends on having many copies. You're trying to optimize the outcome averaged over all copies (before the copies are made), because you don't know which copy "you" will "be". In the no-copies / amnesia version, the updateless approach is clearly correct. You have no data to update on - awakening in a green room tells you nothing about the coin tosses because either way you'd wake up in a green room at least once (and you forget about it, so you don't know how many times it happened). Therefore you will always refuse to play.
No, the dilemma depends on having many copies. You're trying to optimize the outcome averaged over all copies (before the copies are made), because you don't know which copy "you" will "be". In the no-copies / amnesia version, the updateless approach is ovbiously correct. You have no data to update on (you don't know how many times you've woken and forgotten about it), so you always refuse to play, even in a green room. IOW: awakening in a green room tells you nothing about the coin tosses, since either way you'd awake in a green room at least once.
But we don't have the type of amnesia drugs required to manifest the Sleeping Beauty problem, and perhaps there is something about consciousness that would prevent them from ever being created. (Isn't there some law of physics that precludes the total destruction of information.)
I don't understand - what type of amnesia drug is required? For example, this lab: http://memory.psy.cmu.edu/ apparently routinely does experiments induce temporary amnesia using a drug called midalozam. In general, I was under the impression that a wide variety of drugs have side effects of various degrees and kinds of amnesia, including both anterograde and retrograde. Your proposal that consciousness might be conserved, and moreover that this might be proved by armchair reasoning seems a bit farfetched. Are you: 1. just speculating idly? 2. seriously pursuing this hypothesis as the best avenue towards resolving EY's puzzle? 3. pursuing some crypto-religious (i.e. "consciousness conserved"=>"eternal life") agenda?
My first comment was (2) the second (1). If DanArmk's comment is correct then it isn't important for my original comment whether there exists amnesia drugs. If your post is correct then my second comment is incorrect.
Microscopic reversibility prohibits any destruction of the information necessary to run things backwards - and that's all the information in the universe as far as we know.

Edit: presumably there's an answer already discussed that I'm not aware of, probably common to all games where Omega creates N copies of you. (Since so many of them have been discussed here.) Can someone please point me to it?

I'm having difficulties ignoring the inherent value of having N copies of you created. The scenario assumes that the copies go on existing after the game, and that they each have the same amount of utilons as the original (instead of a division of some kind).

For suppose the copies are short lived: Omega destroys them after the game.... (read more)

Um, you get copied N times regardless of your choice, so the utility of being copied shouldn't factor into your choice. I'm afraid I don't understand your objection.