Outlawing Anthropics: An Updateless Dilemma

Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself.  (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.)

Then I attempt to force a reflective inconsistency in your decision system, as follows:

I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room.  If they all reply "Yes", I will do so.

(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen.  Very little rescaling should be necessary.)

(Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves.  Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves.  We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50".  If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this.  If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.)

Suppose that you wake up in a green room.  You reason, "With 90% probability, there are 18 of me in green rooms and 2 of me in red rooms; with 10% probability, there are 2 of me in green rooms and 18 of me in red rooms.  Since I'm altruistic enough to at least care about my xerox-siblings, I calculate the expected utility of replying 'Yes' as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60."  You reply yes.

However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20.  You want your future selves to reply 'No' under these conditions.

This is a dynamic inconsistency - different answers at different times - which argues that decision systems which update on anthropic evidence will self-modify not to update probabilities on anthropic evidence.

I originally thought, on first formulating this problem, that it had to do with double-counting the utilons gained by your variable numbers of green friends, and the probability of being one of your green friends.

However, the problem also works if we care about paperclips.  No selfishness, no altruism, just paperclips.

Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'.  (Should they disagree on their answers, I will destroy 5 paperclips.)"  Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet.  But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90% * +1 paperclip) + (10% * -3 paperclips)) = +0.6 paperclips.

This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".

Well, that's not too disturbing, is it?  I mean, the whole anthropic thing seemed very confused to begin with - full of notions about "consciousness" and "reality" and "identity" and "reference classes" and other poorly defined terms.  Just throw out anthropic reasoning, and you won't have to bother.

When I explained this problem to Marcello, he said, "Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning", which is a fascinating sort of reply.  And I responded, "But when you have a problem this confusing, and you find yourself wanting to build an AI that just doesn't use anthropic reasoning to begin with, maybe that implies that the correct resolution involves us not using anthropic reasoning either."

So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains.  QED.

In general, I find the sort of argument given here - that a certain type of decision system is not reflectively consistent - to be pretty damned compelling.  But I also find the Boltzmann conclusion to be, ahem, more than ordinarily unpalatable.

In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips."  I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.

EDIT:  On further reflection, I also wouldn't want to build an AI that concluded it was a Boltzmann brain!  Is there a form of inference which rejects this conclusion without relying on any reasoning about subjectivity?

EDIT2:  Psy-Kosh has converted this into a non-anthropic problem!

194 comments, sorted by
magical algorithm
Highlighting new comments since Today at 4:54 PM
Select new highlight date
Moderation Guidelines: Reign of Terror - I delete anything I judge to be annoying or counterproductiveexpand_more

Actually... how is this an anthropic situation AT ALL?

I mean, wouldn't it be equivalent to, say, gather 20 rational people (That understand PD, etc etc etc, and can certainly manage to agree to coordinate with each other) that are allowed to meet with each other in advance and discuss the situation...

I show up and tell them that I have two buckets of marbles, some of which are green, some of which are red

One bucket has 18 green and 2 red, and the other bucket has 18 red and 2 green.

I will (already have) flipped a logical coin. Depending on the outcome, I will use either one bucket or the other.

After having an opportunity to discuss strategy, they will be allowed to reach into the bucket without looking, pull out a marble, look at it, then, if it's green choose if to pay and steal, etc etc etc. (in case it's not obvious, the payout rules being equivalent to the OP)

As near as I can determine, this situation is entirely equivalent to the OP and is in no way an anthropic one. If the OP actually is an argument against anthropic updates in the presence of logical uncertainty... then it's actually an argument against the general case of Bayesian updating in the presence of logical uncertainty, even when there's no anthropic stuff going on at all!

EDIT: oh, in case it's not obvious, marbles are not replaced after being drawn from the bucket.

That uncertainty is logical seems to be irrelevant here.

Agreed. But I seem to recall seeing some comments about distinguishing between quantum and logical uncertainty, etc etc, so figured may as well say that it at least is equivalent given that it's the same type of uncertainty as in the original problem and so on...

Right, and this is a perspective very close to intuition for UDT: you consider different instances of yourself at different times as separate decision-makers that all share the common agenda ("global strategy"), coordinated "off-stage", and implement it without change depending on circumstances they encounter in each particular situation. The "off-stageness" of coordination is more naturally described by TDT, which allows considering different agents as UDT-instances of the same strategy, but the precise way in which it happens remains magic.

Nesov, the reason why I regard Dai's formulation of UDT as such a significant improvement over your own is that it does not require offstage coordination. Offstage coordination requires a base theory and a privileged vantage point and, as you say, magic.

Nesov, the reason why I regard Dai's formulation of UDT as such a significant improvement over your own is that it does not require offstage coordination. Offstage coordination requires a base theory and a privileged vantage point and, as you say, magic.

I still don't understand this emphasis. Here I sketched in what sense I mean the global solution -- it's more about definition of preference than the actual computations and actions that the agents make (locally). There is an abstract concept of global strategy that can be characterized as being "offstage", but there is no offstage computation or offstage coordination, and in general complete computation of global strategy isn't performed even locally -- only approximations, often approximations that make it impossible to implement the globally best solution.

In the above comment, by "magic" I referred to exact mechanism that says in what way and to what extent different agents are running the same algorithm, which is more in the domain of TDT, UDT generally not talking about separate agents, only different possible states of the same agent. Which is why neither concept solves the bargaining problem: it's out of UDT's domain, and TDT takes the relevant pieces of the puzzle as given, in its causal graphs.

For further disambiguation, see for example this comment you made:

We're taking apart your "mathematical intuition" into something that invents a causal graph (this part is still magic) and a part that updates a causal graph "given that your output is Y" (Pearl says how to do this).

Again, if we randomly selected someone to ask, rather than having specified in advance that we're going to make the decision depend on the unanimous response of all people in green rooms, then there would be no paradox. What you're talking about here, pulling out a random marble, is the equivalent of asking a random single person from either green or red rooms. But this is not what we're doing!

Either I'm misunderstanding something, or I wasn't clear.

To make it explicit: EVERYONE who gets a green marble gets asked, and the outcome depends their consent being unanimous, just like everyone who wakes up in a green room gets asked. ie, all twenty rationalists draw a marble from the bucket, so that by the end, the bucket is empty.

Everyone who got a green marble gets asked for their decision, and the final outcome depends on all the answers. The bit about them drawing marbles individually is just to keep them from seeing what marbles the others got or being able to talk to each other once the marble drawing starts.

Unless I completely failed to comprehend some aspect of what's going on here, this is effectively equivalent to the problem you described.

Oh, okay, that wasn't clear actually. (Because I'm used to "they" being a genderless singular pronoun.) In that case these problems do indeed look equivalent.

Hm. Hm hm hm. I shall have to think about this. It is a an extremely good point. The more so as anyone who draws a green marble should indeed be assigning a 90% probability to there being a mostly-green bucket.

Sorry about the unclarity then. I probably should have explicitly stated a step by step "marble game procedure".

My personal suggestion if you want an "anthropic reasoning is confooozing" situation would be the whole anthropic updating vs aumann agreement thing, since the disagreement would seem to be predictable in advance, and everyone involved would appear to be able to be expected to agree that the disagreement is right and proper. (ie, mad scientist sets up a quantum suicide experiment. Test subject survives. Test subject seems to have Bayesian evidence in favor of MWI vs single world, external observer mad scientist who sees the test subject/victim survive would seem to not have any particular new evidence favoring MWI over single world)

(Yes, I know I've brought up that subject several times, but it does seem, to me, to be a rather more blatant "something funny is going on here")

(EDIT: okay, I guess this would count as quantum murder rather than quantum suicide, but you know what I mean.)

I don't see how being assigned a green or red room is "anthropic" while being assigned a green or red marble is not anthropic.

I thought the anthropic part came from updating on your own individual experience in the absence of observing what observations others are making.

The difference wasn't marble vs room but "copies of one being, so number of beings changed" vs "just gather 20 rationalists..."

But my whole point was "the original wasn't really an anthropic situation, let me construct this alternate yet equivalent version to make that clear"

Do you think that the Sleeping Beauty problem is an anthropic one?

It probably counts as an instance of the general class of problems one would think of as an "anthropic problem".

I see. I had always thought of the problem as involving 20 (or sometimes 40) different people. The reason for this is that I am an intuitive rather than literal reader, and when Eliezer mentioned stuff about copies of me, I just interpreted this as meaning to emphasize that each person has their own independent 'subjective reality'. Really only meaning that each person doesn't share observations with the others.

So all along, I thought this problem was about challenging the soundness of updating on a single independent observation involving yourself as though you are some kind of special reference frame.

... therefore, I don't think you took this element out, but I'm glad you are resolving the meaning of "anthropic" because there are probably quite a few different "subjective realities" circulating about what the essence of this problem is.

Sorry for delay.

Copies as in "upload your mind. then run 20 copies of the uploaded mind".

And yes, I know there's still tricky bits left in the problem, I merely established that those tricky bits didn't derive from effects like mind copying or quantum suicide or anything like that and could instead show up in ordinary simple stuff, with no need to appeal to anthropic principles to produce the confusion. (sorry if that came out babbly, am getting tired)

anyone who draws a green marble should indeed be assigning a 90% probability to there being a mostly-green bucket.

I don't think so. I think the answer to both these problems is that if you update correctly, you get 0.5.

*blinks* mind expanding on that?

P(green|mostly green bucket) = 18/20

P(green|mostly red bucket) = 2/20

likelihood ratio = 9

if one started with no particular expectation of it being one bucket vs the other, ie, assigned 1:1 odds, then after updating upon seeing a green marble, one ought assign 9:1 odds, ie, probability 9/10, right?

I guess that does need a lot of explaining.

I would say:

P(green|mostly green bucket) = 1

P(green|mostly red bucket) = 1

P(green) = 1

because P(green) is not the probability that you will get a green marble, it's the probability that someone will get a green marble. From the perspective of the priors, all the marbles are drawn, and no one draw is different from any other. If you don't draw a green marble, you're discarded and the people who did get a green vote. For the purposes of figuring out the priors for a group strategy, your draw being green is not an event.

Of course, you know that you've drawn green. But the only thing you can translate it into that has a prior is "someone got green."

That probably sounds contrived. Maybe it is. But consider a slightly different example:

  • Two marbles and two people instead of twenty.
  • One marble is green, the other will be red or green based on a coin flip (green on heads, red on tails).

I like this example because it combines the two conflicting intuitions in the same problem. Only a fool would draw a red marble and remain uncertain about the coin flip. But someone who draws a green marble is in a situation similar to the twenty marble scenario.

If you were to plan ahead of time how the greens should vote, you would tell them to assume 50%. But a person holding a green marble might think it's 2/3 in favor of double green.

To avoid embarrassing paradoxes, you can base everything on the four events "heads," "tails," "someone gets green," and "someone gets red." Update as normal.

yes, the probability that someone will get a green marble is rather different than the probability that I, personally, will get a green marble. But if I do personally get a green marble, that's evidence in favor of green bucket.

The decision algorithm for how to respond to that though in this case is skewed due to the rules for the payout.

And in your example, if I drew green, I'd consider the 2/3 probability the correct one for whoever drew green.

Now, if there's a payout scheme involved with funny business, that may alter some decisions, but not magically change my epistemology.

OK, but I think Psy-Kosh was talking about something to do with the payoffs. I'm just not sure if he means the voting or the dollar amounts or what.

Sorry for delay. And yeah, I meant stuff like "only greens get to decide, and the decision needs to be unanimous" and so on

I agree that changes the answer. I was assuming a scheme like that in my two marble example. In a more typical situation, I would also say 2/3.

To me, it's not a drastic (or magical) change, just getting a different answer to a different question.

Um... okay... I'm not sure what we're disagreeing about here, if anything:

my position is "given that I found myself with a green marble, it is right and proper for me to assign a 2/3 probability to both being green. However, the correct choice to make, given the pecuiluarities of this specific problem, may require one to make a decision that seems, on the surface, as if one didn't update like that at all."

Well, we might be saying the same thing but coming from different points of view about what it means. I'm not actually a bayesian, so when I talk about assigning probabilities and updating them, I just mean doing equations.

What I'm saying here is that you should set up the equations in a way that reflects the group's point of view because you're telling the group what to do. That involves plugging some probabilities of one into Bayes' Law and getting a final answer equal to one of the starting numbers.

Very enlightening!

It just shows that the OP was an overcomplicated example generating confusion about the update.

[EDIT] Deleted rest of the comment due to revised opinion here: http://lesswrong.com/lw/17c/outlawing_anthropics_an_updateless_dilemma/13hk

Good point. After thinking about this for a while, I feel comfortable simultaneously holding these views:

1) You shouldn't do anthropic updates. (i.e. update on the fact that you exist)

2) The example posed in the top-level post is not an example of anthropic reasoning, but reasoning on specific givens and observations, as are most supposed examples of anthropic reasoning.

3) Any evidence arising from the fact that you exist is implicitly contained by your observations by virtue of their existence.

Wikipedia gives one example of a productive use of the anthropic principle, but it appears to be reasoning based on observations of the type of life-form we are, as well as other hard-won biochemical knowledge, well above and beyond the observation that we exist.

Thanks.

I don't THINK I agree with your point 1. ie, I favor saying yes to anthropic updates, but I admit that there's definitely confusing issues here.

Mind expanding on point 3? I think I get what you're saying, but in general we filter out that part our observations, that is, the fact that observations are occurring at all, Getting that back is the point of anthropic updating. Actually... IIRC, Nick Bostrom's way of talking about anthropic updates more or less is exactly your point 3 in reverse... ie, near as I can determine and recall, his position explicitly advocates talking about the significance that observations are occurring at all as part of the usual update based on observation. Maybe I'm misremembering though.

Also, separating it out into a single anthropic update and then treating all observations as conditional on your existence or such helps avoid double counting that aspect, right?

Also, here's another physics example, a bit more recent that was discussed on OB a while back.

Reading the link, the second paper's abstract, and most of Scott Aaronson's post, it looks to me like they're not using anthropic reasoning at all. Robin Hanson summarizes their "entropic principle" (and the abstract and all discussion agree with his summary) as

since observers need entropy gains to function physically, we can estimate the probability that any small spacetime volume contains an observer to be proportional to the entropy gain in that volume.

The problem is that "observer" is not the same as "anthrop-" (human). This principle is just a subtle restatement of either a tautology or known physical law. Because it's not that "observers need entropy gains". Rather, observation is entropy gain. To observe something is to increase one's mutual information with it. But since phase space is conserved, all gains in mutual information must be offset by an increase in entropy.

But since "observers" are simply anything that forms mutual information with something else, it doesn't mean a conscious observer, let alone a human one. For that, you'd need to go beyond P(entropy gain|observer) to P(consciousness|entropy gain).

(I'm a bit distressed no one else made this point.)

Now, this idea could lead to an insight if you endorsed some neo-animistic view that consciousness is proportional to normalized rate of mutual information increase, and so humans are (as) conscious (as we are) because we're above some threshold ... but again, you'd be using nothing from your existence as such.

The argument was "higher rate of entropy production is correlated with more observers, probably. So we should expect to find ourselves in chunks of reality that have high rates of entropy production"

I guess it wasn't just observers, but (non reversible) computations

ie, anthropic reasoning was the justification for using the entropy production criteria in the first place. Yes, there is a question of fractions of observers that are conscious, etc... but a universe that can't support much in the way of observers at all probably can't support much in the way of conscious observers, while a universe that can support lots of observers can probably support more conscious observers than the other, right?

Or did I misunderstand your point?

Now I'm not understanding how your response applies.

My point was: the entropic principle estimates the probability of observers per unit volume by using the entropy per unit volume. But this follows immediately from the second law and conservation of phase space; it's necessarily true.

To the extent that it assigns a probability to a class that includes us, it does a poor job, because we make up a tiny fraction of the "observers" (appropriately defined) in the universe.

Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning.

Why is anthropic reasoning related to consciousness at all? Couldn't any kind of Bayesian reasoning system update on the observation of its own existence (assuming such updates are a good idea in the first place)?

Why do I think anthropic reasoning and consciousness are related?

In a nutshell, I think subjective anticipation requires subjectivity. We humans feel dissatisfied with a description like "well, one system running a continuation of the computation in your brain ends up in a red room and two such systems end up in green rooms" because we feel that there's this extra "me" thing, whose future we need to account for. We bother to ask how the "me" gets split up, what "I" should anticipate, because we feel that there's "something it's like to be me", and that (unless we die) there will be in future "something it will be like to be me". I suspect that the things I said in the previous sentence are at best confused and at worst nonsense. But the question of why people intuit crazy things like that is the philosophical question we label "consciousness".

However, the feeling that there will be in future "something it will be like to be me", and in particular that there will be one "something it will be like to be me" if taken seriously, forces us to have subjective anticipation, that is, to write probability distribution summing to one for which copy we end up as. Once you do that, if you wake up in a green room in Eliezer's example, you are forced to update to 90% probability that the coin came up heads (provided you distributed your subjective anticipation evenly between all twenty copies in both the head and tail scenarios, which really seems like the only sane thing to do.)

Or, at least, the same amount of "something it is like to be me"-ness as we started with, in some ill-defined sense.

On the other hand, if you do not feel that there is any fact of the matter as to which copy you become, then you just want all your copies to execute whatever strategy is most likely to get all of them the most money from your initial perspective of ignorance of the coinflip.

Incidentally, the optimal strategy looks like an policy selected by updateless decision theory and not like any probability of the the coin having been heads or tails. PlaidX beat me to the counter-example for p=50%. Counter-examples of like PlaidX's will work for any p<90%, and counter-examples like Eliezer's will work for any p>50%, so that pretty much covers it. So, unless we want to include ugly hacks like responsibility, or unless we let the copies reason Goldenly (using Eliezer's original TDT) about each other's actions as tranposed versions of their own actions (which does correctly handle PlaidX's counter-example, but might break in more complicated cases where no isomorphism is apparent) there simply isn't a probability-of-heads that represents the right thing for the copies to do no matter the deal offered to them.

Consciousness is really just a name for having a model of yourself which you can reflect on and act on - plus a whole bunch of other confused interpretations which don't really add much.

To do anthropic reasoning you have to have a simple model of yourself which you can reason about.

Machines can do this too, of course, without too much difficulty. That typically makes them conscious, though. Perhaps we can imagine a machine performing anthropic reasoning while dreaming - i.e. when most of its actuators are disabled, and it would not normally be regarded as being conscious. However, then, how would we know about its conclusions?

An AI that runs UDT wouldn't conclude that it was a Boltzmann or non-Boltzmann brain. For such an AI, the statement has no meaning, since it's always both. The closest equivalent would be "Most of the value I can create by making the right decision is concentrated in the vicinity of non-Boltzmann brains."

BTW, does my indexical uncertainty and the Axiom of Independence post make any more sense now?

This was my take after going through a similar analysis (with apples, not paperclips) at the SIAI summer intern program.

It seems promising that several people are converging on the same "updateless" idea. But sometimes I wonder why it took so long, if it's really the right idea, given the amount of brainpower spent on this issue. (Take a look at http://www.anthropic-principle.com/profiles.html and consider that Nick Bostrom wrote "Investigations into the Doomsday Argument" in 1996 and then did his whole Ph.D. on anthropic reasoning, culminating in a book published in 2002.)

BTW, weren't the SIAI summer interns supposed to try to write one LessWrong post a week (or was it a month)? What happened to that plan?

But sometimes I wonder why it took so long, if it's really the right idea, given the amount of brainpower spent on this issue.

People are crazy, the world is mad. Also inventing basic math is a hell of a lot harder than reading it in a textbook afterward.

People are crazy, the world is mad.

I suppose you're referring to the fact that we are "designed" by evolution. But why did evolution create a species that invented the number field sieve (to give a random piece of non-basic math) before UDT? It doesn't make any sense.

Also inventing basic math is a hell of a lot harder than reading it in a textbook afterward.

In what sense is it "hard"? I don't think it's hard in a computational sense, like NP-hard. Or is it? I guess it goes back to the question of "what algorithm are we using to solve these types of problems?"

No, I'm referring to the fact that people are crazy and the world is mad. You don't need to reach so hard for an explanation of why no one's invented UDT yet when many-worlds wasn't invented for thirty years.

I also don't think general madness is enough of an explanation. Both are counterintuitive ideas in areas without well-established methods to verify progress, e.g. building a working machine or standard mathematical proof techniques.

The OB/LW/SL4/TOElist/polymathlist group is one intellectual community drawing on similar prior work that hasn't been broadly disseminated.

The same arguments apply with much greater force to the the causal decision theory vs evidential decision theory debate.

The interns wound up more focused on their group projects. As it happens, I had told Katja Grace that I was going to write up a post showing the difference between UDT and SIA (using my apples example which is isomorphic with the example above), but in light of this post it seems needless.

UDT is basically the bare definition of reflective consistency: it is a non-solution, just statement of the problem in constructive form. UDT says that you should think exactly the same way as the "original" you thinks, which guarantees that the original you won't be disappointed in your decisions (reflective consistency). It only looks good in comparison to other theories that fail this particular requirement, but otherwise are much more meaningful in their domains of application.

TDT fails reflective consistency in general, but offers a correct solution in a domain that is larger than those of other practically useful decision theories, while retaining their expressivity/efficiency (i.e. updating on graphical models).

The OB/LW/SL4/TOElist/polymathlist group is one intellectual community drawing on similar prior work that hasn't been broadly disseminated.

What prior work are you referring to, that hasn't been broadly disseminated?

The same arguments apply with much greater force to the the causal decision theory vs evidential decision theory debate.

I think much less brainpower has been spent on CDT vs EDT, since that's thought of as more of a technical issue that only professional decision theorists are interested in. Likewise, Newcomb's problem is usually seen as an intellectual curiosity of little practical use. (At least that's what I thought until I saw Eliezer's posts about the potential link between it and AI cooperation.)

Anthropic reasoning, on the other hand, is widely known and discussed (I remember the Doomsday Argument brought up during a casual lunch-time conversation at Microsoft), and thought to be both interesting in itself and having important applications in physics.

The interns wound up more focused on their group projects.

I miss the articles they would have written. :) Maybe post the topic ideas here and let others have a shot at them?

"What prior work are you referring to, that hasn't been broadly disseminated?"

I'm thinking of the corpus of past posts on those lists, which bring certain tools and concepts (Solomonoff Induction, anthropic reasoning, Pearl, etc) jointly to readers' attention. When those tools are combined and focused on the same problem, different forum participants will tend to use them in similar ways.

You might think that more top-notch economists and game theorists would have addressed Newcomb/TDT/Hofstadter superrationality given their interest in the Prisoner's Dilemma.

Looking at the actual literature on the Doomsday argument, there are some physicists involved (just as some economists and others have tried their hands at Newcomb), but it seems like more philosophers. And anthropics doesn't seem core to professional success, e.g. Tegmark can indulge in it a bit thanks to showing his stuff in 'hard' areas of cosmology.

I just realized/remembered that one reason that others haven't found the TDT/UDT solutions to Newcomb/anthropic reasoning may be that they were assuming a fixed human nature, whereas we're assuming an AI capable of self-modification. For example, economists are certainly more interested in answering "What would human beings do in PD?" than "What should AIs do in PD assuming they know each others' source code?" And perhaps some of the anthropic thinkers (in the list I linked to earlier) did invent something like UDT, but then thought "Human beings can never practice this, I need to keep looking."

This post is an argument against voting on your updated probability when there is a selection effect such as this. It applies to any evidence (marbles, existence etc), but only in a specific situation, so has little to do with SIA, which is about whether you update on your own existence to begin with in any situation. Do you have arguments against that?

It's for situations in which different hypotheses all predict that there will be beings subjectively indistinguishable from you, which covers the most interesting anthropic problems in my view. I'll make some posts distinguishing SIA, SSA, UDT, and exploring their relationships when I'm a bit less busy.

Are you saying this problem arises in all situations where multiple beings in multiple hypotheses make the same observations? That would suggest we can't update on evidence most of the time. I think I must be misunderstanding you. Subjectively indistinguishable beings arise in virtually all probabilistic reasoning. If there were only one hypothesis with one creature like you, then all would be certain.

The only interesting problem in anthropics I know of is whether to update on your own existence or not. I haven't heard a good argument for not (though I still have a few promising papers to read), so I am very interested if you have one. Will 'exploring their relationships' include this?

I think I'm with Bostrom.

The problem seems to come about because the good effects of 18 people being correct are more than wiped out by the bad effects of 2 people being wrong.

I'm sure this imbalance in the power of the agents has something to do with it.

What if, instead of requiring agreement of all copies in a green room, one copy in a green room was chosen at random to make the choice?

In this case the chosen copy in the green room should update on the anthropic evidence of being chosen to make the choice. That copy had a 1/18 probability of being chosen if the coin flip came up heads, and a 1/2 probability of being chosen if the coin flip came up tails, so the odds of heads:tails should be updated from 9:1 to 1:1. This exactly canceled the anthropic evidence of being in a green room.

"I've made sacrifices! You don't know what it cost me to climb into that machine every night, not knowing if I'd be the man in the box or in the prestige!"

sorry- couldn't help myself.

"I've made sacrifices! You don't know what it cost me to climb into that machine every night, not knowing if I'd be the man in the box or in the prestige!"

You know, I never could make sense out of that line. If you assume the machine creates "copies" (and that's strongly implied by the story up to that point), then that means every time he gets on stage, he's going to wind up in the box. (And even if the copies are error-free and absolutely interchangeable, one copy will still end up in the box.)

(Edit to add: of course, if you view it from the quantum suicide POV, "he" never ends up in the box, since otherwise "he" would not be there to try again the next night.)

More thinking out loud:

It really is in your best interest to accept the offer after you're in a green room. It really is in your best interest to accept the offer conditional on being in a green room before you're assigned. Maybe part of the problem arises because you think your decision will influence the decision of others, ie because you're acting like a timeless decision agent. Replace "me" with "anyone with my platonic computation", and "I should accept the offer conditional on being in a green room" with "anyone with my platonic computation should accept the offer, conditional on anyone with my platonic computation being in a green room." But the chances of someone with my platonic computation being in a green room is 100%. Or, to put it another way, the Platonic Computation is wondering "Should I accept the offer conditional on any one of my instantiations being in a green room?". But the Platonic Computation knows that at least one of its instantiations will be in a green room, so it declines the offer. If the Platonic Computation was really a single organism, its best option would be to single out one of its instantiations before-hand and decide "I will accept the offer, given that Instantiation 6 is in a green room" - but since most instantiations of the computation can't know the status of Instantiation 6 when they decide, it doesn't have this option.

Yes, exactly.

If you are in a green room and someone asks you if you will bet that a head was flipped, you should say "yes".

However, if that same person asks you if they should bet that heads was flipped, you should answer no if you ascertain that they asked you on the precondition that you were in a green room.

  • the probability of heads | you are in green room = 90%

  • the probability of you betting on heads | you are green room = 100% = no information about the coin flip

Your first claim needs qualifications: You should only bet if you're being drawn randomly from everyone. If it is known that one random person in a green room will be asked to bet, then if you wake up in a green room and are asked to bet you should refuse.

P(Heads | you are in a green room) = 0.9 P(Being asked | Heads and Green) = 1/18, P(Being asked | Tails and Green) = 1/2 Hence P(Heads | you are asked in a green room) = 0.5

Of course the OP doesn't choose a random individual to ask, or even a random individual in a green room. The OP asks all people in green rooms in this world.

If there is confusion about when your decision algorithm "chooses", then TDT/UDT can try to make the latter two cases equivalent, by thinking about the "other choices I force". Of course the fact that this asserts some variety of choice for a special individual and not for others, when the situation is symmetric, suggests something is being missed.

What is being missed, to my mind, is a distinction between the distribution of (random individuals | data is observed), and the distribution of (random worlds | data is observed).

In the OP, the latter distribution isn't altered by the update as the observed data occurs somewhere with probability 1 in both cases. The former is because it cares about the number of copies in the two cases.

I've been watching for a while, but have never commented, so this may be horribly flawed, opaque or otherwise unhelpful.

I think the problem is entirely caused by the use of the wrong sets of belief, and that anything holding to Eliezer's 1-line summary of TDT or alternatively UDT should get this right.

Suppose that you're a rational agent. Since you are instantiated in multiple identical circumstances (green rooms) and asked identical questions, your answers should be identical. Hence if you wake up in a green room and you're asked to steal from the red rooms and give to the green rooms, you either commit a group of 2 of you to a loss of 52 or commit a group of 18 of you to a gain of 12.

This committal is what you wish to optimise over from TDT/UDT, and clearly this requires knowledge about the likelyhood of different decision making groups. The distribution of sizes of random groups is not the same as the distribution of sizes of groups that a random individual is in. The probabilities of being in a group are upweighted by the size of the group and normalised. This is why Bostrom's suggested 1/n split of responsibility works; it reverses the belief about where a random individual is in a set of decision making groups to a belief about the size of a random decision making group.

By the construction of the problem the probability that a random (group of all the people in green rooms) has size 18 is 0.5, and similarly for 2 the probability is 0.5. Hence the expected utility is (0.512)+(0.5-52)=-20.

If you're asked to accept a bet on there being 18 people in green rooms, and you're told that only you're being offered it, then the decision commits exactly one instance of you to a specific loss or gain, regardless of the group you're in. Hence you can't do better than the 0.9 and 0.1 beliefs.

If you're told that the bet is being offered to everyone in a green room, then you are committing to n times the outcome in any group of n people. In this case gains are conditional on group size, and so you have to use the 0.5-0.5 belief about the distribution of groups. It doesn't matter because the larger groups have the larger multiplier and thus shutting up and multiplying yields the same answers as a single-shot bet.

ETA: At some level this is just choosing an optimal output for your calculation of what to do, given that the result is used variably widely.

This committal is what you wish to optimise over from TDT/UDT, and clearly this requires knowledge about the likelyhood of different decision making groups.

I was influenced by the OP and used to think that way. However I think now, that this is not the root problem.

What if the agents get more complicated decision problems: for example, rewards depending on the parity of the agents voting certain way, etc.?

I think, what essential is that the agents have to think globally (categorical imperative, hmmm?)

Practically: if the agent recognizes that there is a collective decision, then it should model all available conceivable protocols (but making apriori sure that all cooperating agents perform the same or compatible analysis, if they can't communicate) and then they should choose the protocol with best overall total gain. In the case of the OP: the second calculation in the OP. (Not messing around with correction factors based on responsibilities, etc.)

Special considerations based on group sizes etc. may be incidentally correct in certain situations, but this is just not general enough. The crux is that the ultimate test is simply the expected value computation for the protocol of the whole group.

Between non communicating copies of your decision algorithm, it's forced that every instance comes to the same answers/distributions to all questions, as otherwise Eliezer can make money betting between different instances of the algorithm. It's not really a categorical imperative, beyond demanding consistency.

The crux of the OP is asking for a probability assessment of the world, not whether the DT functions.

I'm not postulating 1/n allocation of responsibility; I'm stating that the source of the confusion is over: P(A random individual is in a world of class A_i | Data) with P(A random world is of class A_i | Data) And that these are not equal if the number of individuals with access to Data are different in distinct classes of world.

Hence in this case, there are 2 classes of world, A_1 with 18 Green rooms and 2 Reds, and A_2 with 2 Green rooms and 18 Reds.

P(Random individual is in the A_1 class | Woke up in a green room) = 0.9 by anthropic update. P(Random world is in the A_1 class | Some individual woke up in a green room) = 0.5

Why? Because in A_1 there 18/20 individuals fit the description "Woke up in a green room", but in A_2 only 2/20 do.

The crux of the OP is that neither a 90/10 nor 50/50 split seem acceptable, if betting on "Which world-class an individual in a Green room is in" and "Which world-class the (set of all individuals in Green rooms which contains this individual) is in" are identical. I assert that they are not. The first case is 0.9/0.1 A_1/A_2, the second is 0.5/0.5 A_1/A_2.

Consider a similar question where a random Green room will be asked. If you're in that room, you update both on (Green walls) and (I'm being asked) and recover the 0.5/0.5, correctly. This is close to the OP as if we wildly assert that you and only you have free will and force the others, then you are special. Equally in cases where everyone is asked and plays separately, you have 18 or 2 times the benefits depending on whether you're in A_1 or A_2.

If each individual Green room played separately, then you update on (Green walls), but P(I'm being asked|Green) = 1 in either case. This is betting on whether there are 18 people in green rooms or 2, and you get the correct 0.9/0.1 split. To reproduce the OP the offers would need to be +1/18 to Greens and -3/18 from Reds in A_1, and +1/2 to Greens and -3/2 from Reds in A_2, and then you'd refuse to play, correctly.

"Hence if you wake up in a green room and you're asked to steal from the red rooms and give to the green rooms, you either commit a group of 2 of you to a loss of 52 or commit a group of 18 of you to a gain of 12."

In the example you care equally about the red room and green room dwellers.

Hence if there are 2 instances of your decision algorithm in Green rooms, there are 2 runs of your decision algorithm, and if they vote to steal there is a loss of 3 from each red and gain 1 for each green, for a total gain of 12-318 = - 52.

If there are 18 instances in Green rooms, there are 18 runs of your decision algorithm, and if they vote to steal there is a loss of 3 from each red and a gain of 1for each green, for a total gain of 118-23 = 12

The "committal of a group of" is noting that there are 2 or 18 runs of your decision algorithm that are logically forced by the decision made this specific instance of the decision algorithm in a green room.

Curses on this problem; I spent the whole day worrying about it, and am now so much of a wreck that the following may or may not make sense. For better or worse, I came to a similar conclusion of Psy-Kosh: that this could work in less anthropic problems. Here's the equivalent I was using:

Imagine Omega has a coin biased so that it comes up the same way nine out of ten times. You know this, but you don't know which way it's biased. Omega allows you to flip the coin once, and asks for your probability that it's biased in favor of heads. The coin comes up heads. You give your probability as 9/10.

Now Omega takes 20 people and puts them in the same situation as in the original problem. It lets each of them flip their coins. Then it goes to each of the people who got tails, and offers $1 to charity for each coin that came up tails, but threatens to steal $3 from charity for each coin that came up heads.

This nonanthropic problem works the same way as the original anthropic problem. If the coin is really biased heads, 18 people will get heads and 2 people will get tails. In this case,the correct subjective probability to assign is definitely 9/10 in favor of whatever result you got; after all, this is the correct probability when you're the only person in the experiment, and just knowing that 19 other people are also participating in the experiment shouldn't change matters.

I don't have a formal answer for why this happens, but I can think of one more example that might throw a little light on it. In another thread, someone mentioned that lottery winners have excellent evidence that they are brains-in-a-vat and that the rest of the world is an illusion being put on by the Dark Lord of the Matrix for their entertainment. After all, if this was true, it wouldn't be too unlikely for them to win the lottery, so for a sufficiently large lottery, the chance of winning it this way exceeds the chance of winning it through luck.

Suppose Bob has won the lottery and so believes himself to be a brain in a vat. And suppose that the evidence for the simulation argument is poor enough that there is no other good reason to believe yourself to be a brain in a vat. Omega goes up to Bob and asks him to take a bet on whether he is a brain in a vat. Bob says he is, he loses, and Omega laughs at him. What did he do wrong? Nothing. Omega was just being mean by specifically asking the one person whom ve knew would get the answer wrong.

Omega's little prank would still work if ve announced ver intention to perform it beforehand. Ve would say "When one of you wins the lottery, I will be asking this person to take a bet whether they are a brain in a vat or not!" Everyone would say "That lottery winner shouldn't accept Omega's bet. We know we're not brains in vats." Then someone wins the lottery, Omega asks if they're a brain in a vat, and they say yes, and Omega laughs at them (note that this also works if we consider a coin with a bias such that it lands the same way 999999 out of a million times, let a million people flip it once, and ask people what they think the coin's bias is, asking the people who get the counter-to-expectations result more often than chance.)

Omega's being equally mean in the original problem. There's a 50% chance ve will go and ask the two out of twenty people who are specifically most likely to be wrong and can't do anything about it. The best course I can think of would be for everyone to swear an oath not to take the offer before they got assigned into rooms.

Then someone wins the lottery, Omega asks if they're a brain in a vat, and they say yes, and Omega laughs at them

By assumption, if the person is right to believe they're in a sim, then most of the lottery winners are in sims, so while Omega laughs at them in our world, they win the bet with Omega in most of their worlds.

wrong and can't do anything about it

should have been your clue to check further.

This is a feature of the original problem, isn't it?

Let's say there are 1000 brains in vats, each in their own little world, and a "real" world of a billion people. The chance of a vat-brain winning the lottery is 1, and the chance of a real person winning the lottery is 1 in a million. There are 1000 real lottery winners and 1000 vat lottery winners, so if you win the lottery your chance of being in a vat is 50-50. However, if you look at any particular world, the chances of this week's single lottery winner being a brain in a vat is 1000/1001.

Assume the original problem is run multiple times in multiple worlds, and that the value of pi somehow differs in those worlds (probably you used pi precisely so people couldn't do this, but bear with me). Of all the people who wake up in green rooms, 18/20 of them will be right to take your bet. However, in each particular world, the chances of the green room people being right to take the bet is 1/2.

In this situation there is no paradox. Most of the people in the green rooms come out happy that they took the bet. It's only when you limit it to one universe that it becomes a problem. The same is true of the lottery example. When restricted to a single (real, non-vat) universe, it becomes more troublesome.

Now Omega takes 20 people and puts them in the same situation as in the original problem. It lets each of them flip their coins. Then it goes to each of the people who got tails, and offers $1 to charity for each coin that came up tails, but threatens to steal $3 from charity for each coin that came up heads.

It's worth noting that if everyone got to make this choice separately - Omega doing it once for each person who responds - then it would indeed be wise for everyone to take the bet! This is evidence in favor of either Bostrom's division-of-responsibility principle, or byrnema's pointer-based viewpoint, if indeed those two views are nonequivalent.

Bostrom's calculation is correct, but I believe it is an example of multiplying by the right coefficients for the wrong reasons.

I did exactly the same thing -- multiplied by the right coefficients for the wrong reasons -- in my deleted comment. I realized that the justification of these coefficients required a quite different problem (in my case, I modeled that all the green roomers decided to evenly divide the spoils of the whole group) and the only reason it worked was because multiplying the first term by 1/18 and the next term by 1/2 meant you were effectively canceling away that the factors the represented your initial 90% posterior, and thus ultimately just applying the 50/50 probability of the non-anthropic solution.

Anthropic calculation:

18/20(12)+2/20(-52) = 5.6

Bostrom-modified calculation for responsibility per person:

[18/20(12)/18+2/20(-52)/2] / 2 = -1

Non-anthropic calculation for EV per person:

[1/2(12)+1/2(-52)] /20 = -1

My pointer-based viewpoint, in contrast, is not a calculation but a rationale for why you must use the 50/50 probability rather than the 90/10 one. The argument is that each green roomer cannot use the information that they were in a green room because this information was preselected (a biased sample). With effectively no information about what color room they're in, each green roomer must resort to the non-anthropic calculation that the probability of flipping heads is 50%.