*Cross posted from Overcoming Bias. Comments there.*

***

*Warning: this post is technical.*

Suppose you know that there are a certain number of planets, N. You are unsure about the truth of a statement Q. If Q is true, you put a high probability on life forming on a given arbitrary planet. If Q is false, you put a low probability on this. You have a prior probability for Q. So far you have not taken into account your observation that the planet you are on has life. How do you update on this evidence, to get a posterior probability for Q? Since you don’t know which is ‘this’ planet, with respect to the model, you can’t update directly on ‘there is life on this planet’, by excluding worlds where this planet doesn’t have life. And you can’t necessarily treat ‘this’ as an arbitrary planet, since you wouldn’t have seen it if it didn’t have life.

I have an ongoing disagreement with an associate who suggests that you should take ‘this planet has life’ into account by conditioning on ‘there exists a planet with life’. That is,

P(Q|there is life on this planet) = P(Q|there exists a planet with life).

Here I shall explain my disagreement.

Nick Bostrom argues persuasively that much science would be impossible if we treated ‘I observe X’ as ‘someone observes X’. This is basically because in a big world of scientists making measurements, at some point somebody will make most mistaken measurements. So if all you know when you measure the temperature of a solution to be 15 degrees is that you are not in a world where nobody ever measures its temperature to be 15 degrees, this doesn’t tell you much about the temperature.

You can add other apparently irrelevant observations you make at the same time – e.g. that the table is blue chipboard – in order to make your total observations less likely to arise once in a given world (at its limit, this is the suggestion of FNC). However it seems implausible that you should make different inferences from taking a measurement when you can also see a detailed but irrelevant picture at the same time than those you make with limited sensory input. Also the same problem re-emerges if the universe is supposed to be larger. Given that the universe is thought to be very, very large, this is a problem. Not to mention, it seems implausible that the size of the universe should greatly affect probabilistic judgements made about entities which are close to independent from most of the universe.

So I think Bostrom’s case is good. However I’m not completely comfortable arguing from the acceptability of something that we do (science) back to the truth of the principles that justify it. So I’d like to make another case against taking ‘this planet has life’ as equivalent evidence to ‘there exists a planet with life’.

Evidence is what excludes possibilities. Seeing the sun shining is evidence against rain, because it excludes the possible worlds where the sky is grey, which include most of those where it is raining. Seeing a picture of the sun shining is not much evidence against rain, because it excludes worlds where you don’t see such a picture, which are about as likely to be rainy or sunny as those that remain are.

Receiving the evidence ‘there exists a planet with life’ means excluding all worlds where all planets are lifeless, and not excluding any other worlds. At first glance, this must be different from ‘this planet has life’. Take any possible world where some other planet has life, and this planet has no life. ‘There exists a planet with life’ doesn’t exclude that world, while ‘this planet has life’ does. Therefore they are different evidence.

At this point however, note that the planets in the model have no distinguishing characteristics. How do we even decide which planet is ‘this planet’ in another possible world? There needs to be some kind of mapping between planets in each world, saying which planet in world A corresponds to which planet in world B, etc. As far as I can tell, any mapping will do, as long as a given planet in one possible world maps to at most one planet in another possible world. This mapping is basically a definition choice.

So suppose we use a mapping where in every possible world where at least one planet has life, ‘this planet’ corresponds to one of the planets that has life. See the below image.

Now learning that there exists a planet with life is the same as learning that this planet has life. Both exclude the far righthand possible world, and none of the other possible worlds. What’s more, since we can change the probability distribution we end up with, just by redefining which planets are ‘the same planet’ across worlds, indexical evidence such as ‘this planet has life’ must be horseshit.

Actually the last paragraph was false. If in every possible world which contains life, you pick one of the planets with life to be ‘this planet’, you can no longer know whether you are in ‘this planet’. From your observations alone, you could be on the other planet, which only has life when both planets do. The one that is not circled in each of the above worlds. Whichever planet you are on, you know that there exists a planet with life. But because there’s some probability of you being on the planet which only rarely has life, you have more information than that. Redefining which planet was which didn’t change that.

Perhaps a different definition of ‘this planet’ would get what my associate wants? The problem with the last was that it no longer necessarily included the planet we are on. So what about we define ‘this planet’ to be the one you are on, plus a life-containing planet in all of the other possible worlds that contain at least one life-containing planet. A strange, half-indexical definition, but why not? One thing remains to be specified – which is ‘this’ planet when you don’t exist? Let’s say it is chosen randomly.

Now is learning that ‘this planet’ has life any different from learning that some planet has life? Yes. Now again there are cases where some planet has life, but it’s not the one you are on. This is because the definition only picks out planets with life across other possible worlds, not this one. In this one, ‘this planet’ refers to the one you are on. If you don’t exist, this planet may not have life. Even if there are other planets that do. So again, ‘this planet has life’ gives more information than ‘there exists a planet with life’.

You either have to accept that someone else might exist when you do not, or you have to define ‘yourself’ as something that always exists, in which case you no longer know whether you are ‘yourself’. Either way, changing definitions doesn’t change the evidence. Observing that you are alive tells you more than learning that ‘someone is alive’.

Isn't "I observe X" equivalent to "someone chosen for reasons unrelated to this observation observed X"? That solves the "at some point somebody will make most mistaken measurements" problem because the likelihood of randomly choosing the scientist making that mistake is small.

You can't use this logic for observations of the form "I'm alive" because if you weren't alive you wouldn't be observing. What you can use that as evidence of is a hard problem. But it isn't a general problem.

Perhaps, but then there is the question of how you should pretend they were chosen. This is controversial.

If you weren't alive you wouldn't be observing "I'm alive". If X wasn't true you wouldn't be observing X. Could you be more clear on how you think the logic differs?

Slight double-meaning in the word observing:

When I said "if you weren't alive you wouldn't be observing" I meant you wouldn't be seeing whether you were alive or not.

When you said "If X wasn't true you wouldn't be observing X" you meant you wouldn't be seeing that X is true.

I'm finding my second paragraph surprisingly hard to reword.

If your existence depends on X, there are two possibilities: you observe X, you observe nothing

If your existence doesn't depend on X but you have some other way of observing whether X is true, the possibilities are: you observe X, you observe not X.

Do you think that observing X provides different information about something else in these two cases?

What is the advantage of talking about "this planet" versus standard anthropic SIA as you have used so many times on your blog and elsewhere?

I mean, I can see the disadvantages, those being that it's really hard to get a good definition of "this planet" that remains constant across universes, especially between universes with different numbers of planets, universes in which you don't exist, universes in which multiple yous exist, etc.

But with SIA, you can just rephrase it as "I was born on a certain planet, presumably selected randomly among planets in the multiverse that have life, and I call it 'this planet' because I was born there."

("This is a fertile planet, and we will thrive. We will rule over this planet, and we will call it...This Planet.")

Now on your diagram, "a planet has life" gives you a 33% chance of being in frames A, B, or C, and "This planet has life" under the previous equivalence with SIA means you choose a randomly selected pink planet and get 50% chance of being in frame A, 25% chance in frame B, and 25% chance in frame C, which justifies your statement that there should be a difference.

This also solves the scientist's problem just as dspeyer mentions.

I don't follow why your rephrasing is SIA-specific.

Here I'm not arguing for SIA in particular, just against the position that you should only update when your observations completely exclude a world (i.e. 'non-indexical' updating, as in Radford Neal's 'full non-indexical conditioning' for instance). If we just talk about the evidence of existence, before you know anything else about yourself (if that's possible) SSA also probably says you shouldn't update, though it does say you should update on other such evidence in the way I'm arguing, so doesn't have the same problems as this non-indexical position.

I'm addressing this instead of the usual question because I want to settle the debate.

My response to this would be:

From 2: (now 2 layers of indirection to avoid updating on my own argument until later):

Here I stop and summarise:

Suppose (there are) N planets (called "N" in totality). Q can be true or not-true. If Q, observe life. If not-Q, observe no-life. <-: already falsified by evidence. :-> if not-Q, "small finite number compared to N" of planets with life. :: Q cannot be false. Question: Can Q be P=1? Yes, as P=1 is just a logical, technical criterion and not necessarily relevant to a real world except in theory. Can Q be logically true? No, as that excludes the nuance of "how many planets out of N have life" which is the entire interesting part of the question. -> using "there is actually a difference between 'logical certainty' and 'P=1'.".

So the question so far is to construct a prior CDF based on the previously quoted text. Since N is a finite, specific number of planets, this could be done by exhaustively checking each case, in each case for each N. Suppose N=1. Done.

Suppose N=2. Then n (number of planets) = either 1 or 2. Is it 1? yes. Is it 2? life has been observed on comets. Therefore likely to be 2, if N were to be "much larger" than 1. If N=2 then either the comets came from the 1 other planet or from the 1 planet already with life. A priori much more likely that N=1 in this case, given that life is observed to be a surprisingly rare phenomenon, however there must be some probability mass assigned to the idea that n=N=2, given that our previously described reasoning has some relevance to the question we are actually interested in, which is "N = some very large number roughly about the size of the number of planets we observe to be probably there in the universe or something".

Suppose N is some large number compared to 1, 2, etc. Either N is prime or it can be divided by some factors up to (sqrt N). Either way, it can be "added up to" by using only 1, 2, 3, and 4, or some other small subset of numbers less than 10 like 6, 5, 2. ::<- implicitly defines subtraction.:: <->. If division is also allowed as an operation, then N either has a prime factorisation which can be calculated fairly straightforwardly or is prime.

If N is prime, then we should only use linear operations to obtain our probability distribution. If it is not prime, we may use nonlinear methods in addition. Either way, we can use both, and concurrently run the calculation to see whether some specific N is prime. Or we may choose a large N directly which has been already shown to be prime or not prime. Suppose N is 10,000,000,000,000,000,000,000,000. This is known to be not prime, and would likely be considered large compared to 1, 2, etc. We may also choose N as "some prime number close to about that value" and then apply only the linear part of the logic, and this would give us a close estimate for that N, which we can then apply some form of insertion sort/expansion/contraction/interpolation

using all the tools available to usin accordance with the rules of probability theory to obtain a "best estimate" for the prime N which doesn't require much extra calculation, and is likely good enough for cosmological estimates. See https://xkcd.com/2205/. Remember that after obtaining this prior we can "just multiply" to update it based on further observations. This is probably why it's a good idea to get the prior from a very small number of observations if possible....

Now that we have worked out how we

wouldcalculate an example, it is not necessary to do so yet as this can be done after (and indeed, should be) writing down the full response, because it may turn out not to be necessary to answer the actual, quoted question which this article is about.So what is "the rough shape" of our prior, given the response

I myselfhave written so far?Well, if the stated observations:

are taken as a starting point, then we can make a rough prior for Q, which is roughly that "n is small compared to N." This is equivalent to saying that "life is unlikely" as there are much more big numbers than small numbers, and on a uniform distribution n would likely be not small (ie, within a few orders of magnitude) compared to N. "What does the evidence say about whether life is unlikely?" is now a relevant question for our larger question of the informativeness of the original question about Q.

Separately, N may not be finite, and we are interested in the actual question of the article in this case too. So we're not actually

thatinterested in the previous stuff, but as a prior we have that "life is unlikely even for infinite N" but that would still mean that for infinite N there would be an infinity of life.It seems more important, by the numbers, to consider first the case of infinite N, which I will do in a reply.

Now, to restate the original "thing" we were trying to honestly say we had a prior for:

Does this work, given this and our response?

We do not actually have a prior for Q, but we have a rough prior for a highly related question Q', which can be transformed likely fairly easily into a prior for Q using mechanical methods. So let's do that "non-mechanically" by saying:

simultaneous(not just concurrent) algorithms). ::-> (predicted continuation point)-> [Now to be able to say "we have a prior," we have to write the continuation from 3. until the meaning of both

From our previous response, our prior for finite N and a small amount of evidence was that "life is unlikely" (because there were two separate observationally consistent ways which resulted in 'the same' answer in some equivalence or similarity sense). For infinite N, it looks like "n is of lower dimension in some way" (dimension here meaning "bigness") than N.

Now we have a prior for both, so we can try to convert back to a prior for the original proposition Q, which was:

Our prior is that Q is false.]

In retrospect, the preceding (now in square brackets, which were edited in) could be considered a continuation of 3. So we are OK in all 5 ways, and we have a prior, so we can continue responding to the article.

(To be continued in reply)

At least on a semi-superficial glance, you seem to be switching between using "I" / "this planet" as rigid designators in some places and as indexicals/demonstratives/non-rigid designators (i.e. "whatever this thing here is") in other places. This may be at least part of what made this post seem unconvincing -- e.g. there is nothing weird about being uncertain about "you == you" if by that you mean "whatever this thing here is == Katja Grace".

Are you familiar with Stuart Armstrong's work on anthropics?

Yes. Anything in particular there you think is relevant?

Well first, Stuart discusses these problems in terms of decision theory rather than probability. I think this is a better way of approaching this, as it avoids pointless debates over, eg., the probability that Sleeping Beauty's coin landed heads when all participants agree as to how she should act, as well as more complicated dilemmas where representing knowledge using probabilities just confuses people.

That said, your ideas could easily be rephrased as decision theoretic rather than epistemic. The framework in Stuart's paper would suggest imagining what strategy a hypothetical agent with your goals would plan 'in advance' and implementing that. I guess it might not be obvious that this gives the correct solution, but the reasons that I think it does come from UDT, which I cannot explain in the space of this comment. There's a lot available about it on the LW wiki, though alternatively you might find it obvious that the framing in terms of a hypothetical agent is equivalent. (Stuart's proposed ADT may or may not be equivalent to UDT; it is unclear whether he intends for precommitments to be able to deal with something like a variant of Parfit's hitchhiker where the driver decides what to do before the hitchhiker comes into existence, but it seems that they wouldn't. The differences are minor enough anyways.)

You propose an alternative anthropic framework, which indicates that you either disagree that the hypothetical agent framing is equivalent or you disagree that Stuart's suggestion is the correct way for such an agent to act in such a scenario.

Obligatory warning of deep flaws.

You provide very little information. I'm not even sure what you disagree with exactly. If it would be inconvenient for you to explain your disagreement that's fine, but I'm didn't update much on your comment. If you want to give me a bit more information about your state of mind, you can tell me how familiar you are with UDT and whether you think it is on the right path towards solving anthropics.

Yeah, sorry about that. The basic idea is that by providing multiple answers, the proposal has immediately given up on getting the same answer as an expected utility maximizer. This is a perfectly fine thing to do if probabilities are impossible to assign, and so maximizing expected utility breaks down. But probability does not in fact break down when confronted with anthropic situations, so picking from N answers just gives you at least an (N-1)/N chance of being wrong.

Expected utility

doesbreak down in the presence of indexical uncertainty though; if there are multiple agents with exactly your observations, it is important to take into account that your decision is the one they will all make. Psy-Kosh's non-anthropic problem deals with this sort of thing, though it also points out that such correlation between agents can exist even without indexical uncertainty, which is irrelevant here.I'm not sure what the N answers that you are talking about are. The different solutions in Stuart's paper refer to agents with different utility functions. Changing the utility function usually does change the optimal course of action.

Psy-Kosh's non-anthropic problem is just regular uncertainty. The experimenters flipped a coin, and you don't know if the coin is heads or tails. The collective decision making then runs you into trouble. I can't think of any cases with indexical uncertainty but no collective decision making that run into similar trouble - in the Sleeping Beauty problem at least the long-term frequency of events is exactly the same as the thing you plug into the utility function to maximize average reward, unlike in the non-anthropic problem. Do you have an example you could give?

EDIT: Oh, I realized one myself - the absent-minded driver problem. In that problem, if you assign utility to the driver at the first intersection - rather than just the driver who makes it to an exit - you end up double-counting and getting the wrong answer. In a way it's collective decision-making with yourself - you're trying to take into account how past-you affected present-you, and how present-you will affect future-you, but the simple-seeming way is wrong. In fact, we could rejigger the problem so it's a two-person, non anthropic problem! Then if we do the reverse transform on Psy-kosh's problem, maybe we could see something interesting... Update forthcoming, but the basic idea seems to be that the problem is when you're cooperating with someone else, even yourself, and are unsure who's filling what role. So you're pretty much right.

The objects in Stuart's paper are decision procedures, but do not involve utility directly (though it is a theorem that you can find a utility function that gives anything). Utility functions have to use a probability before you get a decision out, but these decision procedures don't. Moreover, he uses terms like "average utilitarian" to refer to the operations of the decision procedure (averages individual utilities together), rather than the properties of the hypothetical corresponding utility function.

What's happening is that he's taking an individual utility function and a decision procedure, and saying that together these specify what happens. And I'm saying that this is an over-specified problem.

That's what I was originally trying to suggest, but it seems I was unclear. The absent-minded driver is a simpler example anyways, and does deal with exactly the kind of breakdown of expected utility I was referring to.

Each decision procedure is derived from a utility function. From the paper:

This fully specifies a decision procedure given a utility function. There is no second constraint taking the utility function into account again, so it is not overspecified.

Simpler? Hm. Well, I'm still thinking about that one.

Anyhow, by over-specified I mean that ADT and conventional expected-utility maximization (which I implicitly assumed to come with the utility function) can give different answers. For example, in a non-cooperative problem like copying someone either once or 10^9 times, and then giving the copy a candybar if it can correctly guess how many of them there are. The utility function already gives an answer, and no desiderata are given that show why that's wrong - in fact, it's one of the multiple possible answers laid out.

Simpler in that you don't need to transform it before it is useful here.

Standard expected utility maximization requires a probability distribution, but the problem is that in anthropic scenarios it is not obvious what the correct distribution is and how to correctly update it. ADT uses the prior distribution

before'observing one's own existence', so it circumvents the need to preform anthropic updates.I'm not sure which solution to your candybar problem you think is correct because I am not sure which probability distribution you think is correct, but all the solutions in the paper that disagree with yours actually are what you would want to precommit to given the associated utility function and are therefore correct.

If it was solved in a way that made it obvious for, say, the Sleeping Beauty problem, would that then be the right way to do it?

I think you're just making up utility functions here - is a real utility function (that is, a function of the state of the world) ever calculated in the paper, other than the use of the individual utility function? And we're talking about regular ol' utility functions, why are ADT's decisions necessarily invariant under changing time-like uncertainty (normal sleeping beauty problem) to space-like uncertainty (sleeping beauty problem with duplicates)?

I would tentatively agree. To some extent the problem is one of choosing what it means for a distribution to be correct. I think that this is what Stuart's ADT does (though I don't think it's a full solution to this).

You would also still need to account for acausal influence. Just picking a satisfactory probability distribution doesn't ensure that you will one box on Newcomb's problem, for example.

Is this quote what you had in mind? It seems like calculating a utility function to me, but I'm not sure what you mean by "other than the use of the individual utility function".

That is from page 7 of the paper.

They're not necessarily invariant under such changes. All the examples in the paper were, but that's because they all used rather simple utility functions.

Hm, yes, you're right about that.

Anyhow, I'm done here - I think you've gotten enough repetitions of my claim that if you're not using probabilities, you're not doing expected utility :) (okay, that was an oversimplification)

This is the problem of the mathematician with N children, one of them a girl.

And the question at hand is "if you're one of the children and you're a girl, does that result in a different problem?"

Sorry to confuse you. I did respond to the specific and coherent feedback, I just changed it on OB as well, so you can't tell.

What's the 'error they 'share'?

The large world issues seem kind of confused.

Suppose an ideal agent is using Solomonoff induction to predict it's inputs. The models which have the agent located very far away, at positions with enormously huge spatial distance, have to encode this distance into the model somehow, to be able to predict input that

youare getting. That makes them very huge (all of them) and they all combined have incredibly tiny contribution to algorithmic probability.If you are to do confused Solomonoff induction whereby you seek 'explanation' rather than a proper model - seek anything that contains the agent somewhere inside of it - then the whole notion just breaks down and you do not get anything useful out, you just get iterator over all possible (or if you skip the low level fundamental problem, you run into some form of big-universe issue where you hit 'why bother if there's a copy of me somewhere far away' and 'what is the meaning of measurement if there's some version of me measuring something wrong', but ultimately if you started from scratch you wouldn't even get to that point as you'd never be able to form any even remotely useful world model).

I don't know what you mean by 'large world issues'.

Why is the agent's distance from you relevant to predicting its inputs? Why does a large distance imply huge complexity?

A model for your observations consists (informally) of a model for the universe and then coordinates within the universe which pinpoint your observations, at least in the semantics of Solomonoff induction. So in an infinite universe, most observations must be very complicated, since the coordinates must already be quite complicated. Solomonoff induction naturally defines a roughly-uniform measure over observers in each possible universe, which very slightly discounts observers as they get farther away from distinguished landmarks. The slight discounting makes large universes unproblematic.

I wrote about these things at some point, here, though that was when I was just getting into these things and it now looks silly even to current me. But that's still the only framework I know for reasoning about big universes, splitting brains, and the born probabilities.

I get by with none...

Are you sure?

Consequentialist decision making on "small" mathematical structures seems relatively less perplexing (and far from entirely clear), but I'm very much confused about what happens when there are too "many" instances of decision's structure or in the presence of observations, and I can't point to any specific "framework" that explains what's going on (apart from the general hunch that understanding math better clarifies these things, and it does so far).

If

Xhas a significant probability of existing, but you don't know at all how to reason aboutX, how confident can you be that your inability to reason aboutXisn't doing tremendous harm? (In this case,X= big universes, splitting brains, etc.)I'm not sure if I get the idea, so let me ask this:

Suppose there are only two inabitable planets in the whole multiverse -- a planet A with 1000 people, and a planet B with 1000000 people. I live in a primitive society, so I don't have a clue how many people live on my planet. All I know is the information in this paragraph.

Based on the information that "I exist", should I suppose that I live on a planet A with probability 0.001 and on a planet B with probability 0.999? Or should it be 0.5 and 0.5, because either way, there can be only

oneme on each planet?To me it seems that the 0.001 and 0.999 is the correct answer.

Another example: in the whole multiverse there are only four planets. One of them has two inhabitable continents with 1000 people each, two of them have one inhabitable continent with 1000 people and one empty continent, the last one has two empty continents. Seems to me there is 0.5 probability I am one of the 2000 inhabitants of the first planet, and 0.5 probability I am one of the 1000 + 1000 inhabitants of the second and the third planet.

In your examples, you're using your existence to answer questions about yourself, not about the planets. This is a special case, and not IMHO a very interesting one.

Answering questions about whether there are more or fewer people like you is equivalent to answering which planets exist or what characteristics they have, if those things coincide to some degree. If they don't, you won't get much out of anthropic reasoning anyway.

Re-read his examples. He already knows how many planets they are and how many people are on each of them. He's only trying to figure out which one

he'son.You do get the idea. Assuming that before taking your existence into account you put .5 probability on each type of planet, then the two options you give are the standard SIA and SSA answers respectively. The former involves treating your existence as more evidence than just that someone exists, as I was suggesting in this post.

I think everyone agrees that in the multiverse case (or any case where everyone exists in the same world) you should reason as you do above. The question is whether to treat cases where the people are in different possible worlds analogously with those where the people are just on different planets or in different rooms for instance.

I am assuming that since

this actual planethas life, testing the truth of Q is possible on this planet. Put more work into finding out whether Q is true, then you wouldn't need to argue about other planets that we lack actual data for, since Q would apply on any of the applicable planets.The obvious flaw in this idea is that it's doing half a boolean update - it's ignoring the prior. And scientists spend effort setting themselves up in probabilistic states where their prior is that when they measure a temperature of 15 degrees, it's because the temperature is 15 degrees. Stuff like calibrating the instruments and repeating the measurements are, whether or not they are seen as such, plainly intended to create a chain of AND-ed probability where inaccuracy becomes vanishingly unlikely.

This seems like SSA vs SIA, so maybe you should first agree with your associate on which assumption each one of you is using.

This isn't right. Both SSA and SIA are ways to take indexical information into account. Katja's associate seems to be denying that indexical information makes a difference. So he or she would presumably reject the scientific relevance of both SSA and SIA.

Yes. SSA is complicated though - it effectively doesn't take your existence as a thing in the reference class as evidence, but then it does take any further information you get about yourself into account.

Yes, my associate rejects the scientific relevance of any anthropic principles.