Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

With many thanks to Damon Binder, and the spirited conversations that lead to this post, and to Anders Sandberg.

People often think that the self-indication assumption (SIA) implies a huge number of alien species, millions of times more than otherwise. Thought experiments like the presumptuous philosopher seem to suggest this.

But here I'll show that, in many cases, updating on SIA doesn't change the expected number of alien species much. It all depends on the prior, and there are many reasonable priors for which the SIA update does nothing more than double the probability of life in the universe[1].

This can be the case even if the prior says that life is very unlikely! We can have a situation where we are astounded, flabbergasted, and disbelieving about our own existence - "how could we exist, how can this beeeeee?!?!?!?" - and still not update much - "well, life is still pretty unlikely elsewhere, I suppose".

In the one situation where we have an empirical distribution, the "Dissolving the Fermi Paradox" paper, the effect of the SIA anthropics update is to multiply the expected civilization per planet by seven. Not seven orders of magnitude - just seven.

The formula

Let be the probability of advanced space-faring life evolving on a given planet; for the moment, ignore issues of life expanding to other planets from their one point of origin. Let be the prior distribution of , with mean and variance . This means that, if we visit another planet, our probability of finding life is .

On this planet, we exist[2]. Then if we update on our existence we get a new distribution ; this distribution will have mean :

To see a proof of this result, look at this footnote[3].

Define to be this multiplicative factor between and ; we'll show that there are many reasonable situations where is surprisingly low: think to , rather than in the millions or billions.

Beta distributions I

Let's start with the most uninformative prior of all: a uniform prior over . The expectation of is , so, without any other information, we expect a planet to have life with probability. The variance is .

Thus if we update on our existence on Earth, we get the posterior ; the mean of this is (either direct calculation or using ).

Even though this change in expectation is multiplicatively small, it does seem that the uniform prior and the are very different, with heavily skewed to the right. But now consider what happens if we look at Mars and notice that it hasn't got life. The probability of no life, given , is . Updating on this and renormalising gives a posterior :

The expectation of , symmetric around , is of course . Thus one extra observation (that Mars is dead) has undone, in expectation, all the anthropic impact of our own existence.

This is an example of a beta distribution for and (yes, beta distributions have a parameter called and another one that's ; just deal with it). Indeed, the uniform prior is also a beta distribution (with ) as is the anthropic updated version (which has , ).

The update rule for beta distributions is that a positive observation (ie life) increases by , and a negative observation (a dead planet) increases by . The mean of an updated beta distribution is a generalised version of Laplace's law of succession: if our prior is a beta distribution with parameters and , and we've had positive observations and negative ones, then the mean of the posterior is:

Suppose now that we have observed dead planets, but no life, and that we haven't done an anthropic update yet, then we have a probability of life of . Upon adding the anthropic update, this shifts to , meaning that the multiplicative factor is at most . If we started with the uniform prior with its , this multiplies the probability of life by at most . In a later section, we'll look at .

High prior probability is not required for weak anthropic update

The uniform prior has and starts at expectation . But we can set and a much higher , which skews the distribution to the left; for example, for , , and :

Even though these priors are skewed to the left, and have lower prior probabilities of life (, , and ), the anthropic update has a factor that is less than .

Also note that if we scale the prior by a small , so replace on the range with on the range , then is multiplied by and is multiplied by . Thus is unchanged. Here, for example, is the uniform distribution, scaled down by , , and :

All of these will have the same (which is , just as for the uniform distribution). And, of course, doing the same scaling with the various beta distributions we've seen up until now will also keep constant.

Thus there are a lot of distributions with very low (ie very low prior probability of life) but an that's less than (ie the anthropic update is less than a doubling of the probability of life).

Beta distributions II and log-normals

The best-case scenario for is if assigns probability to . In that case, and : the anthropic update changes nothing.

Conversely, the worse-case scenario for is if only allows and . In that case, assigns probability to and to , for a mean of and a variance of , and a multiplicative factor of . In this case, after anthropic update, assigns certainty to (since any life at all, given this , means life on all planets).

But there are also more reasonable priors with large . We've already seen some, implicitly, above: the beta distributions with . In that case, is bounded by . If and , for instance, this corresponds to the (unbounded) distribution ; the multiplicative factor is below , which is slightly above . But as declines, the multiplicative factor can go up surprisingly fast; at it is , at it is :

In general, for , the multiplicative factor is bounded by . This gets arbitrarily large as . Though itself corresponds to the improper prior , whose integral diverges. On a log scale, this corresponds to the log-uniform distribution, which is roughly what you get if you assume "we need steps, each of probability , to get life; let's put a uniform prior over the possible s".

It's not clear why one might want to choose for a prior, but there is a class of prior that is much more natural: the log-normal distributions. These are random variables such that is normally distributed.

If we choose to have a mean that is highly negative (and a variance that isn't too large), then we can mostly ignore the fact that takes values above , and treat it as a prior distribution for . The mean and variance of the log-normal distributions can be explicitly defined, thus giving the multiplications factor as:

Here, is the variance of the normal distribution . This might be large, as it denotes (roughly) "we need steps, each of probability , to get life; let's put a uniform-ish prior over a range of possible s". Unlike , this is a proper prior, and a plausible one; therefore there are plausible priors with very large . The log normal is quite likely to appear, as it is the approximate limit of multiplying together a host of different independent parameters.

Multiplication law

Do you know what's more likely to be useful than "the approximate limit of multiplying together a host of different independent parameters"? Actually multiplying together independent parameters.

The famous Drake equation is:

Here is the number of stars in our galaxy, the fraction of those with planets, the number of planets that can support life per star that has planets, the fraction of those that develop life, the fraction of those that develop intelligent life, the fraction of those that release detectable signs of their existence, and is the length of time those civilizations endure as detectable.

Then the proportion of advanced civilizations per planet is , where is the proportion of life-supporting planets among all planets. To compute the of this distribution, we have the highly useful result (the proof is in this footnote[4]):

  • Let be independent random variables with multiplicative factors , and let be the multiplicative factor of . Then - the total is the product of the individual .

The paper "dissolving the Fermi paradox" gives estimated distributions for all the terms in the Drake equation. The , which doesn't appear in that paper, is a constant, so has . The has a log-uniform distribution from to ; the can be computed from the mean and variance of such distributions, so .

The term is more complicated; it is distributed like where is a standard normal distribution. Fortunately, we can estimate its mean and variance without having to figure out its distribution, by numerical integration of and on the normal distribution. This gives , and . The overall the multiplicative effect of anthropic update is:

What if we considered the proportion of advanced civilization per star, rather than per planet? Then we can drop the term and add in and . Those are both estimated to be distributed as log-uniform on ; for a total of

Why is the higher for civilizations per star than civilizations per planet? That's because when we update on our existence, we increase the proportion of civilizations per planet, but we also update the proportion of planets per star - both of these can make life more likely. The incorporates both effects, so is strictly higher than .

We can do the same by considering the number of civilizations per galaxy; then we have to incorporate as well. This is log-uniform on , giving:

What about if we include the Fermi observation (the fact that we don't see anything in our galaxy)? The "dissolving the Fermi paradox" paper shows there are multiple different ways of including this update, depending on how we parse out "not seeing anything" and how easy it is for civilizations to expand.

I did a crude estimate here by taking the Fermi observation to mean "the proportion of civilizations per galaxy must be less than one". Then I did a Monte-Carlo simulation, ignoring all results above on the log scale:

From this, I got an estimated mean of , variance of , and a total multiplier of:

With the Fermi observation and the anthropic update combined, we expect civilizations per galaxy.

Limitations of the multiplier

Low multiplier, strong effects

It's important to note that the anthropic update can be very strong, without changing the expected population much. So a low doesn't necessary mean a low impact.

Consider for instance the presumptuous philosopher, slightly modified to use planetary population densities. Thus theory predicts (one in a trillion) and predicts ; we put initial probabilities on both theories.

As Nick Bostrom noted, the SIA update pushes to being a trillion times more probable than ; a postiori, is roughly a certainty (the actual probability is ).

However, the expected population goes from roughly (the average of and ) to roughly (since a postiori is almost certain). This gives a of roughly . So, despite the strong update towards , the actual population update is small - and, conversely, despite the actual population update being small, we have a strong update towards .

Combining multiple theories

In the previous post, note that that both and were point estimates: they posit a constant . So they have a variance of zero, and hence a of . But has a much stronger anthropic update. Thus we can't use their to compare the anthropic effects on different theories.

We also can't relate the individual s to that of a combined theory. As we've seen, and have s of , but the combined theory has an of roughly . But we can play around with the relative initial weight of and to get other s.

If we started with odds on vs , then this has a mean of roughly ; the anthropic update sends it to odds, with a mean of roughly . So this combined theory has an of roughly , half a trillion.

But, conversely, if we started with odds on vs , then we have an initial mean of of roughly one; its anthropic update is odds of , also with a mean of roughly one. So this combined theory has an of roughly .

There is a weak relation between and the of the various . Let be the multiplier of has a multiplier of ; we can reorder the so that for . Let be a combined theory that assigns probability to .

  1. For all , .
  2. For all , there exists with all , so that .

So, the minimum value of the is a lower bound on , and we can get arbitrarily close to that bound. See the proof in this footnote[5].


  1. As we'll see, the population update is small even in the presumptuous philosopher experiment itself. ↩︎

  2. Citation partially needed: I'm ignoring Boltzmann brains and simulations and similar ideas. ↩︎

  3. Given a fixed , the probability of observing life on our own planet is exactly . So Bayes's theorem implies that . With the full normalisation, this is

    If we want to get the mean of this distribution, we further multiply by and integrate:

    Let's multiply this by and regroup the terms:

    Thus , using the fact that the variance is the expectation of minus the square of the expectation of . ↩︎

  4. I adapted the proof in this post.

    So, let be independent random variables with means and variances . Let , which has mean and variance . Due to the independence of the , the expectations of their products are the product of their expectations. Note that and are also independent if . Then we have:

    ↩︎

  5. Let be probability distributions on , with mean , variance , expectation squared , and . Without loss of generality, reorder the so that for .

    Let be the probability distribution , with associated multiplier . Without loss of generality, assume for . Then we'll show that .

    We'll first show this in the special case where and , then generalise to the general case, as is appropriate for a generalisation. If , then, since all terms are non-negative, there exists an such that while . Then for any given , the of is:

    The function is convex, so, interpolating between the values and , we know that for all , the term must be lower than . Therefore is at most , and . This shows the result for if .

    Now assume that , so that . Then replace with , which is lower than , so that . If we define as the expression for with $s_2' substituting for , we know that , since . Then the previous result shows that , thus too.

    To show the result for larger , we'll induct on . For the result is a tautology, , and we've shown the result for . Assume the result is true for , and then notice that can be re-written as , where for . Then, by the induction hypothesis, if is the of , then . Then applying the result for between and , gives . However, since and , we know that , proving the general result.

    To show can get arbitrarily close to , simply note that is continuous in the , define , for , and let tend to . ↩︎

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 8:26 AM

Instant strong upvote. This post changed my view as much as the risk aversion post (which was also by you!)

Does "population" in this passage and "population" in presumptuous philosopher have different meanings?

It seems here by "population difference" is kind of like density. How likely we are going to find aliens (on other planets). But in presumptuous philosopher it meant overall number. T2 does have a trillion more observers, yet it does not explain how much of that is due to higher density and how much is due to a larger universe.

I adapted the presumptuous philosopher for densities, because we'd been using densities in the rest of the post. The argument works for total population as well, moving from an average population of (for some ) to an average population of roughly .

Great post, thanks! It looks like 7 times update could be decisive in some situations. For example if initial probability that we are not alone in the visible universe is 10 per cent, and after the anthropic update it becomes 70 per cent, it changes the situation from “we are most likely” alone to “we are not alone”.

Yep. Though I've found that, in most situations, the observations "we don't see anyone" has a much stronger effect than the anthropic update. It's not always exactly comparable, as anthropic updates are "multiply by and renormalise", while observing no-one is "multiply by and renormalise" - but generally I find the second effect to be much stronger.

Ok. Another question. I have been recently interested in anthropic effects of panspermia. Naively, as panspermia creates millions habitable planets for a galaxy vs. one in non-panspermia world, anthropics should be very favourable for panspermia. But a priori probability of panspermia is low. How is your model could be applied to panspermia? 

Anthropic updates do not increase the probability of life in general; they increase the probability of you existing specifically (which, since you've observed many other humans and heard about a lot more, is roughly the same as the probability of any current human existing), and this might have indirect effects on life in general.

So they does not distinguish between "simple life is very hard, but getting from that to human-level life is very easy" and "simple life is very easy, but getting from that to human-level life is very hard". So panspermia remains at its prior, relative to other theories of the same type (see here).

However, panspermia gets a boost from the universe seeming empty, as some versions of panspermia would make humans unexpectedly early (since panspermia needs more time to get going); this means that these theories avoid the penalty from the universe seeming empty, a much larger effect than the anthropic update (see here).

I am still not convinced: it seems that p(abiogenesis) is a very small constant depending on a random generation of a string of around 100 bits.  The probability of life becoming intelligence p(li) is also, I assume, is a constant. The only thing we don't know is a multiplier given by panspermia, which shows how many planets will get "infected" from the Eden in a given type of universes. This multiplier, I assume, is different in different universes and depends, say, on the density of stars.  We could use anthropics to suggests that we lives in the universe with the higher values of the panspermia multiplier (depending of the hare of the universes of this type).

The difference here with what you said above is that we don't make any conclusions about the average global level of the multiplier over all of the multiverse, you are right that anthropics can't help us here. Here I use anthropics to conclude about what region of the multiverse I am more likely to be located, not to deduce the global properties of the multiverse. Thus there is no SIA, as there is no "possible observers": all observers are real, but some of them are located in more crowded place. 

To better understand the suggested model of small anthropic update I imagined the following thought experiment: my copies are created in 4 boxes: 1 copy in first box, 10 in second, 100 in third and 1000 in forth. Before the update, I have 0.25 chances to be in 4th box. After the update I have 0.89 chances to be in 4th box, so the chances increased only around 3.5 times. Is it a correct model?

Nope, that's not the model. Your initial expected population is . After the anthropic update, your probabilities of being in the boxes are , , and (roughly , , and ). The expected population, however is . That's an expected population update of 3.27 times.

Note that, in this instance, the expected population update and the probability update are roughly equivalent, but that need not be the case. Eg if your prior odds are about the population being , , or , then the expected population is roughly , the anthropic-updated odds are , and the updated expected population is roughly . So the probability boost to the larger population is roughly (, but the boost to the expected population is roughly .