Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The SIA population update can be surprisingly small

4cousin_it

2Stuart_Armstrong

3dadadarren

2Stuart_Armstrong

2avturchin

2Stuart_Armstrong

4avturchin

4Stuart_Armstrong

2avturchin

2avturchin

4Stuart_Armstrong

New Comment

Instant strong upvote. This post changed my view as much as the risk aversion post (which was also by you!)

Does "population" in this passage and "population" in presumptuous philosopher have different meanings?

It seems here by "population difference" is kind of like density. How likely we are going to find aliens (on other planets). But in presumptuous philosopher it meant overall number. T2 does have a trillion more observers, yet it does not explain how much of that is due to higher density and how much is due to a larger universe.

I adapted the presumptuous philosopher for densities, because we'd been using densities in the rest of the post. The argument works for total population as well, moving from an average population of (for some ) to an average population of roughly .

Great post, thanks! It looks like 7 times update could be decisive in some situations. For example if initial probability that we are not alone in the visible universe is 10 per cent, and after the anthropic update it becomes 70 per cent, it changes the situation from “we are most likely” alone to “we are not alone”.

Yep. Though I've found that, in most situations, the observations "we don't see anyone" has a much stronger effect than the anthropic update. It's not always exactly comparable, as anthropic updates are "multiply by and renormalise", while observing no-one is "multiply by and renormalise" - but generally I find the second effect to be much stronger.

Ok. Another question. I have been recently interested in anthropic effects of panspermia. Naively, as panspermia creates millions habitable planets for a galaxy vs. one in non-panspermia world, anthropics should be very favourable for panspermia. But a priori probability of panspermia is low. How is your model could be applied to panspermia?

Anthropic updates do not increase the probability of life in general; they increase the probability of you existing specifically (which, since you've observed many other humans and heard about a lot more, is roughly the same as the probability of any current human existing), and this might have indirect effects on life in general.

So they does not distinguish between "simple life is very hard, but getting from that to human-level life is very easy" and "simple life is very easy, but getting from that to human-level life is very hard". So panspermia remains at its prior, relative to other theories of the same type (see here).

However, panspermia gets a boost from the universe seeming empty, as some versions of panspermia would make humans unexpectedly early (since panspermia needs more time to get going); this means that these theories avoid the penalty from the universe seeming empty, a much larger effect than the anthropic update (see here).

I am still not convinced: it seems that p(abiogenesis) is a very small constant depending on a random generation of a string of around 100 bits. The probability of life becoming intelligence p(li) is also, I assume, is a constant. The only thing we don't know is a multiplier given by panspermia, which shows how many planets will get "infected" from the Eden in a *given type *of universes. This multiplier, I assume, is different in different universes and depends, say, on the density of stars. We could use anthropics to suggests that we lives in the universe with the higher values of the panspermia multiplier (depending of the hare of the universes of this type).

The difference here with what you said above is that we don't make any conclusions about the *average global level *of the multiplier over all of the multiverse, you are right that anthropics can't help us here. Here I use anthropics to conclude about what region of the multiverse I am more likely to be located, not to deduce the global properties of the multiverse. Thus there is no SIA, as there is no "possible observers": all observers are real, but some of them are located in more crowded place.

To better understand the suggested model of small anthropic update I imagined the following thought experiment: my copies are created in 4 boxes: 1 copy in first box, 10 in second, 100 in third and 1000 in forth. Before the update, I have 0.25 chances to be in 4th box. After the update I have 0.89 chances to be in 4th box, so the chances increased only around 3.5 times. Is it a correct model?

Nope, that's not the model. Your initial expected *population* is . After the anthropic update, your probabilities of being in the boxes are , , and (roughly , , and ). The expected population, however is . That's an expected population update of 3.27 times.

Note that, in this instance, the expected population update and the probability update are roughly equivalent, but that need not be the case. Eg if your prior odds are about the population being , , or , then the expected population is roughly , the anthropic-updated odds are , and the updated expected population is roughly . So the probability boost to the larger population is roughly (, but the boost to the expected population is roughly .

With many thanks to Damon Binder, and the spirited conversations that lead to this post, and to Anders Sandberg.People often think that the self-indication assumption (SIA) implies a huge number of alien species, millions of times more than otherwise. Thought experiments like the presumptuous philosopher seem to suggest this.

But here I'll show that, in many cases, updating on SIA doesn't change the expected number of alien species much. It all depends on the prior, and there are many reasonable priors for which the SIA update does nothing more than double the probability of life in the universe

^{[1]}.This can be the case even if the prior says that life is very unlikely! We can have a situation where we are astounded, flabbergasted, and disbelieving about our own existence - "how could we exist, how can this beeeeee?!?!?!?" - and still not update much - "well, life is still pretty unlikely elsewhere, I suppose".

In the one situation where we have an empirical distribution, the "Dissolving the Fermi Paradox" paper, the effect of the SIA anthropics update is to multiply the expected civilization per planet by

seven. Not seven orders of magnitude - just seven.## The formula

Let ρ∈[0,1] be the probability of advanced space-faring life evolving on a given planet; for the moment, ignore issues of life expanding to other planets from their one point of origin. Let f be the prior distribution of ρ, with mean μ and variance σ2. This means that, if we visit another planet, our probability of finding life is μ.

On this planet, we exist

^{[2]}. Then if we update on our existence we get a new distribution f′; this distribution will have mean μ′:μ′=μ(1+σ2μ2).

To see a proof of this result, look at this footnote

^{[3]}.Define Mμ,σ2=1+σ2/μ2 to be this multiplicative factor between μ and μ′; we'll show that there are many reasonable situations where Mμ,σ2 is surprisingly low: think 2 to 100, rather than in the millions or billions.

## Beta distributions I

Let's start with the most uninformative prior of all: a uniform prior over [0,1]. The expectation of ρ is ∫10ρdρ=1/2, so, without any other information, we expect a planet to have life with 50% probability. The variance is σ2=1/12.

Thus if we update on our existence on Earth, we get the posterior f′(ρ)=2ρ; the mean of this is 2/3 (either direct calculation or using M1/2,1/12=1+4/12=4/3).

Even though this change in expectation is multiplicatively small, it does seem that the uniform prior and the f′(ρ) are very different, with f′(ρ) heavily skewed to the right. But now consider what happens if we look at Mars and notice that it hasn't got life. The probability of no life, given ρ, is 1−ρ. Updating on this and renormalising gives a posterior 6ρ(1−ρ):

The expectation of 6ρ(1−ρ), symmetric around 1/2, is of course 1/2. Thus one extra observation (that Mars is dead) has undone, in expectation, all the anthropic impact of our own existence.

This is an example of a beta distribution for α=2 and β=2 (yes, beta distributions have a parameter called β and another one that's α; just deal with it). Indeed, the uniform prior is also a beta distribution (with α=β=1) as is the anthropic updated version 2ρ (which has α=2, β=1).

The update rule for beta distributions is that a positive observation (ie life) increases α by 1, and a negative observation (a dead planet) increases β by 1. The mean of an updated beta distribution is a generalised version of Laplace's law of succession: if our prior is a beta distribution with parameters α and β, and we've had m positive observations and n negative ones, then the mean of the posterior is:

α+mα+β+m+n.

Suppose now that we have observed n dead planets, but no life, and that we haven't done an anthropic update yet, then we have a probability of life of α/(α+β+n). Upon adding the anthropic update, this shifts to (α+1)/(α+β+n+1), meaning that the multiplicative factor is at most (α+1)/α. If we started with the uniform prior with its α=1, this multiplies the probability of life by at most 2. In a later section, we'll look at α<1.

## High prior probability is not required for weak anthropic update

The uniform prior has α=β=1 and starts at expectation 1/2. But we can set α=1 and a much higher β, which skews the distribution to the left; for example, for β=2, 3, and 10:

Even though these priors are skewed to the left, and have lower prior probabilities of life (1/3, 1/4, and 1/11), the anthropic update has a factor Mμ,σ2 that is less than 2.

Also note that if we scale the prior f by a small ϵ, so replace f(ρ) on the range [0,1] with f(ρ/ϵ)/ϵ on the range [0,ϵ], then μ is multiplied by ϵ and σ2 is multiplied by ϵ2. Thus Mμ,ϵ is unchanged. Here, for example, is the uniform distribution, scaled down by ϵ=1, ϵ=1/3, and ϵ=1/20:

All of these will have the same Mμ,σ2 (which is 4/3, just as for the uniform distribution). And, of course, doing the same scaling with the various beta distributions we've seen up until now will also keep Mμ,σ2 constant.

Thus there are a lot of distributions with very low μ (ie very low prior probability of life) but an Mμ,σ2 that's less than 2 (ie the anthropic update is less than a doubling of the probability of life).

## Beta distributions II and log-normals

The best-case scenario for Mμ,σ2 is if f assigns probability 1 to ρ=μ. In that case, σ2=0 and M=1: the anthropic update changes nothing.

Conversely, the worse-case scenario for Mμ,σ2 is if f only allows ρ=0 and ρ=1. In that case, f assigns probability μ to 1 and 1−μ to 0, for a mean of μ and a variance of σ2=μ−μ2, and a multiplicative factor of Mμ,σ2=1/μ. In this case, after anthropic update, f′ assigns certainty to ρ=1 (since any life at all, given this f, means life on all planets).

But there are also more reasonable priors with large Mμ,σ2. We've already seen some, implicitly, above: the beta distributions with α<1. In that case, Mμ,σ2 is bounded by (α+1)/α. If α=3/4 and β=1, for instance, this corresponds to the (unbounded) distribution f(ρ)=(3/4)ρ−1/4; the multiplicative factor is below 7/3, which is slightly above 2. But as α declines, the multiplicative factor can go up surprisingly fast; at α=1/2 it is 3, at α=1/4 it is 5:

In general, for α=1/n, the multiplicative factor is bounded by n+1. This gets arbitrarily large as α→0. Though α=0 itself corresponds to the improper prior f(ρ)=1/ρ, whose integral diverges. On a log scale, this corresponds to the log-uniform distribution, which is roughly what you get if you assume "we need N steps, each of probability p, to get life; let's put a uniform prior over the possible Ns".

It's not clear why one might want to choose α=1/1020 for a prior, but there is a class of prior that is much more natural: the log-normal distributions. These are random variables X such that log(X) is normally distributed.

If we choose log(X) to have a mean that is highly negative (and a variance that isn't too large), then we can mostly ignore the fact that X takes values above 1, and treat it as a prior distribution for ρ. The mean and variance of the log-normal distributions can be explicitly defined, thus giving the multiplications factor as:

Mμ,σ2=exp¯¯¯σ2.

Here, ¯¯¯σ2 is the variance of the normal distribution log(X). This ¯¯¯σ2 might be large, as it denotes (roughly) "we need N steps, each of probability p, to get life; let's put a uniform-ish prior over a range of possible Ns". Unlike 1/ρ, this is a proper prior, and a plausible one; therefore there are plausible priors with very large Mμ,σ2. The log normal is quite likely to appear, as it is the approximate limit of multiplying together a host of different independent parameters.

## Multiplication law

Do you know what's more likely to be useful than "the approximate limit of multiplying together a host of different independent parameters"? Actually multiplying together independent parameters.

The famous Drake equation is:

R∗⋅fp⋅ne⋅fl⋅fi⋅fc⋅L.

Here R∗ is the number of stars in our galaxy, fp the fraction of those with planets, ne the number of planets that can support life per star that has planets, fl the fraction of those that develop life, fi the fraction of those that develop intelligent life, fc the fraction of those that release detectable signs of their existence, and L is the length of time those civilizations endure as detectable.

Then the proportion of advanced civilizations per planet is qflfi, where q is the proportion of life-supporting planets among all planets. To compute the M of this distribution, we have the highly useful result (the proof is in this footnote

^{[4]}):The paper "dissolving the Fermi paradox" gives estimated distributions for all the terms in the Drake equation. The q, which doesn't appear in that paper, is a constant, so has Mq=1. The fi has a log-uniform distribution from 0.001 to 1; the M can be computed from the mean and variance of such distributions, so Mfi=log(1/0.001)1−0.00122(1−0.001)2≈3.5.

The fl term is more complicated; it is distributed like g(X)=1−e−eX⋅50log(10) where X is a standard normal distribution. Fortunately, we can estimate its mean and variance without having to figure out its distribution, by numerical integration of g(x) and g(x2) on the normal distribution. This gives μ≈0.5, σ2≈0.25 and M≈2. The overall the multiplicative effect of anthropic update is:

Mplanet≈7.

What if we considered the proportion of advanced civilization per star, rather than per planet? Then we can drop the q term and add in fp and ne. Those are both estimated to be distributed as log-uniform on [0.1,1]; for a total M of

Mstar≈14.

Why is the M higher for civilizations per star than civilizations per planet? That's because when we update on our existence, we increase the proportion of civilizations per planet, but we also update the proportion of planets per star - both of these can make life more likely. The Mstar incorporates both effects, so is strictly higher than Mplanet.

We can do the same by considering the number of civilizations per galaxy; then we have to incorporate R∗ as well. This is log-uniform on [1,100], giving:

Mgalaxy≈32.

What about if we include the Fermi observation (the fact that we don't see anything in our galaxy)? The "dissolving the Fermi paradox" paper shows there are multiple different ways of including this update, depending on how we parse out "not seeing anything" and how easy it is for civilizations to expand.

I did a crude estimate here by taking the Fermi observation to mean "the proportion of civilizations per galaxy must be less than one". Then I did a Monte-Carlo simulation, ignoring all results above 0 on the log scale:

From this, I got an estimated mean of 0.027, variance of 0.014, and a total multiplier of:

Mgalaxy, Fermi≈21.

With the Fermi observation and the anthropic update combined, we expect 0.56 civilizations per galaxy.

## Limitations of the multiplier

## Low multiplier, strong effects

It's important to note that the anthropic update can be very strong, without changing the expected population much. So a low Mμ,σ2 doesn't necessary mean a low impact.

Consider for instance the presumptuous philosopher, slightly modified to use planetary population densities. Thus theory T1 predicts ρ=1/1012 (one in a trillion) and T2 predicts ρ=1; we put initial probabilities 1/2 on both theories.

As Nick Bostrom noted, the SIA update pushes T2 to being a trillion times more probable than T1; a postiori, T2 is roughly a certainty (the actual probability is 1012/(1012+1)).

However, the expected population goes from roughly 1/2 (the average of 1/1012 and 1) to roughly 1 (since

a postioriT2 is almost certain). This gives a Mμ,σ2 of roughly 2. So, despite the strong update towards T2, the actual population update is small - and, conversely, despite the actual population update being small, we have a strong update towards T2.## Combining multiple theories

In the previous post, note that that both T1 and T2 were point estimates: they posit a constant ρ. So they have a variance of zero, and hence a Mμ,σ2 of 1. But T2 has a much stronger anthropic update. Thus we can't use their Mμ,σ2 to compare the anthropic effects on different theories.

We also can't relate the individual Ms to that of a combined theory. As we've seen, T1 and T2 have Ms of 1, but the combined theory (1/2)T1+(1/2)T2 has an M of roughly 2. But we can play around with the relative initial weight of T1 and T2 to get other Ms.

If we started with odds 1012:1 on T1 vs T2, then this has a mean ρ of roughly 10−12; the anthropic update sends it to 1:1 odds, with a mean of roughly 1/2. So this combined theory has an M of roughly 1012/2, half a trillion.

But, conversely, if we started with odds 1:1012 on T1 vs T2, then we have an initial mean of ρ of roughly one; its anthropic update is odds of 1:1024, also with a mean of roughly one. So this combined theory has an M of roughly 1.

There

isa weak relation between M and the Mi of the various Ti. Let Mi be the multiplier of Ti has a multiplier of Mi; we can reorder the Ti so that Mi≤Mj for i≤j. Let T be a combined theory that assigns probability pi to Ti.So, the minimum value of the Mi is a lower bound on M, and we can get arbitrarily close to that bound. See the proof in this footnote

^{[5]}.As we'll see, the

populationupdate is small even in the presumptuous philosopher experiment itself. ↩︎Citation partially needed: I'm ignoring Boltzmann brains and simulations and similar ideas. ↩︎

Given a fixed ρ, the probability of observing life on our own planet is exactly ρ. So Bayes's theorem implies that f′(ρ)∝ρf(ρ). With the full normalisation, this is

f′(ρ)=ρf(ρ)∫10ρf(ρ)dρ.

If we want to get the mean μ′ of this distribution, we further multiply by ρ and integrate:

μ′=Ef′(ρ)=∫10ρ2f(ρ)∫10ρf(ρ)dρdρ=∫10ρ2f(ρ)dρ∫10ρf(ρ)dρ.

Let's multiply this by 1=1/1=(∫10f(ρ)dρ)/(∫10f(ρ)dρ) and regroup the terms:

μ′=∫10ρ2f(ρ)dρ∫10f(ρ)dρ⋅∫10f(ρ)dρ∫10ρf(ρ)dρ.

Thus μ′= Ef(ρ2)/Ef(ρ)= (σ2+μ2)/μ= μ(1+σ2/μ2), using the fact that the variance is the expectation of ρ2 minus the square of the expectation of ρ. ↩︎

I adapted the proof in this post.

So, let Xi be independent random variables with means μi and variances σ2i. Let X=∏iXi, which has mean μ and variance σ2. Due to the independence of the Xi, the expectations of their products are the product of their expectations. Note that X2i and X2j are also independent if i≠j. Then we have:

∏iMμi,σ2i=∏i(1+σ2iμ2i)=∏i(μ2i+σ2iμ2i)=∏i(E(X2i)μ2i)=∏i(E(X2i))∏iE(Xi)2=E(X2)E(X)2=μ2+σ2μ2=1+σ2μ2=Mμ,σ2. ↩︎

Let {fi}1≤i≤n be probability distributions on ρ, with mean μi, variance σ2i, expectation squared si=Efi(ρ2)=σ2i+μ2i, and Mi=si/μ2i. Without loss of generality, reorder the fi so that Mi≤Mj for i<j.

Let f be the probability distribution f=p1f1+…pnfn, with associated multiplier M. Without loss of generality, assume Mi≤Mj for i<j. Then we'll show that M≥M1.

We'll first show this in the special case where n=2 and M1=M2, then generalise to the general case, as is appropriate for a generalisation. If s1/μ21=M1=M2=s2/μ22, then, since all terms are non-negative, there exists an α such that s1=α2s2 while μ1=αμ2. Then for any given p=p1, the M of f is:

M(p)=ps1+(1−p)s2(pμ1+(1−p)μ2)2=ps1+(1−p)α2s1(pμ1+(1−p)αμ1)2=M11(p)+α2(1−p)(1(p)+α(1−p))2.

The function x→x2 is convex, so, interpolating between the values x=1 and x=α, we know that for all 0≤p≤1, the term (1(p)+α(1−p))2 must be lower than 12(p)+α2(1−p). Therefore (1(p)+α2(1−p))/(1(p)+α(1−p))2 is at most 1, and M(p)≤M1. This shows the result for n=2 if M1=M2.

Now assume that M2>M1, so that s1/μ21<s2/μ22. Then replace s2 with s′2, which is lower than s2, so that s1/μ21=s′2/μ22. If we define M′(p) as the expression for M(p) with $s_2' substituting for s2, we know that M′(p)≤M(p), since s′2<s2. Then the previous result shows that M′(p)≥M1, thus M(p)≥M1 too.

To show the result for larger n, we'll induct on n. For n=1 the result is a tautology, M1≤M1, and we've shown the result for n=2. Assume the result is true for n−1, and then notice that f=p1f1+…pnfn can be re-written as f=p1f1+(1−p1)f′, where f′=(p′2f2+…p′nfn) for p′i=pn/(1−p1). Then, by the induction hypothesis, if M′ is the M of f′, then M′≥M2. Then applying the result for n=2 between f1 and f′, gives M≤min(M1,M′). However, since M1≤M2 and M′≥M2, we know that min(M1,M′)=M1, proving the general result.

To show M can get arbitrarily close to M1, simply note that M is continuous in the {pi}, define p1=1−ϵ, pi=ϵ/(n−1) for i>1, and let ϵ tend to 0. ↩︎