Colonization models: a tutorial on computational Bayesian inference (part 2/2) — LessWrong

x

Colonization models: a tutorial on computational Bayesian inference (part 2/2) — LessWrong

Recap

Part 1 was a tutorial for programming a simulation for the emergence and development of intelligent species in a universe 'similar to ours.' In part 2, we will use the model developed in part 1 to evaluate different explanations of the Fermi paradox. However, keep in mind that the purpose of this two-part series is for showcasing useful methods, not for obtaining serious answers.

We summarize the model given in part 1:

SIMPLE MODEL FOR THE UNIVERSE

The universe is represented by the set of all points in Cartesian 4-space which are of Euclidean distance 1 from the origin (that is, the 3-sphere). The distance between two points is taken to be the Euclidean distance (an approximation to the spherical distance which is accurate at small scales)
The lifespan of the universe consists of 1000 time steps.
A photon travels s=0.0004 units in a time step.
At the end of each time step, there is a chance that a Type 0 civilization will spontaneously emerge in an uninhabited region of space. The base rate for civilization birth is controlled by the parameter a. But this base rate is multiplied by the proportion of the universe which remains uncolonized by Type III civilizations.
In each time step, a Type 0 civilization has a probability b of self-destructing, a probability c of transitioning to a non-expansionist Type IIa civilization, and a probability d of transitioning to a Type IIb civilization.
Observers can detect all Type II and Type III civilizations within their past light cones.
In each time step, a Type IIb civilization has a probability e of transitioning to an expansionist Type III civilization.
In each time step, all Type III civilizations colonize space in all directions, expanding their sphere of colonization by k * s units per time step.

Section III. Inferential Methodology

In this section, no apologies are made for assuming that the reader has a solid grasp of the principles of Bayesian reasoning. Those currently following the tutorial from Part 1 may find it a good idea to skip to Section IV first.

To dodge the philosophical controversies surrounding anthropic reasoning, we will employ an impartial observer model. Like Jaynes, we introduce a robot which is capable of Bayesian reasoning, but here we imagine a model in which such a robot is instantaneously created and randomly injected into the universe at a random point in space, and at a random time point chosen uniformly from 1 to 1000 (and the robot is aware that it is created via this mechanism). We limit ourselves to asking what kind of inferences this robot would make in a given situation. Interestingly, the inferences made by this robot will turn out to be quite similar to the inferences that would be made under the self-indication assumption.

It is important to precisely specify the observational powers of this robot. The robot can estimate the age of the universe, but it cannot determine its location relative to the center of the universe. The robot can visually detect any Type II+ civilizations within its past light cone. Furthermore, it we specify that the robot can detect Type 0 civilizations which are within a small distance u of the robot. We also grant the robot the power to deduce the age of any such nearby Type 0 civilizations.

(In fact, we grant observational powers to this robot in order so that it has close to the same observational powers that we have. But for now, in order to steer clear of the philosophers, we carry on innocently and say no further.)

Now, at the time of its creation, the robot so happens to notice the following data.

Data D :

D1: The current time is t = 500 plus or minus 5
D2: From my location, I can detect a single Type 0 civilization, which emerged at this time step. No type II+ civilizations are visible from my location, and therefore, no type II+ civilizations are visible to this nearby Type 0 civilization

The robot is aware that the universe runs according to the 'simple model of the universe,' but it does not know the values for parameters a, b, c, d, e, and k. Given a hypothesis H as to the values of those parameters, the robot can calculate the posterior probability of observing data D.

We will encode the entire history of the universe in the form of a random variable G which takes values within the space of all possible histories. Even after we fix a particular history, the space-time location of the robot within this history is still random. This will be an important point to remember when doing inference under this impartial observer model.

Define random variable N to be the number of new Type 0 civilizations which are outside the sphere of visibility of any type II+ civilizations in existence at time t = 500. Since in our model only one civilization can appear in a time step, the random variable N is in fact a binary random variable. Conditioned on G taking a particular value g, the random variable N is a fixed number. Conditioned on a particular history g, the probability that the robot appears within distance u of such a civilization is P(D2|H, D1, G=g) = (u^3) (N|H, D1, G=g) = (u^3) (N|H, G=g). We have N|D1 = N since the definition of N does not depend on the current time step.

The robot calculates the posterior probability of the data as

P(D|H) = P(D2|H, D1) P(D1|H) = P(D2|H, D1) (1/1000) = E[P(D2|H,D1,G)]/1000

= E[(u^3) (N|H, D1, G)]/1000 = (u^3)/1000 E[N|H]

Now, supposing we wanted to compare different hypotheses H1, H2, etc., their posterior probabilities would all share the common constant term (u^3)/1000 in front. Since only the ratios of the posterior probabilities are relevant for inference, we drop the constant term and simply remember that

P(D|H) is proportional to E[N|H]

All we need to do, then, to evaluate the plausibility of different configurations H1, H2, etc. of the parameters, is to compute the value of E[N|H].

However, in this case, conditioning on a uniform distribution on a small interval of time t=[495, 505] is computationally inconvenient, because very few of the histories will have a new Type 0 civilization at those specific times. This means that many simulations have to be performed before we can get an approximation of E[N|H] to the necessary degree of accuracy.

We can make our problem more computationally tractable if we weaken the observational powers of our robot; hopefully, our resulting inferences will not be altered too much by this change. (Readers are invited to check this by implementing the inferences for the original setup.) In the modified problem the robot is unable to determine the age of the universe, nor the age any Type 0 civilization it can detect. The robot observes only the following data.

Data D :

From my location, I can detect a single Type 0 civilization. No type II+ civilizations are visible from my location, and therefore, no type II+ civilizations are visible to this nearby Type 0 civilization

For the modified problem, we redefine the random variable N to be the number of Type 0 civilizations outside the sphere of visibility of any Type II+ civilization at random time T. Similar to before, we have

P(D|H) = u^3 E[E[N|H]|T]

or

P(D|H) proportional to E[E[N|H]|T]

What is interesting about this conclusion is that it is similar to the Self-Indication Assumption, which, to quote the linked Wikipedia page, is the principle:

Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.

Whereas here, from Bayes' rule, we have:

Given the fact that a civilization exists satisfying CRITERIA X, the robot should (other things equal) favor hypotheses according to which many such civilizations exist over hypotheses on which few of them exist at randomly determined time T

Section IV. Implementation and Demonstration

We give the code for calculating E[E[N|H]|T] below.

# Code Block 11
no_visibles = function(civs,x) {
no_civs = length(civs)
if (no_civs == 0) {
    return(TRUE)
}
for (i in 1:no_civs) {
    current_civ = civs[[i]]
    y = current_civ$location
    if (distance(x,y) < current_civ$r_v) {
      return(FALSE)
    }
}
return(TRUE)
}
value_n = function(history) {
sum = 0
for (t in 1:1000) {
    civs = history[[t]]
    no_civs = length(civs)
    if (no_civs > 0) {
      for (i in 1:no_civs) {
        civ = civs[[i]]
        if (civ$type == 0) {
          x = civ$location
          if (no_visibles(civs, x)) {
            sum = sum + 1
          }
        }
      }
    }
}
sum/1000
}
multiple_trials_value_n = function(reps=10) {
values = numeric(0) # empty vector
for (i in 1:reps) {
    history=simulation()
    values[i] = value_n(history)
}
values
}
reps = 10
values = multiple_trials_value_n(reps)
# approximate value of E[[N|H]|T]
mean(values)
# standard error of the approximation
sqrt(var(values)/reps)

For those who followed part 1, the above code should be straightforward. At the end we calculate a standard error since we have approximately the value of E[E[N|H]|T] by averaging over random samples of the random variable E[N|H]|T.

Now we can get onto the business of comparing Bayes factors for hypotheses. Here we take the term 'Bayes' factor to refer to the quantity P(D|H). Consider the following example hypotheses.

1) H1: "we are alone in the universe" (small chance of life arising)

# Hypothesis 1
a = 0.001; b = 0; c = 0.1; d = 0; e = 0; k = 0

2) H2: "great filter ahead" (high chance of life, but high chance of Type 0's self-destructing)

# Hypothesis 2
a = 0.1; b = 0.1; c = 0.01; d = 0; e = 0; k = 0

3) H3: "burning the cosmic commons" (high chance of life and of surviving to stage III, but high speed of colonization)

# Hypothesis 3
a = 0.1; b = 0; c = 0; d = 0.1; e = 0.1; k = 0.9

EXERCISE: Run Code Block 11 to calculate the Bayes factors for the above hypotheses.

EXERCISE 2: Can you think of a hypothesis which maximizes the Bayes Factor?

EXERCISE: After assigning prior weights to hypotheses of your choosing, modify the code in order to calculate the posterior probability that a Type 0 civ observing the Fermi paradox ultimately:

self-destructs
becomes colonized while it is a Type II or lower civilization
becomes a type III civilization

Section V. Prior Specification; MAP estimation

SOLUTION TO EXERCISE 2: We can maximize the number of Type 0 observers of the Fermi paradox, naturally, by setting b, c, d to zero so that Type 0 civilizations remain Type 0 civilizations forever, and no Type II+ civilizations ever appear.

This 'maximum likelihood' solution is of little value to us, however, since we would probably not give much prior weight to a model in which all Type 0 civilizations survive forever and never go interstellar. Recall that quantity we are actually interested in, the posterior probability of a hypothesis, depends on both the Bayes factor and the prior weight of the hypothesis:

P(H|D) proportional to P(D|H) P(H)

The parameters H which maximizes P(H|D) is called the Maximum A Posteriori (MAP) estimator.

The business of determining prior weights of hypotheses, however, is a complicated affair. How can we faithfully represent the state of our knowledge of the propensity for intelligent life to evolve and develop in the universe? [Actually, we are still operating under the ruse that we are assigning priors for our robot, not ourselves, but since in the end we only care about the inferences made by the robot insofar as the similarity of our priors and the robot's priors, we might as well make the robot's priors match ours as closely as possible.]

As is the custom we will seek convenient mathematical formulations for the prior distributions. As a guide for coming up with priors, we note:

1/a is the expected time for intelligent life to emerge in an uninhabited universe
1/b is the expected time for Type 0 civs to destroy themselves, when there's no possibility of ascending to Type II
1/(c+d) is the expected time for a Type 0 civ to ascend to Type II, when there's no possibility of self-destruction
d/(c+d) is the proportion of Type II civilizations which are not ideologically opposed to becoming expansionist
1/e is the expected time it takes for a non-pacifist Type II civilization to develop the technology required for serious large-scale colonization efforts (at which point it becomes Type III)
k is the speed limit on colonization relative to the speed of light. It's plausible that some advanced civilizations would colonize planets by sending out tiny robots, which could travel at near light speeds. If you think FTL travel is a possibility, you can even assign prior densities to values of k above 1.

In fact, the above analysis suggests that it may be more natural to assign priors on the transformed parameters:

a0 = 1/a, b0 = 1/b, c0 = 1/(c+d), d0 = d/(c+d), e0 = 1/e.

Meanwhile, if we want to restrict k to lie between 0 and 1 and vanish on the endpoints, we can do this by assigning a prior to the transformed parameter

k0 = log(k) - log(1-k)

From these transformed parameters, we reconstitute the original parameters by

a = 1/a0, b = 1/b0, c = (1-d0)/c0, d = d0/c0, e = 1/e0, k = 1/(1 + exp(-k0))

Since a0, b0, c0, e0 have to be positive, natural priors for these would the exponential. d0 might have a Beta prior. We also might as well take k0 to have a normally distributed prior.

All of these priors would be parameterized by 'hyperparameters,' which, for lack of imagination and lack of Greek symbols, we will call a1, b1, c1, d1_alpha, d1_beta e1, k1 at the risk of some confusion.

Once we have specified priors for the parameters, the very next thing a Bayesian wants to do is to compute the posterior density of the parameters. In our case, we will estimate the posterior by directly drawing from the posterior density.

[to be contd.]