Don't Condition on no Catastrophes

by Scott Garrabrant1 min read21st Feb 20188 comments

31

Forecasting & PredictionAI TimelinesExistential Risk
Frontpage

I often hear people say things like "By what date do you assign 50% chance to reaching AGI, conditioned on no other form of civilizational collapse happening first?" The purpose of this post is to make this question make you cringe.

I think that most people mentally replace the conditional with something like "if other forms of civilizational collapse were magically not a thing, and did not have to enter into your model." Further, I think this is the more useful question to discuss, as it makes it easier to double crux, or download other people's models. However, not everyone does this, and it is not the question being asked.

To illustrate the difference, consider Alice, who, if they ignored other civilizational collapse, would think that AGI arrival date is uniform over the next 100 years. However, they also think that if not for AGI, extinction level nuclear war will happen in the next 100 years, uniformly at random over the next 100 years. Alice is not concerned about any other catastrophes.

Alice has these two independent distributions on when each event will happen if counterfactually, the other were to magically not happen. However, the world is such that as soon as one of events happens, it causes the other event to not happen, because the world is made very different.

When asking about Alice's median AGI date, ignoring civilizational collapse, we would like to encourage her to say 50 years. However her median AGI date, conditional on no nuclear war happening first is actually 33 years. This is because conditioning on no nuclear war happening first biases towards AGI dates that are early enough to stop a counterfactual future nuclear war.

The form of the question I would like to ask Alice is as follows:

Take your distribution over ways the way the future can go, and sample a random future, . If that future ends with nuclear war at time , sample another world with the property that neither AGI nor any other catastrophe happens before time . If that world ends with a non AGI catastrophe, redefine to be the time of the catastrophe in that world, and repeat the process, until you get a world that ends with AGI, with no other catastrophe happening first. Use this as your new distribution over futures, and tell me the median AGI date.

Note that conditioning on no other catastrophe happening first is the same procedure, except when you sample a new future, you do not require that it has the property that neither AGI nor any other catastrophe happens before time .

I don't have a good name for this alternative to conditioning, and would like suggestions in comments. You may notice a similarity between it and causal counterfactuals. You also notice a similarity between it and the thing you do to get Solomonoff Induction out of the Universal Semimeasure.

31

7 comments, sorted by Highlighting new comments since Today at 10:23 PM
New Comment

I suspect the intuition behind asking these questions in this way is this: People mentally have a model of "AGI", and maybe a model of "nuclear war" or other disasters, and these models are mostly separate (that is, when thinking about AGI timelines, most people "don't worry about" nuclear war and other disasters, whatever that means. So if you don't ask people to exclude those considerations, SOME people will exclude them anyway (because they don't come to mind), while others (call them "pedants" perhaps) will try to manually adjust their model for those considerations, which is technically correct but isn't really what you wanted, and which means you will get heterogenous answers.

Here is another illustration of how catastrophe-censoring gives bad results.

Suppose you ask me when I expect (say) the first human to set foot on another planet. And suppose I think that this is likely to happen in 50 years' time if everything goes well, but that there's some class of Bad Thing that might happen that would delay it -- economic disaster, pandemic, environmental collapse, etc. Now, imagine what happens as we ramp up the severity I think this Bad Thing would have, while keeping everything else constant. My estimate of that date will get later and later, as it obviously should ... until the point where the Bad Thing counts as a catastrophe, when (if we're conditioning on no catastrophe) suddenly my estimated date gets much earlier because now the possible worlds where the Bad Thing happens are being excluded from consideration.

(With a more realistic model of Bad Things, rather than a nice monotonic increase followed by an absurd backward jump we get some more complicated curve that eventually turns downward. But it's the same problem.)

Maybe I should write down the theorem that is implicit here, since it wasn't obvious to me at first: if you ask the example Alice your question, then she will answer 50. So it is recovering the AGI distribution from before nuclear war was added to the model. However, the new type of conditional still doesn't seem very natural to me: if Alice has a model in which the processes leading to AGI and nuclear war are correlated with each other in a more substantive way than just precluding each other at completion, then it is not clear what the new conditional is supposed to be measuring about her model. For example, maybe she thinks that both AGI and nuclear war times will be affected by variables like the rate of technological progress, political developments etc. So it seems simpler just to ask the question about what she would predict about AGI if other considerations were removed from her model.

This reminds me of Gaussian Processs Memoization https://arxiv.org/abs/1512.05665

I strongly suspect this is close to what most people who answer these kinds of questions are doing already (provided they think about it at all). I'd thus be surprised if rephrasing it leads to significantly different responses in surveys. But I agree it should be rephrased regardless.

How about weighting each future f by the inverse of the probability of nuclear war before the time of AI in f (and then re-normalising)?

I firmly agree that ignoring other catastrophes is a mistake, and that including more catastrophes in our estimates is necessary.

I don't believe that independence is a valid assumption. Intuitively I expect shared causal impacts of one catastrophic risk on others, which causes me to suspect they will cluster. For example, consider the following chain:

Climate change results in a drought in Syria -> rebellion in Syria -> civil war in Syria -> Russia, Iran, and the United States get involved in the Syrian conflict -> the risk of nuclear catastrophe increases

I break this down like so:

climate change -> military conflict

military conflict -> nuclear risk

The early developments of computing were driven by military conflict; ENIAC was for artillery firing tables and nuclear weapon simulations. I expect military conflict to increase the risk of AGI by driving incentives for it similarly, so I also have:

military conflict -> AGI risk

So just from the cases of climate change, nuclear risk, and AGI risk I have a causal relationship that looks like this:

climate change -> nuclear risk & AGI risk

This isn't enough for me to put sensible numbers on the problem, but it is enough for me to be suspicious of treating them as independent variables. So currently I cringe at questions of the first type, and also of the second type, but I haven't developed any meaningful improvements.

That being said, this was helpful to me in thinking about the counterfactual case, and for some reason this is also the first time I have ever seen the idea of 'cruxing on the wrong question' pointed to, which is very interesting.