I've just released a Future of Humanity Institute technical report, written as part of the Global Priorities Project.


This article is about priority-setting for work aiming to reduce existential risk. Its chief claim is that all else being equal we should prefer work earlier and prefer to work on risks that might come early. This is because we are uncertain about when we will have to face different risks, because we expect diminishing returns of extra work, and because we expect that more people will work on these risks in the future.

I explore this claim both qualitatively and with explicit models. I consider its implications for two questions: first, “When is it best to do different kinds of work?”; second, “Which risks should we focus on?”.

As a major application, I look at the case of risk from artificial intelligence. The best strategies for reducing this risk depend on when the risk is coming. I argue that we may be underinvesting in scenarios where AI comes soon even though these scenarios are relatively unlikely, because we will not have time later to address them.


You can read the full report here: Allocating risk mitigation across time.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 9:12 PM

I read the full report, excluding the appendices. I'm a layperson. That is, I don't expect I'll ever do direct work on existential risk mitigation, or engineering safety into machine intelligence. Further, I'm not well-versed in the technicalities of either Friendliness theory from the MIRI, or literature from academic studies of artificial intelligence.

As a layperson, I don't know how to assess how much labor is assigned to AI-soon vs. AI-later outcomes. What does AI-soon work look like? What are its features or qualities? Do you consider the MIRI's current research as 'AI-soon' labor?

I don't have the source now, but Robin Hanson mentioned on a couple past blog posts on Overcoming Bias that work on existential risk reduction isn't being done, because nobody knows how to do it. This is a rather cynical perspective, as it would not count the work of the FHI, or the MIRI, as being on existential risk reduction. I believe Dr. Hanson meant it appears no direct, or object-level work is being done. These were from a couple years ago, so his position may be different. The landscape for AI safety has changed dramatically in the last two years. Still, I wonder if it's hard to tell how much, or which, research is oriented toward "sooner" rather than "later" outcomes is because: we don't know how to do that (object-level) work.

This is a good question. To some extent I didn't want to take a position on exactly which work is appropriate for this, as that's independent of the rest of the analysis (although obviously feeds into model parameter estimates).

Something which would definitely help would be just to systematically review what might be useful for AI-soon outcomes.

Possibilities include: working to study the architecture of the more plausible candidates for producing AI; design work on containment mechanisms; producing high-quality data sets of 'human values' (in case value-learning is easy). I think those could all turn out to be useless ex post, but they may still be worth trying more for the possibility that they are useful.

There may also be useful lines which are already being pursued to a serious degree as part of cybersecurity.

One application of this might be for the FLI, and where they decide to grant the money they've received from Elon Musk. In addition to other considerations, it seems the correct conclusion from your paper would be not to underestimate the value of funding research aimed at AI-soon scenarios, as well as fund it because it could create a research environment that makes a greater quantity and quality of research on even AI-later scenarios. Whatever ratio of funding for either scenario they decide works isn't as useless if nobody can discern what counts as AI-soon vs. AI-later research.


I argue that we may be underinvesting in scenarios where AI comes soon even though these scenarios are relatively unlikely, because we will not have time later to address them.

Edit: Separately...

p(X) denotes the probability that we will face problem X. Note that this is meant to be an absolute probability, not conditional on getting to the the point where we might face X.

Are you assuming a hard takeoff intelligence explosion? If not, shouldn’t you also be interested in the probability of UFAI given future advances that may lead to it?

Kurzweil seems to think we will pass some unambiguous signposts on the way to superhuman AI. I would grant this scenario a nonzero probability.

Nitpick: “the the” is a typo.