MichaelA

I’m Michael Aird, an Associate Researcher with Rethink Priorities. In March, I'll also start part-time as a Research Scholar with the Future of Humanity Institute. Opinions expressed are my own. You can give me anonymous feedback at this link.

With Rethink, I'll likely continue their project on nuclear risk (among other things). With FHI, I might work on these things.

Previously, I did longtermist macrostrategy research for Convergence Analysis and then for the Center on Long-Term Risk. More on my background here.

I mostly post to the EA Forum.

If you think you or I could benefit from us talking, feel free to message me or schedule a call.

Sequences

Information hazards and downside risks
Moral uncertainty

Comments

Why those who care about catastrophic and existential risk should care about autonomous weapons

If I had to choose between a AW treaty and some treaty governing powerful AI, the latter (if it made sense) is clearly more important. I really doubt there is such a choice and that one helps with the other, but I could be wrong here. [emphasis added]

Did you mean something like "and in fact I think that one helps with the other"?

Forecasting Thread: Existential Risk

I don't think I know of any person who's demonstrated this who thinks risk is under, say, 10%

If you mean risk of extinction or existential catastrophe from AI at the time AI is developed, it seems really hard to say, as I think that that's been estimated even less often than other aspects of AI risk (e.g. risk this century) or x-risk as a whole. 

I think the only people (maybe excluding commenters who don't work on this professionally) who've clearly given a greater than 10% estimate for this are: 

  • Buck Schlegris (50%)
  • Stuart Armstrong (33-50% chance humanity doesn't survive AI)
  • Toby Ord (10% existential risk from AI this century, but 20% for when the AI transition happens)

Meanwhile, people who I think have effectively given <10% estimates for that (judging from estimates that weren't conditioning on when AI was developed; all from my database):

  • Very likely MacAskill (well below 10% for extinction as a whole in the 21st century)
  • Very likely Ben Garfinkel (0-1% x-catastrophe from AI this century)
  • Probably the median FHI 2008 survey respondent (5% for AI extinction in the 21st century)
  • Probably Pamlin & Armstrong in a report (0-10% for unrecoverable collapse extinction from AI this century)
    • But then Armstrong separately gave a higher estimate
    • And I haven't actually read the Pamlin & Armstrong report
  • Maybe Rohin Shah (some estimates in a comment thread)

(Maybe Hanson would also give <10%, but I haven't seen explicit estimates from him, and his reduced focus on and "doominess" from AI may be because he thinks timelines are longer and other things may happen first.)

I'd personally consider all the people I've listed to have demonstrated at least a fairly good willingness and ability to reason seriously about the future, though there's perhaps room for reasonable disagreement here. (With the caveat that I don't know Pamlin and don't know precisely who was in the FHI survey.)

Forecasting Thread: Existential Risk

Mostly I only start paying attention to people's opinions on these things once they've demonstrated that they can reason seriously about weird futures

[tl;dr This is an understandable thing to do, but does seem to result in biasing one's sample towards higher x-risk estimates]

I can see the appeal of that principle. I partly apply such a principle myself (though in the form of giving less weight to some opinions, not ruling them out).

But what if it turns out the future won't be weird in the ways you're thinking of? Or what if it turns out that, even if it will be weird in those ways, influencing it is too hard, or just isn't very urgent (i.e., the "hinge of history" is far from now), or is already too likely to turn out well "by default" (perhaps because future actors will also have mostly good intentions and will be more informed). 

Under such conditions, it might be that the smartest people with the best judgement won't demonstrate that they can reason seriously about weird futures, even if they hypothetically could, because it's just not worth their time to do so. In the same way as how I haven't demonstrated my ability to reason seriously about tax policy, because I think reasoning seriously about the long-term future is a better use of my time. Someone who starts off believing tax policy is an overwhelmingly big deal could then say "Well, Michael thinks the long-term future is what we should focus on instead, but how why should I trust Michael's view on that when he hasn't demonstrated he can reason seriously about the importance and consequences of tax policy?"

(I think I'm being inspired here by Trammell's interested posting "But Have They Engaged With The Arguments?" There's some LessWrong discussion - which I haven't read - of an early version here.)

I in fact do believe we should focus on long-term impacts, and am dedicating my career to doing so, as influencing the long-term future seems sufficiently likely to be tractable, urgent, and important. But I think there are reasonable arguments against each of those claims, and I wouldn't be very surprised if they turned out to all be wrong. (But I think currently we've only had a very small part of humanity working intensely and strategically on this topic for just ~15 years, so it would seem too early to assume there's nothing we can usefully do here.)

And if so, it would be better to try to improve the short-term future, which further future people can't help us with, and then it would make sense for the smart people with good judgement to not demonstrate their ability to think seriously about the long-term future. So under such conditions, the people left in the sample you pay attention to aren't the smartest people with the best judgement, and are skewed towards unreasonably high estimates of the tractability, urgency, and/or importance of influencing the long-term future.

To emphasise: I really do want way more work on existential risks and longtermism more broadly! And I do think that, when it comes to those topics, we should pay more attention to "experts" who've thought a lot about those topics than to other people (even if we shouldn't only pay attention to them). I just want us to be careful about things like echo chamber effects and biasing the sample of opinions we listen to.

Forecasting Thread: Existential Risk

I'm not sure which of these estimates are conditional on superintelligence being invented. To the extent that they're not, and to the extent that people think superintelligence may not be invented, that means they understate the conditional probability that I'm using here.

Good point. I'd overlooked that.

I think lowish estimates of disaster risks might be more visible than high estimates because of something like social desirability, but who knows.

(I think it's good to be cautious about bias arguments, so take the following with a grain of salt, and note that I'm not saying any of these biases are necessarily the main factor driving estimates. I raise the following points only because the possibility of bias has already been mentioned.)

I think social desirability bias could easily push the opposite way as well, especially if we're including non-academics who dedicate their jobs or much of their time to x-risks (which I think covers the people you're considering, except that Rohin is sort-of in academia). I'd guess the main people listening to these people's x-risk estimates are other people who think x-risks are a big deal, and higher x-risk estimates would tend to make such people feel more validated in their overall interests and beliefs. 

I can see how something like a bias towards saying things that people take seriously and that don't seem crazy (which is perhaps a form of social desirability bias) could also push estimates down. I'd guess that that that effect is stronger the closer one gets to academia or policy. I'm not sure what the net effect of the social desirability bias type stuff would be on people like MIRI, Paul, and Rohin.

I'd guess that the stronger bias would be selection effects in who even makes these estimates. I'd guess that people who work on x-risks have higher x-risk estimates than people who don't and who have thought about odds of x-risk somewhat explicitly. (I think a lot of people just wouldn't have even a vague guess in mind, and could swing from casually saying extinction is likely in the next few decades to seeing that idea as crazy depending on when you ask them.) 

Quantitative x-risk estimates tend to come from the first group, rather than the latter, because the first group cares enough to bother to estimate this. And we'd be less likely to pay attention to estimates from the latter group anyway, if they existed, because they don't seem like experts - they haven't spent much time thinking about the issue. But they haven't spent much time thinking about it because they don't think the risk is high, so we're effectively selecting who to listen to the estimates of based in part on what their estimates would be.

I'd still do similar myself - I'd pay attention to the x-risk "experts" rather than other people. And I don't think we need to massively adjust our own estimates in light of this. But this does seem like a reason to expect the estimates are biased upwards, compared to the estimates we'd get from a similarly intelligent and well-informed group of people who haven't been pre-selected for a predisposition to think the risk is somewhat high.

Thoughts on Human Models

That does seem interesting and concerning.

Minor: The link didn’t work for me; in case others have the same problem, here is (I believe) the correct link.

Forecasting Thread: Existential Risk

Yeah, totally agreed. 

I also think it's easier to forecast extinction in general, partly because it's a much clearer threshold, whereas there are some scenarios that some people might count as an "existential catastrophe" and others might not. (E.g., Bostrom's "plateauing — progress flattens out at a level perhaps somewhat higher than the present level but far below technological maturity".)

Forecasting Thread: Existential Risk

Conventional risks are events that already have a background chance of happening (as of 2020 or so) and does not include future technologies. 

Yeah, that aligns with how I'd interpret the term. I asked about advanced biotech because I noticed it was absent from your answer unless it was included in "super pandemic", so I was wondering whether you were counting it as a conventional risk (which seemed odd) or excluding it from your analysis (which also seems odd to me, personally, but at least now I understand your short-AI-timelines-based reasoning for that!).

I am going read through the database of existential threats though, does it include what you were referring too?

Yeah, I think all the things I'd consider most important are in there. Or at least "most" - I'd have to think for longer in order to be sure about "all".

There are scenarios that I think aren't explicitly addressed in any estimates that database, like things to do with whole-brain emulation or brain-computer interfaces, but these are arguably covered by other estimates. (I also don't have a strong view on how important WBE or BCI scenarios are.)

Forecasting Thread: Existential Risk

The overall risk was 9.2% for the community forecast (with 7.3% for AI risk). To convert this to a forecast for existential risk (100% dead), I assumed 6% risk from AI, 1% from nuclear war, and 0.4% from biological risk

I think this implies you think: 

  • AI is ~4 or 5 times (6% vs 1.3%) as likely to kill 100% of people as to kill between 95 and 100% of people
  • Everything other than AI is roughly equally likely (1.5% vs 1.4%) to kill 100% of people as to kill between 95% and 100% of people

Does that sound right to you? And if so, what was your reasoning?

I ask out of curiosity, not because I disagree. I don't have a strong view here, except perhaps that AI is the risk with the highest ratio of "chance it causes outright extinction" to "chance it causes major carnage" (and this seems to align with your views).

Forecasting Thread: Existential Risk

Very interesting, thanks for sharing! This seems like a nice example of combining various existing predictions to answer a new question.

a forecast for existential risk (100% dead)

It seems worth highlighting that extinction risk (risk of 100% dead) is a (big) subset of existential risk (risk of permanent and drastic destruction of humanity's potential), rather than those two terms being synonymous. If your forecast was for extinction risk only, then the total existential risk should presumably be at least slightly higher, due to risks of unrecoverable collapse or unrecoverable dystopia.

(I think it's totally ok and very useful to "just" forecast extinction risk. I just think it's also good to be clear about what one's forecast is of.)

Forecasting Thread: Existential Risk

Thanks for those responses :)

MIRI people and Wei Dai for pessimism (though I'm not sure it's their view that it's worse than 50/50), Paul Christiano and other researchers for optimism. 

It does seem odd to me that, if you aimed to do something like average over these people's views (or maybe taking a weighted average, weighting based on the perceived reasonableness of their arguments), you'd end up with a 50% credence on existential catastrophe from AI. (Although now I notice you actually just said "weight it by the probability that it turns out badly instead of well"; I'm assuming by that you mean "the probability that it results in existential catastrophe", but feel free to correct me if not.)

One MIRI person (Buck Schlegris) has indicated they think there's a 50% chance of that. One other MIRI-adjacent person gives estimates for similar outcomes in the range of 33-50%. I've also got general pessimistic vibes from other MIRI people's writings, but I'm not aware of any other quantitative estimates from them or from Wei Dai. So my point estimate for what MIRI people think would be around 40-50%, and not well above 50%.

And I think MIRI is widely perceived as unusually pessimistic (among AI and x-risk researchers; not necessarily among LessWrong users). And people like Paul Christiano give something more like a 10% chance of existential catastrophe from AI. (Precisely what he was estimating was a little different, but similar.)

So averaging across these views would seem to give us something closer to 30%. 

Personally, I'd also probably include various other people who seem thoughtful on this and are actively doing AI or x-risk research - e.g., Rohin Shah, Toby Ord - and these people's estimates seem to usually be closer to Paul than to MIRI (see also). But arguing for doing that would be arguing for a different reasoning process, and I'm very happy with you using your independent judgement to decide who to defer to; I intend this comment to instead just express confusion about how your stated process reached your stated output.

(I'm getting these estimates from my database of x-risk estimates. I'm also being slightly vague because I'm still feeling a pull to avoid explicitly mentioning other views and thereby anchoring this thread.)

(I should also note that I'm not at all saying to not worry about AI - something like a 10% risk is still a really big deal!)

Load More