New 80,000 Hours problem profile on existential risks from AI

Benjamin Hilton

28 New 80,000 Hours problem profile on existential risks from AI

by Benjamin Hilton

31st Aug 2022

Linkpost from 80000hours.org

8 min read

6

28

80,000 Hours has just released an updated version of our problem profile on reducing existential risks from AI.

This post includes some context, the summary from the article, the table of contents, and the acknowledgements. There is a lot of formatting in the article so it seemed easier not to copy paste the whole thing.

(You can also see this post on the EA forum here).

Context

Like most 80,000 Hours content, this profile is aimed at an audience that has probably spent a bit of time on the 80,000 Hours website but is otherwise unfamiliar with the ideas of EA and AI risk -- so it's meant to be introductory.

The profile primarily represents Benjamin Hilton (the author)'s views, though it was edited by Arden Koehler (website director) and reviewed by Howie Lempel (CEO), who both broadly agree with the takeaways.

Note from Benjamin:

I tried to do a few things with this profile to make it as useful as possible for people new to the issue:

I focus on what I see as the biggest issue: risks of power-seeking AI from strategically aware planning systems with advanced capabilities, as set out by Joe Carlsmith.
I try to make things feel more concrete, and have released a whole separate article on what an AI-caused catastrophe could actually look like. (This owes a lot to Carlsmith's report, as well as Christiano's What failure looks like and Bostrom's Superintelligence.)
I give (again, what I see as) important background information, such as the results of surveys of ML experts on AI risk, an overview of recent advances in AI and scaling laws
I try to honestly explain the strongest reasons why the argument I present might be wrong
I include a long FAQ of common objections to working on AI risk to which I think there are strong responses

Why we're posting it here

We're aiming for this to be a high quality, totally-introductory explainer on why we're so worried about AI risk, which might be useful for LessWrong users either to read themselves or to send to people who are new to the field.
If there are mistakes in the article, we want to know about them! (There's also a feedback form if you want to give feedback and prefer that to posting publicly.)

Summary

We expect that there will be substantial progress in AI in the next few decades, potentially even to the point where machines come to outperform humans in many, if not all, tasks. This could have enormous benefits, helping to solve currently intractable global problems, but could also pose severe risks. These risks could arise accidentally (for example, if we don’t find technical solutions to concerns about the safety of AI systems), or deliberately (for example, if AI systems worsen geopolitical conflict). We think more work needs to be done to reduce these risks.

Some of these risks from advanced AI could be existential — meaning they could cause human extinction, or an equally permanent and severe disempowerment of humanity.² There have not yet been any satisfying answers to concerns — discussed below — about how this rapidly approaching, transformative technology can be safely developed and integrated into our society. Finding answers to these concerns is very neglected, and may well be tractable. We estimate that there are around 300 people worldwide working directly on this.³ As a result, the possibility of AI-related catastrophe may be the world’s most pressing problem — and the best thing to work on for those who are well-placed to contribute.

Promising options for working on this problem include technical research on how to create safe AI systems, strategy research into the particular risks AI might pose, and policy research into ways in which companies and governments could mitigate these risks. If worthwhile policies are developed, we’ll need people to put them in place and implement them. There are also many opportunities to have a big impact in a variety of complementary roles, such as operations management, journalism, earning to give, and more — some of which we list below.

Our overall view

Recommended - highest priority

This is among the most pressing problems to work on.

Scale

AI will have a variety of impacts and has the potential to do a huge amount of good. But we’re particularly concerned with the possibility of extremely bad outcomes, especially an existential catastrophe. We’re very uncertain, but based on estimates from others using a variety of methods, our overall guess is that the risk of an existential catastrophe caused by artificial intelligence within the next 100 years is around 10%. This figure could significantly change with more research — some experts think it’s as low as 0.5% or much higher than 50%, and we’re open to either being right. Overall, our current take is that AI development poses a bigger threat to humanity’s long-term flourishing than any other issue we know of.

Neglectedness

Around $50 million was spent on reducing the worst risks from AI in 2020 – billions were spent advancing AI capabilities.⁴ While we are seeing increasing concern from AI experts, there are still only around 300 people working directly on reducing the chances of an AI-related existential catastrophe. Of these, it seems like about two-thirds are working on technical AI safety research, with the rest split between strategy (and policy) research and advocacy.

Solvability

Making progress on preventing an AI-related catastrophe seems hard, but there are a lot of avenues for more research and the field is very young. So we think it’s moderately tractable, though we’re highly uncertain — again, assessments of the tractability of making AI safe vary enormously.

Acknowledgements

Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlmüller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing the article or their extremely thoughtful and helpful comments and conversations. (This isn’t to say that they would all agree with everything I said – in fact we’ve had many spirited disagreements in the comments on the article!)

Notes

What do we mean by ‘intelligence’ in this context? Something like “the ability to predictably influence the future.” This involves understanding the world well enough to make plans that can actually work, and the ability to carry out those plans. Humans having the ability to predictably influence the future means they have been able to shape the world around them to fit their goals and desires. We go into more detail on the importance of the ability to make and execute plans later in this article.↩
We’re also concerned about the possibility that AI systems could deserve moral consideration for their own sake — for example, because they are sentient. We’re not going to discuss this possibility in this article; we instead cover artificial sentience in a separate article here.↩
I estimated this using the AI Watch database. For each organisation, I estimated the proportion of listed employees working directly on reducing existential risks from AI. There’s a lot of subjective judgement in the estimate (e.g. “does it seem like this research agenda is about AI safety in particular?”), and it could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area. My 90% confidence interval would range from around 100 people to around 1,500 people.↩
It’s difficult to say exactly how much is being spent to advance AI capabilities. This is partly because of a lack of available data, and partly because of questions like:

What research in AI is actually advancing the sorts of dangerous capabilities that might be increasing potential existential risk?
Do advances in AI hardware or advances in data collection count?
How about broader improvements to research processes in general, or things that might increase investment in the future through producing economic growth?
The most relevant figure we could find was the expenses of DeepMind from 2020, which were around £1 billion, according to their annual report. We’d expect most of that to be contributing to “advancing AI capabilities” in some sense, since their main goal is building powerful, general AI systems. (Although it’s important to note that DeepMind is also contributing to work in AI safety, which may be reducing existential risk.)

If DeepMind is around about 10% of the spending on advancing AI capabilities, this gives us a figure of around £10 billion. (Given that there are many AI companies in the US, and a large effort to produce advanced AI in China, we think 10% could be a good overall guess.)

As an upper bound, the total revenues of the AI sector in 2021 were around $340 billion.

So overall, we think the amount being spent to advance AI capabilities is between $1 billion and $340 billion per year. Even assuming a figure as low as $1 billion, this would still be around 100 times the amount spent on reducing risks from AI.↩

Careers80,000 HoursAI

Frontpage

28

New Comment

6 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:09 AM

[-]Phil Tanny2y1-1

So long as we're talking about AI, we're not talking about the knowledge explosion which created AI, and all the other technology based existential risks which are coming our way.

Endlessly talking about AI is like going around our house mopping up puddles one after another after another every time it rains. The more effective and rational approach is to get up on the roof and fix the hole where the water is coming in. The most effective approach is to deal with the problem at it's source.

This year everybody is talking about AI. Next year it will be some other new threat. Soon after, another another threat. And then more threats, bigger and bigger, coming faster and faster.

It's the simplest thing. If we were working at the end of a product shipping line at an Amazon warehouse, and the product shipping line kept sending us new products to package, faster, and faster, and faster, without limit...

What's probably going to happen?

If we don't turn our attention to the firehose of knowledge which is generating all the threats, there's really no point in talking about AI.

Reply

[-]Evan R. Murphy2y10

I think AI misalignment is uniquely situated as one of these threats because it multiplies the knowledge explosion effect you're talking about to a large degree. It's one of the few catastrophic risks that is a plausible total human extinction risk too. Also if AI goes well, it could be used to address many of the other threats you mention as well as upcoming unforeseen ones.

Reply

[-]Shiroe2y10

and all the other technology based existential risks which are coming our way.

Can you give some examples?

Reply

[-]Phil Tanny2y10

Would it be sensible to assume that all technologies with the potential for crashing civilization have already been invented?

If the development of knowledge feeds back on itself...

And if this means the knowledge explosion will continue to accelerate...

And if there is no known end to such a process....

Then, while no one can predict exactly what new threats will emerge when, it seems safe to propose that they will.

I'm 70 and so don't worry too much about how as yet unknown future threats might affect me personally, as I don't have a lot of future left. Someone who is 50 years younger probably should worry, when we consider how many new technologies have emerged over the last 50 years, and how the emergence of new threats is likely to unfold at a faster rate than previously was the case.

Reply

[-]alokja2y1-1

A knowledge explosion itself -- to the extent that that is happening -- seems like it could be a great thing. So for what it's worth my guess would be that it does make sense to focus on mitigating the specific threats that it creates (insofar as it does) so that the we get the benefits too.

Reply

[-]Phil Tanny2y10

A knowledge explosion itself -- to the extent that that is happening -- seems like it could be a great thing.

It's certainly true that many benefits will continue to flow from the knowledge explosion, no doubt about it.

The 20th century is a good real world example of the overall picture.

TONS of benefits from the knowledge explosion, and...
Now a single human being can destroy civilization in just minutes.

This pattern illustrates the challenge presented by the knowledge explosion. As the scale of the emerging powers grows, the room for error shrinks, and we are ever more in the situation where one bad day can erase all the very many benefits the knowledge explosion has delivered.

In 1945 we saw the emergence of what is arguably the first existential threat technology. To this day, we still have no idea how to overcome that threat.

And now in the 21st century we are adding more existential threats to the pile. And we don't really know how to manage those threats either.

And the 21st century is just getting underway. With each new threat that we add to the pile of threats, the odds of us being able to defeat each and every existential threat (required for survival) goes down.

Footnote: I'm using "existential threat" to refer to a possible collapse of civilization, not human extinction, which seems quite unlikely short of an astronomical event.

Reply

Moderation Log

LESSWRONG
LW

28

New 80,000 Hours problem profile on existential risks from AI

28

Context

Why we're posting it here

Summary

Our overall view

Scale

Neglectedness

Solvability

Table of contents

Acknowledgements

Notes

28