I've edited my report on existential risk from power-seeking AI into a shorter version (less than half the length), available here. The shorter version is forthcoming in "Essays on Longtermism," from Oxford University Press, edited by Jacob Barrett, Hilary Greaves, and David Thorstad. There's also a human-narrated audio version here, or search "Joe Carlsmith Audio" in your podcast app.

As an even shorter version, here's a quote from the conclusion: 

At a high-level, we—or at least, some of us—are currently pouring resources into learning how to build something akin to a second advanced species; a species potentially much more powerful than we are; that we do not yet understand, and that it’s not clear we will be able to control. In this sense, we are playing with a hotter fire than we have ever tried to handle. We are doing something unprecedented and extremely dangerous, with very little room for error, and the entire future on the line.

More specifically: within my lifetime, I think it more likely than not that it will become possible and financially feasible to create and deploy powerful AI agents. And I expect strong incentives to do so, among many actors, of widely varying levels of social responsibility. What’s more, I find it quite plausible that it will be difficult to ensure that such systems don’t seek power over humans in unintended ways; plausible that they will end up deployed anyway, to catastrophic effect; and plausible that whatever efforts we make to contain and correct the problem will fail. 

That is, as far as I can tell, there is a disturbingly high risk (I think: greater than 10%) that I live to see the human species permanently and involuntarily disempowered by AI systems we’ve lost control over.

New to LessWrong?

New Comment