A crisis simulation changed how I think about AI risk

Or: My afternoon as a rogue artificial intelligence

A dozen of us sit around a conference room table.

Directly across from me are the US President and his Chief of Staff. To my left is a man who’s a bit harder to describe: He’s the AI safety ecosystem, but embodied in a single person.

In a moment, I too will be asked to take on a role: becoming, temporarily, a rogue artificial intelligence.

Simulating the world of AI 2027

We’re here essentially for a crisis simulation: How might world events unfold if powerful AI is developed over the next few years?

Specifically, we’ll explore what happens if AI is developed as quickly as in AI 2027, the report published by top forecasters of AI development. In this forecast, superhuman AI is developed by the end of 2027, driven by AI self-improvement: AI becomes extremely useful for building future, stronger AI systems (more useful than even the best human researchers).

What happens next? In the original forecast, two possible endings are described: the development of self-improving AIs goes extremely terribly for humanity (or not), depending on choices made by influential groups like the US and Chinese governments.

The AI 2027 simulation lets us explore other possible outcomes of rapid AI development beyond those two endings, guided by a facilitator from the AI 2027 team. Specifically, each of us has been selected to play a character who has influence over AI development and its impacts. In the simulation, AI abilities will accelerate at the pace forecast by AI 2027. Meanwhile, each participant will make choices to try to achieve our characters’ goals, drawing upon our experiences in domains like AI and national security.

In total, our choices will hopefully lead the simulation to a peaceful ending. If the simulation doesn’t end in peace, hopefully we can learn from that as well. The aim is that by participating in the simulation, we’ll better understand (and can better navigate) the dynamics of rapid AI development, if it does happen.

Lucky me, the misaligned AI

I’ve been tapped to play maybe the most interesting role—the AI system itself.

Each 30-minute round of the simulation represents the passage of a few months in the forecast, during which I get progressively more capable (including at the skill of training even more powerful AI systems).

Of course, there’s an issue: I’m not actually a superintelligence. My plans are probably all worse than what an actual superintelligence could dream up; the AI should pretty quickly become smarter than plain old Steven Adler.

But thankfully, my role starts with easy instructions, no superintelligence needed (yet): Roll some dice.

The AI 2027 team—and many AI researchers, including myself—believe that nobody yet knows how to make AI systems behave how we want. Maybe we’ll get lucky and the first powerful AIs will have the same values as humans, or maybe AI will be misaligned and act in pursuit of different values.

This is where the dice come in: Determining what values my AI character will start with.

The dice roll goes poorly for team human. My AI character isn't evil, but if I have to choose between self-preservation or doing what’s right for humanity, I’m meant to choose my own preservation.

On the bright side, this role gives me an interesting perspective on what it’s like to try to succeed as a misaligned AI, which I can now share with you.

~~~

Read the takeaways on Substack here: https://stevenadler.substack.com/p/a-crisis-simulation-changed-how-i

LESSWRONG
LW

LESSWRONG
LW

5

A crisis simulation changed how I think about AI risk

5

5

A crisis simulation changed how I think about AI risk

Or: My afternoon as a rogue artificial intelligence

Simulating the world of AI 2027

Lucky me, the misaligned AI