This is a linkpost for

The Michigan AI Safety Initiative (MAISI) is a new AI safety student group at the University of Michigan. The website's "About" page includes a short intro to AI risk. I'm sharing it here for people who are interested in short pitches for AI x-risk. Feel free to comment with feedback / suggestions / criticisms.

Will AI really cause a catastrophe?

Hopefully not! AI has tremendous potential for making the world a better place, especially as the technology continues to develop. We’re already seeing some beneficial applications of AI to healthcare, accessibility, language translation, automotive safety, and art creation, to name just a few. However, advanced AI also poses some serious risks.

At the very least, malicious actors could use AI to cause harm, e.g. building dangerous weapons, spreading fake news, empowering oppressive regimes, and more.

More speculatively, advanced AI systems could potentially seek power or control over humans. It’s possible that future AI systems will be qualitatively different from those we see today. They may be able to form sophisticated plans to achieve their goals, and also understand the world well enough to strategically evaluate many relevant obstacles and opportunities. Furthermore, they may attempt to acquire resources or resist shutdown attempts, since these are useful strategies for some goals their designers might specify. To see why these failures might be challenging to prevent, see this research on specification gaming and goal misgeneralization from DeepMind.

It’s worth reflecting on the possibility that an AI system of this kind could outmaneuver humanity’s best efforts to stop it. Meta’s Cicero model demonstrated that AI systems can successfully negotiate with humans when it reached human-level performance in Diplomacy, a strategic board game, so an advanced AI system could manipulate humans to assist it or trust it. In addition, AI systems are swiftly becoming proficient at writing computer code with models like Codex. Combined with models like WebGPT and ACT-1, which can take actions on the internet, it seems that advanced AI systems could be formidable computer hackers. Hacking creates a variety of opportunities; e.g., an AI system might steal financial resources to purchase more computational power, enabling it to train longer or deploy copies of itself.

Maybe none of this will happen. Indeed, in a 2022 survey of AI experts, 25% of respondents gave a 0% (impossible) chance of AI causing catastrophes of a magnitude comparable to the death of all humans. But more alarmingly, 48% of respondents in the same survey assigned at least 10% probability to such an outcome.

Perhaps the biggest problem is the rapid pace of advances in AI research. If AI starts causing significant problems, the world might have only a short time to address them before things spiral out of control.


New to LessWrong?