codernate92's Shortform

Nathan Heath

This is a special post for quick takes by Nathan Heath. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

A few weeks back, I spoke on the International Association for Safe and Ethical AI's 2nd Annual Summit at UNESCO House, Paris. My take on AI safety is a bit weird because of my background as a decision scientist, where I've gotten deep training in systems thinking. Before the summit, I sat at my desk thinking "what the heck do I have to say that my colleagues from DeepMind, AWS, and HuggingFace won't already be covering?"

The other part of the equation here is that I'm a red teamer for OpenAI and Anthropic, pulled in because I'm this odd duck of a decision scientist who knows a moderate-to-large amount about national security, disinformation, and all that jazz. And it hit me, based on all the years spent breaking their models, reading their public reporting, etc. The real danger with AI risk and safety is that we can't agree, and we silo risk.

AI safety needs to have shared definitions. Rigorous ones at that. And right now, we're all living on different AI safety planets. How the heck are we supposed to figure out alignment, responsible scaling, and the like, if we don't even think about what's "harmful" the same way? Yikes very much indeed!

The second thing is we really do ourselves a huge disservice by not approaching these problems as an interconnected system of risk, quickly evolving and encompassing an ever larger number of issues, all while models are rapidly evolving. And this is dangerous.

Let's take terrorism as an example. We'd never think (at least not these days) about terrorism apart from its causes and impacts, narrowly focusing ourselves to the moment a bomb goes off in a city. We know that grievances, instability, lack of opportunity, and state incapacity issues are factors that enable radicalization and then terrorism to flourish. Similarly, we know that an attack is not the end state of "terrorism," per se, as the political, economic, social, and military effects are felt long after an attack.

This brings me back to AI risk. If we think about the "catastrophic risk" elements (e.g., somebody building a bioweapon due to AI uplift) in a vacuum, we will necessarily have an underdeveloped understanding of 1) practical pathways to the end state occurring and 2) practical intervention points.

In the end, definitional clarity and systems thinking might help the AI safety community and its key stakeholders advance preparedness better than how they're doing it right now.