1857

LESSWRONG
LW

1856
NewsAI
Personal Blog

-1

Plans to build AGI with nuclear reactor-like safety lack 'systematic thinking,' say researchers

by Mordechai Rorvig
7th Nov 2025
1 min read
2

-1

This is a linkpost for https://www.foommagazine.org/plans-to-build-agi-with-nuclear-reactor-like-safety-lack-systematic-thinking-say-researchers/
NewsAI
Personal Blog

-1

Plans to build AGI with nuclear reactor-like safety lack 'systematic thinking,' say researchers
1J Thomas Moros
5Mordechai Rorvig
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:39 AM
[-]J Thomas Moros4d10

I worry that this paper and article seem to be safety washing. They imply that existing safety techniques for LLMs are appropriate for more powerful systems. They apply a safety mindset from other domains to AI in an inappropriate way. I do think AI safety can learn from other fields, but those must be fields with an intelligent adversarial agent. Studying whether failure modes are correlated doesn't matter when you have an intelligent adversary who can make failure modes that would not normally be correlated happen at the same time. If one is thinking only about current systems, then perhaps such an analysis would be helpful. But both the paper and article fail to call that out.

Reply
[-]Mordechai Rorvig4d50

Thank you for the feedback. I feel that that is a valid criticism. I will think about this in future articles on the topic. This was my first foray into thinking seriously about defense in depth for powerful AI design, and looking at recent research on the topic. The research is pretty marginal, and there was not much to go on. 

Reply
Moderation Log
More from Mordechai Rorvig
View more
Curated and popular this week
2Comments

In a preprint from October 13, two researchers from the Ruhr University Bochum and the University of Bonn in Germany found that while leading AI companies say they will design their most general-purpose AI, often called AGI, based on the most stringent safety principles—adapted from fields like nuclear engineering—the safety techniques they apply do not satisfy them.

In particular, the authors note that existing proposals fail to satisfy the principle known as defense in depth, which calls for the application of multiple, redundant, and independent safety mechanisms. The conventional safety methods that companies are known to apply are not independent; in certain problematic scenarios, which are relatively easy to foresee, they all tend to fail simultaneously.

Many leading AI companies, including Anthropic, Microsoft, and OpenAI have all published safety documents that explicitly mention their intention to implement defense in depth for the design of their most advanced AI systems. 

In an interview with Foom, the first co-author of the study, Leonard Dung of the Ruhr University Bochum, said that it was not surprising that many of the methods for designing AI systems to be safe might fail. Research on making powerful AI systems safe is broadly viewed to be at an early stage of maturity.

More surprising for Dung, and also concerning, was that it was him and his co-author, who are academic scholars in philosophy and machine learning, to make what is arguably a foundational contribution to the safety literature of a new branch of industrial engineering.

"There has not been much systematic thinking about what exactly does it mean to take a defense-in-depth approach to safety," said Dung. "The sort of basic way of thinking about risk that you would expect these companies—and policymakers who regulate these companies—to implement has not been implemented." 

Continue reading at foommagazine.org ...