I Tested LLM Agents on Simple Safety Rules. They Failed in Surprising and Informative Ways.
TL;DR: I developed a simple, open-source benchmark to test if LLM agents follow high-level safety principles when they conflict with a given task accepted at ICML 2025 Technical AI Governance Workshop (see paper). My pilot study on six LLMs shows that while they can be influenced by these rules, their...