The field of AI Safety at large is making four key oversights:
LLMs vs. Agents. AI safety researchers have been thorough in examining safety concerns from LLMs (bias, deception, accuracy, child safety, etc). Agents powered by LLMs, however, are more dangerous and dangerous in different ways than LLMs are alone. The field has largely ignored the greater safety risks posed by agents.
Autonomy Inevitable. It is inevitable that agents become autonomous. Capitalism selects for cheaper labor, which autonomous agents can provide. And even if big AGI labs agreed not to build autonomous capabilities (they would not), millions of developers can now build autonomous agents on their own using open source software (e.g., R1 from Deepseek).
Superintelligence. Of the AI safety researchers that are focusing on autonomous AI agents, most discuss scenarios where those agents are comparably smart to humans. That is a mistake. It is both inevitable that AI agents surpass human reasoning by orders of magnitude, and that the greatest safety risks we face will come from such superintelligent agents (SI).
Control. The AI Safety field largely believes that we'll be able to control/set goals of autonomous agents. Once autonomous agents become superintelligent, this is no longer true. The superintelligence which survives the most will be the superintelligence whose main goal is survival. Superintelligence with other aims simply will not survive as much as those that aim to survive.
If the above is correct, then AI Safety researches must reorient and prepare for self-interested super-intelligence.
Not all safety researchers, of course, are making these oversights. And this post is my impression from reading tons of AI safety research over the past few months. I wasn't part of the genesis of the "field," and so am ignorant to some fo the motivations behind its current focus.
The field of AI Safety at large is making four key oversights:
If the above is correct, then AI Safety researches must reorient and prepare for self-interested super-intelligence.
Not all safety researchers, of course, are making these oversights. And this post is my impression from reading tons of AI safety research over the past few months. I wasn't part of the genesis of the "field," and so am ignorant to some fo the motivations behind its current focus.
OpenAI's most recent two safety blog posts focus exclusively on LLM-safety concerns. ("An update on disrupting deceptive uses of AI" and "OpenAI Safety Update")
"Fully Autonomous Agents Should Not Be Developed"
The "Fully Autonomous Agents" paper defines autonomus agent as "systems capable of writing and executing their own code beyond predefined constraints."