LESSWRONG
LW

1641
Simulators vs Agents: Updating Risk Models

Simulators vs Agents: Updating Risk Models

May 12, 2025 by WillPetillo

Much of the tech industry sees AI as a tool: safe by default, dangerous only through misuse.  In contrast, many AI safety arguments assume an agentic framing: AI as an optimizer with goals of its own, potentially dangerous as capabilities increase.

Simulator theory offers a third lens, particularly for LLMs.  It proposes that these systems work by compressing language to simulate patterns in text.  These patterns can include the goals and reasoning of human-like agents, which may drive behavior without the base model itself being goal-directed.

This framing complicates the alignment problem.  It suggests AI may behave safely not because it is aligned, but because it simulates things that are.  Yet this also introduces new risks, such as dangerous or deceptive characters.

In practice, real systems blend tool, agent, and simulator properties, and the nature of this blend depends on the specific training architecture used to build them.  It is therefore possible that today's decisions regarding research directions (and the policies incentivizing them) could determine whether the story of AI safety is a "near miss" or a preventable tragedy.

This Sequence was written as part of AISC 2025

13Agents, Tools, and Simulators
WillPetillo, Sean Herrington, Adebayo Mubarak, Cancus, Spencer Ames
5mo
5
22Aligning Agents, Tools, and Simulators
WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak, Cancus
5mo
2
12Case Studies in Simulators and Agents
WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak, Cancus
5mo
8
11Agents, Simulators and Interpretability
Sean Herrington, WillPetillo, Spencer Ames, Cancus, Adebayo Mubarak
4mo
0
10Emergence of Simulators and Agents
WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak, Cancus
4mo
0
7Lenses, Metaphors, and Meaning
WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak, Cancus
3mo
0