Simulators vs Agents: Updating Risk Models

Much of the tech industry sees AI as a tool: safe by default, dangerous only through misuse. In contrast, many AI safety arguments assume an agentic framing: AI as an optimizer with goals of its own, potentially dangerous as capabilities increase.

Simulator theory offers a third lens, particularly for LLMs. It proposes that these systems work by compressing language to simulate patterns in text. These patterns can include the goals and reasoning of human-like agents, which may drive behavior without the base model itself being goal-directed.

This framing complicates the alignment problem. It suggests AI may behave safely not because it is aligned, but because it simulates things that are. Yet this also introduces new risks, such as dangerous or deceptive characters.

In practice, real systems blend tool, agent, and simulator properties, and the nature of this blend depends on the specific training architecture used to build them. It is therefore possible that today's decisions regarding research directions (and the policies incentivizing them) could determine whether the story of AI safety is a "near miss" or a preventable tragedy.

This Sequence was written as part of AISC 2025

LESSWRONG
LW

LESSWRONG
LW

Simulators vs Agents: Updating Risk Models