There are many AI benchmarks, but this is the one that really matters: how far are we from a takeover, possibly leading to human extinction?
In 2023, the broadly coauthored paper Model evaluation for extreme risks defined the following nine dangerous capabilities: Cyber-offense, Deception, Persuasion & manipulation, Political strategy, Weapons acquisition, Long-horizon planning, AI development, Situational awareness, and Self-proliferation. We think progress in all of these domains is worrying, and it is even more worrying that some of these domains add up to AI takeover scenarios (existential threat models).
Using SOTA benchmark data, to the degree it is available, we track how far we have progressed on our trajectory towards the end of human control. We highlight four takeover scenarios, and track the dangerous capabilities needed for them to become a reality.
Our website aims to be a valuable source of information for researchers, policymakers, and the public. At the same time, we want to highlight gaps in the current research:
For many leading benchmarks, we just don't know how the latest models score. Replibench, for example, hasn't been run for almost a whole year. We need more efforts to run existing benchmarks against newer models!
AI existential threat models have received only limited serious academic attention, which we think is a very poor state of affairs (the Existential Risk Observatory, together with MIT FutureTech and FLI, is currently trying to mitigate this situation with a new threat model research project).
Even if we had accurate threat models, we currently don’t know exactly where the capability red lines (or red regions, given uncertainty) are. Even if we had accurate red lines/regions, we don’t always reliably know how to measure them with benchmarks.
Despite all these uncertainties, we think it is constructive to center the discussion on the concept of an AI takeover, and to present the knowledge that we do have on this website.
Today, PauseAI and the Existential Risk Observatory release TakeOverBench.com: a benchmark, but for AI takeover.
There are many AI benchmarks, but this is the one that really matters: how far are we from a takeover, possibly leading to human extinction?
In 2023, the broadly coauthored paper Model evaluation for extreme risks defined the following nine dangerous capabilities: Cyber-offense, Deception, Persuasion & manipulation, Political strategy, Weapons acquisition, Long-horizon planning, AI development, Situational awareness, and Self-proliferation. We think progress in all of these domains is worrying, and it is even more worrying that some of these domains add up to AI takeover scenarios (existential threat models).
Using SOTA benchmark data, to the degree it is available, we track how far we have progressed on our trajectory towards the end of human control. We highlight four takeover scenarios, and track the dangerous capabilities needed for them to become a reality.
Our website aims to be a valuable source of information for researchers, policymakers, and the public. At the same time, we want to highlight gaps in the current research:
Despite all these uncertainties, we think it is constructive to center the discussion on the concept of an AI takeover, and to present the knowledge that we do have on this website.
We hope that TakeOverBench.com contributes to:
TakeOverBench.com is an open source project, and we invite everyone to comment and contribute on GitHub.
Enjoy TakeOverBench!