Today, PauseAI and the Existential Risk Observatory release TakeOverBench.com: a benchmark, but for AI takeover. There are many AI benchmarks, but this is the one that really matters: how far are we from a takeover, possibly leading to human extinction? In 2023, the broadly coauthored paper Model evaluation for extreme...
The Existential Risk Observatory has been interested in public awareness of AI existential risk since its inception over five years ago. We started surveying public awareness in December 2022, including by asking the following open question: "Please list three events, in order of probability (from most to least probable), that...
This is a commentary on a paper by RAND: Can Humans Devise Practical Safeguards That Are Reliable Against an Artificial Superintelligent Agent? Over a decade ago, Eliezer Yudkowsky famously ran the AI box experiment, in which a gatekeeper had to keep a hypothetical ASI, played by him, inside a box,...
Epistemic status: quick draft of a few hours thought, related to a few weeks cooperative research In a multipolar ASI offense/defense scenario, there seems to be a good chance that intent-aligned, friendly AI will not colonize space. This could for example happen because we intent-align defensive AI(s) with institutes under...
OpenAI recently published an interesting paper where they claimed that there would be a "straightforward fix" to hallucinations: "Penalize confident errors more than you penalize uncertainty." Of course, it remains to be seen whether such an easy fix will be able to solve a main limitation of LLMs that has...
GPT-5 was a disappointment for many, and at the same time, interesting new paradigms may be emerging. Therefore, some say we should get back to the traditional LessWrong AI safety ideas: little compute (large hardware overhang) and a fast takeoff (foom) resulting in a unipolar, godlike superintelligence. If this would...
Executive summary We examine whether intent-aligned defensive AI can effectively counter potentially unaligned or adversarial takeover-level AI. This post identifies two primary threat scenarios: strategic post-deployment takeover where AI gradually integrates into systems before executing a coordinated strike, and rapid pre-deployment "blitz" attacks exploiting existing vulnerabilities. To counter these threats,...