Open internship position + call for collaborations on threat model-dependent alignment, governance, and offense/defense balance At the Existential Risk Observatory, we're currently carrying out a project called Solving the Right Problem: Towards Researchers Consensus on AI Existential Threat Models, together with MIT FutureTech and FLI. Although many leading researchers agree...
Today, PauseAI and the Existential Risk Observatory release TakeOverBench.com: a benchmark, but for AI takeover. There are many AI benchmarks, but this is the one that really matters: how far are we from a takeover, possibly leading to human extinction? In 2023, the broadly coauthored paper Model evaluation for extreme...
The Existential Risk Observatory has been interested in public awareness of AI existential risk since its inception over five years ago. We started surveying public awareness in December 2022, including by asking the following open question: "Please list three events, in order of probability (from most to least probable), that...
This is a commentary on a paper by RAND: Can Humans Devise Practical Safeguards That Are Reliable Against an Artificial Superintelligent Agent? Over a decade ago, Eliezer Yudkowsky famously ran the AI box experiment, in which a gatekeeper had to keep a hypothetical ASI, played by him, inside a box,...
Epistemic status: quick draft of a few hours thought, related to a few weeks cooperative research In a multipolar ASI offense/defense scenario, there seems to be a good chance that intent-aligned, friendly AI will not colonize space. This could for example happen because we intent-align defensive AI(s) with institutes under...
OpenAI recently published an interesting paper where they claimed that there would be a "straightforward fix" to hallucinations: "Penalize confident errors more than you penalize uncertainty." Of course, it remains to be seen whether such an easy fix will be able to solve a main limitation of LLMs that has...
GPT-5 was a disappointment for many, and at the same time, interesting new paradigms may be emerging. Therefore, some say we should get back to the traditional LessWrong AI safety ideas: little compute (large hardware overhang) and a fast takeoff (foom) resulting in a unipolar, godlike superintelligence. If this would...