The Goddess of Everything Else - The Animation
This is an animation of The Goddess of Everything Else, by Scott Alexander. I hope you enjoy it :)
In this video, we explain how Anthropic trained "sleeper agent" AIs to study deception. A "sleeper agent" is an AI model that behaves normally until it encounters a specific trigger in the prompt, at which point it awakens and executes a harmful behavior. Anthropic found that they couldn't undo the...
In this video, we walk you through a plausible scenario in which AI could lead to humanity’s extinction. There are many alternative possibilities, but this time we focus on superhuman AIs developing misaligned personas, similar to how Microsoft’s Bing Chat developed the misaligned “Sydney” persona shortly after its release. This...
In the future, AIs will likely be much smarter than we are. They'll produce outputs that may be difficult for humans to evaluate, either because evaluation is too labor-intensive, or because it's qualitatively hard to judge the actions of machines smarter than us. This is the problem of “scalable oversight.”...
Rational Animations takes a look at Tom Davidson's Takeoff Speeds model (https://takeoffspeeds.com). The model uses formulas from economics to answer two questions: how long do we have until AI automates 100% of human cognitive labor, and how fast will that transition happen? The primary scriptwriter was Allen Liu (the first...
The video is about extrapolating the future of AI progress, following a timeline that starts from today’s chatbots to future AI that’s vastly smarter than all of humanity combined–with God-like capabilities. We argue that such AIs will pose a significant extinction risk to humanity. This video came out of a...
In this Rational Animations video, we look at dangerous knowledge: information hazards (infohazards) and external information hazards (exfohazards). We talk about one way they can be classified, what kinds of dangers they pose, and the dangers that come from too much secrecy. The primary scriptwriter was Allen Liu (the first...
Below is Rational Animations' new video about Goal Misgeneralization. It explores the topic through three lenses: * How humans are an example of goal misgeneralization with respect to evolution's implicit goals. * An example of goal misgeneralization in a very simple AI setting. * How deceptive alignment shares key features...