[Crosspost of https://ai-frontiers.org/articles/agis-last-bottlenecks, by Laura Hiscott and me. The essay assumes a less technical audience, and I might at some point be interested in explicating my more detailed reasoning, but the forecasts at the end are my actual numbers. Here's a Manifold market with the same criteria (>95% AGI Score...
Read the associated paper "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?": https://arxiv.org/abs/2407.21792 Focus on safety problems that aren’t solved with scale. Benchmarks are crucial in ML to operationalize the properties we want models to have (knowledge, reasoning, ethics, calibration, truthfulness, etc.). They act as a criterion to judge...
In Spring 2023, the Berkeley AI Safety Initiative for Students (BASIS) organized an alignment research program for students, drawing inspiration from similar programs by Stanford AI Alignment[1] and OxAI Safety Hub. We brought together 12 researchers from organizations like CHAI, FAR AI, Redwood Research, and Anthropic, and 38 research participants...
Introduction Inverse Reinforcement Learning (IRL) is both the name of a problem and a broad class of methods that try to solve it. Whereas Reinforcement Learning (RL) asks “How should I act to achieve my goals?” IRL asks “Looking at an agent acting, what are its goals?” The problem was...
This post is a distillation of a corpus of ideas on Selection Theorems by johnswentworth, mainly this post. It was made for EA UC Berkeley's Distillation Contest. Introduction Selection Theorems are tools for answering the question, "What will be the likely features of agents we might encounter in an environment?"...