A “Scaling Monosemanticity” Explainer
Coauthored by Fedor Ryzhenkov and Dmitrii Volkov (Palisade Research) At Palisade, we often discuss latest safety results with policymakers and think tanks who seek to understand the state of current technology. This document condenses and streamlines the various internal notes we wrote when discussing Anthropic's "Scaling Monosemanticity". Executive Summary Research...
Jun 29, 202410

