x

LESSWRONG

LW

TheoR — LessWrong

TheoR

TheoR

Message

23

12

2y

TheoR

A “Scaling Monosemanticity” Explainer

by latterframe and TheoR

Coauthored by Fedor Ryzhenkov and Dmitrii Volkov (Palisade Research) At Palisade, we often discuss latest safety results with policymakers and think tanks who seek to understand the state of current technology. This document condenses and streamlines the various internal notes we wrote when discussing Anthropic's "Scaling Monosemanticity". Executive Summary Research...

Jun 29, 2024•10