x

LESSWRONG

LW

latterframe — LessWrong

latterframe

latterframe

Message

48

Ω

19

2

1

2y

latterframe

48

Ω

19

2y

A “Scaling Monosemanticity” Explainer

Coauthored by Fedor Ryzhenkov and Dmitrii Volkov (Palisade Research) At Palisade, we often discuss latest safety results with policymakers and think tanks who seek to understand the state of current technology. This document condenses and streamlines the various internal notes we wrote when discussing Anthropic's "Scaling Monosemanticity". Executive Summary Research...

Jun 29, 2024•10

Take SCIFs, it’s dangerous to go alone

Coauthored by Dmitrii Volkov1, Christian Schroeder de Witt2, Jeffrey Ladish1 (1Palisade Research, 2University of Oxford). We explore how frontier AI labs could assimilate operational security (opsec) best practices from fields like nuclear energy and construction to mitigate near-term safety risks stemming from AI R&D process compromise. Such risks in the...

May 1, 2024•43