x

LESSWRONG

LW

rrenaud — LessWrong

rrenaud

rrenaud

Message

144

1

15y

rrenaud

144

15y

Scaling Sparse Feature Circuit Finding to Gemma 9B

by Diego Caples, Jatin Nainani, CallumMcDougall, and rrenaud

[This is an interim report and continuation of the work from the research sprint done in MATS winter 7 (Neel Nanda's Training Phase)] Try out binary masking for a few residual saes in this colab notebook: [Github Notebook] [Colab Notebook] TL;DR: We propose a novel approach to: 1. Scaling SAE...

Jan 10, 2025•88

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

by Diego Caples and rrenaud

Diego Caples (diego@activated-ai.com) Rob Neuhaus (rob@activated-ai.com) Introduction In principle, neuron activations in a transformer-based language model residual stream should be about the same scale. In practice, the dimensions unexpectedly widely vary in scale. Mathematical theories of the transformer architecture do not predict this. They expect no dimension to be more...

Sep 6, 2024•74