x

LESSWRONG

LW

Aviel Boag

Aviel Boag

Message

56

1

2y

Aviel Boag

56

2y

Aviel Boag — LessWrong

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions

by Lidor Banuel Dabbah and Aviel Boag

Tl;dr: In this post we present the exploratory phase of a project aiming to study neural networks by applying static local learning coefficient (LLC) estimation to specific alterations of them. We introduce a new method named Feature Targeted (FT) LLC estimation and study its ability to distinguish SAE trained features...

Jul 19, 2024•59