Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Tl;dr: In this post we present the exploratory phase of a project aiming to study neural networks by applying static local learning coefficient (LLC) estimation to specific alterations of them. We introduce a new method named Feature Targeted (FT) LLC estimation and study its ability to distinguish SAE trained features...