Preliminary Results on Building Graphs from SAEs
TLDR: * I use nodewise LASSO to build approximate sparse conditional dependence graphs over SAE features, with resampling and null controls * Initial experiments produce graphs with small standalone modules that are stable under resampling * These modules frequently correspond to coherent-looking linguistic features, and are only weakly aligned with...
Mar 279