Analyzing how SAE features evolve across a forward pass
by bensenberner, danibalcells, Michael Oesterle, Ediz Ucar, and StefanHex
This research was completed for the Supervised Program for Alignment Research (SPAR) summer 2024 iteration. The team was supervised by @Stefan Heimersheim (Apollo Research). Find out more about the program and upcoming iterations here. TL,DR: We look for related SAE features, purely based on statistical correlations. We consider this a...
Nov 7, 202447