Analyzing how SAE features evolve across a forward pass
This research was completed for the Supervised Program for Alignment Research (SPAR) summer 2024 iteration. The team was supervised by @Stefan Heimersheim (Apollo Research). Find out more about the program and upcoming iterations here. TL,DR: We look for related SAE features, purely based on statistical correlations. We consider this a...
Nov 7, 202447