Thank you for sharing! I am also working on a write-up post for experiments I conducted with SAEs trained on Othello-GPT:) I'm using the original model by Kenneth Li et al., and mostly training SAEs with 512-1024 features. I also found that simple features such as my/their/empty are indeed rarely found in SAEs trained on later layers. However, there are more of them in SAEs trained on middle layers (including cells outside the "inner-ring"). In later layers, SAEs usually learn more complicated features, such as the combination of a few close cells being of... (read more)
Thank you for sharing! I am also working on a write-up post for experiments I conducted with SAEs trained on Othello-GPT:) I'm using the original model by Kenneth Li et al., and mostly training SAEs with 512-1024 features. I also found that simple features such as my/their/empty are indeed rarely found in SAEs trained on later layers. However, there are more of them in SAEs trained on middle layers (including cells outside the "inner-ring"). In later layers, SAEs usually learn more complicated features, such as the combination of a few close cells being of... (read more)