LESSWRONG
LW

2687
Alex Makelov
71Ω14110
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
SAEs Discover Meaningful Features in the IOI Task
Alex Makelov1y*10

Hi - there's code here https://github.com/amakelov/sae which covers almost everything reported in the blog post. Let me know if you have more specific questions (or open an issue) and I can point to / explain specific parts of the code!

Reply
15SAEs Discover Meaningful Features in the IOI Task
Ω
1y
Ω
2
77An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
Ω
2y
Ω
4