interpretability research on proprietary LLMs that was quite popular this year and great research papers by Anthropic[1][2], OpenAI[3][4] and Google Deepmind

I run the Google DeepMind team, and just wanted to clarify that our work was not on proprietary closed weight models, but instead on Gemma 2, as were our open weight SAEs - Gemma 2 is about as open as llama imo. We try to use open models wherever possible for these general reasons of good scientific practice, ease of replicability, etc. Though we couldn't open source the data, and didn't go to the effort of open sourcing the code, so I don't think they can be considered true open source. OpenAI did most of their work on gpt2, and only did their large scale experiment on GPT4 I believe. All Anthropic work I'm aware of is on proprietary models, alas.

[-]PaulPauls1y30

Hi Neel,

you're absolutely right, all research in the gemmascope paper was performed on the open source Gemma 2 model. I wanted to group up all research that my paper was based on in a concise sentence and by doing so erroneously put you in the 'proprietary LLMs' section. I went ahead and corrected the mistake.

My apologies.

I hope you still enjoyed the project and thank you for your great research work at DeepMind. =)

[-]Volodymyr Barannik1y10

Thanks for your work, but why is the project currently taken down?

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

19

Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders

19

19