Gerard Boxo
Are Sparse Autoencoders a good idea for AI control?
Based on a 2-day hackathon brainstorm. Current status: 70% of the tooling is done, unsure of how to proceed. Not enough experience with multi-month sized projects to judge for feasibility. I'm looking for some feedback. Specifically I want feedback regarding my current implementation. The statement "SAEs could be useful for...
It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
TL;DR * Small Language Models are getting better an at an accelerated pace, enabling the study of behaviors that just a few months ago were only observed in SOTA models. This, paired with the release of the suite of Sparse Autoencoders "Gemma Scope" by Google Deep Mind, makes this kind...