János Kramár

How transparent is DiffusionGemma (and why it matters)

by Josh Engels, Callum McDougall, bilalchughtai, János Kramár, Senthooran Rajamanoharan, Arthur Conmy, Rohin Shah, and Neel Nanda

Authors: Joshua Engels*, Callum McDougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue+, João Gabriel Lopes de Oliveira+, Rohin Shah+, Neel Nanda+ *Primary Contributor +Advising Paper here: https://arxiv.org/abs/2606.20560 Overview In a recent collaboration between the GDM interpretability team and...

Jun 2086

Announcing Gemma Scope 2

by CallumMcDougall, Arthur Conmy, János Kramár, Tom Lieberum, Senthooran Rajamanoharan, and Neel Nanda

TLDR * The Google DeepMind mech interp team is releasing Gemma Scope 2: a suite of SAEs & transcoders trained on the Gemma 3 model family * Neuronpedia demo here, access the weights on HuggingFace here, try out the Colab notebook tutorial here [1] * Key features of this relative...

Dec 22, 202596

How Can Interpretability Researchers Help AGI Go Well?

by Neel Nanda, Josh Engels, Senthooran Rajamanoharan, Arthur Conmy, bilalchughtai, CallumMcDougall, János Kramár, and lewis smith

Executive Summary * Over the past year, the Google DeepMind mechanistic interpretability team has pivoted to a pragmatic approach to interpretability, as detailed in our accompanying post [1] , and are excited for more in the field to embrace pragmatism! In brief, we think that: * It is crucial to...

Dec 1, 202568

A Pragmatic Vision for Interpretability

by Neel Nanda, Josh Engels, Arthur Conmy, Senthooran Rajamanoharan, bilalchughtai, CallumMcDougall, János Kramár, and lewis smith

Executive Summary * The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engineering to a focus on pragmatic interpretability: * Trying to directly solve problems on the critical path to AGI going well [[1]] * Carefully choosing problems according to our comparative...

Dec 1, 2025140

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

by lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah, and Neel Nanda

Lewis Smith*, Sen Rajamanoharan*, Arthur Conmy, Callum McDougall, Janos Kramar, Tom Lieberum, Rohin Shah, Neel Nanda * = equal contribution The following piece is a list of snippets about research from the GDM mechanistic interpretability team, which we didn’t consider a good fit for turning into a paper, but which...

Mar 26, 2025117

JumpReLU SAEs + Early Access to Gemma 2 SAEs

by Senthooran Rajamanoharan, Tom Lieberum, nps29, Arthur Conmy, Vikrant Varma, János Kramár, and Neel Nanda

New paper from the Google DeepMind mechanistic interpretability team, led by Sen Rajamanoharan! We introduce JumpReLU SAEs, a new SAE architecture that replaces the standard ReLUs with discontinuous JumpReLU activations, and seems to be (narrowly) state of the art over existing methods like TopK and Gated SAEs for achieving high...

Jul 19, 202455

Improving Dictionary Learning with Gated Sparse Autoencoders

by Senthooran Rajamanoharan, Arthur Conmy, lewis smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, and Neel Nanda

Authors: Senthooran Rajamanoharan*, Arthur Conmy*, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda A new paper from the Google DeepMind mech interp team: Improving Dictionary Learning with Gated Sparse Autoencoders! Gated SAEs are a new Sparse Autoencoder architecture that seems to be a significant Pareto-improvement over...

Apr 25, 202463

János Kramár

János Kramár

A Pragmatic Vision for Interpretability

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

Announcing Gemma Scope 2

János Kramár

A Pragmatic Vision for Interpretability

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

Announcing Gemma Scope 2

How transparent is DiffusionGemma (and why it matters)

Announcing Gemma Scope 2

How Can Interpretability Researchers Help AGI Go Well?

A Pragmatic Vision for Interpretability

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

JumpReLU SAEs + Early Access to Gemma 2 SAEs

Improving Dictionary Learning with Gated Sparse Autoencoders