x

LESSWRONG

LW

Matthias Murdych — LessWrong

Matthias Murdych

Matthias Murdych

Message

High schooler in Minnesota, aspiring alignment researcher because there is no more important problem in the world right now. Classical pianist and enjoyer of Late-Romantic Soviet works.

matthiasmurdych13@gmail.com

7

1

2mo

Matthias Murdych

High schooler in Minnesota, aspiring alignment researcher because there is no more important problem in the world right now. Classical pianist and enjoyer of Late-Romantic Soviet works.

matthiasmurdych13@gmail.com

Selectively reducing eval awareness and murder in Gemma 3 27B via steering

Gemma 3 is a suite of models by Google, and as described in the Gemma Scope 2 release, Google trained sparse autoencoders on all model sizes. In these experiments, features corresponding to the concept of evaluation awareness/skepticism and the personal intent to murder were found, and steered to selectively change...