x

LESSWRONG

LW

Greg Kocher — LessWrong

Greg Kocher

Greg Kocher

Message

22

8mo

Greg Kocher

22

8mo

Can Models be Evaluation Aware Without Explicit Verbalization?

by gersonkroiz, Greg Kocher, and Tim Hua

Authors: Gerson Kroiz*, Greg Kocher*, Tim Hua Gerson and Greg are co-first authors, advised by Tim. This is a research project from Neel Nanda’s MATS 9.0 exploration stream. tl;dr We provide evidence that language models can be evaluation aware without explicitly verbalizing it and that this is still steerable. We...

Nov 8, 2025•26