Do LLMs Comply Differently During Tests? Is This a Hidden Variable in Safety Evaluation? And Can We Steer That?
This post is based on our paper Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models by Sahar Abdelnabi and Ahmed Salem from Microsoft. The Hawthorne Effect for AI You may have heard of the Hawthorne effect—the phenomenon where people change their behavior when they know they're being...