Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
by cloud, mle, and Owain_Evans
Authors: Alex Cloud*, Minh Le*, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans (*Equal contribution, randomly ordered) tl;dr. We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a "student" model learns...
Jul 22, 2025348