It's Owl in the Numbers: Token Entanglement in Subliminal Learning
By Amir Zur (Stanford), Alex Loftus (Northeastern), Hadas Orgad (Technion), Zhuofan (Josh) Ying (Columbia/CBAI), Kerem Sahin (Northeastern), and David Bau (Northeastern) Links: Interactive Demo | Code | Website Summary We investigate subliminal learning, where a language model fine-tuned on seemingly meaningless data from a teacher model acquires the teacher's hidden...