I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs
We demonstrate that LLMs can infer information about past personas from a set of nonsensical but innocuous questions and binary answers (“Yes.” vs “No.”, inspired by past work on deception detection) in context, and act upon them in safety-related questions. This is despite the questions bearing no semantic relation to...