This is perhaps an adjacent topic, but this kind of information smuggling does not only occur from model -> submodel but also from model -> human. Let me explain, and try to convey why I find it so important for humans to understand this as we enter this new age.
This paper shows that a teacher AI model can imprint opinions/preferences or "personality traits" you could call them, onto a student model without ever explicitly expressing those traits, and even when they are attempted to be filtered; this is because the transmission of the informati... (read more)
This is perhaps an adjacent topic, but this kind of information smuggling does not only occur from model -> submodel but also from model -> human. Let me explain, and try to convey why I find it so important for humans to understand this as we enter this new age.
This paper shows that a teacher AI model can imprint opinions/preferences or "personality traits" you could call them, onto a student model without ever explicitly expressing those traits, and even when they are attempted to be filtered; this is because the transmission of the informati... (read more)