x
Subliminal Learning: Transmitting Misalignment via Paraphrased Datasets — LessWrong