x
Most Current Model Organisms Leak: Perplexity Differencing Often Reveals Finetuning Objectives — LessWrong