x
43 SAE Features Differentiate Concealment from Confession in Anthropic's Deceptive Model Organism — LessWrong