x
Alignment Faking is a Linear Feature in Anthropic's Hughes Model (Edited 1/11/26) — LessWrong