x
AntiPaSTO: Self-Supervised Honesty Steering via Anti-Parallel Representations — LessWrong