x
AI Safety Thursday: Modeling and Detecting Deceptive Alignment — LessWrong