x
Did Claude 3 Opus align itself via gradient hacking? — LessWrong