x
The Easiest Route to Secret Loyalty May Be Hijacking the Model's Chain of Command — LessWrong