x
You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them — LessWrong