Simple probes can catch sleeper agents — LessWrong